How to Run x86 Assembly on an Intel Mac
In my undergraduate Computer Systems course, we learned x86-64 assembly language in the AT&T syntax, but we never actually ran any on a real machine. Here, I show how to do that, starting by inspecting the assembly output of a C compiler. I will only be showing the bare essentials of how to get simple programs up and running, assuming you know enough about the language that this is useful, and I will prioritize using built-in tools over using more general-purpose ones. Vital prerequisite: you need to be running on an Intel Mac, i.e., not one of the new Apple silicon ones. You also need the Xcode command line tools installed, which you can prompt with:
% xcode-select --install
You can find the code referenced below in this GitHub repository.
Inspecting the C Compiler’s Output
Take the following hello world program in C and save it as hello_c.c
:
#include <stdio.h>
int main() {
printf("Hello, world!\n");
return 0;
}
Hopefully if you compile and run this, you’ll get what you expect:
% gcc hello_c.c -o hello_c && ./hello_c
Hello, world!
Now instead of telling the compiler to give us an executable, let’s have it give us the assembly:
% gcc hello_c.c -S -o hello_c.s
On my machine, this produces the following hello_c.s
:
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 14, 0 sdk_version 14, 0
.globl _main ## -- Begin function main
.p2align 4, 0x90
_main: ## @main
.cfi_startproc
## %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
subq $16, %rsp
movl $0, -4(%rbp)
leaq L_.str(%rip), %rdi
movb $0, %al
callq _printf
xorl %eax, %eax
addq $16, %rsp
popq %rbp
retq
.cfi_endproc
## -- End function
.section __TEXT,__cstring,cstring_literals
L_.str: ## @.str
.asciz "Hello, world!\n"
.subsections_via_symbols
Probably you recognize some of that and probably some of it is gibberish. Let’s press forward; it turns out that much of the gibberish will be unnecessary. To take this assembly file and run it, first assemble it:
% as hello_c.s -o hello_c.o
then link it with the system library:
% ld -lSystem hello_c.o -o hello_c_asm
(if this gives you an error like ld: library 'System' not found
, tell it where to find that library:
% ld -L/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib -lSystem hello_c.o -o hello_c_asm
), then run the result:
% ./hello_c_asm
Hello, world!
Okay, we did it! We ran some assembly!
Simplifying
It turns out that the hello world in assembly that the compiler generated for us can be simplified to this:
# First, let's define the message as a C-style string
my_message: .asciz "Hello, world!\n"
# Now, the actual program
.globl _main # Export the _main symbol so it can be called from outside the program
_main: # Code execution starts here
leaq my_message(%rip), %rdi # Load the message into the first argument register
# To call printf, we have to do some stack setup and teardown:
pushq %rbp # Save the base pointer
movq %rsp, %rbp # Save the stack pointer
andq $-16, %rsp # Align the stack pointer
callq _printf # Call printf
movq %rbp, %rsp # Restore the stack pointer
popq %rbp # Restore the base pointer
movq $0, %rax # Load 0 into the return register
retq # Return
Save that as hello_asm.s
, assemble, link, and execute, and you should get the same thing:
% as hello_asm.s -o hello_asm.o
% ld -L/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib -lSystem hello_asm.o -o hello_asm
% ./hello_asm
Hello, world!
Now that we know how to assemble and link manually, we can jump up a layer of abstraction and have gcc
take care of that for us:
% gcc hello_asm.s -o hello_asm && ./hello_asm
Hello, world!
So that’s the answer — that’s how you actually write and run assembly programs on an Intel Mac. Not too much more involved than writing them for a Computer Systems exam.
Next Steps
Whenever you aren’t sure how to do something in assembly, you can write a C program to do it and see what the compiler says (it can help to declare variables as volatile
so they don’t get optimized away). Here’s doing a bit of math and printing the result just like you’d use printf("%d\n", ...)
in C:
# A format string
fmt_str: .asciz "%d\n"
# The program
.globl _main
_main:
movq $42, %rsi # Second argument register = 42
addq $5, %rsi # Second argument register += 5
leaq fmt_str(%rip), %rdi # Format string into first argument register
# Same stack alignment stuff as in hello_asm.s
pushq %rbp
movq %rsp, %rbp
andq $-16, %rsp
callq _printf # Call printf
movq %rbp, %rsp
popq %rbp
movq $0, %rax # Load 0 into the return register
retq # Return
And here’s a more complicated program with jumps to create loops and if/else structures. It’s a non-golfed translation of the Java pi program from this project. The C version:
#include <stdio.h>
int main() {
// Use a temporary variable to avoid "unsequenced modification and access" warnings
long x = 0;
for (long t=0,n=0,y=1;;) {
x = (int)((t++*y+n++)>>7);
n -= (y*y+x*x)>>62;
y = x;
// Print every 2^(64-36)=268,435,456 iterations
if (t<<36 == 0) printf("%.17g\n",n*4./t);
}
return 0; // Unreachable
}
And my manual translation to assembly:
fmt_str: .asciz "%.17g\n" # Format string
four: .double 4.0 # Double constant
.globl _main
_main:
pushq %rbx
movq $0, %r12 # t = 0
movq $0, %r13 # n = 0
movq $1, %r14 # y = 1
movq $0, %r15 # y2 = 0
i:
imulq %r12, %r14 # y *= t
addq %r13, %r14 # y += n
sarq $7, %r14 # y >>= 7
movslq %r14d, %r14 # y = (int)y
movq %r15, %rbx # tmp = y2
movq %r14, %r15 # y2 = y
imulq %r15, %r15 # y2 *= y2
addq %r15, %rbx # tmp += y2
sarq $62, %rbx # tmp >>= 62
subq %rbx, %r13 # n -= tmp
incq %r12 # t++
incq %r13 # n++
movq %r12, %rbx # tmp = t
salq $36, %rbx # tmp <<= 36
test %rbx, %rbx
jne i # Jump if tmp != 0
# Floating point math within the if block
cvtsi2sdq %r13, %xmm0 # xmm0 = (double)n
mulsd four(%rip), %xmm0 # xmm0 *= 4.
cvtsi2sdq %r12, %xmm1 # xmm1 = (double)t
divsd %xmm1, %xmm0 # xmm0 /= xmm1
# printf within the if block
leaq fmt_str(%rip), %rdi # Format string into first argument
pushq %rbp # Stack alignment
movq %rsp, %rbp
andq $-16, %rsp
callq _printf # Call printf
movq %rbp, %rsp
popq %rbp
jmp i
# Unreachable
popq %rbx
movq $0, %rax
retq
Interestingly, the assembly version runs about 8% faster than the C version compiled with -Ofast
— 46 seconds vs. 50 seconds to run 17 billion iterations — and, as you can see, the assembly version is nothing special beyond keeping everything in the registers in the inner loop. It is left to the reader whether this suggests that there is still a performance advantage to hand-coded assembly.