How to Run x86 Assembly on an Intel Mac

In my undergraduate Computer Systems course, we learned x86-64 assembly language in the AT&T syntax, but we never actually ran any on a real machine. Here, I show how to do that, starting by inspecting the assembly output of a C compiler. I will only be showing the bare essentials of how to get simple programs up and running, assuming you know enough about the language that this is useful, and I will prioritize using built-in tools over using more general-purpose ones. Vital prerequisite: you need to be running on an Intel Mac, i.e., not one of the new Apple silicon ones. You also need the Xcode command line tools installed, which you can prompt with:

% xcode-select --install

You can find the code referenced below in this GitHub repository.

Inspecting the C Compiler’s Output

Take the following hello world program in C and save it as hello_c.c:

#include <stdio.h>
int main() {
    printf("Hello, world!\n");
    return 0;
}

Hopefully if you compile and run this, you’ll get what you expect:

% gcc hello_c.c -o hello_c && ./hello_c 
Hello, world!

Now instead of telling the compiler to give us an executable, let’s have it give us the assembly:

% gcc hello_c.c -S -o hello_c.s

On my machine, this produces the following hello_c.s:

	.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 14, 0	sdk_version 14, 0
	.globl	_main                           ## -- Begin function main
	.p2align	4, 0x90
_main:                                  ## @main
	.cfi_startproc
## %bb.0:
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	subq	$16, %rsp
	movl	$0, -4(%rbp)
	leaq	L_.str(%rip), %rdi
	movb	$0, %al
	callq	_printf
	xorl	%eax, %eax
	addq	$16, %rsp
	popq	%rbp
	retq
	.cfi_endproc
                                        ## -- End function
	.section	__TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
	.asciz	"Hello, world!\n"

.subsections_via_symbols

Probably you recognize some of that and probably some of it is gibberish. Let’s press forward; it turns out that much of the gibberish will be unnecessary. To take this assembly file and run it, first assemble it:

% as hello_c.s -o hello_c.o

then link it with the system library:

% ld -lSystem hello_c.o -o hello_c_asm

(if this gives you an error like ld: library 'System' not found, tell it where to find that library:

% ld -L/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib -lSystem hello_c.o -o hello_c_asm

), then run the result:

% ./hello_c_asm
Hello, world!

Okay, we did it! We ran some assembly!

Simplifying

It turns out that the hello world in assembly that the compiler generated for us can be simplified to this:

# First, let's define the message as a C-style string
my_message: .asciz "Hello, world!\n"

# Now, the actual program
.globl _main  # Export the _main symbol so it can be called from outside the program
_main:  # Code execution starts here
    leaq my_message(%rip), %rdi  # Load the message into the first argument register
    
    # To call printf, we have to do some stack setup and teardown:
    pushq %rbp  # Save the base pointer
    movq %rsp, %rbp  # Save the stack pointer
    andq $-16, %rsp  # Align the stack pointer
    callq _printf  # Call printf
    movq %rbp, %rsp  # Restore the stack pointer
    popq %rbp  # Restore the base pointer

    movq $0, %rax  # Load 0 into the return register
    retq  # Return

Save that as hello_asm.s, assemble, link, and execute, and you should get the same thing:

% as hello_asm.s -o hello_asm.o
% ld -L/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib -lSystem hello_asm.o -o hello_asm
% ./hello_asm
Hello, world!

Now that we know how to assemble and link manually, we can jump up a layer of abstraction and have gcc take care of that for us:

% gcc hello_asm.s -o hello_asm && ./hello_asm
Hello, world!

So that’s the answer — that’s how you actually write and run assembly programs on an Intel Mac. Not too much more involved than writing them for a Computer Systems exam.

Next Steps

Whenever you aren’t sure how to do something in assembly, you can write a C program to do it and see what the compiler says (it can help to declare variables as volatile so they don’t get optimized away). Here’s doing a bit of math and printing the result just like you’d use printf("%d\n", ...) in C:

# A format string
fmt_str: .asciz "%d\n"

# The program
.globl _main
_main:
    movq $42, %rsi  # Second argument register = 42
    addq $5, %rsi  # Second argument register += 5
    leaq fmt_str(%rip), %rdi  # Format string into first argument register
    
    # Same stack alignment stuff as in hello_asm.s
    pushq %rbp
    movq %rsp, %rbp
    andq $-16, %rsp
    callq _printf  # Call printf
    movq %rbp, %rsp
    popq %rbp
	
    movq $0, %rax  # Load 0 into the return register
    retq  # Return

And here’s a more complicated program with jumps to create loops and if/else structures. It’s a non-golfed translation of the Java pi program from this project. The C version:

#include <stdio.h>
int main() {
    // Use a temporary variable to avoid "unsequenced modification and access" warnings
    long x = 0;
    for (long t=0,n=0,y=1;;) {
        x = (int)((t++*y+n++)>>7);
        n -= (y*y+x*x)>>62;
        y = x;
        // Print every 2^(64-36)=268,435,456 iterations
        if (t<<36 == 0) printf("%.17g\n",n*4./t);
    }
    return 0;  // Unreachable
}

And my manual translation to assembly:

fmt_str: .asciz "%.17g\n"      # Format string
four: .double 4.0              # Double constant
.globl _main
_main:
    pushq %rbx
    movq $0, %r12              # t = 0
    movq $0, %r13              # n = 0
    movq $1, %r14              # y = 1
    movq $0, %r15              # y2 = 0
i:
    imulq %r12, %r14           # y *= t
    addq %r13, %r14            # y += n
    sarq $7, %r14              # y >>= 7
    movslq %r14d, %r14         # y = (int)y
    movq %r15, %rbx            # tmp = y2
    movq %r14, %r15            # y2 = y
    imulq %r15, %r15           # y2 *= y2
    addq %r15, %rbx            # tmp += y2
    sarq $62, %rbx             # tmp >>= 62
    subq %rbx, %r13            # n -= tmp
    incq %r12                  # t++
    incq %r13                  # n++
    movq %r12, %rbx            # tmp = t
    salq $36, %rbx             # tmp <<= 36
    test %rbx, %rbx
    jne i                      # Jump if tmp != 0

    # Floating point math within the if block
    cvtsi2sdq %r13, %xmm0      # xmm0 = (double)n
    mulsd four(%rip), %xmm0    # xmm0 *= 4.
    cvtsi2sdq %r12, %xmm1      # xmm1 = (double)t
    divsd %xmm1, %xmm0         # xmm0 /= xmm1

    # printf within the if block
    leaq fmt_str(%rip), %rdi   # Format string into first argument
    pushq %rbp                 # Stack alignment
    movq %rsp, %rbp
    andq $-16, %rsp
    callq _printf              # Call printf
    movq %rbp, %rsp
    popq %rbp

    jmp i
    # Unreachable
    popq %rbx
    movq $0, %rax
    retq

Interestingly, the assembly version runs about 8% faster than the C version compiled with -Ofast — 46 seconds vs. 50 seconds to run 17 billion iterations — and, as you can see, the assembly version is nothing special beyond keeping everything in the registers in the inner loop. It is left to the reader whether this suggests that there is still a performance advantage to hand-coded assembly.

Updated: