Stack allocation, padding, and alignment

20,331

Solution 1

It's a gcc feature controlled by -mpreferred-stack-boundary=n where the compiler tries to keep items on the stack aligned to 2^n. If you changed n to 2, it would only allocate 8 bytes on the stack. The default value for n is 4 i.e. it will try to align to 16-byte boundaries.

Why there's the "default" 8 bytes and then 24=8+16 bytes is because the stack already contains 8 bytes for leave and ret, so the compiled code must adjust the stack first by 8 bytes to get it aligned to 2^4=16.

Solution 2

The SSEx family of instructions REQUIRES packed 128-bit vectors to be aligned to 16 bytes - otherwise you get a segfault trying to load/store them. I.e. if you want to safely pass 16-byte vectors for use with SSE on the stack, the stack needs to be consistently kept aligned to 16. GCC accounts for that by default.

Solution 3

I found this site, which has some decent explanation at the bottom of the page about why the stack might be larger. Scale the concept up to a 64bit machine and it might explain what you are seeing.

Solution 4

LWN have an article on memory alignment, that you may find interesting.

Solution 5

The Mac OS X / Darwin x86 ABI requires a stack alignment of 16 bytes. This is not the case on other x86 platforms such as Linux, Win32, FreeBSD ...

Share:
20,331
David
Author by

David

Updated on July 09, 2022

Comments

  • David
    David almost 2 years

    I've been trying to gain a deeper understanding of how compilers generate machine code, and more specifically how GCC deals with the stack. In doing so I've been writing simple C programs, compiling them into assembly and trying my best to understand the outcome. Here's a simple program and the output it generates:

    asmtest.c:

    void main() {
        char buffer[5];
    }
    

    asmtest.s:

    pushl   %ebp
    movl    %esp, %ebp
    subl    $24, %esp
    leave
    ret
    

    What's puzzling to me is why 24 bytes are being allocated for the stack. I know that because of how the processor addresses memory, the stack has to be allocated in increments of 4, but if this were the case, we should only move the stack pointer by 8 bytes, not 24. For reference, a buffer of 17 bytes produces a stack pointer moved 40 bytes and no buffer at all moves the stack pointer 8. A buffer between 1 and 16 bytes inclusive moves ESP 24 bytes.

    Now assuming the 8 bytes is a necessary constant (what is it needed for?), this means that we're allocating in chunks of 16 bytes. Why would the compiler be aligning in such a way? I'm using an x86_64 processor, but even a 64bit word should only require an 8 byte alignment. Why the discrepancy?

    For reference I'm compiling this on a Mac running 10.5 with gcc 4.0.1 and no optimizations enabled.