Function Prologue and Epilogue in C

28,514

Solution 1

There are lots of resources out there that explain this:

to name a few.

Basically, as you somewhat described, "the stack" serves several purposes in the execution of a program:

  1. Keeping track of where to return to, when calling a function
  2. Storage of local variables in the context of a function call
  3. Passing arguments from calling function to callee.

The prolouge is what happens at the beginning of a function. Its responsibility is to set up the stack frame of the called function. The epilog is the exact opposite: it is what happens last in a function, and its purpose is to restore the stack frame of the calling (parent) function.

In IA-32 (x86) cdecl, the ebp register is used by the language to keep track of the function's stack frame. The esp register is used by the processor to point to the most recent addition (the top value) on the stack. (In optimized code, using ebp as a frame pointer is optional; other ways of unwinding the stack for exceptions are possible, so there's no actual requirement to spend instructions setting it up.)

The call instruction does two things: First it pushes the return address onto the stack, then it jumps to the function being called. Immediately after the call, esp points to the return address on the stack. (So on function entry, things are set up so a ret could execute to pop that return address back into EIP. The prologue points ESP somewhere else, which is part of why we need an epilogue.)

Then the prologue is executed:

push  ebp         ; Save the stack-frame base pointer (of the calling function).
mov   ebp, esp    ; Set the stack-frame base pointer to be the current
                  ; location on the stack.
sub   esp, N      ; Grow the stack by N bytes to reserve space for local variables

At this point, we have:

...
ebp + 4:    Return address
ebp + 0:    Calling function's old ebp value
ebp - 4:    (local variables)
...

The epilog:

mov   esp, ebp    ; Put the stack pointer back where it was when this function
                  ; was called.
pop   ebp         ; Restore the calling function's stack frame.
ret               ; Return to the calling function.

Solution 2

  1. C Function Call Conventions and the Stack explains well the concept of a call stack

  2. Function prologue briefly explains the assembly code and the hows and whys.

  3. The gen on function perilogues

Solution 3

I am quite late to the party & I am sure that in the last 7 years since the question was asked, you'd have gotten a way clearer understanding of things, that is of course if you chose to pursue the question any further. However, I thought I would still give a shot at especially the why part of the prolog & the epilog.

Also, the accepted answer elegantly & quite simply explains the how of the epilog & the prolog, with good references. I only intend to supplement that answer with the why (at least the logical why) part.

I will quote the below from the accepted answer & try to extend it's explanation.

In IA-32 (x86) cdecl, the ebp register is used by the language to keep track of the function's stack frame. The esp register is used by the processor to point to the most recent addition (the top value) on the stack.

The call instruction does two things: First it pushes the return address onto the stack, then it jumps to the function being called. Immediately after the call, esp points to the return address on the stack.

The last line in the quote above says immediately after the call, esp points to the return address on the stack.

Why's that?

So let's say that our code that's getting currently executed has the following situation, as shown in the (really badly drawn) diagram below

enter image description here

So our next instruction to be executed is, say at the address 2. This is where the EIP is pointing. The current instruction has a function call (that would internally translate to the assembly call instruction).

Now ideally, because the EIP is pointing to the very next instruction, that would indeed be the next instruction to get executed. But since there's sort of a diversion from the current execution flow path, (that is now expected because of the call) the EIP's value would change. Why? Because now another instruction, that may be somewhere else, say at the address 1234 (or whatever), may need to get executed. But in order to complete the execution flow of the program as was intended by the programmer, after the diversion activities are done, the control must return back to the address 2 as that is what should have been executed next should the diversion have not happened. Let us call this address 2 as the return address in the context of the call that is being made.

Problem 1

So, before the diversion actually happens, the return address, 2, would need to be stored somewhere temporarily.

There could have been many choices of storing it in any of the available registers, or some memory location etc. But for (I believe good reason) it was decided that the return address would be stored onto the stack.

So what needs to be done now is increment the ESP (the stack pointer) such that the top of the stack now points at the next address on the stack. So TOS' (TOS before the increment) which was pointing to the address, say 292, now gets incremented & starts pointing to the address 293. That is where we put our return address 2. So something like this:

enter image description here

So it looks like now we have achieved our goal of temporarily storing the return address somewhere. We should now just go about making the diversion call. And we could. But there's a small problem. During the execution of the called function, the stack pointer, along with the other register values, could be manipulated multiple times.

Problem 2

So, although the return address of ours, is still stored on the stack, at location 293, after the called function finishes off executing, how would the execution flow know that it should now goto 293 & that's where it would find the return address?

So (I believe for good reason again) one of the ways of solving the above problem could be to store the stack address 293 (where the return address is) in a (designated) register called EBP. But then what about the contents of EBP? Would that not be overwritten? Sure, that's a valid point. So let's store the current contents of EBP on to the stack & then store this stack address into EBP. Something like this:

enter image description here

The stack pointer is incremented. The current value of EBP (denoted as EBP'), which is say xxx, is stored onto the top of the stack, i.e. at the address 294. Now that we have taken a backup of the current contents of EBP, we can safely put any other value onto the EBP. So we put the current address of the top of the stack, that is the address 294, in EBP.

With the above strategy in place, we solve for the Problem 2 discussed above. How? So now when the execution flow wants to know where from should it fetch the return address, it would :

  • first get the value from EBP out and point the ESP to that value. In our case, this would make TOS (top of stack) point to the address 294 (since that is what is stored in EBP).

  • Then it would restore the previous value of EBP. To do this it would simply take the value at 294 (the TOS), which is xxx (which was actually the older value of EBP), & put it back to EBP.

  • Then it would decrement the stack pointer to go to the next lower address in the stack which is 293 in our case. Thus finally reaching 293 (see that's what our problem 2 was). That's where it would find the return address, which is 2.

  • It will finally pop this 2 out into the EIP, that's the instruction that should have ideally been executed should the diversion have not happened, remember.

And the steps that we just saw being performed, with all the jugglery, to store the return address temporarily & then retrieve it is exactly what gets done with the function prolog (before the function call) & the epilog (before the function ret). The how was already answered, we just answered the why as well.

Just an end note: For the sake of brevity, I have not taken care of the fact that the stack addresses may grow the other way round.

Share:
28,514
user1843665
Author by

user1843665

Updated on March 16, 2021

Comments

  • user1843665
    user1843665 about 3 years

    I know data in nested function calls go to the Stack.The stack itself implements a step-by-step method for storing and retrieving data from the stack as the functions get called or returns.The name of these methods is most known as Prologue and Epilogue.

    I tried with no success to search material on this topic. Do you guys know any resource ( site,video, article ) about how function prologue and epilogue works generally in C ? Or if you can explain would be even better.

    P.S : I just want some general view, not too detailed.

  • Ivan Kush
    Ivan Kush over 7 years
    depends on calling convention
  • ZeZNiQ
    ZeZNiQ about 4 years
    Just for the sake of being complete, its good to mention that ret instruction does the opposite of the call instruction, that is, ret instruction must also do 2 things - pop the return address off of the stack using esp, then jump to that address to resume execution from there.
  • ZeZNiQ
    ZeZNiQ about 4 years
    Then it begs the question of who cleans up the passed arguments and when? My guess would be, as per x86 cdecl calling convention, the caller must be the one to push args BEFORE calling the call instruction, and therefore, it must be the same caller who will need to cleanup of the args off of the stack AFTER calling the ret instruction.
  • Peter Cordes
    Peter Cordes about 3 years
    @ZeZNiQ: This shows the function using ret, not ret 12 or whatever, so it's a caller-pops convention like i386 System V, or MSVC cdecl, where code in the caller right after call foo finds ESP unmodified by the call. So the caller could mov new args into that space for another call, instead of add esp,12 / push. Calling a function that ends with ret 12 would (from the caller's POV) be like running add esp,12 after whatever code the function ran.
  • Peter Cordes
    Peter Cordes about 3 years
    Some functions need to save more registers as part of their prologue, and reserve stack space with sub esp, N. If they don't do any more than push ebp / mov ebp, esp (you got that instruction backwards), then most compilers will use pop ebp / ret as the epilogue, because pop ebp is cheaper than leave, and does the same thing if ESP is still pointing to the saved EBP. Also, some calling conventions are callee-pops and end with ret 8 or whatever.
  • Arkan
    Arkan almost 2 years
    When pushing to the stack register, ESP will be decremented (not incremented).