What happens when a computer program runs?

c++ memory operating-system x86 computer-architecture

26,262

Solution 1

It really depends on the system, but modern OSes with virtual memory tend to load their process images and allocate memory something like this:

+---------+
|  stack  |  function-local variables, return addresses, return values, etc.
|         |  often grows downward, commonly accessed via "push" and "pop" (but can be
|         |  accessed randomly, as well; disassemble a program to see)
+---------+
| shared  |  mapped shared libraries (C libraries, math libs, etc.)
|  libs   |
+---------+
|  hole   |  unused memory allocated between the heap and stack "chunks", spans the
|         |  difference between your max and min memory, minus the other totals
+---------+
|  heap   |  dynamic, random-access storage, allocated with 'malloc' and the like.
+---------+
|   bss   |  Uninitialized global variables; must be in read-write memory area
+---------+
|  data   |  data segment, for globals and static variables that are initialized
|         |  (can further be split up into read-only and read-write areas, with
|         |  read-only areas being stored elsewhere in ROM on some systems)
+---------+
|  text   |  program code, this is the actual executable code that is running.
+---------+

This is the general process address space on many common virtual-memory systems. The "hole" is the size of your total memory, minus the space taken up by all the other areas; this gives a large amount of space for the heap to grow into. This is also "virtual", meaning it maps to your actual memory through a translation table, and may be actually stored at any location in actual memory. It is done this way to protect one process from accessing another process's memory, and to make each process think it's running on a complete system.

Note that the positions of, e.g., the stack and heap may be in a different order on some systems (see Billy O'Neal's answer below for more details on Win32).

Other systems can be very different. DOS, for instance, ran in real mode, and its memory allocation when running programs looked much differently:

+-----------+ top of memory
| extended  | above the high memory area, and up to your total memory; needed drivers to
|           | be able to access it.
+-----------+ 0x110000
|  high     | just over 1MB->1MB+64KB, used by 286s and above.
+-----------+ 0x100000
|  upper    | upper memory area, from 640kb->1MB, had mapped memory for video devices, the
|           | DOS "transient" area, etc. some was often free, and could be used for drivers
+-----------+ 0xA0000
| USER PROC | user process address space, from the end of DOS up to 640KB
+-----------+
|command.com| DOS command interpreter
+-----------+ 
|    DOS    | DOS permanent area, kept as small as possible, provided routines for display,
|  kernel   | *basic* hardware access, etc.
+-----------+ 0x600
| BIOS data | BIOS data area, contained simple hardware descriptions, etc.
+-----------+ 0x400
| interrupt | the interrupt vector table, starting from 0 and going to 1k, contained 
|  vector   | the addresses of routines called when interrupts occurred.  e.g.
|  table    | interrupt 0x21 checked the address at 0x21*4 and far-jumped to that 
|           | location to service the interrupt.
+-----------+ 0x0

You can see that DOS allowed direct access to the operating system memory, with no protection, which meant that user-space programs could generally directly access or overwrite anything they liked.

In the process address space, however, the programs tended to look similar, only they were described as code segment, data segment, heap, stack segment, etc., and it was mapped a little differently. But most of the general areas were still there.

Upon loading the program and necessary shared libs into memory, and distributing the parts of the program into the right areas, the OS begins executing your process wherever its main method is at, and your program takes over from there, making system calls as necessary when it needs them.

Different systems (embedded, whatever) may have very different architectures, such as stackless systems, Harvard architecture systems (with code and data being kept in separate physical memory), systems which actually keep the BSS in read-only memory (initially set by the programmer), etc. But this is the general gist.

You said:

I also know that a computer program uses two kinds of memory: stack and heap, which are also part of the primary memory of the computer.

"Stack" and "heap" are just abstract concepts, rather than (necessarily) physically distinct "kinds" of memory.

A stack is merely a last-in, first-out data structure. In the x86 architecture, it can actually be addressed randomly by using an offset from the end, but the most common functions are PUSH and POP to add and remove items from it, respectively. It is commonly used for function-local variables (so-called "automatic storage"), function arguments, return addresses, etc. (more below)

A "heap" is just a nickname for a chunk of memory that can be allocated on demand, and is addressed randomly (meaning, you can access any location in it directly). It is commonly used for data structures that you allocate at runtime (in C++, using new and delete, and malloc and friends in C, etc).

The stack and heap, on the x86 architecture, both physically reside in your system memory (RAM), and are mapped through virtual memory allocation into the process address space as described above.

The registers (still on x86), physically reside inside the processor (as opposed to RAM), and are loaded by the processor, from the TEXT area (and can also be loaded from elsewhere in memory or other places depending on the CPU instructions that are actually executed). They are essentially just very small, very fast on-chip memory locations that are used for a number of different purposes.

Register layout is highly dependent on the architecture (in fact, registers, the instruction set, and memory layout/design, are exactly what is meant by "architecture"), and so I won't expand upon it, but recommend you take an assembly language course to understand them better.

Your question:

At what point is the stack used for the execution of the instructions? Instructions go from the RAM, to the stack, to the registers?

The stack (in systems/languages that have and use them) is most often used like this:

int mul( int x, int y ) {
    return x * y;       // this stores the result of MULtiplying the two variables 
                        // from the stack into the return value address previously 
                        // allocated, then issues a RET, which resets the stack frame
                        // based on the arg list, and returns to the address set by
                        // the CALLer.
}

int main() {
    int x = 2, y = 3;   // these variables are stored on the stack
    mul( x, y );        // this pushes y onto the stack, then x, then a return address,
                        // allocates space on the stack for a return value, 
                        // then issues an assembly CALL instruction.
}

Write a simple program like this, and then compile it to assembly (gcc -S foo.c if you have access to GCC), and take a look. The assembly is pretty easy to follow. You can see that the stack is used for function local variables, and for calling functions, storing their arguments and return values. This is also why when you do something like:

f( g( h( i ) ) );

All of these get called in turn. It's literally building up a stack of function calls and their arguments, executing them, and then popping them off as it winds back down (or up ;). However, as mentioned above, the stack (on x86) actually resides in your process memory space (in virtual memory), and so it can be manipulated directly; it's not a separate step during execution (or at least is orthogonal to the process).

FYI, the above is the C calling convention, also used by C++. Other languages/systems may push arguments onto the stack in a different order, and some languages/platforms don't even use stacks, and go about it in different ways.

Also note, these aren't actual lines of C code executing. The compiler has converted them into machine language instructions in your executable. ~~They are then (generally) copied from the TEXT area into the CPU pipeline, then into the CPU registers, and executed from there.~~ [This was incorrect. See Ben Voigt's correction below.]

Solution 2

Sdaz has gotten a remarkable number of upvotes in a very short time, but sadly is perpetuating a misconception about how instructions move through the CPU.

The question asked:

Instructions go from the RAM, to the stack, to the registers?

Sdaz said:

Also note, these aren't actual lines of C code executing. The compiler has converted them into machine language instructions in your executable. They are then (generally) copied from the TEXT area into the CPU pipeline, then into the CPU registers, and executed from there.

But this is wrong. Except for the special case of self-modifying code, instructions never enter the datapath. And they are not, cannot be, executed from the datapath.

The x86 CPU registers are:

General registers EAX EBX ECX EDX
Segment registers CS DS ES FS GS SS
Index and pointers ESI EDI EBP EIP ESP
Indicator EFLAGS

There are also some floating-point and SIMD registers, but for the purposes of this discussion we'll classify those as part of the coprocessor and not the CPU. The memory-management unit inside the CPU also has some registers of its own, we'll again treat that as a separate processing unit.

None of these registers are used for executable code. EIP contains the address of the executing instruction, not the instruction itself.

Instructions go through a completely different path in the CPU from data (Harvard architecture). All current machines are Harvard architecture inside the CPU. Most these days are also Harvard architecture in the cache. x86 (your common desktop machine) are Von Neumann architecture in the main memory, meaning data and code are intermingled in RAM. That's beside the point, since we're talking about what happens inside the CPU.

The classic sequence taught in computer architecture is fetch-decode-execute. The memory controller looks up the instruction stored at the address EIP. The bits of the instruction go through some combinational logic to create all the control signals for the different multiplexers in the processor. And after some cycles, the arithmetic logic unit arrives at a result, which is clocked into the destination. Then the next instruction is fetched.

On a modern processor, things work a little differently. Each incoming instruction is translated into a whole series of microcode instructions. This enable pipelining, because the resources used by the first microinstruction aren't needed later, so they can begin working on the first microinstruction from the next instruction.

To top it off, terminology is slightly confused because register is an electrical engineering term for a collection of D-flipflops. And instructions (or especially microinstructions) may very well be stored temporarily in such a collection of D-flipflops. But this is not what is meant when a computer scientist or software engineer or run-of-the-mill developer uses the term register. They mean the datapath registers as listed above, and these are not used for transporting code.

The names and number of datapath registers vary for other CPU architectures, such as ARM, MIPS, Alpha, PowerPC, but all of them execute instructions without passing them through the ALU.

Solution 3

The exact layout of the memory while a process is executing is completely dependent on the platform which you're using. Consider the following test program:

#include <stdlib.h>
#include <stdio.h>

int main()
{
    int stackValue = 0;
    int *addressOnStack = &stackValue;
    int *addressOnHeap = malloc(sizeof(int));
    if (addressOnStack > addressOnHeap)
    {
        puts("The stack is above the heap.");
    }
    else
    {
        puts("The heap is above the stack.");
    }
}

On Windows NT (and it's children), this program is going to generally produce:

The heap is above the stack

On POSIX boxes, it's going to say:

The stack is above the heap

The UNIX memory model is quite well explained here by @Sdaz MacSkibbons, so I won't reiterate that here. But that is not the only memory model. The reason POSIX requires this model is the sbrk system call. Basically, on a POSIX box, to get more memory, a process merely tells the Kernel to move the divider between the "hole" and the "heap" further into the "hole" region. There is no way to return memory to the operating system, and the operating system itself does not manage your heap. Your C runtime library has to provide that (via malloc).

This also has implications for the kind of code actually used in POSIX binaries. POSIX boxes (almost universally) use the ELF file format. In this format, the operating system is responsible for communications between libraries in different ELF files. Therefore, all the libraries use position-independent code (That is, the code itself can be loaded into different memory addresses and still operate), and all calls between libraries are passed through a lookup table to find out where control needs to jump for cross library function calls. This adds some overhead and can be exploited if one of the libraries changes the lookup table.

Windows' memory model is different because the kind of code it uses is different. Windows uses the PE file format, which leaves the code in position-dependent format. That is, the code depends on where exactly in virtual memory the code is loaded. There is a flag in the PE spec which tells the OS where exactly in memory the library or executable would like to be mapped when your program runs. If a program or library cannot be loaded at it's preferred address, the Windows loader must rebase the library/executable -- basically, it moves the position-dependent code to point at the new positions -- which doesn't require lookup tables and cannot be exploited because there's no lookup table to overwrite. Unfortunately, this requires very complicated implementation in the Windows loader, and does have considerable startup time overhead if an image needs to be rebased. Large commercial software packages often modify their libraries to start purposely at different addresses to avoid rebasing; windows itself does this with it's own libraries (e.g. ntdll.dll, kernel32.dll, psapi.dll, etc. -- all have different start addresses by default)

On Windows, virtual memory is obtained from the system via a call to VirtualAlloc, and it is returned to the system via VirtualFree (Okay, technically VirtualAlloc farms out to NtAllocateVirtualMemory, but that's an implementation detail) (Contrast this to POSIX, where memory cannot be reclaimed). This process is slow (and IIRC, requires that you allocate in physical page sized chunks; typically 4kb or more). Windows also provides it's own heap functions (HeapAlloc, HeapFree, etc.) as part of a library known as RtlHeap, which is included as a part of Windows itself, upon which the C runtime (that is, malloc and friends) is typically implemented.

Windows also has quite a few legacy memory allocation APIs from the days when it had to deal with old 80386s, and these functions are now built on top of RtlHeap. For more information about the various APIs that control memory management in Windows, see this MSDN article: http://msdn.microsoft.com/en-us/library/ms810627 .

Note also that this means on Windows a single process an (and usually does) have more than one heap. (Typically, each shared library creates it's own heap.)

(Most of this information comes from "Secure Coding in C and C++" by Robert Seacord)

Solution 4

The stack

In X86 architercture the CPU executes operations with registers. The stack is only used for convenience reasons. You can save the content of your registers to stack before calling a subroutine or a system function and then load them back to continue your operation where you left. (You could to it manually without the stack, but it is a frequently used function so it has CPU support). But you can do pretty much anything without the stack in a PC.

For example an integer multiplication:

MUL BX

Multiplies AX register with BX register. (The result will be in DX and AX, DX containing the higher bits).

Stack based machines (like JAVA VM) use the stack for their basic operations. The above multiplication:

DMUL

This pops two values from the top of the stack and multiplies tem, then pushes the result back to the stack. Stack is essential for this kind of machines.

Some higher level programming languages (like C and Pascal) use this later method for passing parameters to functions: the parameters are pushed to the stack in left to right order and popped by the function body and the return values are pushed back. (This is a choice that the compiler manufacturers make and kind of abuses the way the X86 uses the stack).

The heap

The heap is an other concept that exists only in the realm of the compilers. It takes the pain of handling the memory behind your variables away, but it is not a function of the CPU or the OS, it is just a choice of housekeeping the memory block wich is given out by the OS. You could do this manyually if you want.

Accessing system resources

The operating system has a public interface how you can access its functions. In DOS parameters are passed in registers of the CPU. Windows uses the stack for passing parameters for OS functions (the Windows API).

View more solutions

26,262

gaijinco

Updated on April 07, 2020

Comments

gaijinco about 4 years

I know the general theory but I can't fit in the details.

I know that a program resides in the secondary memory of a computer. Once the program begins execution it is entirely copied to the RAM. Then the processor retrive a few instructions (it depends on the size of the bus) at a time, puts them in registers and executes them.

I also know that a computer program uses two kinds of memory: stack and heap, which are also part of the primary memory of the computer. The stack is used for non-dynamic memory, and the heap for dynamic memory (for example, everything related to the new operator in C++)

What I can't understand is how those two things connect. At what point is the stack used for the execution of the instructions? Instructions go from the RAM, to the stack, to the registers?
- mkelley33 over 13 years
  
  +1 for asking a fundamental question!
- Andrey over 13 years
  
  hmm... you know, they write books about that. Do you really want to study this part of OS architecture with the help of SO?
- mkelley33 over 13 years
  
  I added a couple of tags based on the memory-related nature of the question, and the reference to C++, although I think a good answer could also come from someone knowledgeable in Java or C#!)
- Maxpm over 13 years
  
  Upvoted and favorited. I've always been too afraid to ask...
- Billy ONeal over 13 years
  
  Added the x86 and x86-64 tags; all of the answers here assume an x86 style architecture. C++ itself actually doesn't define this at all. C++ has three storage locations: automatic storage (often called the "stack"), static storage (global variables, class statics, function local statics), and dynamic storage (malloc/free; new/delete). How these are provided is left up to the implementor of C/C++ on the target machine.
- Admin over 13 years
  
  The term "puts them in registers" isn't quite right. On most processors, registers are used to hold intermediate values, not executable code.
- Jim Balter over 13 years
  
  You say "I know" and then follow that with things that aren't true, or at least are inaccurate. Better not to assume that you know things. And despite all the effort people pout into answering your question, you haven't accepted any of the answers. Do better.
- Mackie Messer about 13 years
  
  Nice question. You should get an introductory text on operating systems. Andrew Tanenbaum's Operating Systems: Design and Implementation comes to mind.
- Nolin M. almost 9 years
  
  It is a good question
- Ciro Santilli OurBigBook.com over 8 years
  
  Voting to close as too broad. Related for Linux: stackoverflow.com/questions/8352535/…
Andrey over 13 years

sorry, but a good book recommendation would be a better answer, IMO
Sdaz MacSkibbons over 13 years

Yeah, "RTFM" is always better.
mkelley33 over 13 years

@Andrey: maybe you should change that comment to "also, you might want to read your-good-book-recommendation" I understand that this sort of question merits more investigation, but whenever you have to begin a comment with "sorry but..." perhaps you should really consider flagging the post for moderator attention or at least offering an explanation as to why your opinion should matter to anyone anyway.
Andrey over 13 years

I didn't want to be rough, sorry. It appeared obvious for me that the topic, to be completely understood, requires a deep study. Sdaz's answer is great. But topic starter would still need to become acquainted with either assembly for the platform he is interested in or any of these: "operating systems" by A. Tanenbaum, "Security Tools and exploits" by J.Foster, V.Liu and "Secure Coding in C and C++" by R. Seacord. The last one might be the best in scope of this topic. I also believe that debugging of a sample C function with one's favorite debugger would have been even better course.
Kizaru over 13 years

@Audrey You think debugging C with gdb would tell how you 'what happens when a computer program runs?'
Andrey over 13 years

@Kizaru: surely debugging a C program with windbg tells me a lot about what lies in stack. If gdb doesn't allow to do so, why would anyone use it at all?
phooji over 13 years

@Sdaz MacSkibbons: +1, But I think the question shows some fundamental misunderstandings that you could address in more detail. In particular, the OP seems to conflate hardware concepts (RAM) and intangible concepts (stack, heap): "which are also part of the primary memory of the computer".
Sdaz MacSkibbons over 13 years

@phooji: Good catch. Added to the above.
Merlyn Morgan-Graham over 13 years

@Andrey: Maybe you should post as an answer that list of books and your rationale as to why this requires further reading, and cannot be fully answered here.
Maxpm over 13 years

Excellent answer. It certainly cleared some things up for me!
Billy ONeal over 13 years

The idea of the stack being at the top isn't true in all systems. Windows NT, for example, does not do this. I'm not sure about Linux but I don't think it does either.
Sdaz MacSkibbons over 13 years

@Billy: Linux (on x86) does exactly this; it was my primary model for this explanation. There are, of course, many architectures and operating systems, which is why I tried to be as careful as possible with my language. I'm not familiar with the NT process memory layout; please add an answer describing it.
Billy ONeal over 13 years

Example: The following program produces "The heap is above the stack" on NT. pastebin.com/YBLWpzMc
Sdaz MacSkibbons over 13 years

Thanks for the clarification. I was hesitant to add that as I'm not intimately familiar with it, but did it at someone else's request.
Billy ONeal over 13 years

@Sdaz: Okay, answered -> stackoverflow.com/questions/5162580/…
Sdaz MacSkibbons over 13 years

Great info, thanks! Hope "user487117" eventually actually comes back. :-)
Mikael Persson over 13 years

@Sdaz: Very nice answer! I was aware of all you said though. Maybe I missed something in your explanation, but I didn't see what I have always had little understanding of: What about cache memory? What and how do the necessary chunks of memory get put into cache? If you know, it would be a great addition (or maybe a good reference?).
Sdaz MacSkibbons over 13 years

@Mikael: I'm hesitant to add more on that, as while I have a general idea for more straightforward architectures, I spend more time on the other side of the CPU, and my concepts may be outdated with respect to modern multi-core processors. You may wish to ask Ben Voigt below for clarification on CPU cache.
Bjarke Freund-Hansen over 13 years

s/ARM/RAM/ in "meaning data and code are intermingled in ARM". Right?
Ben Voigt over 13 years

@bjarkef: The first time yes, but not the second. I'll fix it.
Ben Voigt over 13 years

@Mikael: Depending on the implementation, you may have mandatory caching, in which case any time data is read from memory, an entire cache line is read and the cache is populated. Or it may be possible to give the cache manager a hint that the data will only be needed once, so copying it into cache isn't helpful. That's for read. For write there are write-back and write-through caches, which affect when DMA controllers can read the data, and then there a whole host of cache coherency protocols for dealing with multiple processors each having its own cache. This really deserves its own Q.
Andrej Mitrović over 13 years

I thought bss was used for uninitialized data, not constants?
Euro Micelli over 11 years

About the DOS memory layout, a refinement: to maximize user memory, command.com moved the bulk of itself to the top of what you called USER PROC, leaving only a stub at the bottom. In DOS, a user program owned all memory from its base address to the 640K mark, including the area now occupied by command.com. After a program exited, the stub would run a CRC on the block where command.com had been moved. If the CRC matched (the program hadn't used the memory), command.com just ran from there again. Otherwise, the stub would prompt with "insert disk with command.com in drive A" to reload itself.
Peter Cordes over 7 years

Registers aren't "loaded from the TEXT area". The stack pointer starts out pointing to the stack memory the OS has set up for your process. The others start out holding random garbage. (Or more likely, all zeros since the OS zeros them to avoid information leaks of kernel data into user processes). The ABI for the platform defines what you can expect to find in memory for a fresh process (e.g. in the System V ABI, ESP/RSP points at argc, and argv is just above it. IIRC, Windows might put a DLL exit function address as a fake "return address". See stackoverflow.com/tags/x86/info)