Cost of static memory allocation vs dynamic memory allocation in C

13,300

Solution 1

Static allocation will be much faster. Static allocation can happen at global scope, and on the stack.

In global scope statically allocated memory is built into the binary image. That is the total size of the memory required, and where it is to be located in the running binary is computed at compile time. Then when the program loads the operating system loader will allocate enough memory for all the global static arrays. I'm pretty sure it happens in constant time for all the allocations. ( e.g. more allocations don't cost more time )

In local scope, static allocations are allocated on the stack. This involves simply reserving a fixed number of bytes on the stack, and happens in constant time per allocation. Stack space is very limited.

Dynamic memory must be allocated from a heap, and even in the best case most allocations will take time that scales more than linear with each allocation, like n log n time or something.

Also practically speaking the dynamic allocation will be many times slower than static allocation.

@update: As has been pointed out in comments below: stack allocations are not technically static allocations ( but they are allocations with the syntactic form used in OP's question ).

Also when speaking about "allocation time", I'm considering total time to manage the memory ( alloc and free ).

In some dynamic allocators allocation time is faster than freeing time.

It is also true that some fast allocators trade memory efficiency for allocation speed. In these cases static is still "better" in that static and stack allocs are no time and constant time respectively while allocating exact sized blocks.

Dynamic allocators to be fast trade off significant memory efficiency ( e.g. buddy allocators round up to the next power of two sized block, like 33k alloc will use 64k )

Solution 2

True static allocations (globals, and local variables marked static) are glued into sections and loaded along with the rest of the section at load time (execve) using mmap. They're essentially free, already covered by the cost of loading the program.

Automatic arrays with statically known sizes can be glued together with other local variables and allocated by adjusting the stack pointer. That's essentially one integer subtraction (the stack grows downwards) or something close to 1 ns on modern processors.

Variable length arrays can't be glued to other variables so they should cost about 1 ns each.

I tried measuring mallocs with different sizes in a single threaded process and I got this, which would imply that malloc+free pairs cost about 50-100 times as much as stack variables for allocations <16MiB. After that, it spikes upwards (32MiB and 64MiB both cost about a hundred times as much as allocations <=16MiB do):

Malloc times

These are just the costs of obtaining the pointer though. Actual access may cause page faults and thereby further increase the cost of the memory block. The page faults should be extremely rare with stack allocations, but likely with first-time accesses to dynamically allocated memory.

Additionally, many small dynamic allocations will put quite a strain on caches as your memory will be fragmented. (You can reduce this fragmentation by using memory pools, either your own or the obstack functionality available as part of glibc.)

Solution 3

Global memory is normally allocated when your program (or shared module/dll) loads and pre-populated with its initial values. This typically has its own memory section.

The stack is a block (series of memory pages) of memory allocated by the kernel when you create a new thread. Allocating stack memory on the stack typically is done by decremeting the stack pointer (one machine instruction - (stack a typically full descening - newer allocations have lower addresses). When a stack object is deleted, the stack pointer is incremented). Stacks therefore have strict first in last out semantic. Stack are also fixed size so when you run out you get a stack overflow.

When one uses malloc (in C) or new (C++), memory is allocated on the heap/free-store. This is any number memory block. When more memory is needed than has been already allocated, malloc goes to the kernel to request more memory from the system. Typically malloc will use freed memory already obtained from the system.

Calls to malloc must be assumed to be slow and should be avoided for any performance critical or real-time routines. This is for 2 reasons.

  1. The heap needs to allocate and free any size of memory in any order. Older algorithms used to walk a list of freed blocks until one of suitable size was found. Since memory could be highly fragmented, this could potentially be slow. If the heap is used on multiple threads some sort of locking needs to be provided. Much research has been done to improve this situation and modern heaps the jemalloc and tcmalloc do improve things. However, don't count on this.
  2. If required memory cannot be allocated from what is already managed by the heap allocator, the heap allocator will need to ask the kernel for more memory. (On unix this is done with the mmap or sbrk system calls). The kernel needs to find some unallocated memory pages and the has to map these pages into your processes memory space). Any form of caching goes out the window). So expect this to be super slow (for a computer).

So if you need do any performance critical processing, allocate all the memory up front and then reuse the already allocated memory. Always assume calls to malloc and free will be slow.

Solution 4

If you know exactly how much space you need to set aside and your primary concern is performance with respect to memory management and your code doesn't need to be re-entrant, then you will want to allocate objects with static storage duration, either by declaring them at file scope or by using the static storage class specifier.

int data[SOME_NUMBER];

void foo( /* some list of parameters here */ )
{
  static int some_more_data[SOME_OTHER_NUMBER];
  ...
}

Both data and some_more_data exist over the lifetime of the program, but some_more_data is only visible within the foo function1.

In practice, items with static storage duration have space set aside for them in the binary image itself, so that memory is available as soon as the program is loaded. This may have an advantage as far as locality is concerned; if the data and the code are in the same page, there's no need to swap. Of course, it depends on the code.

The obvious drawback is that your executable file will have a larger footprint. Another drawback is that your code isn't re-entrant; every invocation of foo is working on the same block of memory when it reads or writes some_more_data. Depending on what your code does, that may or may not be a big deal.

If your code needs to be re-entrant, then you can't use objects with static storage duration. If the block of data is relatively small, use objects with automatic storage duration (i.e., regular local variables).

void foo( /* some list of parameters here */ )
{
  int some_more_data[SOME_SMALLISH_NUMBER];
  ...
}

In this case, some_more_data only exists for the lifetime of the foo function; each time you call foo, it allocates another some_more_data object automatically. Whatever overhead there is in setting aside the memory is part of the overhead of calling the function in the first place (at least in my experience). The stack pointer still gets adjusted whether you're using local variables or not, so using local variables won't make it any slower. The main issue is that the available memory for automatic objects is relatively small; for objects above a certain size, this approach simply won't work.

If your code needs to be re-entrant and you need to allocate large blocks of memory, you're pretty much stuck with dynamic memory management (malloc/calloc/realloc and free). Depending on how you design your code, you can minimize some of the performance issues.


1. Visibility rules are enforced during translation from source to machine code; they don't really apply at run time.
Share:
13,300
samarasa
Author by

samarasa

Updated on June 25, 2022

Comments

  • samarasa
    samarasa almost 2 years

    I am very interested to know what is the preferred method of memory allocation static vs dynamic is good for performance (e.g., running time) when you know the exact number of objects/items in C on Linux. Cost for a small number of objects (small amount of memory) and as well as for a large number of objects (huge amount of memory).

    e.g., type A[N] vs type *A = malloc(sizeof(type) * N)

    Please let me know. Thank you.

    Note: We can benchmark this and probably know the answer. But I would like to know the concepts that explain the performance differences between these two allocation methods.

  • samarasa
    samarasa about 9 years
    Thank you very much for the answer. Please upvote for my question as I need some points to upvote the answers.
  • Charles Salvia
    Charles Salvia about 9 years
    The complexity of heap memory allocation algorithms varies a lot. Common algorithms like first-fit/best-fit are usually linear, unless there is some clever indexing data-structure. Binary buddy can be constant time for allocation and logarithmic for freeing - and simple-segregated-storage/slab allocators based on free lists for some predefined size (like powers of 2)
  • pm100
    pm100 about 9 years
    i dont really think you should say that stack space is statically allocated. Its allocated as needed and then released - thats not static
  • doron
    doron about 9 years
    Both the memory for static an stack memory (for the primary thread) are allocated at load time. By the same logic, allocating a large block of memory early on in the execution phase should not present a problem. Constant use of malloc and free however may present a performance bottle-neck.
  • Damon
    Damon about 9 years
    While heap allocation can be expensive (often freeing is more expensive than allocating), the overhead of heap allocation is usually over-estimated. It's one of those monsters under your bed that is gone when you turn on the light. I've been disappointed while benchmarking a lockfree producer-consumber queue that would only deliver 18-20 million per second when I expected something like 2-3 times as many. Until my collegue asked me: "Uh, you do realize that you're doing 20 million heap allocations per second, too?". Now anything you can do 20M times per second isn't precisely worrysome.
  • John Bode
    John Bode about 9 years
    @doron: but setting aside memory for a particular object on the stack is not done until function entry. True, it's just a matter of adjusting the stack pointer, which is done as part of the function call overhead anyway. Like I said, using an auto variable won't make things any slower.
  • Abhishek Jaiswal
    Abhishek Jaiswal about 3 years
    @Rafael Baptista One thing, I want to know from you... "It is to be located in the running binary is computed at compile time." it means that compiler has an idea about the location of memory(for the variables, arrays, etc). But if I run the exe file generated in one PC by the compiler on another PC. Then according to your statement compiler will have an idea about the location of memory allocation in another PC too?
  • Rafael Baptista
    Rafael Baptista almost 3 years
    @AbhishekJaiswal Statically allocated memory size is known exactly at compile time. The binary contains a list of byte offsets. Ex: four static allocations of 32k each, then noted in the binary would be to reserve 128k , with offsets at 32k, 64k, 96k etc. When the OS loads the binary image it will allocate enough memory to hold the binary and also the static allocations. The exact addresses will depend on where the binary image is loaded. Each process gets its own virtual address space such that the addresses are always the same - but at the hardware level the addresses are not predictable.