Minimizing the amount of malloc() calls improves performance?

21,258

Solution 1

Of course this completely depends on the malloc implementation, but in this case, with no calls to free, most malloc implementations will probably give you the same algorithmic speed.

As another answer commented, usually there will be a list of free blocks, but if you have not called free, there will just be one, so it should be O(1) in both cases.

This assumes that the memory allocated for the heap is big enough in both cases. In case #1, you will have allocated more total memory, as each allocation involves memory overhead to store meta-data, as a result you may need to call sbrk(), or equivalent to grow the heap in case #1, which would add an additional overhead.

They will probably be different due to cache and other second order effects, since the memory alignments for the new allocation won't be the same.

If you have been freeing some of the memory blocks, then it is likely that #2 will be faster due to less fragmentation, and so a smaller list of free blocks to search.

If you have freed all the memory blocks, it should end up being exactly the same, since any sane free implementation will have coalesced the blocks back into a single arena of memory.

Solution 2

You asked 2 questions:

  • for which application the next malloc() call will be faster, #1 or #2?
  • In other words: Does malloc() have an index of allocated locations in memory?

You've implied that they are the same question, but they are not. The answer to the latter question is YES.

As for which will be faster, it is impossible to say. It depends on the allocator algorithm, the machine state, the fragmentation in the current process, and so on.

Your idea is sound, though: you should think about how malloc usage will affect performance. There was once an app I wrote that used lots of little blobs of memory, each allocated with malloc(). It worked correctly but was slow. I replaced the many calls to malloc with just one, and then sliced up that large block within my app. It was much much faster.

I don't recommend this approach; it's just an illustration of the point that malloc usage can materially affect performance.

My advice is to measure it.

Solution 3

Malloc has to run through a linked list of free blocks to find one to allocate. This takes time. So, #1 will usually be slower:

  • The more often you call malloc, the more time it will take - so reducing the number of calls will give you a speed improvement (though whether it is significant will depend on your exact circumstances).

  • In addition, if you malloc many small blocks, then as you free those blocks, you will fragment the heap much more than if you only allocate and free a few large blocks. So you are likely to end up with many small free blocks on your heap rather than a few big blocks, and therefore your mallocs may have to search further through the free-space lists to find a suitable block to allocate. WHich again will make them slower.

Solution 4

These are of course implementation details, but typically free() will insert the memory into a list of free blocks. malloc() will then look at this list for a free block that is the right size, or larger. Typically, only if this fails does malloc() ask the kernel for more memory.

There are also other considerations, such as when to coalesce multiple adjacent blocks into a single, larger block.

And, another reason that malloc() is expensive: If malloc() is called from multiple threads, there must be some kind of synchronization on these global structures. (i.e. locks.) There exist malloc() implementations with different optimization schemes to make it better for multple threads, but generally, keeping it multi-thread safe adds to the cost, as multiple threads will contend for those locks and block progress on each other.

Solution 5

You can always do a better job using malloc() to allocate a large chunk of memory and sub-dividing it yourself. Malloc() was optimized to work well in the general case and makes no assumptions whether or not you use threads or what the size of the program's allocations might be.

Whether it is a good idea to implement your own sub-allocator is a secondary question. It rarely is, explicit memory management is already hard enough. You rarely need another layer of code that can screw up and crash your program without any good way to debug it. Unless you are writing a debug allocator.

Share:
21,258
Dor
Author by

Dor

Updated on January 18, 2020

Comments

  • Dor
    Dor over 4 years

    Consider two applications: one (num. 1) that invokes malloc() many times, and the other (num. 2) that invokes malloc() few times. Both applications allocate the same amount of memory (assume 100MB).
    For which application the next malloc() call will be faster, #1 or #2?
    In other words: Does malloc() have an index of allocated locations in memory?