Track Memory Usage in C++ and evaluate memory consumption

16,306

Solution 1

Finally I was able to solve the problem and will happily share my findings. In general the best tool to evaluate memory consumption of a program from my perspective is the Massif tool from Valgrind. it allows you to profile the heap consumption and gives you a detailed analysis.

To profile the heap of your application run valgrind --tool=massif prog now, this will give you basic access to all information about the typical memory allocation functions like malloc and friends. However, to dig deeper I activated the option --pages-as-heap=yes which will then report even the information about the underlaying system calls. To given an example here is something from my profiling session:

 67  1,284,382,720      978,575,360      978,575,360             0            0
100.00% (978,575,360B) (page allocation syscalls) mmap/mremap/brk, --alloc-fns, etc.
->87.28% (854,118,400B) 0x8282419: mmap (syscall-template.S:82)
| ->84.80% (829,849,600B) 0x821DF7D: _int_malloc (malloc.c:3226)
| | ->84.36% (825,507,840B) 0x821E49F: _int_memalign (malloc.c:5492)
| | | ->84.36% (825,507,840B) 0x8220591: memalign (malloc.c:3880)
| | |   ->84.36% (825,507,840B) 0x82217A7: posix_memalign (malloc.c:6315)
| | |     ->83.37% (815,792,128B) 0x4C74F9B: std::_Rb_tree_node<std::pair<std::string const, unsigned int> >* std::_Rb_tree<std::string, std::pair<std::string const, unsigned int>, std::_Select1st<std::pair<std::string const, unsigned int> >, std::less<std::string>, StrategizedAllocator<std::pair<std::string const, unsigned int>, MemalignStrategy<4096> > >::_M_create_node<std::pair<std::string, unsigned int> >(std::pair<std::string, unsigned int>&&) (MemalignStrategy.h:13)
| | |     | ->83.37% (815,792,128B) 0x4C7529F: OrderIndifferentDictionary<std::string, MemalignStrategy<4096>, StrategizedAllocator>::addValue(std::string) (stl_tree.h:961)
| | |     |   ->83.37% (815,792,128B) 0x5458DC9: var_to_string(char***, unsigned long, unsigned long, AbstractTable*) (AbstractTable.h:341)
| | |     |     ->83.37% (815,792,128B) 0x545A466: MySQLInput::load(std::shared_ptr<AbstractTable>, std::vector<std::vector<ColumnMetadata*, std::allocator<ColumnMetadata*> >*, std::allocator<std::vector<ColumnMetadata*, std::allocator<ColumnMetadata*> >*> > const*, Loader::params const&) (MySQLLoader.cpp:161)
| | |     |       ->83.37% (815,792,128B) 0x54628F2: Loader::load(Loader::params const&) (Loader.cpp:133)
| | |     |         ->83.37% (815,792,128B) 0x4F6B487: MySQLTableLoad::executePlanOperation() (MySQLTableLoad.cpp:60)
| | |     |           ->83.37% (815,792,128B) 0x4F8F8F1: _PlanOperation::execute_throws() (PlanOperation.cpp:221)
| | |     |             ->83.37% (815,792,128B) 0x4F92B08: _PlanOperation::execute() (PlanOperation.cpp:262)
| | |     |               ->83.37% (815,792,128B) 0x4F92F00: _PlanOperation::operator()() (PlanOperation.cpp:204)
| | |     |                 ->83.37% (815,792,128B) 0x656F9B0: TaskQueue::executeTask() (TaskQueue.cpp:88)
| | |     |                   ->83.37% (815,792,128B) 0x7A70AD6: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
| | |     |                     ->83.37% (815,792,128B) 0x6BAEEFA: start_thread (pthread_create.c:304)
| | |     |                       ->83.37% (815,792,128B) 0x8285F4B: clone (clone.S:112)
| | |     |                         
| | |     ->00.99% (9,715,712B) in 1+ places, all below ms_print's threshold (01.00%)
| | |     
| | ->00.44% (4,341,760B) in 1+ places, all below ms_print's threshold (01.00%)

As you can see ~85% of my memory allocation come from a single branch and the question is now why the memory consumption is so high, if the original heap profiling showed a normal consumption. If you look at the example you will see why. For allocation I used posix_memalign to make sure allocations happen to useful boundaries. This allocator was then passed down from the outer class to the inner member variables (a map in this case) to use the allocator for heap allocation. However, the boundary I choose was too large - 4096 - in my case. This means, you will allocate 4b using posix_memalign but the system will allocate a full page for you to align it correctly. If you now allocate many small values you will end up with lots of unused memory. This memory will not be reported by normal heap profiling tools since you allocate only a fraction of this memory, but the system allocation routines will allocate more and hide the rest.

To solve this problem, I switched to a smaller boundary and thus could drastically reduce the memory overhead.

As a conclusion of my hours spent in front of Massif & Co. I can only recommend to use this tool for deep profiling since it gives you a very good understanding of what is happening and allows tracking errors easily. For the use of posix_memalign the situation is different. There are cases where it is really necessary, however, for most cases you will just fine with a normal malloc.

Solution 2

According to this article ps/top report how much memory your program uses if it were the only program running. Assuming that your program e.g. uses a bunch of shared libraries such as STL which are already loaded into memory there is a gap between the amount of actual memory that is allocated due to the execution of your program vs how much memory it would allocate if it were the only process.

Share:
16,306
grundprinzip
Author by

grundprinzip

There is nothing really complicated :) -- Opinions are my own and not of my employer.

Updated on June 20, 2022

Comments

  • grundprinzip
    grundprinzip almost 2 years

    I came across the following problem with my code: I was using Valgrind and gperftools to perform heap checking and heap profiling to see if I release all the memory that I allocate. The output of these tools look good and it seems I'm not loosing memory. However, when I'm looking at top and the output of ps I'm confused because this basically does not represent what I'm observing with valgrind and gperftools.

    Here are the numbers:

    • Top reports: RES 150M
    • Valgrind (Massif) reports: 23M peak usage
    • gperftools Heap Profiler reports: 22.7M peak usage

    My question is now, where does the difference come from? I tried as well to track the stack usage in Valgrind but without any success.

    Some more details:

    • The process is basically loading data from mysql via the C api to an in-memory storage
    • Performing a leak check and breaking shortly after the loading is done, shows a definitive lost of 144 bytes, and 10M reachable, which fits the amount that is currently allocated
    • The library performs no complex IPC, it starts a few threads but only one of the threads is executing the work
    • It does not load other complex system libraries
    • the PSS size from /proc/pid/smaps corresponds to the RES size in TOP and ps

    Do you have any ideas, where this difference in reported memory consumption comes from? How can I validate that my program is behaving correctly? Do you have any ideas how I could further investigate this issue?

  • grundprinzip
    grundprinzip over 11 years
    I did check the with Massif and the stack size, as said above, but the stack size is constant around 15k.
  • grundprinzip
    grundprinzip over 11 years
    The article is where I got the tip about PSS size from, so I checked that. With further investigation, I'm now at a point where I some of the difference in memory consumption comes from the underlaying malloc() implementation that will allocate larger blocks and some weirdness in const char* to std::string conversions. Valgrinds massif with --pages-as-heap=yes option helped a lot.
  • Christian
    Christian over 11 years
    I see. Just an idea for further investigations: increase the amount of memory you alloc within your program (e.g. 24MB -> 512MB -> 1024MB) and observe if the undefined difference to top/ps output remains constant or also grows.