How to check for memory leaks in a large scale c++ Linux application?

15,134

Solution 1

First of all...

and we reached the point when it is mandatory to make a roundup of checks for memory leaks.

This, actually, is a problem of methodology. Correctness should be the primary goal of any software piece and not an afterthought.

I will suppose though that you now realize this and how much easier it would have been to identify the problems had you been running an instrumented unit test at each commit.


So, what to do now ?

  • Runtime detection:

    • Try to make Valgrind work, you probably have some environmental issues
    • Try ASan, ThreadSan and MemSan; they are not trivial to setup under Linux but oh so impressive!
    • Try instrumented builds: tcmalloc includes a heap-checker for example
    • ...
  • Compile time detection:

    • Turn on the warnings (preferably with -Werror) (not specific to your issue)
    • Use static analysis, such as Clang's, it may spot unpaired allocation routines
    • ...
  • Human detection:

    • Code reviews: make sure all resources are allocated within RAII classes
    • ...

Note: using only RAII classes helps removing memory leaks, but does not help with dangling references. Thankfully, detecting dangling references is what ASan does.


And once you have patched all the issues, make sure that this becomes part of the process. Changes should be reviewed and tested, always, so that rotten eggs are culled immediately rather than left to stink up the code base.

Solution 2

Instead of giving up on Valgrind, you should instead work with them and try to

  • get rid of the bugs you encountered in Valgrind
  • get your app thoroughly tested and debugged with the updated Valgrind.

Saying you gave up on Valgrind which is the solution to your problem isn't helping really...

Valgrind is the tool we all use to check for memory-leaks and threading issues under linux.

In the end, it's definitely better to invest time in figuring out "why Valgrind doesn't work with my app" rather than looking for alternate solutions. Valgrind is a proved and tested tool, but not perfect. And it beats the alternative methods by a long, long shot.

The Valgrind page says it's better to submit bugs to Bugzilla, but it's actually better to ask around on https://lists.sourceforge.net/lists/listinfo/valgrind-users if anyone saw such issues before and what to do in such a situation. Worst-case scenario - they'll tell you to file a bug to bugzilla or file it themselves.

Solution 3

You probably want to look at valgrind.

And you just may want to start with really simple examples to get a feel for what valgrind reports which can be somewhat verbose. Consider this simplified example where valgrind exactly what and how much is missing:

edd@max:/tmp$ cat valgrindex.cpp 

#include <cstdlib>

int main() {
  double *a = new double[100];
  exit(0);
}
edd@max:/tmp$ g++ -o valgrindex valgrindex.cpp 
edd@max:/tmp$ valgrind ./valgrindex
==15910== Memcheck, a memory error detector
==15910== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==15910== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==15910== Command: ./valgrindex
==15910== 
==15910== 
==15910== HEAP SUMMARY:
==15910==     in use at exit: 800 bytes in 1 blocks
==15910==   total heap usage: 1 allocs, 0 frees, 800 bytes allocated
==15910== 
==15910== LEAK SUMMARY:
==15910==    definitely lost: 0 bytes in 0 blocks
==15910==    indirectly lost: 0 bytes in 0 blocks
==15910==      possibly lost: 0 bytes in 0 blocks
==15910==    still reachable: 800 bytes in 1 blocks
==15910==         suppressed: 0 bytes in 0 blocks
==15910== Rerun with --leak-check=full to see details of leaked memory
==15910== 
==15910== For counts of detected and suppressed errors, rerun with: -v
==15910== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
edd@max:/tmp$ 

Solution 4

Results from free and top will not be helpful to you. I regret that you put effort into constructing graphs for their results. I've given a good explanation why they are unhelpful in a similar topic here: Memory stability of a C++ application in Linux.

I will also concur with other answers here that you should probably prioritize troubleshooting the crash you're encountering in Valgrind. Valgrind is considered very stable at this point, and I have personally run rather complex multi-threaded multimedia SDL/OpenGL/etc. apps through it without issue. It is much more likely that Valgrind's running environment is exposing likely instabilities in your application. The crash sounds like a thread race condition crash, though it may also be heap/memory corruption.

What you may want to ask for, then, is advice on how to debug an app that's crashing from within Valgrind's running environment (which is something I don't know the answer to).

Solution 5

The trouble with free and top is they can show you a problem, but they give little help in fixing the problem. Of the 100's or 1000's of lines of code that allocate memory, which ones are leaking? This is where valgrind helps.

If this is for a company with a budget for tools, you might look at purify or other commercial tools.

Just for completeness I will mention the Boehm conservative garbage collecting memory allocator (which works for C and C++ code). You can turn off GC and use GC_Free() and it becomes a leak detection tool. Or you can leave GC enabled to automatically free memory when no longer used.

Share:
15,134
BaroneAshura
Author by

BaroneAshura

Writing this in order to complete the section. Unfortunately I really value my own privacy, so I just fill every required field with bogus but consistent things (though I probably am the only one who actually knows where the consistency is).

Updated on June 11, 2022

Comments

  • BaroneAshura
    BaroneAshura almost 2 years

    I am currently working on a large scale application project (written in c++) which started from scratch some time ago, and we reached the point when it is mandatory to make a roundup of checks for memory leaks.

    The application runs on an Ubuntu Linux, it has a lot of multimedia content, and uses OpenGl, SDL and ffmpeg for various purposes including 3D graph rendering, windows, audio and movie playback. You could think of it as a videogame, although it is not, but the duties of the application could be simplified by considering it a video game.

    I am currently a little bit clueless in determining whether we still have memory leaks or not. In the past we had already identified some, and removed them. These days though, the application is nearly complete, and the tests we ran are giving me results which I cant exactly figure out.

    First thing I did was to try to run the application through Valgrind... unfortunately then application crashes when running in a valgrind environment. The crash in "non-deterministic" since it crashes in various different places. So I gave up with Valgrind to easily identify the source of potential leaks, and ended up using two Linux commands: free and top.

    free is being used for probing system memory usage while the application is running

    top is being used with the '-p' option, to probe the application process memory usage while running.

    Output form top and free is being dumped into files for post-processing. I made up two graphs with the data which are linked at the bottom of the question.

    The test case is very simple: data about memory is being probed once the application has already been launched and it is waiting for commands. Then I start a sequence of commands which repeatedly does always the same thing. The application is expected to load a whole lot multimedia data into RAM, and then download it.

    Unfortunately the graph is not showing me what I was expecting. Memory usage grows through 3 different steps and then stops. Memory is apparently never released, which hinted me that there was a HUGE memory leak. that would be perfectly fine, since it would mean that very likely we were not freeing up memory eaten up by media stuff.

    But after the first three steps... memory usage is stable... there arent any more huge steps... just slight up and down which correspond to the expected data loading and unloading. The unexpected here is that the data which is supposed to be loaded/unloaded makes up for hundredths of megabytes of RAM, instead the up and downs make of for just a handful of megabytes (lets say 8-10 MB).

    I am currently pretty clueless in interpreting these data.

    Anyone has some hints or suggestions? What am I missing? Is the method I am using for checking the presence of macroscopic memory leaks completely wrong? DO you know any other (preferably free) tool other than Valgrind for checking memory leaks?

    System Memory Usage Graph

    Process Memory Usage Graph