How to debug heap corruption errors?

193,888

Solution 1

Application Verifier combined with Debugging Tools for Windows is an amazing setup. You can get both as a part of the Windows Driver Kit or the lighter Windows SDK. (Found out about Application Verifier when researching an earlier question about a heap corruption issue.) I've used BoundsChecker and Insure++ (mentioned in other answers) in the past too, although I was surprised how much functionality was in Application Verifier.

Electric Fence (aka "efence"), dmalloc, valgrind, and so forth are all worth mentioning, but most of these are much easier to get running under *nix than Windows. Valgrind is ridiculously flexible: I've debugged large server software with many heap issues using it.

When all else fails, you can provide your own global operator new/delete and malloc/calloc/realloc overloads -- how to do so will vary a bit depending on compiler and platform -- and this will be a bit of an investment -- but it may pay off over the long run. The desirable feature list should look familiar from dmalloc and electricfence, and the surprisingly excellent book Writing Solid Code:

  • sentry values: allow a little more space before and after each alloc, respecting maximum alignment requirement; fill with magic numbers (helps catch buffer overflows and underflows, and the occasional "wild" pointer)
  • alloc fill: fill new allocations with a magic non-0 value -- Visual C++ will already do this for you in Debug builds (helps catch use of uninitialized vars)
  • free fill: fill in freed memory with a magic non-0 value, designed to trigger a segfault if it's dereferenced in most cases (helps catch dangling pointers)
  • delayed free: don't return freed memory to the heap for a while, keep it free filled but not available (helps catch more dangling pointers, catches proximate double-frees)
  • tracking: being able to record where an allocation was made can sometimes be useful

Note that in our local homebrew system (for an embedded target) we keep the tracking separate from most of the other stuff, because the run-time overhead is much higher.


If you're interested in more reasons to overload these allocation functions/operators, take a look at my answer to "Any reason to overload global operator new and delete?"; shameless self-promotion aside, it lists other techniques that are helpful in tracking heap corruption errors, as well as other applicable tools.


Because I keep finding my own answer here when searching for alloc/free/fence values MS uses, here's another answer that covers Microsoft dbgheap fill values.

Solution 2

You can detect a lot of heap corruption problems by enabling Page Heap for your application . To do this you need to use gflags.exe that comes as a part of Debugging Tools For Windows

Run Gflags.exe and in the Image file options for your executable, check "Enable Page Heap" option.

Now restart your exe and attach to a debugger. With Page Heap enabled, the application will break into debugger whenever any heap corruption occurs.

Solution 3

To really slow things down and perform a lot of runtime checking, try adding the following at the top of your main() or equivalent in Microsoft Visual Studio C++

_CrtSetDbgFlag(_CRTDBG_ALLOC_MEM_DF | _CRTDBG_LEAK_CHECK_DF | _CRTDBG_CHECK_ALWAYS_DF );

Solution 4

A very relevant article is Debugging Heap corruption with Application Verifier and Debugdiag.

Solution 5

What sort of things can cause these errors?

Doing naughty things with memory, e.g. writing after the end of a buffer, or writing to a buffer after it's been freed back to the heap.

How do I debug them?

Use an instrument which adds automated bounds-checking to your executable: i.e. valgrind on Unix, or a tool like BoundsChecker (Wikipedia suggests also Purify and Insure++) on Windows.

Beware that these will slow your application, so they may be unusable if yours is a soft-real-time application.

Another possible debugging aid/tool might be MicroQuill's HeapAgent.

Share:
193,888

Related videos on Youtube

trincot
Author by

trincot

I am an IT project manager at an administration, and worked as software engineer before. Here are some of the answers I found most useful or fun (not necessarily accepted, nor highest voted): Algorithm - Fastest way to sort an array only using these blackbox functions? Algorithm - Minimum number of steps to reduce number to 1 Algorithm - Solve a maze with n balls Algorithm - Optimise finding time-based events Algorithm - Fastest way to check if a number is a vampire number? PHP - How to get innerHTML of DOMNode? PHP - How to create a pagination bar with "..." fillers MySql - How to create a hierarchical recursive query? JavaScript - Promise resolved earlier with p.then(resolve) than with resolve(p) JavaScript - Are ES6 template literals safer than eval? JavaScript - filter() for Objects JavaScript - Sort version-dotted number strings JavaScript - Calculate arithmetic expression w/o eval JavaScript - How to version control objects

Updated on November 23, 2021

Comments

  • trincot
    trincot over 2 years

    I am debugging a (native) multi-threaded C++ application under Visual Studio 2008. On seemingly random occasions, I get a "Windows has triggered a break point..." error with a note that this might be due to a corruption in the heap. These errors won't always crash the application right away, although it is likely to crash short after.

    The big problem with these errors is that they pop up only after the corruption has actually taken place, which makes them very hard to track and debug, especially on a multi-threaded application.

    • What sort of things can cause these errors?

    • How do I debug them?

    Tips, tools, methods, enlightments... are welcome.

  • ChrisW
    ChrisW almost 15 years
    Yes: look at the application's compiler/build options, and ensure it's being built to linking against a "multi-threaded" version of the C run-time library.
  • JaredPar
    JaredPar almost 15 years
    @ChrisW for the HeapAlloc style APIs this is different. It's actually a parameter that can be changed at heap creation time, not link time.
  • ChrisW
    ChrisW almost 15 years
    Oh. It didn't occur to me that the OP might be talking about that heap, and not about the heap in the CRT.
  • JaredPar
    JaredPar almost 15 years
    @ChrisW, the question is rather vague but I just hit the problem I detailed ~1 week ago so it's fresh on my mind.
  • Employed Russian
    Employed Russian almost 15 years
    Rebuilding the application with debugging runtime (/MDd or /MTd flag) would be my first step. These perform additional checks at malloc and free, and are often quit effective at narrowing down the location of the bug(s).
  • leander
    leander almost 15 years
    One tiny thing worth noting about Application Verifier: you must register Application Verifier's symbols ahead of the microsoft symbol server symbols in your symbol search path, if you use that... Took me a bit of searching to figure out why !avrf wasn't finding the symbols it needed.
  • Admin
    Admin almost 15 years
    Application Verifier was a great deal of help, and combined with some guessing, I was able to solve the problem! Thanks a lot, and for everyone else too, for bringing up helpful points.
  • Samrat Patil
    Samrat Patil about 14 years
    MicroQuill's HeapAgent: There's not much written or heard about it, but for heap corruption, it should be on your list.
  • Guillaume Paris
    Guillaume Paris over 12 years
    yes but once i get this function call in my callstack dump (after memory corruption crash) : wow64!Wow64NotifyDebugger , what I can I do ? I still don't know what is going wrong in my application
  • Dave F
    Dave F almost 11 years
    Just tried out gflags to debug heap corruption here, VERY useful little tool, highly recommended. Turned out I was accessing freed memory, which, when instrumented with gflags will immediately break into the debugger... Handy!
  • Nathan Reed
    Nathan Reed almost 10 years
    Does Application Verifier have to be used with WinDbg, or should it work with the Visual Studio debugger? I've been trying to use it, but it doesn't raise any errors or apparently do anything when I debug in VS2012.
  • leander
    leander almost 10 years
    @NathanReed: I believe it works with VS as well -- see msdn.microsoft.com/en-us/library/ms220944(v=vs.90).aspx -- although note this link is for VS2008, I'm not sure about later versions. Memory is a bit fuzzy, but I believe when I had the issue in the "earlier question" link I just ran Application Verifier and saved the options, ran the program, and when it crashed I chose VS to debug with. AV just made it crash / assert earlier. The !avrf command is specific to WinDbg as far as I know, though. Hopefully others can provide more info!
  • Nathan Reed
    Nathan Reed almost 10 years
    Thanks. I actually solved my original issue and it turned out not to be heap corruption after all, but something else, so that probably explains why App Verifier didn't find anything. :)
  • Rick Papo
    Rick Papo almost 9 years
    BoundsChecker works fine as a smoke test, but don't even think about running a program under it while trying to run that program in production as well. The slowdown can be anywhere from 60x to 300x, depending on which options you are using, and whether or not you are using the compiler instrumentation feature. Disclaimer: I am one of the guys who maintains the product for Micro Focus.
  • Devolus
    Devolus about 8 years
    Great Tool! Just found a bug, that I was hunting for days, because Windows doesn't say the address of the corruption, only that "something" is wrong, which is not really helpfull.
  • uceumern
    uceumern over 7 years
    A bit late to the party, but I noticed a significant increase memory usage my the application I am debugging when I turned on Page Heap. Unfortunately up to the point the (32bit) application runs out of memory before the heap corruption detection is triggered. Any ideas how to tackle that problem?
  • Matthias
    Matthias over 2 years
    While that made things really slow for me, I instead put calls to _CrtCheckMemory() before and after some places in my code which I suspected of causing the issue. A bit like lying "mouse traps" to better pinpoint the location at which the error occurs.
  • gil_mo
    gil_mo about 2 years
    Just found a heap corruption I've been chasing for days, which I finally caught not by any of the tools mentioned here. In my case, I got a crash while deleting some memory upon destruction. I've noticed that the "heap stamp" was being overwritten (I'm referring to the 4 bytes before the actual allocated memory). By placing a memory breakpoint on those 4 bytes, I found what was causing the corruption - it was an "array[-1] = value", where "array" was the allocated memory, and the "-1" came from a member with a bad value. So when this scenario occurs, this method is the fastest.