Dead code detection in legacy C/C++ project

36,740

Solution 1

You could use a code coverage analysis tool for this and look for unused spots in your code.

A popular tool for the gcc toolchain is gcov, together with the graphical frontend lcov (http://ltp.sourceforge.net/coverage/lcov.php).

If you use gcc, you can compile with gcov support, which is enabled by the '--coverage' flag. Next, run your application or run your test suite with this gcov enabled build.

Basically gcc will emit some extra files during compilation and the application will also emit some coverage data while running. You have to collect all of these (.gcdo and .gcda files). I'm not going in full detail here, but you probably need to set two environment variables to collect the coverage data in a sane way: GCOV_PREFIX and GCOV_PREFIX_STRIP...

After the run, you can put all the coverage data together and run it through the lcov toolsuite. Merging of all the coverage files from different test runs is also possible, albeit a bit involved.

Anyhow, you end up with a nice set of webpages showing some coverage information, pointing out the pieces of code that have no coverage and hence, were not used.

Off course, you need to double check if the portions of code are not used in any situation and a lot depends on how good your tests exercise the codebase. But at least, this will give an idea about possible dead-code candidates...

Solution 2

Compile it under gcc with -Wunreachable-code.

I think that the more recent the version, the better results you'll get, but I may be wrong in my impression that it's something they've been actively working on. Note that this does flow analysis, but I don't believe it tells you about "code" which is already dead by the time it leaves the preprocessor, because that's never parsed by the compiler. It also won't detect e.g. exported functions which are never called, or special case handling code which just so happen to be impossible because nothing ever calls the function with that parameter - you need code coverage for that (and run the functional tests, not the unit tests. Unit tests are supposed to have 100% code coverage, and hence execute code paths which are 'dead' as far as the application is concerned). Still, with these limitations in mind it's an easy way to get started finding the most completely bollixed routines in the code base.

This CERT advisory lists some other tools for static dead code detection

Solution 3

Your approach depends on the availability (automated) tests. If you have a test suite that you trust to cover a sufficient amount of functionality, you can use a coverage analysis, as previous answers already suggested.

If you are not so fortunate, you might want to look into source code analysis tools like SciTools' Understand that can help you analyse your code using a lot of built in analysis reports. My experience with that tool dates from 2 years ago, so I can't give you much detail, but what I do remember is that they had an impressive support with very fast turnaround times of bug fixes and answers to questions.

I found a page on static source code analysis that lists many other tools as well.

If that doesn't help you sufficiently either, and you're specifically interested in finding out the preprocessor-related dead code, I would recommend you post some more details about the code. For example, if it is mostly related to various combinations of #ifdef settings you could write scripts to determine the (combinations of) settings and find out which combinations are never actually built, etc.

Solution 4

For C code only and assuming that the source code of the whole project is available, launch an analysis with the Open Source tool Frama-C. Any statement of the program that displays red in the GUI is dead code.

If you have "dead code" problems, you may also be interested in removing "spare code", code that is executed but does not contribute to the end result. This requires you to provide an accurate modelization of I/O functions (you wouldn't want to remove a computation that appears to be "spare" but that is used as an argument to printf). Frama-C has an option for pointing out spare code.

Solution 5

Both Mozilla and Open Office have home-grown solutions.

Share:
36,740
andyknas
Author by

andyknas

c++ / ruby / linux / git

Updated on July 05, 2022

Comments

  • andyknas
    andyknas almost 2 years

    How would you go about dead code detection in C/C++ code? I have a pretty large code base to work with and at least 10-15% is dead code. Is there any Unix based tool to identify this areas? Some pieces of code still use a lot of preprocessor, can automated process handle that?

  • andyknas
    andyknas over 15 years
    I'm still stuck with Sun C++ compilers but we have gcc migration underway so I'm gonna try this out. Thanks.
  • philippe lhardy
    philippe lhardy over 10 years
    This answer is no more valid do to the fact that -Wunreachable-code option was removed from gcc. gcc.gnu.org/ml/gcc-help/2011-05/msg00360.html
  • Steve Jessop
    Steve Jessop over 10 years
    Shame. For many purposes "unstable" dead code detection is still better than nothing. Aside from anything else, perfect dead code detection in general is impossible (halting problem), so everyone knows that whatever tool they use is imperfect. Presumably someone actually cares that it's more imperfect with -O0 than it is with -O3, or doesn't want new warnings whenever the optimizer improves.
  • Steve Jessop
    Steve Jessop over 10 years
    Still, if your code uses no new features you could still use an old gcc as a static analysis tool. So my answer isn't completely wrong. Bit of a reach, I know ;-)
  • syam
    syam about 7 years
    Both the links are inaccessible now. Can anybody update?
  • Max Lybbert
    Max Lybbert about 7 years
    I've switched the first link from a blog post to a (hopefully longer lasting) documentation page. The Open Office link appears to work.
  • Arun
    Arun over 6 years
    Code Coverage analysis (such as gcov) can provide data which code is not covered by the particular run(s) of the software -- code that is not covered is not necessarily dead code. A different run of the software (such as different compile option, different runtime option or different input data) or a different execution path (such as error handling) may trigger a function that was not invoked earlier.