What's the best way to become familiar with a large codebase?

13,764

Solution 1

Start with some small task if possible, debug the code around your problem. Stepping through code in debug mode is the easiest way to learn how something works.

Solution 2

Another option is to write tests for the features you're interested in. Setting up the test harness is a good way of establishing what dependencies the system has and where its state resides. Each test starts with an assertion about the way you think the system should work. If it turns out to work that way, you've achieved something and you've got some working sample code to reproduce it. If it doesn't work that way, you've got a puzzle to solve and a line of enquiry to follow.

Solution 3

One thing that I usually suggest to people that has not yet been mentioned is that it is important to become a competent user of the existing code base before you can be a developer. When new developers come into our large software project, I suggest that they spend time becoming expert users before diving in trying to work on the code.

Maybe that's obvious, but I have seen a lot of people try to jump into the code too quickly because they are eager to start making progress.

Solution 4

This is quite dependent on what sort of learner and what sort of programmer you are, but:

  • Broad first - you need an idea of scope and size. This might include skimming docs/uml if they're good. If it's a long term project and you're going to need a full understanding of everything, I might actually read the docs properly. Again, if they're good.
  • Narrow - pick something manageable and try to understand it. Get a "taste" for the code.
  • Pick a feature - possibly a different one to the one you just looked at if you're feeling confident, and start making some small changes.
  • Iterate - assess how well things have gone and see if you could benefit from repeating an early step in more depth.

Solution 5

Pairing with strict rotation.

If possible, while going through the documentation/codebase, try to employ pairing with strict rotation. Meaning, two of you sit together for a fixed period of time (say, a 2 hour session), then you switch pairs, one person will continue working on that task while the other moves to another task with another partner.

In pairs you'll both pick up a piece of knowledge, which can then be fed to other members of the team when the rotation occurs. What's good about this also, is that when a new pair is brought together, the one who worked on the task (in this case, investigating the code) can then summarise and explain the concepts in a more easily understood way. As time progresses everyone should be at a similar level of understanding, and hopefully avoid the "Oh, only John knows that bit of the code" syndrome.

From what I can tell about your scenario, you have a good number for this (3 pairs), however, if you're distributed, or not working to the same timescale, it's unlikely to be possible.

Share:
13,764

Related videos on Youtube

Dhir Pratap
Author by

Dhir Pratap

Updated on June 23, 2020

Comments

  • Dhir Pratap
    Dhir Pratap about 4 years

    Joining an existing team with a large codebase already in place can be daunting. What's the best approach;

    • Broad; try to get a general overview of how everything links together, from the code
    • Narrow; focus on small sections of code at a time, understanding how they work fully
    • Pick a feature to develop and learn as you go along
    • Try to gain insight from class diagrams and uml, if available (and up to date)
    • Something else entirely?

    I'm working on what is currently an approx 20k line C++ app & library (Edit: small in the grand scheme of things!). In industry I imagine you'd get an introduction by an experienced programmer. However if this is not the case, what can you do to start adding value as quickly as possible?

    --
    Summary of answers:

    • Step through code in debug mode to see how it works
    • Pair up with someone more familiar with the code base than you, taking turns to be the person coding and the person watching/discussing. Rotate partners amongst team members so knowledge gets spread around.
    • Write unit tests. Start with an assertion of how you think code will work. If it turns out as you expected, you've probably understood the code. If not, you've got a puzzle to solve and or an enquiry to make. (Thanks Donal, this is a great answer)
    • Go through existing unit tests for functional code, in a similar fashion to above
    • Read UML, Doxygen generated class diagrams and other documentation to get a broad feel of the code.
    • Make small edits or bug fixes, then gradually build up
    • Keep notes, and don't jump in and start developing; it's more valuable to spend time understanding than to generate messy or inappropriate code.

    this post is a partial duplicate of the-best-way-to-familiarize-yourself-with-an-inherited-codebase

    • Paco
      Paco over 15 years
      20K lines is not a very large code base. When it's only 20K lines, I would read it. One of the things I did not learn at university is working with large code bases.
    • HS.
      HS. over 15 years
      Indeed. 20k does not seem like much. We have C++ files with more than 10k lines in each. I know, it is bad, but we don't have the time for cleanup right now. (just imagine me rolling my eyes just thinking about it) Much of the bloat is from comments though.
    • Dhir Pratap
      Dhir Pratap over 15 years
      Heh, indeed! I did not mean to imply 20k was a huge code base (I never said it was), was just looking for general, scalable, advice. Great answers so far; lots to think about.
    • user2864740
      user2864740 over 9 years
      20k is .. what, one file? ;-)
    • Dave Newton
      Dave Newton about 9 years
      One place I consulted at had a 40k line file of deeply-nested if/then statements that implemented some sort of business rule thing. It was awful.
  • extraneon
    extraneon over 15 years
    It pays to think of what you expect a variable to be when debugging, and if the debugger shows different, find out why. I personnaly don't like debuggers but prefer print statements which force you to think in advance.
  • Bill the Lizard
    Bill the Lizard over 15 years
    I've always thought this was the best way to get familiar with someone else's code.
  • Stephen Darlington
    Stephen Darlington over 15 years
    This is how I tend to do things. There is no substitute for doing thing. Simply reading code/documentation/tests never really cuts it.
  • Elazar Leibovich
    Elazar Leibovich over 14 years
    @extraneon in addition to force you to think in advance, print statements allows you more easily to see great amount of input. So for instance if a variable is usually 2, and becomes 10 once in 100 loops, you'll be able to spot it with a printf statement, but it is harder to trace that with a debugger.
  • Rokit
    Rokit over 5 years
    The link is broken.