Why Compile to an Object File First?

10,468

Solution 1

Compiling to object files first is called separate compilation. There are many advantages and a few drawbacks.

Advantages:

  • easy to transform object files (.o) to libraries and link to them later
  • many people can work on different source files at the same time
  • faster compiling (you don't compile the same files again and again when the source hasn't changed)
  • object files can be made from different language sources and linked together at some later time. To do that, the object files just have to use the same format and compatible calling conventions.
  • separate compilation enables distribution of system wide libraries (either OS libraries, language standard libraries or third party libraries) either static or shared.

Drawbacks:

  • There are some optimizations (like optimizing functions away) that the compiler cannot perform, and the linker does not care about; however, many compilers now include the option to perform "link time optimization", which largely negates this drawback. But this is still an issue for system libraries and third party libraries, especially for shared libraries (impossible to optimize away parts of a component that may change at each run, however other techniques like JIT compilation may mitigate this).
  • in some languages, the programmer has to provide some kind of header for the use of others that will link with this object. For example in C you have to provide .h files to go with your object files. But it is good practice anyway.
  • in languages with text based includes like C or C++, if you change a function prototype, you have to change it in two places. Once in header file, once in the implementation file.

Solution 2

When you have a project with a few 100 source files, you don't want to recompile all of them every time one changes. By compiling each source file into a separate object file and only recompile those source files that are affected by a change, you spend the minimum amount of time from source code change to new executable.

make is the common tool used to track such dependencies and recreate your binary when something changes. Typically you set up what each source file depends on (these dependencies can typically be generated by your compiler - in a format suitable for make), and let make handle the details of creating an up to date binary.

Solution 3

The .o file is the Object File. It's an intermediate representation of the final program.

Specifically, typically, the .o file has compiled code, but what it does not have is final addresses for all of the different routines or data.

One of the things that a program needs before it can be run is something similar to a memory image.

For example.

If you have your main program and it calls a routine A. (This is faux fortran, I haven't touched in decades, so work with me here.)

PROGRAM MAIN
INTEGER X,Y
X = 10
Y = SQUARE(X)
WRITE(*,*) Y
END

Then you have the SQUARE function.

FUNCTION SQUARE(N)
SQUARE = N * N
END

The are individually compiled units. You can see than when MAIN is compiled it does not KNOW where "SQUARE" is, what address it is at. It needs to know that so when it calls the microprocessors JUMP SUBROUTINE (JSR) instruction, the instruction has someplace to go.

The .o file has the JSR instruction already, but it doesn't have the actual value. That comes later in the linking or loading phase (depending on your application).

So, MAINS .o file has all of the code for main, and a list of references that it wants to resolved (notably SQUARE). SQUARE is basically stand alone, it doesn't have any references, but at the same time, it had no address as to where it exists in memory yet.

The linker will take all off the .o files and combine them in to a single exe. In the old days, compiled code would literally be a memory image. The program would start at some address and simply loaded in to RAM wholesale, and then executed. So, in the scenario, you can see the linker taking the two .o files, concatenating them together (to get SQUAREs actual address), then it would go back and find the SQUARE reference in MAIN, and fill in the address.

Modern linkers don't go quite that far, and defer much of that final processing to when the program is actually loaded. But the concept is similar.

By compiling to .o files, you end up with reusable units of logic that are then combined later by the linking and loading processes before execution.

The other nice aspect is that the .o files can come from different languages. As long as the calling mechanisms are compatible (i.e. how are arguments passed to and from functions and procedures), then once compiled in to a .o, the source language becomes less relevant. You can link, combine, C code with FORTRAN code, say.

In PHP et all, the process is different because all of the code is loaded in to a single image at runtime. You can consider the FORTRANs .o files similar to how you would use PHPs include mechanism to combine files in to a large, cohesive whole.

Solution 4

Another reason, apart from compile time, is that the compilation process is a multi-step process.

The multi-part process.

The object files are just one intermediate output from that process. They will eventually be used by the linker to produce the executable file.

Solution 5

We compile to object files to be able to link them together to form larger executables. That is not the only way to do it.

There are also compilers that don't do it that way, but instead compiles to memory and executes the result immediately. Earlier, when students had to use mainframes, this was standard. Turbo Pascal also did it this way.

Share:
10,468
tomshafer
Author by

tomshafer

Updated on June 24, 2022

Comments

  • tomshafer
    tomshafer almost 2 years

    In the last year I've started programming in Fortran working at a research university. Most of my prior experience is in web languages like PHP or old ASP, so I'm a newbie to compile statements.

    I have two different code I'm modifying.

    One has an explicit statement creating .o files from modules (e.g. gfortran -c filea.f90) before creating the executable.

    Another are creating the executable file directly (sometimes creating .mod files, but no .o files, e.g. gfortran -o executable filea.f90 fileb.f90 mainfile.f90).

    • Is there a reason (other than, maybe, Makefiles) that one method is preferred over the other?
  • Phil Miller
    Phil Miller about 13 years
    Some linkers can in fact perform inlining or other optimization at the assembly level.
  • sbi
    sbi about 13 years
    There's compilers that can optimize across object files. Newer VC versions do that. Nevertheless, a good answer, +1 from me.
  • Yakov Galka
    Yakov Galka about 13 years
    +0 for not mentioning templates in drawbacks.
  • SK-logic
    SK-logic about 13 years
    @ybungalobill, templates? In fortran?!?
  • sbi
    sbi about 13 years
    @Tomalak: Frankly, I don't know whether (real) compilation is deferred until the link stage or the linker is that smart.
  • Yakov Galka
    Yakov Galka about 13 years
    @SK: the question is tagged C++ too.
  • Lightness Races in Orbit
    Lightness Races in Orbit about 13 years
    @sbi: Fair point. I assume the typical build process still applies, but I never really thought about that: I guess it's possible that the lines between compilation and linking are blurred slightly by toolchains with link-time optimisation enabled.
  • SK-logic
    SK-logic about 13 years
    @ybungalobill, anyway, instantiated function templates are marked as weak symbols and collapsed by linker.
  • Neowizard
    Neowizard almost 13 years
    Compilation is indeed a multi-step process, but including the assembler, linker and loader in it (even explicitly) is a source for so many misunderstandings about compilers. Only the first row in your diagram can be attributed to the process of compilation (and even that might be too much for some people/compilers).
  • kidsid49
    kidsid49 almost 13 years
    You seem to have mis-understood the question. :-)
  • Neowizard
    Neowizard almost 13 years
    The question is irrelevant in regards to my comments. I was talking about the diagram and the suggestion that it describes the steps of compilation. I don't claim against your response to the original question (which is more then fair IMO).
  • Neowizard
    Neowizard almost 13 years
    Because I think the confusion the chart causes is greater then the information the answer gives, and I encounter this mistake way too often with pupils.
  • kidsid49
    kidsid49 almost 13 years
    OK. I disagree; but you're the wizard. :-)
  • Noel Widmer
    Noel Widmer over 6 years
    Very good. This clarified a few things for me. Wish I read that earlier.
  • Toby Speight
    Toby Speight about 6 years
    One of the motivating advantages over all-in-one compilation is reduced memory overhead (linking does need more memory than compiling, but still not as much as an all-in-one compilation unless you're doing link-time optimization).
  • kriss
    kriss about 6 years
    @Toby Speight: relative use of memory with linking and compiling depends much of the language, of the toolchain and of course of the source code itself. C linking historically used much less memory than compiling, the balance tipped the other way because of many factors with C++ (templates, LTO, security issues, etc)
  • Vladimir F Героям слава
    Vladimir F Героям слава almost 3 years
    I would note that you can combine Fortran C C++ and perhaps even Ada or Go in one gcc compilation command as well. Not sure if it was the case 10 years ago, but the feature is not very new.