Link-time optimization and inline
Solution 1
Even with LTO, a compiler still has to use heuristics to determine whether or not to inline a function for every call (note it makes the decision not per function, but per call). The heuristic takes into account factors like - is it in a loop, is the loop unrolled, how big the function is, how frequently it is called globally, etc. The compiler will certainly never be able to accurately determine how frequently code is called, and whether or not the code expansion is likely to blow out the instruction/trace/loop/microcode caches of a particular CPU at compile time.
Profile Guided Optimization is supposed to be a step towards addressing this, but if you've ever tried it, you are likely to have noticed that you can get a swing in performance in the order of 0-2%, and it can be in either direction! :-) It's still a work in progress.
If performance is your ultimate goal, and you really know what you are doing, and really do a thorough analysis of your code, what one really needs is a way to tell the compiler to inline or not inline on a per-call basis, not a per-function hint. In practice I have managed this by using compiler specific "force_no_inline" type hints for cases I don't want inlining, and a separate "force_inline" copy (or macro in the rare case this fails) of the function for when I want it inlined. If anyone knows how to do this in a cleaner way with compiler specific hints (for any C/C++ compilers), please let me know.
To specifically address your points:
1.The code becomes less succinct and somewhat less maintainable.
Generally, no - it's just a keyword hint that controls how it is inlined. However if you jump through hoops like I described in the last paragraph, then yes.
2.Sometimes, inlining can greatly increase run-time performance.
When leaving the compiler to its own devices - yes, it certainly can, but mostly doesn't. The compiler has good heuristics that make good although not always optimal inlining decisions. Specificially for the keyword, compilers may totally ignore the keyword, or use to keyword as a weak hint - in general they do seem adverse to inlining code that red flags their heuristics (like inlining a 16k function into a loop unrolled 16x).
3.Inlining is decided at a fixed point in time, maybe without a terribly good foreknowledge of its uses, or without considering all (future) surrounding circumstances.
Yes, it uses static analysis. Dynamic analysis can come from your insight and you manually controlling inlining on a per-call basis, or theoretically from PGO (which still sucks).
Solution 2
GCC 9 Binutils 2.33 experiment to show that LTO can inline
For those that are curious if ld
inlines across object files or not, here is a quick experiment that confirms that it can:
main.c
int notmain(void);
int main(void) {
return notmain();
}
notmain.c
int notmain(void) {
return 42;
}
Compile with LTO and disassemble:
gcc -O3 -flto -ggdb3 -std=c99 -Wall -Wextra -pedantic -c -o main.o main.c
gcc -O3 -flto -ggdb3 -std=c99 -Wall -Wextra -pedantic -c -o notmain.o notmain.c
gcc -O3 -flto -ggdb3 -std=c99 -Wall -Wextra -pedantic -o main.out notmain.o main.o
gdb -batch -ex "disassemble/rs main" main.out
Disassembly output:
0x0000000000001040 <+0>: b8 2a 00 00 00 mov $0x2a,%eax
0x0000000000001045 <+5>: c3 retq
so we see that there is no callq
or other jumps, which means that the call was inlined across the two object files.
Without -flto
however we see:
0x0000000000001040 <+0>: f3 0f 1e fa endbr64
0x0000000000001044 <+4>: e9 f7 00 00 00 jmpq 0x1140 <notmain>
so how there is a JMPQ, which means that the call was not inlined.
Note that the compiler chose JMPQ which does not make any stack changes as would be done by a more naive CALLQ as an optimization, I think this is a trivial minimal case of a tail call optimization.
So yes, if you are using -flto
, you don't need to worry about putting definitions in headers so they can be inlined.
The main downside of having definitions in headers is that they may slow down compilation. For C++ templates, you may also be interested in explicit template instantiation: Explicit template instantiation - when is it used?
Tested in Ubuntu 19.10 amd64.
Solution 3
The question is: does link-time optimization (e.g., in GCC) render manual inlining, e.g., declaring in C99 a function "inline" and providing an implementation, obsolete?
This article would seem to answer "Yes:"
Think for a minute: what turns a function into a good candidate for inlining? Apart from the size factor, the optimizer needs to know how often this function is called, where it is called from, how many other functions in the program are viable candidates for inlining and -- believe it or not -- whether the function is ever called. Optimizing (i.e. inlining) a function that isn't called even once is a waste of time and resources. But how can an optimizer know that a function is never called? Well, it cannot. Unless it has scanned the entire program. This is where [link-time optimization] becomes crucial.
Solution 4
If link time optimization were as fast as compile time optimization, then it would obviate the need for compiler hints. Unfortunately, it is generally not faster than compile time optimization, so it's a tradeoff between overall build speed and the overall quality of optimizations for that build.
Also, you still need to use inline when defining functions in headers. Otherwise, you will get linker errors for multiple definitions of those functions if they are used in multiple translation units.
Admin
Updated on June 07, 2022Comments
-
Admin almost 2 years
In my experience, there's lot of code that explicitly uses inline functions, which comes at a tradeoff:
- The code becomes less succinct and somewhat less maintainable.
- Sometimes, inlining can greatly increase run-time performance.
- Inlining is decided at a fixed point in time, maybe without a terribly good foreknowledge of its uses, or without considering all (future) surrounding circumstances.
The question is: does link-time optimization (e.g., in GCC) render manual inlining, e.g., declaring in C99 a function "inline" and providing an implementation, obsolete? Is it true that we don't need to consider inlining for most functions ourselves? What about functions that do always benefit from inlining, e.g., deg_to_rad(x)?
Clarification: I am not thinking about functions that are in the same translation-unit anyway, but about functions that should logically reside in different translation-units.
Update: I have often seen an opposition against "inline", and it was suggested obsolete. Personally, however, I do see explicitly inlined functions often: as functions defined in a class body.
-
Oliver Charlesworth almost 13 yearsWould you care to share what that Item says, for those of us without that book?
-
Admin almost 13 yearsThe optimizer that could make inline obsolete cannot kick in before link-time, b/c the definition may not be (and would in most interesting cases) in the same translation unit.
-
Admin almost 13 yearsRegarding 2: That's why I put a "sometimes" in it and have "3".
-
Billy ONeal almost 13 years
inline
doesn't necessarily tell the compiler that the function must be inlined; most modern compilers ignore that and use it only to specify that the function has internal linkage. -
Billy ONeal almost 13 years@Oli: Particularly when that item # is from an obsolete edition... :)
-
GManNickG almost 13 years@Billy:
inline
doesn't give a function internal linkage,static
does. That said, the effect is often the same, but implementation-wise it's not. -
Mooing Duck almost 13 years@Billy inline also means the compiler shouldn't throw errors if the function is defined multiple times, is that what you were thinking of?
-
Billy ONeal almost 13 years@Mooing: That's what internal linkage is, yes.
-
Mooing Duck almost 13 years@Billy: stackoverflow.com/questions/4957582/…
7.1.2/3 footnote says The inline keyword has no effect on the linkage of a function.
-
GManNickG almost 13 years@Billy: No, that's not what internal linkage is.
-
Admin almost 13 yearsOf course, LTO is necessary for making "inline" obsolete, the question is if it is obsolete in real-life.
-
Admin almost 13 yearsThat's a valid comment. But is "inline" really obsolete when LTO is assumed to be enabled?
-
Admin almost 13 yearsUnfortunately, the article is mostly speculative, and was written when LTO wasn't as common-place as it is today.
-
Admin almost 13 yearsSince you have an elaborate inline-system in place, how do you check if the compiler did inline a particular function call?
-
Admin almost 13 yearsI would also be very interested in how to give the compiler the hint to inline a single function call.
-
Crowley9 almost 13 yearsReally oldschool: I look at the code generated by using objdump for gcc or .asm output for msvc to see what was actually generated. I also have some scripts the pipe the objdum -d output through grep "call" and wc, in order to get a total call count. Assuming the function you are trying to inline (or not inline) has less than or more than 1 call count you can get quick feedback on whether or not your code change made a difference.
-
Gnawme almost 13 years"The question is: does link-time optimization render manual inlining obsolete." There is no "real life" in the question. Manual inlining is effectively a compiler hint, anyway. LTO allows compiler and linker to make a much more informed choice about what to inline.
-
Gnawme almost 13 yearsAlso, the article explains how LTO (aka WPO) operates in "Visual C++ 7.0 and later versions, including the most recent Visual C++ 2005 beta 2." How is that speculative? If it's in a beta, it has been implemented.
-
Admin almost 13 yearsCool. I wonder if can be done with gdb, or another debugger/profiler.
-
Admin almost 13 yearsI guess, I should put "real life" in the question. However, since a perfect compiler and optimizer renders the whole discussion and question void, it's already implicit.
-
Admin almost 13 yearsThe article is speculative insofar it mentions benefits without providing real data or comparisons. It's actually confusing that he tries to focus on Visual C++ when most of what writes is valid generally to LTO/PGO. It looks like he didn't study the compiling techniques he mentions, and did not run quantitative tests. He fails to mention that PGO does not always improve speed b/c many aspects of it are still up to research. Statements like "[...] my impression that there's still room for improvement" are just biased opinions, and some parts just read like an ad.
-
Jason S over 7 years"Otherwise, you will get linker errors for multiple definitions of those functions if they are used in multiple translation units" -- the
static
keyword is what is needed here.