Why push_back is slower than operator[] for a previously allocated vector
Solution 1
push_back
does a bounds check. operator[]
does not. So even if you've reserved the space, push_back
is going to have an extra conditional check that operator[]
will not have. Additionally, it will increase the size
value (reserve only sets the capacity
), so it will update that every time.
In short, push_back
is doing more than what operator[]
is doing - which is why it is slower (and more accurate).
Solution 2
As Yakk and me have found out, there might be another interesting factor that contributes to the apparent slowness of push_back
.
The first interesting observation is that in the original test, using new
and operating on a raw array is slower than using vector<int> bigarray(N);
and operator[]
-- more than a factor 2. Even more interesting is that you can get the same performance for both by inserting an additional memset
for the raw array variant:
int routine1_modified()
{
int sum;
int* bigarray = new int[N];
memset(bigarray, 0, sizeof(int)*N);
PROFILE (
{
for (unsigned int k = 0; k < N; ++k)
bigarray[k] = k;
}, "C++ new");
sum = std::accumulate (bigarray, bigarray + N, 0);
delete [] bigarray;
return sum;
}
The conclusion of course is, that PROFILE
measures something different than expected. Yakk and me guess it has something to do with memory management; from Yakk's comment to the OP:
resize
will touch the entire block of memory.reserve
will allocate without touching. If you have a lazy allocator that doesn't fetch or assign physical memory pages until accessed,reserve
on an empty vector can be nearly free (doesn't even have to find physical memory for the pages!) until you write to the pages (at which point, they have to be found).
I thought of something similar, so tried a small test for this hypothesis by touching certain pages with a "strided memset" (a profiling tool might get more reliable results):
int routine1_modified2()
{
int sum;
int* bigarray = new int[N];
for(int k = 0; k < N; k += PAGESIZE*2/sizeof(int))
bigarray[k] = 0;
PROFILE (
{
for (unsigned int k = 0; k < N; ++k)
bigarray[k] = k;
}, "C++ new");
sum = std::accumulate (bigarray, bigarray + N, 0);
delete [] bigarray;
return sum;
}
By changing the stride from every page half to every 4th page to leaving it out completely, we get a nice transition of the timings from the vector<int> bigarray(N);
case to the new int[N]
case where no memset
has been used.
In my opinion, that's a strong hint that memory management is a major contributor to the measurement results.
Another issue is the branching in push_back
. It is claimed in many answers that this is a / the major reason why push_back
is much slower as compared to using operator[]
. Indeed, comparing the raw-pointer w/o memset to using reserve
+ push_back
, the former is two times faster.
Similarly, if we add a bit of UB (but check the results later):
int routine3_modified()
{
int sum;
vector<int> bigarray;
bigarray.reserve (N);
memset(bigarray.data(), 0, sizeof(int)*N); // technically, it's UB
PROFILE (
{
for (unsigned int k = 0; k < N; ++k)
bigarray.push_back (k);
}, "reserve + push_back");
sum = std::accumulate (begin (bigarray), end (bigarray), 0);
return sum;
}
this modified version is about 2 times slower than using new
+ a full memset
. So it seems whatever the invocation of push_back
does, it results in a factor 2
slowdown when compared to just setting the element (via operator[]
in both the vector
and raw array case).
But is it the branching required in push_back
, or the additional operation?
// pseudo-code
void push_back(T const& p)
{
if(size() == capacity())
{
resize( size() < 10 ? 10 : size()*2 );
}
(*this)[size()] = p; // actually using the allocator
++m_end;
}
It is indeed that simple, see e.g. libstdc++'s implementation.
I've tested it by using the vector<int> bigarray(N);
+ operator[]
variant, and inserting a function call that mimics the behaviour of push_back
:
unsigned x = 0;
void silly_branch(int k)
{
if(k == x)
{
x = x < 10 ? 10 : x*2;
}
}
int routine2_modified()
{
int sum;
vector<int> bigarray (N);
PROFILE (
{
for (unsigned int k = 0; k < N; ++k)
{
silly_branch(k);
bigarray[k] = k;
}
}, "vector");
sum = std::accumulate (begin (bigarray), end (bigarray), 0);
return sum;
}
Even when declaring x
as volatile, this only has 1 % influence on the measurement. Of course, you had to verify that the branch is actually in the opcode, but my assembler knowledge doesn't allow me to verify that (at -O3
).
The interesting point now is what happens when I add an increment to silly_branch
:
unsigned x = 0;
void silly_branch(int k)
{
if(k == x)
{
x = x < 10 ? 10 : x*2;
}
++x;
}
Now, the modified routine2_modified
runs 2 times slower than the original routine2
, being on par with the proposed routine3_modified
above that includes UB to commit the memory pages. I don't find that particularly surprising, as it adds another write to every write in the loop, so we have two times the work and two times the duration.
Conclusion
Well you had to carefully look at the assembly and profiling tools to verify the hypotheses of memory management and the additional write is a good hypothesis ("correct"). But I think the hints are strong enough to claim that there's something more complicated going on than just a branch which makes push_back
slower.
Here's the full test code:
#include <iostream>
#include <iomanip>
#include <vector>
#include <numeric>
#include <chrono>
#include <string>
#include <cstring>
#define PROFILE(BLOCK, ROUTNAME) ProfilerRun([&](){do {BLOCK;} while(0);}, \
ROUTNAME, __FILE__, __LINE__);
//#define PROFILE(BLOCK, ROUTNAME) BLOCK
template <typename T>
void ProfilerRun (T&& func, const std::string& routine_name = "unknown",
const char* file = "unknown", unsigned line = 0)
{
using std::chrono::duration_cast;
using std::chrono::microseconds;
using std::chrono::steady_clock;
using std::cerr;
using std::endl;
steady_clock::time_point t_begin = steady_clock::now();
// Call the function
func();
steady_clock::time_point t_end = steady_clock::now();
cerr << "[" << std::setw (20)
<< (std::strrchr (file, '/') ?
std::strrchr (file, '/') + 1 : file)
<< ":" << std::setw (5) << line << "] "
<< std::setw (10) << std::setprecision (6) << std::fixed
<< static_cast<float> (duration_cast<microseconds>
(t_end - t_begin).count()) / 1e6
<< "s --> " << routine_name << endl;
cerr.unsetf (std::ios_base::floatfield);
}
using namespace std;
constexpr int N = (1 << 28);
constexpr int PAGESIZE = 4096;
uint64_t __attribute__((noinline)) routine1()
{
uint64_t sum;
int* bigarray = new int[N];
PROFILE (
{
for (int k = 0, *p = bigarray; p != bigarray+N; ++p, ++k)
*p = k;
}, "new (routine1)");
sum = std::accumulate (bigarray, bigarray + N, 0ULL);
delete [] bigarray;
return sum;
}
uint64_t __attribute__((noinline)) routine2()
{
uint64_t sum;
int* bigarray = new int[N];
memset(bigarray, 0, sizeof(int)*N);
PROFILE (
{
for (int k = 0, *p = bigarray; p != bigarray+N; ++p, ++k)
*p = k;
}, "new + full memset (routine2)");
sum = std::accumulate (bigarray, bigarray + N, 0ULL);
delete [] bigarray;
return sum;
}
uint64_t __attribute__((noinline)) routine3()
{
uint64_t sum;
int* bigarray = new int[N];
for(int k = 0; k < N; k += PAGESIZE/2/sizeof(int))
bigarray[k] = 0;
PROFILE (
{
for (int k = 0, *p = bigarray; p != bigarray+N; ++p, ++k)
*p = k;
}, "new + strided memset (every page half) (routine3)");
sum = std::accumulate (bigarray, bigarray + N, 0ULL);
delete [] bigarray;
return sum;
}
uint64_t __attribute__((noinline)) routine4()
{
uint64_t sum;
int* bigarray = new int[N];
for(int k = 0; k < N; k += PAGESIZE/1/sizeof(int))
bigarray[k] = 0;
PROFILE (
{
for (int k = 0, *p = bigarray; p != bigarray+N; ++p, ++k)
*p = k;
}, "new + strided memset (every page) (routine4)");
sum = std::accumulate (bigarray, bigarray + N, 0ULL);
delete [] bigarray;
return sum;
}
uint64_t __attribute__((noinline)) routine5()
{
uint64_t sum;
int* bigarray = new int[N];
for(int k = 0; k < N; k += PAGESIZE*2/sizeof(int))
bigarray[k] = 0;
PROFILE (
{
for (int k = 0, *p = bigarray; p != bigarray+N; ++p, ++k)
*p = k;
}, "new + strided memset (every other page) (routine5)");
sum = std::accumulate (bigarray, bigarray + N, 0ULL);
delete [] bigarray;
return sum;
}
uint64_t __attribute__((noinline)) routine6()
{
uint64_t sum;
int* bigarray = new int[N];
for(int k = 0; k < N; k += PAGESIZE*4/sizeof(int))
bigarray[k] = 0;
PROFILE (
{
for (int k = 0, *p = bigarray; p != bigarray+N; ++p, ++k)
*p = k;
}, "new + strided memset (every 4th page) (routine6)");
sum = std::accumulate (bigarray, bigarray + N, 0ULL);
delete [] bigarray;
return sum;
}
uint64_t __attribute__((noinline)) routine7()
{
uint64_t sum;
vector<int> bigarray (N);
PROFILE (
{
for (int k = 0; k < N; ++k)
bigarray[k] = k;
}, "vector, using ctor to initialize (routine7)");
sum = std::accumulate (begin (bigarray), end (bigarray), 0ULL);
return sum;
}
uint64_t __attribute__((noinline)) routine8()
{
uint64_t sum;
vector<int> bigarray;
PROFILE (
{
for (int k = 0; k < N; ++k)
bigarray.push_back (k);
}, "vector (+ no reserve) + push_back (routine8)");
sum = std::accumulate (begin (bigarray), end (bigarray), 0ULL);
return sum;
}
uint64_t __attribute__((noinline)) routine9()
{
uint64_t sum;
vector<int> bigarray;
bigarray.reserve (N);
PROFILE (
{
for (int k = 0; k < N; ++k)
bigarray.push_back (k);
}, "vector + reserve + push_back (routine9)");
sum = std::accumulate (begin (bigarray), end (bigarray), 0ULL);
return sum;
}
uint64_t __attribute__((noinline)) routine10()
{
uint64_t sum;
vector<int> bigarray;
bigarray.reserve (N);
memset(bigarray.data(), 0, sizeof(int)*N);
PROFILE (
{
for (int k = 0; k < N; ++k)
bigarray.push_back (k);
}, "vector + reserve + memset (UB) + push_back (routine10)");
sum = std::accumulate (begin (bigarray), end (bigarray), 0ULL);
return sum;
}
template<class T>
void __attribute__((noinline)) adjust_size(std::vector<T>& v, int k, double factor)
{
if(k >= v.size())
{
v.resize(v.size() < 10 ? 10 : k*factor);
}
}
uint64_t __attribute__((noinline)) routine11()
{
uint64_t sum;
vector<int> bigarray;
PROFILE (
{
for (int k = 0; k < N; ++k)
{
adjust_size(bigarray, k, 1.5);
bigarray[k] = k;
}
}, "vector + custom emplace_back @ factor 1.5 (routine11)");
sum = std::accumulate (begin (bigarray), end (bigarray), 0ULL);
return sum;
}
uint64_t __attribute__((noinline)) routine12()
{
uint64_t sum;
vector<int> bigarray;
PROFILE (
{
for (int k = 0; k < N; ++k)
{
adjust_size(bigarray, k, 2);
bigarray[k] = k;
}
}, "vector + custom emplace_back @ factor 2 (routine12)");
sum = std::accumulate (begin (bigarray), end (bigarray), 0ULL);
return sum;
}
uint64_t __attribute__((noinline)) routine13()
{
uint64_t sum;
vector<int> bigarray;
PROFILE (
{
for (int k = 0; k < N; ++k)
{
adjust_size(bigarray, k, 3);
bigarray[k] = k;
}
}, "vector + custom emplace_back @ factor 3 (routine13)");
sum = std::accumulate (begin (bigarray), end (bigarray), 0ULL);
return sum;
}
uint64_t __attribute__((noinline)) routine14()
{
uint64_t sum;
vector<int> bigarray;
PROFILE (
{
for (int k = 0; k < N; ++k)
bigarray.emplace_back (k);
}, "vector (+ no reserve) + emplace_back (routine14)");
sum = std::accumulate (begin (bigarray), end (bigarray), 0ULL);
return sum;
}
uint64_t __attribute__((noinline)) routine15()
{
uint64_t sum;
vector<int> bigarray;
bigarray.reserve (N);
PROFILE (
{
for (int k = 0; k < N; ++k)
bigarray.emplace_back (k);
}, "vector + reserve + emplace_back (routine15)");
sum = std::accumulate (begin (bigarray), end (bigarray), 0ULL);
return sum;
}
uint64_t __attribute__((noinline)) routine16()
{
uint64_t sum;
vector<int> bigarray;
bigarray.reserve (N);
memset(bigarray.data(), 0, sizeof(bigarray[0])*N);
PROFILE (
{
for (int k = 0; k < N; ++k)
bigarray.emplace_back (k);
}, "vector + reserve + memset (UB) + emplace_back (routine16)");
sum = std::accumulate (begin (bigarray), end (bigarray), 0ULL);
return sum;
}
unsigned x = 0;
template<class T>
void /*__attribute__((noinline))*/ silly_branch(std::vector<T>& v, int k)
{
if(k == x)
{
x = x < 10 ? 10 : x*2;
}
//++x;
}
uint64_t __attribute__((noinline)) routine17()
{
uint64_t sum;
vector<int> bigarray(N);
PROFILE (
{
for (int k = 0; k < N; ++k)
{
silly_branch(bigarray, k);
bigarray[k] = k;
}
}, "vector, using ctor to initialize + silly branch (routine17)");
sum = std::accumulate (begin (bigarray), end (bigarray), 0ULL);
return sum;
}
template<class T, int N>
constexpr int get_extent(T(&)[N])
{ return N; }
int main()
{
uint64_t results[] = {routine2(),
routine1(),
routine2(),
routine3(),
routine4(),
routine5(),
routine6(),
routine7(),
routine8(),
routine9(),
routine10(),
routine11(),
routine12(),
routine13(),
routine14(),
routine15(),
routine16(),
routine17()};
std::cout << std::boolalpha;
for(int i = 1; i < get_extent(results); ++i)
{
std::cout << i << ": " << (results[0] == results[i]) << "\n";
}
std::cout << x << "\n";
}
A sample run, on an old & slow computer; note:
-
N == 2<<28
, not2<<29
as in the OP - compiled with g++4.9 20131022 with
-std=c++11 -O3 -march=native
[ temp.cpp: 71] 0.654927s --> new + full memset (routine2) [ temp.cpp: 54] 1.042405s --> new (routine1) [ temp.cpp: 71] 0.605061s --> new + full memset (routine2) [ temp.cpp: 89] 0.597487s --> new + strided memset (every page half) (routine3) [ temp.cpp: 107] 0.601271s --> new + strided memset (every page) (routine4) [ temp.cpp: 125] 0.783610s --> new + strided memset (every other page) (routine5) [ temp.cpp: 143] 0.903038s --> new + strided memset (every 4th page) (routine6) [ temp.cpp: 157] 0.602401s --> vector, using ctor to initialize (routine7) [ temp.cpp: 170] 3.811291s --> vector (+ no reserve) + push_back (routine8) [ temp.cpp: 184] 2.091391s --> vector + reserve + push_back (routine9) [ temp.cpp: 199] 1.375837s --> vector + reserve + memset (UB) + push_back (routine10) [ temp.cpp: 224] 8.738293s --> vector + custom emplace_back @ factor 1.5 (routine11) [ temp.cpp: 240] 5.513803s --> vector + custom emplace_back @ factor 2 (routine12) [ temp.cpp: 256] 5.150388s --> vector + custom emplace_back @ factor 3 (routine13) [ temp.cpp: 269] 3.789820s --> vector (+ no reserve) + emplace_back (routine14) [ temp.cpp: 283] 2.090259s --> vector + reserve + emplace_back (routine15) [ temp.cpp: 298] 1.288740s --> vector + reserve + memset (UB) + emplace_back (routine16) [ temp.cpp: 325] 0.611168s --> vector, using ctor to initialize + silly branch (routine17) 1: true 2: true 3: true 4: true 5: true 6: true 7: true 8: true 9: true 10: true 11: true 12: true 13: true 14: true 15: true 16: true 17: true 335544320
Solution 3
When you allocate the array in the constructor, the compiler/library can basically memset()
the original fill and then just set each individual value. When you use push_back()
, the std::vector<T>
class will need to:
- Check if there is enough space.
- Change the end pointer to be a new location.
- Set the actual value.
The last step is the only thing that needs to be done when the memory is allocated in one go.
Solution 4
I can answer your second question. Although the vector is preallocated, push_back still needs to check the available space everytime you call push_back. On the other hand operator[] performs no checks and simply assumes the space is available.
Solution 5
This is an extended comment, not an answer, intended to help improve the question.
Routine 4 invokes undefined behaviour. You are writing past the end of the size
of the array. Replace reserve with resize to eliminate that.
Routine 3 through 5 could do nothing post optimization as they have no observable output.
An insert( vec.end(), src.begin(), src.end() )
where src
is a random access generator range (boost
probably has it) might emulate the new
version if your insert
is smart.
The duplication of routine1
seems funny -- by any chance does this change the timings?
xis
Updated on July 09, 2022Comments
-
xis almost 2 years
I just read this blog http://lemire.me/blog/archives/2012/06/20/do-not-waste-time-with-stl-vectors/ comparing performance of
operator[]
assignment andpush_back
on memory pre-reservedstd::vector
and I decided to try it myself. The operation is simple:// for vector bigarray.reserve(N); // START TIME TRACK for(int k = 0; k < N; ++k) // for operator[]: // bigarray[k] = k; // for push_back bigarray.push_back(k); // END TIME TRACK // do some dummy operations to prevent compiler optimize long sum = accumulate(begin(bigarray), end(array),0 0);
And here is the result:
~/t/benchmark> icc 1.cpp -O3 -std=c++11 ~/t/benchmark> ./a.out [ 1.cpp: 52] 0.789123s --> C++ new [ 1.cpp: 52] 0.774049s --> C++ new [ 1.cpp: 66] 0.351176s --> vector [ 1.cpp: 80] 1.801294s --> reserve + push_back [ 1.cpp: 94] 1.753786s --> reserve + emplace_back [ 1.cpp: 107] 2.815756s --> no reserve + push_back ~/t/benchmark> clang++ 1.cpp -std=c++11 -O3 ~/t/benchmark> ./a.out [ 1.cpp: 52] 0.592318s --> C++ new [ 1.cpp: 52] 0.566979s --> C++ new [ 1.cpp: 66] 0.270363s --> vector [ 1.cpp: 80] 1.763784s --> reserve + push_back [ 1.cpp: 94] 1.761879s --> reserve + emplace_back [ 1.cpp: 107] 2.815596s --> no reserve + push_back ~/t/benchmark> g++ 1.cpp -O3 -std=c++11 ~/t/benchmark> ./a.out [ 1.cpp: 52] 0.617995s --> C++ new [ 1.cpp: 52] 0.601746s --> C++ new [ 1.cpp: 66] 0.270533s --> vector [ 1.cpp: 80] 1.766538s --> reserve + push_back [ 1.cpp: 94] 1.998792s --> reserve + emplace_back [ 1.cpp: 107] 2.815617s --> no reserve + push_back
For all the compilers,
vector
withoperator[]
is much faster than raw pointer withoperator[]
. This led to the first question: why? The second question is, I have already "reserved" the memory, so whyopeator[]
is faster?The next question is, since the memory is already allocated, why
push_back
is times slower thanoperator[]
?The test code is attached below:
#include <iostream> #include <iomanip> #include <vector> #include <numeric> #include <chrono> #include <string> #include <cstring> #define PROFILE(BLOCK, ROUTNAME) ProfilerRun([&](){do {BLOCK;} while(0);}, \ ROUTNAME, __FILE__, __LINE__); template <typename T> void ProfilerRun (T&& func, const std::string& routine_name = "unknown", const char* file = "unknown", unsigned line = 0) { using std::chrono::duration_cast; using std::chrono::microseconds; using std::chrono::steady_clock; using std::cerr; using std::endl; steady_clock::time_point t_begin = steady_clock::now(); // Call the function func(); steady_clock::time_point t_end = steady_clock::now(); cerr << "[" << std::setw (20) << (std::strrchr (file, '/') ? std::strrchr (file, '/') + 1 : file) << ":" << std::setw (5) << line << "] " << std::setw (10) << std::setprecision (6) << std::fixed << static_cast<float> (duration_cast<microseconds> (t_end - t_begin).count()) / 1e6 << "s --> " << routine_name << endl; cerr.unsetf (std::ios_base::floatfield); } using namespace std; const int N = (1 << 29); int routine1() { int sum; int* bigarray = new int[N]; PROFILE ( { for (unsigned int k = 0; k < N; ++k) bigarray[k] = k; }, "C++ new"); sum = std::accumulate (bigarray, bigarray + N, 0); delete [] bigarray; return sum; } int routine2() { int sum; vector<int> bigarray (N); PROFILE ( { for (unsigned int k = 0; k < N; ++k) bigarray[k] = k; }, "vector"); sum = std::accumulate (begin (bigarray), end (bigarray), 0); return sum; } int routine3() { int sum; vector<int> bigarray; bigarray.reserve (N); PROFILE ( { for (unsigned int k = 0; k < N; ++k) bigarray.push_back (k); }, "reserve + push_back"); sum = std::accumulate (begin (bigarray), end (bigarray), 0); return sum; } int routine4() { int sum; vector<int> bigarray; bigarray.reserve (N); PROFILE ( { for (unsigned int k = 0; k < N; ++k) bigarray.emplace_back(k); }, "reserve + emplace_back"); sum = std::accumulate (begin (bigarray), end (bigarray), 0); return sum; } int routine5() { int sum; vector<int> bigarray; PROFILE ( { for (unsigned int k = 0; k < N; ++k) bigarray.push_back (k); }, "no reserve + push_back"); sum = std::accumulate (begin (bigarray), end (bigarray), 0); return sum; } int main() { long s0 = routine1(); long s1 = routine1(); long s2 = routine2(); long s3 = routine3(); long s4 = routine4(); long s5 = routine5(); return int (s1 + s2); }
-
dyp over 10 yearsWhat about branch prediction?
-
Zac Howland over 10 years@DyP even with branch prediction in your favor, you'd still be running an additional set of instructions (e.g. the branch check) that does not exist when you set items with
operator[]
-
dyp over 10 years@ZacHowland The difference is a factor 5, so I guess it's more than just a branch. Maybe it's additional instructions; maybe
push_back
will not be inlined, butoperator[]
. -
dyp over 10 yearsI understand why routine 4 invokes UB, but I somehow can't see how 2 does.
-
Yakk - Adam Nevraumont over 10 years@dyp I missed the
N
in the constructor. -
dyp over 10 yearsBtw I checked by summing all the results and using a large enough data type: the relative timings of routine 3 and 5 are correct; changing the order of calls also doesn't seem to have much effect.
-
Zac Howland over 10 years@DyP It is a combination of all of those. The short answer is that
push_back
does a lot more thanoperator[]
does. -
dyp over 10 yearsN.B. The "custom emplace_back" uses
resize
, therefore touching the memory twice. As opposed to thememset
before running the measurement, both writes are included in the timing. This might explain the difference to "vector + reserve + push_back". -
xis over 10 yearsThis is an excellent explanation. The
memset
should touched every bit of memory and then the cache is now effectively used. -
Yakk - Adam Nevraumont over 10 years@xiaogesu maybe not cache, but a lazy allocator: some allocators do not allocate really until the memory page is read or written...
-
Dietmar Kühl over 10 years@Yakk: When
push_back()
is used, the steps 1. and 2. need to be done in addition to setting the actual values; when usingoperator[]()
only setting the last value is done. I think this explains the time difference. What part of this answer is wrong? Assuming your comment is to explain the downvote, can you please explain to me how that is wrong, warranting a downvote? My last sentence about initializing an array with values being fast isn't wrong either but does, indeed, not need to be there and I shall remove it. -
dyp over 8 years@TemplateRex Yakk and me have been a bit late to the party :)
-
TemplateRex over 8 yearsI've noticed this Q&A because I've been playing around with the new
small_vector
andstatic_vector
in Boost.Container (since 1.58), also in relation to Scott Meyers's blog post this week aboutflat_set
performance. As your answer shows, you can't take anything for granted. -
dyp almost 8 yearsWith some help from a profiling tool, it seems to me that the slowdown of the OP's routine 3 vs routine 2 is not just caused by the difference of what is measured, but also the way in which the pages are committed. Routine 3 writes to a page, then produces a page fault, then commits a new page and so on (presumably). This seems to be much less efficient than direclty requesting zeroed-out pages.
-
xcvr over 7 yearsJust to add a bit. The call to
resize
/reserve
inpush_back
can have a measurable impact on performance even though it's never called (regardless of the branching). This is a result of inliningresize
/reserve
so YMMV. The assembly generated if inlined will bepush
ing all sorts of things, and thenpop
ing them, when it doesn't need to for the common case. And it would also makepush_back
a better candidate for inlining optimizations itself, if reserve/resize wasn't inlined.