Vector vs Array Performance

c++ benchmarking

12,544

Solution 1

I can guarantee that LLVM does infact misoptimize std::vector (if you are in fact optimising at all), at least as of right now. It does not correctly inline many of the function calls involved. You will get better performance with GCC.

Solution 2

A simpler explanation: you're building with optimisations disabled. You want -O3, not -o3.

I don't have clang available to exactly reproduce your tests, but my results are as follows:

//Array run # 1
$ g++ -std=c++11 -O3 test.cpp -o b.out && time ./b.out

real    0m25.323s
user    0m25.162s
sys 0m0.148s

//Vector run #1
$ g++ -std=c++11 -O3 test.cpp -o b.out && time ./b.out

real    0m25.634s
user    0m25.486s
sys 0m0.136s

12,544

ChrisCM

I'm Chris, software engineer and mobile accessibility expert at Deque Systems Inc. Also a member of the W3C WCAG 2.0 Mobile Accessibility Task Force and one of the leading Native Mobile Accessibility Experts in the world, with deep knowledge of both Android and iOS Accessibility APIs. I have been involved in software development since I was in middle school, starting out with Android development and moved onto iOS. Now I have a deep interest in mobile architecture, especially mobile accessibility architecture. I try to explore the many solutions and different ways of creating accessible applications. I think this question is particularly interesting when it comes to mobile development. I have worked on my interesting projects and technologies. Including the ASKInterface's accessible keyboard and Deque's Objective-C static accessibility analysis tool for iOS.

Updated on October 03, 2022

Comments

ChrisCM over 1 year
In another thread I started a discussion about Vectors and Arrays, in which I was largely playing devil's advocate, to push buttons. However, during the course of this, I stumbled onto a test case that has me a little perplexed, and I would like to have a real discussion about it, over the "abuse" I'm getting for playing devil's advocate, starting a real discussion on that thread is now impossible. However, the particular example has me intrigued, and I cannot explain it to myself satisfactorily.

The discussion is about the general performance of Vector vs Arrays, ignoring dynamic elements. Ex: obviously continually using push_back() in a vector is going to slow it down. We're assuming that the vector and array are pre-populated with data. The example I presented, and subsequently modified by an individual in the thread, was the following:
```
#include <iostream>
#include <vector>
#include <type_traits>
using namespace std;

const int ARRAY_SIZE = 500000000;

// http://stackoverflow.com/a/15975738/500104
template <class T>
class no_init_allocator
{
public:
    typedef T value_type;

    no_init_allocator() noexcept {}
    template <class U>
        no_init_allocator(const no_init_allocator<U>&) noexcept {}
    T* allocate(std::size_t n)
        {return static_cast<T*>(::operator new(n * sizeof(T)));}
    void deallocate(T* p, std::size_t) noexcept
        {::operator delete(static_cast<void*>(p));}
    template <class U>
        void construct(U*) noexcept
        {
            // libstdc++ doesn't know 'is_trivially_default_constructible', still has the old names
            static_assert(is_trivially_default_constructible<U>::value,
            "This allocator can only be used with trivally default constructible types");
        }
    template <class U, class A0, class... Args>
        void construct(U* up, A0&& a0, Args&&... args) noexcept
        {
            ::new(up) U(std::forward<A0>(a0), std::forward<Args>(args)...);
        }
};

int main() {
    srand(5);  //I use the same seed, we just need the random distribution.
    vector<char, no_init_allocator<char>> charArray(ARRAY_SIZE);
    //char* charArray = new char[ARRAY_SIZE];
    for(int i = 0; i < ARRAY_SIZE; i++) {
        charArray[i] = (char)((i%26) + 48) ;
    }

    for(int i = 0; i < ARRAY_SIZE; i++) {
        charArray[i] = charArray[rand() % ARRAY_SIZE];
    }
}
```
When I run this on my machine, I get the following terminal output. The first run is with the vector line uncommented, the second is with the array line uncommented. I used the highest level of optimization, to give the vector the best chance of success. Below are my results, the first two runs with the array line uncommented, the second two with the vector line.
```
//Array run # 1
clang++ -std=c++11 -stdlib=libc++ -o3 some.cpp -o b.out && time ./b.out

real    0m20.287s
user    0m20.068s
sys 0m0.175s

//Array run # 2
clang++ -std=c++11 -stdlib=libc++ -o3 some.cpp -o b.out && time ./b.out

real    0m21.504s
user    0m21.267s
sys 0m0.192s

//Vector run # 1
clang++ -std=c++11 -stdlib=libc++ -o3 some.cpp -o b.out && time ./b.out

real    0m28.513s
user    0m28.292s
sys 0m0.178s

//Vector run # 2
clang++ -std=c++11 -stdlib=libc++ -o3 some.cpp -o b.out && time ./b.out

real    0m28.607s
user    0m28.391s
sys 0m0.178s
```
That arrays outperform vectors does not surprise me, however, that the difference is on the order of 50% surprises me very much, I would expect that they would be negligible, and I feel like the nature of this test case me be obscuring the nature of the results. When you run this test on array sizes that are smaller, the performance differences dissipate dramatically.

My explanation:

The additional implementation instructions for vector are causing the vector instructions to align in memory poorly, perhaps even on this example, a split at a really bad point on 2 different "blocks". This is causing memory to jump back and forth between levels of cache vs data cache vs instruction cache more frequently than you would expect. I also suspect that the LLVM compiler may be exaggerating weaknesses, and optimizing poorly, due to some of the newer C++11 elements, though I have no reason for either of these explanations besides hypothesis and conjecture.

I am interested if A: that someone can replicate my results and B: if someone has a better explanation for how the computer is running this particular benchmark and why vector is so dramatically underperforming arrays in this instance.

My set up: http://www.newegg.com/Product/Product.aspx?Item=N82E16834100226
- Daniel Fischer almost 11 years
  
  -o3 should be -O3, you're not optimising.
- josesuero almost 11 years
  
  Interesting question, and so nice to see both source code, compiler flags and results, along with a detailed explanation. +1 for a well asked question. :)
- ChrisCM almost 11 years
  
  @Markku: What you outline is specifically what optimizations should fix though.
- Daniel Fischer
  
  Using -O3 and changing is_trivially_default_constructible to has_trivial_default_constructor (using libstdc++), I get 0m26.250s for array and 0m27.438s for vector, so that's more or less the expected small difference.