STL vectors with uninitialized storage?

29,823

Solution 1

std::vector must initialize the values in the array somehow, which means some constructor (or copy-constructor) must be called. The behavior of vector (or any container class) is undefined if you were to access the uninitialized section of the array as if it were initialized.

The best way is to use reserve() and push_back(), so that the copy-constructor is used, avoiding default-construction.

Using your example code:

struct YourData {
    int d1;
    int d2;
    YourData(int v1, int v2) : d1(v1), d2(v2) {}
};

std::vector<YourData> memberVector;

void GetsCalledALot(int* data1, int* data2, int count) {
    int mvSize = memberVector.size();

    // Does not initialize the extra elements
    memberVector.reserve(mvSize + count);

    // Note: consider using std::generate_n or std::copy instead of this loop.
    for (int i = 0; i < count; ++i) {
        // Copy construct using a temporary.
        memberVector.push_back(YourData(data1[i], data2[i]));
    }
}

The only problem with calling reserve() (or resize()) like this is that you may end up invoking the copy-constructor more often than you need to. If you can make a good prediction as to the final size of the array, it's better to reserve() the space once at the beginning. If you don't know the final size though, at least the number of copies will be minimal on average.

In the current version of C++, the inner loop is a bit inefficient as a temporary value is constructed on the stack, copy-constructed to the vectors memory, and finally the temporary is destroyed. However the next version of C++ has a feature called R-Value references (T&&) which will help.

The interface supplied by std::vector does not allow for another option, which is to use some factory-like class to construct values other than the default. Here is a rough example of what this pattern would look like implemented in C++:

template <typename T>
class my_vector_replacement {

    // ...

    template <typename F>
    my_vector::push_back_using_factory(F factory) {
        // ... check size of array, and resize if needed.

        // Copy construct using placement new,
        new(arrayData+end) T(factory())
        end += sizeof(T);
    }

    char* arrayData;
    size_t end; // Of initialized data in arrayData
};

// One of many possible implementations
struct MyFactory {
    MyFactory(int* p1, int* p2) : d1(p1), d2(p2) {}
    YourData operator()() const {
        return YourData(*d1,*d2);
    }
    int* d1;
    int* d2;
};

void GetsCalledALot(int* data1, int* data2, int count) {
    // ... Still will need the same call to a reserve() type function.

    // Note: consider using std::generate_n or std::copy instead of this loop.
    for (int i = 0; i < count; ++i) {
        // Copy construct using a factory
        memberVector.push_back_using_factory(MyFactory(data1+i, data2+i));
    }
}

Doing this does mean you have to create your own vector class. In this case it also complicates what should have been a simple example. But there may be times where using a factory function like this is better, for instance if the insert is conditional on some other value, and you would have to otherwise unconditionally construct some expensive temporary even if it wasn't actually needed.

Solution 2

In C++11 (and boost) you can use the array version of unique_ptr to allocate an uninitialized array. This isn't quite an stl container, but is still memory managed and C++-ish which will be good enough for many applications.

auto my_uninit_array = std::unique_ptr<mystruct[]>(new mystruct[count]);

Solution 3

C++0x adds a new member function template emplace_back to vector (which relies on variadic templates and perfect forwarding) that gets rid of any temporaries entirely:

memberVector.emplace_back(data1[i], data2[i]);

Solution 4

You can use boost::noinit_adaptor to default initialize new elements (which is no initialization for built-in types):

std::vector<T, boost::noinit_adaptor<std::allocator<T>> memberVector;

As long as you don't pass an initializer into resize, it default initializes the new elements.

Solution 5

So here's the problem, resize is calling insert, which is doing a copy construction from a default constructed element for each of the newly added elements. To get this to 0 cost you need to write your own default constructor AND your own copy constructor as empty functions. Doing this to your copy constructor is a very bad idea because it will break std::vector's internal reallocation algorithms.

Summary: You're not going to be able to do this with std::vector.

Share:
29,823
Jim Hunziker
Author by

Jim Hunziker

C/C++, Python, Machine learning

Updated on July 10, 2021

Comments

  • Jim Hunziker
    Jim Hunziker almost 3 years

    I'm writing an inner loop that needs to place structs in contiguous storage. I don't know how many of these structs there will be ahead of time. My problem is that STL's vector initializes its values to 0, so no matter what I do, I incur the cost of the initialization plus the cost of setting the struct's members to their values.

    Is there any way to prevent the initialization, or is there an STL-like container out there with resizeable contiguous storage and uninitialized elements?

    (I'm certain that this part of the code needs to be optimized, and I'm certain that the initialization is a significant cost.)

    Also, see my comments below for a clarification about when the initialization occurs.

    SOME CODE:

    void GetsCalledALot(int* data1, int* data2, int count) {
        int mvSize = memberVector.size()
        memberVector.resize(mvSize + count); // causes 0-initialization
    
        for (int i = 0; i < count; ++i) {
            memberVector[mvSize + i].d1 = data1[i];
            memberVector[mvSize + i].d2 = data2[i];
        }
    }