small string optimization for vector?

10,837

Solution 1

You can borrow the SmallVector implementation from LLVM. (header only, located in LLVM\include\llvm\ADT)

Solution 2

Boost 1.58 was just released and it's Container library has a small_vector class based on the LLVM SmallVector.

There is also a static_vector which cannot grow beyond the initially given size. Both containers are header-only.

facebook's folly library also has some awesome containers.

It has a small_vector which can be configured with a template parameter to act like boost's static or small vectors. It can also be configured to use small integer types for it's internal size bookkeeping which given that they are facebook is no surprise :)

There is work in progress to make the library cross platform so Windows/MSVC support should land some day...

Solution 3

It was discussed years ago (and a few of the names in that thread may look a bit familiar :-) ), but I don't know of an existing implementation. I don't think I'd try to adapt std::string to the task. The exact requirements on the type over which std::basic_string aren't well stated, but the standard is pretty clear that it's only intended for something that acts a lot like char. For types that are substantially different, it might still work, but it's hard to say what would happen -- it was never intended for, and probably hasn't been tested with many types other than small integers.

A fully conforming implementation of std::vector is a lot of work. But implementing a usable subset of std::vector from scratch (even including a small vector optimization) won't usually be terribly difficult. If you include a small vector optimization, I'm reasonably certain you can't meet all the requirements on std::vector though.

In particular, swapping or moving a vector where you've stored actual data in the vector object means you'll need to swap/move actual data items, where the requirements on std::vector are predicated on its storing only a pointer to the data, so it can normally1 swap or move the contents just by manipulating the pointers, without actually touching the data items themselves at all. As such, it's required to be able to do these things without throwing, even if manipulating the data items themselves would/will throw. As such, a small vector optimization will preclude meeting those requirements.

On the other hand, as noted above, one of the requirements on std::string is that it can only store items that can be manipulated without throwing. As such, if std::string is a viable option at all, implementing your own vector-like container probably won't need to worry about those details a lot either.


  1. There is one case where you end up having to swap/move actual data items, even in an actual std::vector: if the two vectors use different allocators, then you have to allocate space for the objects in the destination via that vector's allocator.

Solution 4

If T is a POD type why not basic_string instead of vector??

Share:
10,837
BuschnicK
Author by

BuschnicK

Software Engineer @Google

Updated on June 02, 2022

Comments

  • BuschnicK
    BuschnicK almost 2 years

    I know several (all?) STL implementations implement a "small string" optimization where instead of storing the usual 3 pointers for begin, end and capacity a string will store the actual character data in the memory used for the pointers if sizeof(characters) <= sizeof(pointers). I am in a situation where I have lots of small vectors with an element size <= sizeof(pointer). I cannot use fixed size arrays, since the vectors need to be able to resize dynamically and may potentially grow quite large. However, the median (not mean) size of the vectors will only be 4-12 bytes. So a "small string" optimization adapted to vectors would be quite useful to me. Does such a thing exist?

    I'm thinking about rolling my own by simply brute force converting a vector to a string, i.e. providing a vector interface to a string. Good idea?

  • josesuero
    josesuero about 14 years
    Implementing std::vector correctly from scratch is surprisingly hard. Of course, if you choose to ignore exception safety it becomes a lot easier, but then it's no longer a meaningful (or useful) vector implementation.
  • BuschnicK
    BuschnicK about 14 years
    Thanks for the pointer - unfortunately the discussion never really reached a conclusion. Implementing my own vector wouldn't be all that difficult, I agree. However, I was hoping of just plugging in some ready made class and see the effect on memory consumption. I'm currently toying with various ideas of reducing our memory footprint and this is only one of them.
  • josesuero
    josesuero about 14 years
    When that is said, I do think a custom vector implementation would be the best way to achieve this. Wrapping the existing vector class would make it hard to exploit the object's size efficiently, and a custom vector isn't rocket science, as long as you're careful and know what you're doing.
  • Jerry Coffin
    Jerry Coffin about 14 years
    @jalf: it's certainly true that it isn't trivial -- and there's one requirement that probably can't be met. Swapping two vectors isn't allowed to throw an exception except if the copy ctor for the allocator throws, even if T(T const&) and/or swap(T, T) throws. I don't see any way to meet that requirement using a built-in buffer.
  • josesuero
    josesuero about 14 years
    @Jerry: True, I don't think that can be done either. But at least it's a bit of a corner case, where you can probably live without the nothrow guarantee.
  • Admin
    Admin about 9 years
    It's not header-only anymore.
  • Nick
    Nick about 8 years
  • Steve Jessop
    Steve Jessop over 7 years
    Note that you'd have to write a std::char_traits specialization (or an equivalent traits class) for the POD type. The requirements aren't all that severe, other than the need to nominate a special eof value, but I expect it's kind of tedious.
  • Arne Vogel
    Arne Vogel almost 7 years
    Moreover, there is no guarantee that basic_string<MyPodType> will use small string optimization, and no portable way of specifying how many elements, even if it does.
  • xaxxon
    xaxxon about 6 years
    Why would you suggest mis-using a string?
  • phuclv
    phuclv over 4 years
    one of the hardest part about a vector is choosing the strategy for expanding the internal array
  • Eike
    Eike over 2 years
    @jalf: re: object size: couldn't you wrap a std::vector inside of a union with a std::array<T, number of shortvector elements> ? That way you can use the methods of the containers and "only" think of transition from one to the other.