Is shrink_to_fit the proper way of reducing the capacity a `std::vector` to its size?

c++ c++11 vector stl

18,028

Solution 1

Do the arguments in the quotation hold?

Measure and you will know. Are you constrained in memory? Can you figure out the correct size up front? It will be more efficient to reserve than it will be to shrink after the fact. In general I am inclined to agree on the premise that most uses are probably fine with the slack.

If yes, what's the proper way of shrinking an STL container's capacity to its size (at least for std::vector).

The comment does not only apply to shrink_to_fit, but to any other way of shrinking. Given that you cannot realloc in place, it involves acquiring a different chunk of memory and copying over there regardless of what mechanism you use for shrinking.

And if there's a better way to shrink a container, what's the reason for the existence of shrink_to_fit after-all?

The request is non-binding, but the alternatives don't have better guarantees. The question is whether shrinking makes sense: if it does, then it makes sense to provide a shrink_to_fit operation that can take advantage of the fact that the objects are being moved to a new location. I.e., if the type T has a noexcept(true) move constructor, it will allocate the new memory and move the elements.

While you can achieve the same externally, this interface simplifies the operation. The equivalent to shrink_to_fit in C++03 would have been:

std::vector<T>(current).swap(current);

But the problem with this approach is that when the copy is done to the temporary it does not know that current is going to be replaced, there is nothing that tells the library that it can move the held objects. Note that using std::move(current) would not achieve the desired effect as it would move the whole buffer, maintaining the same capacity().

Implementing this externally would be a bit more cumbersome:

{
   std::vector<T> copy;
   if (noexcept(T(std::move(declval<T>())))) {
      copy.assign(std::make_move_iterator(current.begin()),
                  std::make_move_iterator(current.end()));
   } else {
      copy.assign(current.begin(), current.end());
   }
   copy.swap(current);
}

Assuming that I got the if condition right... which is probably not what you want to write every time that you want this operation.

Solution 2

Will the arguments hold?

As the arguments are originally mine, don't mind if I defend them, one by one:

Either shrink_to_fit does nothing (...)

As it was mentioned, the standard says (many times, but in the case of vector it's section 23.3.7.3...) that the request is non-binding to allow an implementation latitude for optimizations. This means that the implementation can define shrink_to_fit as an no-op.
(...) or it gives you cache locality issues

In the case that shrink_to_fit is not implemented as a no-op, you have to allocate a new underlying container with capacity size(), copy (or, in the best case, move) construct all your N = size() new items from the old ones, destruct all the old ones (in the move case this should be optimized, but it's possible that this involves a loop again over the old container) and then destructing the old container per se. This is done, in libstdc++-4.9, exactly as David Rodriguez has described, by
```
      _Tp(__make_move_if_noexcept_iterator(__c.begin()),
          __make_move_if_noexcept_iterator(__c.end()),
          __c.get_allocator()).swap(__c);
```
and in libc++-3.5, by a function in __alloc_traits that does approximately the same.

Oh, and an implementation absolutely cannot rely on realloc (even if it uses malloc inside ::operator new for its memory allocations) because realloc, if it cannot shrink in-place, will either leave the memory alone (no-op case) or make a bitwise copy (and miss the opportunity for readjusting pointers, etc. that the proper C++ copying/moving constructors would give).

Sure, one can write a shrinkable memory allocator, and use it in the constructor of its vectors.

In the easy case where the vectors are larger than the cache lines, all that movement puts pressure on the cache.
and it's O(n)

If n = size(), I think it was established above that, at the very least, you have to do one n sized allocation, n copy or move constructions, n destructions, and one old_capacity sized deallocation.
usually it's cheaper just to leave the slack in memory

Obviously, unless you are really pressed for free memory (in which case it might be wiser to save your data to the disk and re-load it later on demand...)

If yes, what's the proper way of shrinking an STL container's capacity to its size (at least for std::vector).

The proper way is still shrink_to_fit... you just have to either not rely on it or know very well your implementation!

And if there's a better way to shrink a container, what's the reason for the existence of shrink_to_fit after-all?

There is no better way, but the reason for the existence of shrink_to_fit is, AFAICT, that sometimes your program might feel memory pressure and it's one way of treating it. Not a very good way, but still.

HTH!

Solution 3

If yes, what's the proper way of shrinking an STL container's capacity to its size (at least for std::vector).

The 'swap trick' will trim a vector to the exact size required (from More Effective SQL):

vector<Person>(persons).swap(persons);

Particularly useful when the vector is empty, to release all memory:

vector<Person>().swap(persons);

Vectors were constantly tripping my unit tester's memory leak detection code because of retained allocations of unused space, and this sorted them out perfectly.

This is the kind of example where I really don't care about runtime efficiency (size or speed), but I do care about exact memory usage.

And if there's a better way to shrink a container, what's the reason for the existence of shrink_to_fit after-all?

I really don't know what the point of providing a function that can legally do absolutely nothing is. I cheered when I saw it had been introduced, then despaired when I found it couldn't be relied upon.

Perhaps we'll see maybe_sort() in the next version.

18,028

Author by

101010

Updated on June 11, 2022

Comments

101010 almost 2 years
In C++11 shrink_to_fit was introduce to complement certain STL containers (e.g., std::vector, std::deque, std::string).

Synopsizing, its main functionality is to request the container that is associated to, to reduce its capacity to fit its size. However, this request is non-binding, and the container implementation is free to optimize otherwise and leave the vector with a capacity greater than its size.

Furthermore, in a previous SO question the OP was discouraged from using shrink_to_fit to reduce the capacity of his std::vector to its size. The reasons not to do so are quoted below:

shrink_to_fit does nothing or it gives you cache locality issues and it's O(n) to execute (since you have to copy each item to their new, smaller home). Usually it's cheaper to leave the slack in memory. @Massa

Could someone be so kind as to address the following questions:
- Do the arguments in the quotation hold?
- If yes, what's the proper way of shrinking an STL container's capacity to its size (at least for std::vector).
- And if there's a better way to shrink a container, what's the reason for the existence of shrink_to_fit after-all?
Jonas Schäfer almost 10 years

Given that you cannot realloc in place, it involves acquiring a different chunk of memory and copying over there regardless of what mechanism you use for shrinking. Can you explain that please? The realloc manpage suggests that it is possible for realloc to work in-place (which I would assume a sane implementation does if it is about to shrink a memory region). Is this due to some speciality of C++ semantics?
David Rodríguez - dribeas almost 10 years

@JonasWielicki: Except that C++ memory allocators don't use realloc. There is some work being done in a proof of concept inside boost, but allocators cannot use realloc (or rather, they can use realloc but only to implement malloc and free)
MikeMB over 7 years

You also can't rely on the swap mechanism. I don't think (correct me if I'm wrong) that anything prevents the newly created vector to have a bigger capacity then it's size. In practice however both approaches work.
Andy Krouwel over 7 years

No, I don't think the standard guarantees that an initially empty vector has a capacity of 0, although as you say in practice it normally is. However, even if it's non-0 on construction then it's highly likely that one initially empty vector will have the same capacity as another initially empty vector, so for my memory-comparison purposes it'd still work fine.
MikeMB over 7 years

I'm not primarily talking about the empty vector - any implementation where that causes a dynamic memory allocation is imho broken. I'm talking about the temporary vector you create as a copy from persons. It might very well be beneficial for a newly created vector to have a bigger capacity than it's size due to some particularities of the underlying heap allocation functions (e.g. because they only allocate memory blocks in discrete chunks anyway).
MikeMB over 7 years

Also I don't know why you quip about a function which may do nothing (although in reality it will do what you want) but present an alternative which doesn't give you any guarantees either, will in practice pretty much achieve the same result, but might actually have to do more work to do it, because the individual elements are not moved.
Andy Krouwel over 7 years

I'd rather have something guaranteed, but you make do with what you have, and swap trick works for me where shrink_to_fit() didn't. Swap trick is C++98 compatible, shrink_to_fit() isn't.
MikeMB over 7 years

Interesting. May I ask, on what compiler and in what situation it didn't work?
Andy Krouwel over 7 years

I think it was Visual C++ 2005, which I still have to use far too often.
MikeMB over 7 years

Ouch! Thank you very much. I have to admit that this reassures me somewhat.
Vyacheslav Lanovets about 5 years

Is there any implementation which ignores shrink_to_fit() request? Major ones seem to respect it.
ahcox about 2 years

Is the reference to an SQL book correct? More Effective C++ or Effective STL seem more likely.
ahcox about 2 years

As mentioned in the accepted answer, this older idiom has to copy, whereas the standard function can move the vector contents more efficiently (if it is not a NOP in a particular compiler).
Andy Krouwel about 2 years

SQL typo corrected to STL. Thanks!