Why is std::rotate so fast?

c++ algorithm sorting c++11 stl

16,626

Solution 1

Edit:

Since the context is not given, it is not clear if your code calls std::swap() or another swap(a,b) algorithm like

T tmp = a; a = b; b = tmp;

When a and b are vectors of 1000 ints each, this would copy all vector elements 3 times. The specialized version of std::swap() for containers like std::vector<T> call the container a.swap(b) method instead, essentially swapping only the dynamic data pointers of the containers.

Also, for different iterator types the std::rotate() implementation can utilize some optimizations (see my older, possibly misleading answer below).

Caveat: The implementation of std::rotate() is implementation-dependent. For different iterator categories, different algorithms can be utilized (e.g. look for __rotate( in bits/stl_algo.h header of GNU g++).

To shift n elements by m=std::distance(first,middle) a simple (naive) algorithm like m rotations by one element needs O(n*m) moving or copying operations. But only O(n) moves are needed, when each element is directly placed to its right position, which results in a (roughly) m times faster algorithm.

An example for illustration: Rotate a string s = "abcdefg" by three elements:

abcdefg : store 'a' in temporary place
dbcdefg : move s[3] to s[0] (where it belongs in the end, directly)
dbcgefg : move s[6] to s[3]
dbcgefc : move s[9%7] to s[6] (wrapping index modulo container size: 9%7 == 2)
dbfgefc : move s[5] to s[2]
dbfgebc : move s[1] to s[5] (another wrapping around)
defgebc : move s[4] to s[1]
defgabc : move 'a' from temporary place to s[4]

For n and m with greatest common divisor 1 you are done now. Otherwise you have to repeat that scheme n/m time for the first m consecutive elements (n > m assumed here). This little more complicated algorithm is much faster.

For bidirectional iterators another legendary O(3n) algorithm can be used, referred to as "flipping hands". According to Jon Bentley's book Programming Pearls it was used in early UNIX editors for moving text:

Place your hands in front of you, one above the other, thumbs up. Now

Turn one hand.
Turn the other.
Turn both, connected to each other.

In code:

reverse(first, middle);
reverse(middle, last);
reverse(first, last);

For random access iterators large chunks of memory can be relocated by swap_ranges() (or memmove() operations for POD types).

Microoptimization by utilising assembler operations can give a small extra amount of acceleration, it can be done on top of the fasted algorithm.

Algorithms using consecutive elements instead of "hopping around" in memory also result in smaller number of cache misses on modern computer architectures.

Solution 2

As the commentors already stated, it depends on your Standard Library implementation. But the code that you posted is valid even for forward iterators. As such, it imposes very little requirements (only that these iterators can be incremented and dereferenced).

Stepanov's classic Elements of Programming devotes an entire chapter (10) to rotate and other rearrangement algorithms. For forward iterators, the series of swaps in your code gives O(3N) assignments. For bidirectional iterators, three consecutive calls to reverse yield another O(3N) algorithm. For random access iterators, std::rotate can be implemented as O(N) assignments by defining a permutation of indices w.r.t. to the starting iterator first.

All the above algorithms are in-place. Using a memory buffer, it is possible that the random access version can benefit from the greater cache locality of memcpy() or memmove() (if the underlying value type is POD) in which entire blocks of contiguous memory can be swapped. If your insertion sort is done on an array or std::vector, it is likely that your Standard Library will take advantage of this optimization.

TL;DR: trust your Standard Library and don't reinvent the wheel!

16,626

Author by

bradshire

Updated on June 25, 2022

Comments

bradshire almost 2 years
Why is std::rotate so much faster than the equivalent function that cplusplus.com describes?

cplusplus.com's implementation:
```
template <class ForwardIterator>
  void rotate (ForwardIterator first, ForwardIterator middle, ForwardIterator last)
{
  ForwardIterator next= middle;

  while (first != next)
  {
    swap (*first++, *next++);

    if(next == last)
        next= middle;
    else if (first==middle)
        middle= next;
  }
}
```
I have two insertion sorting algorithms which are entirely identical, with the exception that one uses std::rotate, and one uses cplusplus.com's equivalent function. I'm setting them to sort 1000 vectors with 1000 int elements. The sort which uses std::rotate takes 0.376 seconds, and the other takes 8.181 seconds.

Why is this? I'm not intending to try and make something better than the STL functions but I'm still curious.
Marshall Clow over 10 years

> For random access iterators, std::rotate is very likely to benefit from memmove() optimizations in which entire blocks of contiguous memory can be swapped. This is, in general, not true. If the underlying data type is a POD (plain old data) type, then memmove/memcpy can be used. Otherwise the copy/move constructors for the objects must be called.
bradshire over 10 years

Thanks. These comments and answers have given me a good deal of insight. Looks like I have a lot of learning to do before I can fully understand what's going on...
bradshire over 10 years

Thanks. I think I'll need to do a lot of reading on iterators, algorithms and the STL before I can truly understand all of the factors at play here. I'll look into alternative algorithms when I get the chance =)
Chortos-2 about 9 years

The algorithm OP quoted performs O(n) moves, not O(nm).
Benjamin R about 7 years

@bradshire The fact that you want to learn and understand is the important part. Now comes the never-ending effort!
cesarvargas almost 3 years

In Big O notation O(3N) is the same as O(N) since we are only interested in the function behavior.