How big is the performance gap between std::sort and std::stable_sort in practice?
Solution 1
There are good answers that compared the algorithms theoretically. I benchmarked std::sort
and std::stable_sort
with google/benchmark for curiosity's sake.
It is useful to point out ahead of time that;
- Benchmark machine has
1 X 2500 MHz CPU
and1 GB RAM
- Benchmark OS
Arch Linux 2015.08 x86-64
- Benchmark compiled with
g++ 5.3.0
andclang++ 3.7.0
(-std=c++11
,-O3
and-pthread
) -
BM_Base*
benchmark tries to measure the time populatingstd::vector<>
. That time should be subtracted from the sorting results for better comparison.
First benchmark sorts std::vector<int>
with 512k
size.
[ g++ ]# benchmark_sorts --benchmark_repetitions=10
Run on (1 X 2500 MHz CPU )
2016-01-08 01:37:43
Benchmark Time(ns) CPU(ns) Iterations
----------------------------------------------------------------
...
BM_BaseInt/512k_mean 24730499 24726189 28
BM_BaseInt/512k_stddev 293107 310668 0
...
BM_SortInt/512k_mean 70967679 70799990 10
BM_SortInt/512k_stddev 1300811 1301295 0
...
BM_StableSortInt/512k_mean 73487904 73481467 9
BM_StableSortInt/512k_stddev 979966 925172 0
[ clang++ ]# benchmark_sorts --benchmark_repetitions=10
Run on (1 X 2500 MHz CPU )
2016-01-08 01:39:07
Benchmark Time(ns) CPU(ns) Iterations
----------------------------------------------------------------
...
BM_BaseInt/512k_mean 26198558 26197526 27
BM_BaseInt/512k_stddev 320971 348314 0
...
BM_SortInt/512k_mean 70648019 70666660 10
BM_SortInt/512k_stddev 2030727 2033062 0
...
BM_StableSortInt/512k_mean 82004375 81999989 9
BM_StableSortInt/512k_stddev 197309 181453 0
Second benchmark sorts std::vector<S>
with 512k
size (sizeof(Struct S) = 20
).
[ g++ ]# benchmark_sorts --benchmark_repetitions=10
Run on (1 X 2500 MHz CPU )
2016-01-08 01:49:32
Benchmark Time(ns) CPU(ns) Iterations
----------------------------------------------------------------
...
BM_BaseStruct/512k_mean 26485063 26410254 26
BM_BaseStruct/512k_stddev 270355 128200 0
...
BM_SortStruct/512k_mean 81844178 81833325 8
BM_SortStruct/512k_stddev 240868 204088 0
...
BM_StableSortStruct/512k_mean 106945879 106857114 7
BM_StableSortStruct/512k_stddev 10446119 10341548 0
[ clang++ ]# benchmark_sorts --benchmark_repetitions=10
Run on (1 X 2500 MHz CPU )
2016-01-08 01:53:01
Benchmark Time(ns) CPU(ns) Iterations
----------------------------------------------------------------
...
BM_BaseStruct/512k_mean 27327329 27280000 25
BM_BaseStruct/512k_stddev 488318 333059 0
...
BM_SortStruct/512k_mean 78611207 78407400 9
BM_SortStruct/512k_stddev 690207 372230 0
...
BM_StableSortStruct/512k_mean 109477231 109333325 8
BM_StableSortStruct/512k_stddev 11697084 11506626 0
Anyone who likes to run the benchmark, here is the code,
#include <vector>
#include <random>
#include <algorithm>
#include "benchmark/benchmark_api.h"
#define SIZE 1024 << 9
static void BM_BaseInt(benchmark::State &state) {
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_int_distribution<int> dist;
while (state.KeepRunning()) {
std::vector<int> v;
v.reserve(state.range_x());
for (int i = 0; i < state.range_x(); i++) {
v.push_back(dist(mt));
}
}
}
BENCHMARK(BM_BaseInt)->Arg(SIZE);
static void BM_SortInt(benchmark::State &state) {
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_int_distribution<int> dist;
while (state.KeepRunning()) {
std::vector<int> v;
v.reserve(state.range_x());
for (int i = 0; i < state.range_x(); i++) {
v.push_back(dist(mt));
}
std::sort(v.begin(), v.end());
}
}
BENCHMARK(BM_SortInt)->Arg(SIZE);
static void BM_StableSortInt(benchmark::State &state) {
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_int_distribution<int> dist;
while (state.KeepRunning()) {
std::vector<int> v;
v.reserve(state.range_x());
for (int i = 0; i < state.range_x(); i++) {
v.push_back(dist(mt));
}
std::stable_sort(v.begin(), v.end());
}
}
BENCHMARK(BM_StableSortInt)->Arg(SIZE);
struct S {
int key;
int arr[4];
};
static void BM_BaseStruct(benchmark::State &state) {
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_int_distribution<int> dist;
while (state.KeepRunning()) {
std::vector<S> v;
v.reserve(state.range_x());
for (int i = 0; i < state.range_x(); i++) {
v.push_back({dist(mt)});
}
}
}
BENCHMARK(BM_BaseStruct)->Arg(SIZE);
static void BM_SortStruct(benchmark::State &state) {
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_int_distribution<int> dist;
while (state.KeepRunning()) {
std::vector<S> v;
v.reserve(state.range_x());
for (int i = 0; i < state.range_x(); i++) {
v.push_back({dist(mt)});
}
std::sort(v.begin(), v.end(),
[](const S &a, const S &b) { return a.key < b.key; });
}
}
BENCHMARK(BM_SortStruct)->Arg(SIZE);
static void BM_StableSortStruct(benchmark::State &state) {
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_int_distribution<int> dist;
while (state.KeepRunning()) {
std::vector<S> v;
v.reserve(state.range_x());
for (int i = 0; i < state.range_x(); i++) {
v.push_back({dist(mt)});
}
std::stable_sort(v.begin(), v.end(),
[](const S &a, const S &b) { return a.key < b.key; });
}
}
BENCHMARK(BM_StableSortStruct)->Arg(SIZE);
BENCHMARK_MAIN();
Solution 2
std::stable_sort
performs NlogN comparisons when sufficient memory is available. When insufficient memory is available, it degrades to N((logN)^2) comparisons. Therefore it is roughly of the same efficiency as std::sort
(which performs O(NlogN) comparisons in both average and worst case) when memory is available.
For those interested, sort() uses an introsort (quicksort which switches to heapsort when the recursion reaches a certain depth) and stable_sort() uses a merge sort.
Solution 3
Big enough to warrant a separate function that does stable sort and not have std::sort()
do it transparently.
Solution 4
Sometimes std::stable_sort() is needed because it maintains order of elements that are equal.
And the conventional advice is that, if maintaining order is not important, you should use std::sort() instead.
However, its context dependant. There is plenty of data that is best sorted with stable sort even if you don't need to maintain order:
Quicksort quickly becomes worst-case performance if the data has consistently poor pivot points.
The Burrows-Wheeler Transform is an algorithm used as part of data compression, e.g. bzip2. It requires sorting all the rotations of the text. For the majority of text data, merge sort (as often used by std::stable_sort()) is substantially faster than quicksort (as usually used by std::sort()).
bbb is a BWT implementation that notes the advantages of std::stable_sort() over sort() for this application.
Solution 5
Was looking for something similar - but was surprised no one talked about Auxiliary space.
As I believe - the implementation of both stable_sort and sort is supposed to guarantee O(NlogN) for all (Best, Average & Worst) cases.
However, the difference exists in the Auxiliary space used. stable_sort needs an Auxiliary space of O(N).
May be the difference in performance lies in acquiring that space. :)
Otherwise, theoretically - they are supposed to be same w.r.t performance.
sort should do what you need unless you need this -> stable_sort preserves the relative order of the elements with equivalent values.
Related videos on Youtube
SebastianK
Interested in algorithms, operations research, scheduling, C++
Updated on July 09, 2022Comments
-
SebastianK almost 2 years
Both should run in O(n log n), but in general sort is faster than stable_sort. How big is the performance gap in practice? Do you have some experience about that?
I want to sort a very large number of structs that have a size of about 20 bytes. The stability of the result would be nice in my case, but it is not a must. At the moment the underlying container is a plain array, perhaps it could be changed to a std::deque later on.
-
Justin Meiners almost 3 yearsNote that
std::sort
is typically implemented as quicksort, andstd::stable_sort
is a sophisticated form of merge sort. -
wcochran about 2 years@JustinMeiners quicksort is O(n^2) worst case thus
std::sort
uses a hybrid that incorporates heapsort when quicksort recurses too deep. -
Justin Meiners about 2 years@wcochran good point, I am greatly oversimplifying. The STL merge sort has some different cases like that as well: justinmeiners.github.io/efficient-programming-with-components/…
-
-
MSalters about 15 yearsLogN^2 ususually does not mean log(N^2) but (log N)^2, and especially so if it occurs in big-O notation. This commonly happens with algorithms that take O(N log N) steps with O(log N) work per step and vice versa. So, no, it's not a constant 2.
-
andrewmu about 15 yearsBasically what I am trying to say is try both ways.
-
Peter M almost 11 yearsHAHAH. Okay, so maybe not the most technically detailed answer, but certainly my favorite.
-
Admin over 10 yearsYou say
std::sort (which performs O(NlogN) comparisons in both average and worst case)
. But the link to std::sort says: O(N·log(N)) in all cases only since C++11. Before C++11 there was no requiment O(Nlog(N)) for the worst case. -
Clearer about 9 yearsWhile this is probably too small on modern computers, when you're sorting very large data sets, you're better off using an external sorting algorithm which makes good use of disk to store intermediate results; such as external merge sort.
-
Erik Aronesty over 8 years(and sometimes sorting pointers is a lot faster than sorting structs - and vice versa, depending on the size of the structs, the available RAM, etc)
-
5gon12eder over 8 years+1 for actually going ahead and doing the benchmark. Your work looks sound. The results would be more accessible, however, if you'd summarize them in an easier to digest form like drawing a small diagram.