What is the performance of std::bitset?
Solution 1
Did a short test profiling std::bitset vs bool arrays for sequential and random access - you can too:
#include <iostream>
#include <bitset>
#include <cstdlib> // rand
#include <ctime> // timer
inline unsigned long get_time_in_ms()
{
return (unsigned long)((double(clock()) / CLOCKS_PER_SEC) * 1000);
}
void one_sec_delay()
{
unsigned long end_time = get_time_in_ms() + 1000;
while(get_time_in_ms() < end_time)
{
}
}
int main(int argc, char **argv)
{
srand(get_time_in_ms());
using namespace std;
bitset<5000000> bits;
bool *bools = new bool[5000000];
unsigned long current_time, difference1, difference2;
double total;
one_sec_delay();
total = 0;
current_time = get_time_in_ms();
for (unsigned int num = 0; num != 200000000; ++num)
{
bools[rand() % 5000000] = rand() % 2;
}
difference1 = get_time_in_ms() - current_time;
current_time = get_time_in_ms();
for (unsigned int num2 = 0; num2 != 100; ++num2)
{
for (unsigned int num = 0; num != 5000000; ++num)
{
total += bools[num];
}
}
difference2 = get_time_in_ms() - current_time;
cout << "Bool:" << endl << "sum total = " << total << ", random access time = " << difference1 << ", sequential access time = " << difference2 << endl << endl;
one_sec_delay();
total = 0;
current_time = get_time_in_ms();
for (unsigned int num = 0; num != 200000000; ++num)
{
bits[rand() % 5000000] = rand() % 2;
}
difference1 = get_time_in_ms() - current_time;
current_time = get_time_in_ms();
for (unsigned int num2 = 0; num2 != 100; ++num2)
{
for (unsigned int num = 0; num != 5000000; ++num)
{
total += bits[num];
}
}
difference2 = get_time_in_ms() - current_time;
cout << "Bitset:" << endl << "sum total = " << total << ", random access time = " << difference1 << ", sequential access time = " << difference2 << endl << endl;
delete [] bools;
cin.get();
return 0;
}
Please note: the outputting of the sum total is necessary so the compiler doesn't optimise out the for loop - which some do if the result of the loop isn't used.
Under GCC x64 with the following flags: -O2;-Wall;-march=native;-fomit-frame-pointer;-std=c++11; I get the following results:
Bool array: random access time = 4695, sequential access time = 390
Bitset: random access time = 5382, sequential access time = 749
Solution 2
In addition to what the other answers said about the performance of access, there may also be a significant space overhead: Typical bitset<>
implementations simply use the longest integer type to back their bits. Thus, the following code
#include <bitset>
#include <stdio.h>
struct Bitfield {
unsigned char a:1, b:1, c:1, d:1, e:1, f:1, g:1, h:1;
};
struct Bitset {
std::bitset<8> bits;
};
int main() {
printf("sizeof(Bitfield) = %zd\n", sizeof(Bitfield));
printf("sizeof(Bitset) = %zd\n", sizeof(Bitset));
printf("sizeof(std::bitset<1>) = %zd\n", sizeof(std::bitset<1>));
}
produces the following output on my machine:
sizeof(Bitfield) = 1
sizeof(Bitset) = 8
sizeof(std::bitset<1>) = 8
As you see, my compiler allocates a whopping 64 bits to store a single one, with the bitfield approach, I only need to round up to eight bits.
This factor eight in space usage can become important if you have a lot of small bitsets.
Solution 3
Not a great answer here, but rather a related anecdote:
A few years ago I was working on real-time software and we ran into scheduling problems. There was a module which was way over time-budget, and this was very surprising because the module was only responsible for some mapping and packing/unpacking of bits into/from 32-bit words.
It turned out that the module was using std::bitset. We replaced this with manual operations and the execution time decreased from 3 milliseconds to 25 microseconds. That was a significant performance issue and a significant improvement.
The point is, the performance issues caused by this class can be very real.
Solution 4
Rhetorical question: Why std::bitset
is written in that inefficacy way?
Answer: It is not.
Another rhetorical question: What is difference between:
std::bitset<128> a = src;
a[i] = true;
a = a << 64;
and
std::bitset<129> a = src;
a[i] = true;
a = a << 63;
Answer: 50 times difference in performance http://quick-bench.com/iRokweQ6JqF2Il-T-9JSmR0bdyw
You need be very careful what you ask for, bitset
support lot of things but each have it own cost. With correct handling you will have exactly same behavior as raw code:
void f(std::bitset<64>& b, int i)
{
b |= 1L << i;
b = b << 15;
}
void f(unsigned long& b, int i)
{
b |= 1L << i;
b = b << 15;
}
Both generate same assembly: https://godbolt.org/g/PUUUyd (64 bit GCC)
Another thing is that bitset
is more portable but this have cost too:
void h(std::bitset<64>& b, unsigned i)
{
b = b << i;
}
void h(unsigned long& b, unsigned i)
{
b = b << i;
}
If i > 64
then bit set will be zero and in case of unsigned we have UB.
void h(std::bitset<64>& b, unsigned i)
{
if (i < 64) b = b << i;
}
void h(unsigned long& b, unsigned i)
{
if (i < 64) b = b << i;
}
With check preventing UB both generate same code.
Another place is set
and []
, first one is safe and mean you will never get UB but this will cost you a branch. []
have UB if you use wrong value but is fast as using var |= 1L<< i;
. Of corse if std::bitset
do not need have more bits than biggest int available on system because other wise you need split value to get correct element in internal table. This mean for std::bitset<N>
size N
is very important for performance. If is bigger or smaller than optimal one you will pay cost of it.
Overall I find that best way is use something like that:
constexpr size_t minBitSet = sizeof(std::bitset<1>)*8;
template<size_t N>
using fasterBitSet = std::bitset<minBitSet * ((N + minBitSet - 1) / minBitSet)>;
This will remove cost of trimming exceeding bits: http://quick-bench.com/Di1tE0vyhFNQERvucAHLaOgucAY
![quant](https://i.stack.imgur.com/lAhj3.png?s=256&g=1)
quant
I'm here to learn programming and *nix mostly, and am immensely indebted to this community for always being there when I get stuck. The help and support I've received on SE sites has allowed me to develop my understanding in a range of disciplines much faster than I could ever have hoped for. I try to give back once in a while as well; it's the least I could do. I'm the owner and maintainer of www.portnovel.com
Updated on November 05, 2020Comments
-
quant over 3 years
I recently asked a question on Programmers regarding reasons to use manual bit manipulation of primitive types over
std::bitset
.From that discussion I have concluded that the main reason is its comparatively poorer performance, although I'm not aware of any measured basis for this opinion. So next question is:
what is the performance hit, if any, likely to be incurred by using
std::bitset
over bit-manipulation of a primitive?The question is intentionally broad, because after looking online I haven't been able to find anything, so I'll take what I can get. Basically I'm after a resource that provides some profiling of
std::bitset
vs 'pre-bitset' alternatives to the same problems on some common machine architecture using GCC, Clang and/or VC++. There is a very comprehensive paper which attempts to answer this question for bit vectors:http://www.cs.up.ac.za/cs/vpieterse/pub/PieterseEtAl_SAICSIT2010.pdf
Unfortunately, it either predates or considered out of scope
std::bitset
, so it focuses on vectors/dynamic array implementations instead.I really just want to know whether
std::bitset
is better than the alternatives for the use cases it is intended to solve. I already know that it is easier and clearer than bit-fiddling on an integer, but is it as fast? -
sp2danny over 6 yearsa single data point doesn't let you assess the asymptotic cost. is it linear? quadratic? something else?
-
user1319829 about 6 yearsWhat compiler was that?
-
Stewart about 6 yearsmsvc 12 I think from Visual Studio 2008
-
moongoal over 4 years
minBitSet * ((N + minBitSet - 1) / minBitSet) == N + minBitSet - 1
-
Yankes over 4 years@AlQafir
/
Cause value to be crop, this mean this equation is not true. Left side is alwaysminBitSet * k
where both numbers are integers, but right side can have any value you want, like13 + 32 - 1
. And I want32 * k
-
moongoal over 4 yearsNow I see what you did there. Thanks for explaining!