What is the performance of std::bitset?

c++ performance bitset

38,089

Solution 1

Did a short test profiling std::bitset vs bool arrays for sequential and random access - you can too:

#include <iostream>
#include <bitset>
#include <cstdlib> // rand
#include <ctime> // timer

inline unsigned long get_time_in_ms()
{
    return (unsigned long)((double(clock()) / CLOCKS_PER_SEC) * 1000);
}


void one_sec_delay()
{
    unsigned long end_time = get_time_in_ms() + 1000;

    while(get_time_in_ms() < end_time)
    {
    }
}



int main(int argc, char **argv)
{
    srand(get_time_in_ms());

    using namespace std;

    bitset<5000000> bits;
    bool *bools = new bool[5000000];

    unsigned long current_time, difference1, difference2;
    double total;

    one_sec_delay();

    total = 0;
    current_time = get_time_in_ms();

    for (unsigned int num = 0; num != 200000000; ++num)
    {
        bools[rand() % 5000000] = rand() % 2;
    }

    difference1 = get_time_in_ms() - current_time;
    current_time = get_time_in_ms();

    for (unsigned int num2 = 0; num2 != 100; ++num2)
    {
        for (unsigned int num = 0; num != 5000000; ++num)
        {
            total += bools[num];
        }
    }   

    difference2 = get_time_in_ms() - current_time;

    cout << "Bool:" << endl << "sum total = " << total << ", random access time = " << difference1 << ", sequential access time = " << difference2 << endl << endl;


    one_sec_delay();

    total = 0;
    current_time = get_time_in_ms();

    for (unsigned int num = 0; num != 200000000; ++num)
    {
        bits[rand() % 5000000] = rand() % 2;
    }

    difference1 = get_time_in_ms() - current_time;
    current_time = get_time_in_ms();

    for (unsigned int num2 = 0; num2 != 100; ++num2)
    {
        for (unsigned int num = 0; num != 5000000; ++num)
        {
            total += bits[num];
        }
    }   

    difference2 = get_time_in_ms() - current_time;

    cout << "Bitset:" << endl << "sum total = " << total << ", random access time = " << difference1 << ", sequential access time = " << difference2 << endl << endl;

    delete [] bools;

    cin.get();

    return 0;
}

Please note: the outputting of the sum total is necessary so the compiler doesn't optimise out the for loop - which some do if the result of the loop isn't used.

Under GCC x64 with the following flags: -O2;-Wall;-march=native;-fomit-frame-pointer;-std=c++11; I get the following results:

Bool array: random access time = 4695, sequential access time = 390

Bitset: random access time = 5382, sequential access time = 749

Solution 2

In addition to what the other answers said about the performance of access, there may also be a significant space overhead: Typical bitset<> implementations simply use the longest integer type to back their bits. Thus, the following code

#include <bitset>
#include <stdio.h>

struct Bitfield {
    unsigned char a:1, b:1, c:1, d:1, e:1, f:1, g:1, h:1;
};

struct Bitset {
    std::bitset<8> bits;
};

int main() {
    printf("sizeof(Bitfield) = %zd\n", sizeof(Bitfield));
    printf("sizeof(Bitset) = %zd\n", sizeof(Bitset));
    printf("sizeof(std::bitset<1>) = %zd\n", sizeof(std::bitset<1>));
}

produces the following output on my machine:

sizeof(Bitfield) = 1
sizeof(Bitset) = 8
sizeof(std::bitset<1>) = 8

As you see, my compiler allocates a whopping 64 bits to store a single one, with the bitfield approach, I only need to round up to eight bits.

This factor eight in space usage can become important if you have a lot of small bitsets.

Solution 3

Not a great answer here, but rather a related anecdote:

A few years ago I was working on real-time software and we ran into scheduling problems. There was a module which was way over time-budget, and this was very surprising because the module was only responsible for some mapping and packing/unpacking of bits into/from 32-bit words.

It turned out that the module was using std::bitset. We replaced this with manual operations and the execution time decreased from 3 milliseconds to 25 microseconds. That was a significant performance issue and a significant improvement.

The point is, the performance issues caused by this class can be very real.

Solution 4

Rhetorical question: Why std::bitset is written in that inefficacy way? Answer: It is not.

Another rhetorical question: What is difference between:

std::bitset<128> a = src;
a[i] = true;
a = a << 64;

and

std::bitset<129> a = src;
a[i] = true;
a = a << 63;

Answer: 50 times difference in performance http://quick-bench.com/iRokweQ6JqF2Il-T-9JSmR0bdyw

You need be very careful what you ask for, bitset support lot of things but each have it own cost. With correct handling you will have exactly same behavior as raw code:

void f(std::bitset<64>& b, int i)
{
    b |= 1L << i;
    b = b << 15;
}
void f(unsigned long& b, int i)
{
    b |= 1L << i;
    b = b << 15;
}

Both generate same assembly: https://godbolt.org/g/PUUUyd (64 bit GCC)

Another thing is that bitset is more portable but this have cost too:

void h(std::bitset<64>& b, unsigned i)
{
    b = b << i;
}
void h(unsigned long& b, unsigned i)
{
    b = b << i;
}

If i > 64 then bit set will be zero and in case of unsigned we have UB.

void h(std::bitset<64>& b, unsigned i)
{
    if (i < 64) b = b << i;
}
void h(unsigned long& b, unsigned i)
{
    if (i < 64) b = b << i;
}

With check preventing UB both generate same code.

Another place is set and [], first one is safe and mean you will never get UB but this will cost you a branch. [] have UB if you use wrong value but is fast as using var |= 1L<< i;. Of corse if std::bitset do not need have more bits than biggest int available on system because other wise you need split value to get correct element in internal table. This mean for std::bitset<N> size N is very important for performance. If is bigger or smaller than optimal one you will pay cost of it.

Overall I find that best way is use something like that:

constexpr size_t minBitSet = sizeof(std::bitset<1>)*8;

template<size_t N>
using fasterBitSet = std::bitset<minBitSet * ((N  + minBitSet - 1) / minBitSet)>;

This will remove cost of trimming exceeding bits: http://quick-bench.com/Di1tE0vyhFNQERvucAHLaOgucAY

View more solutions

38,089

Author by

quant

I'm here to learn programming and *nix mostly, and am immensely indebted to this community for always being there when I get stuck. The help and support I've received on SE sites has allowed me to develop my understanding in a range of disciplines much faster than I could ever have hoped for. I try to give back once in a while as well; it's the least I could do. I'm the owner and maintainer of www.portnovel.com

Updated on November 05, 2020

Comments

quant over 3 years

I recently asked a question on Programmers regarding reasons to use manual bit manipulation of primitive types over std::bitset.

From that discussion I have concluded that the main reason is its comparatively poorer performance, although I'm not aware of any measured basis for this opinion. So next question is:

what is the performance hit, if any, likely to be incurred by using std::bitset over bit-manipulation of a primitive?

The question is intentionally broad, because after looking online I haven't been able to find anything, so I'll take what I can get. Basically I'm after a resource that provides some profiling of std::bitset vs 'pre-bitset' alternatives to the same problems on some common machine architecture using GCC, Clang and/or VC++. There is a very comprehensive paper which attempts to answer this question for bit vectors:

http://www.cs.up.ac.za/cs/vpieterse/pub/PieterseEtAl_SAICSIT2010.pdf

Unfortunately, it either predates or considered out of scope std::bitset, so it focuses on vectors/dynamic array implementations instead.

I really just want to know whether std::bitset is better than the alternatives for the use cases it is intended to solve. I already know that it is easier and clearer than bit-fiddling on an integer, but is it as fast?
sp2danny over 6 years

a single data point doesn't let you assess the asymptotic cost. is it linear? quadratic? something else?
user1319829 about 6 years

What compiler was that?
Stewart about 6 years

msvc 12 I think from Visual Studio 2008
moongoal over 4 years

minBitSet * ((N + minBitSet - 1) / minBitSet) == N + minBitSet - 1
Yankes over 4 years

@AlQafir / Cause value to be crop, this mean this equation is not true. Left side is always minBitSet * k where both numbers are integers, but right side can have any value you want, like 13 + 32 - 1. And I want 32 * k
moongoal over 4 years

Now I see what you did there. Thanks for explaining!