Efficiently generating random bytes of data in C++11/14

22,367

Solution 1

Distributions take random bits and turn them into numbers. If you actually want random bits then you want to use an engine:

In particular, those requirements specify the algorithmic interface for types and objects that produce sequences of bits in which each possible bit value is uniformly likely.3

A single call to a URNG object is allowed to produce and deliver many (typically 32 or more) bits, returning these bits as a single packaged value of an unsigned integer type.4 N3847

random_device happens to be specified such that accessing uniformly distributed bits is easy:

std::random_device engine;
unsigned x = engine(); // sizeof(unsigned) * CHAR_BIT random bits

Note that other engines may not make it quite as easy to get uniformly random bits as random_device, due to returning fewer bits than their result_type can hold or even by effectively returning fractional bits.

If your concern is that unsigned's size is implementation defined and so random_device returns an implementation defined number of bits, you can write an adapter that either collects enough bits before giving them to you, or one that will give you just enough bits and cache the rest for your next request. (You can also do this to handle other engines which exhibit the previously mentioned issues.)

Solution 2

What you're looking for is the std::independent_bits_engine adaptor:

#include <vector>
#include <random>
#include <climits>
#include <algorithm>
#include <functional>

using random_bytes_engine = std::independent_bits_engine<
    std::default_random_engine, CHAR_BIT, unsigned char>;

int main()
{
    random_bytes_engine rbe;
    std::vector<unsigned char> data(1000);
    std::generate(begin(data), end(data), std::ref(rbe));
}

Note that the accepted answer is not strictly correct in a general case – random engines produce unsigned values belonging to a range [min(), max()], which doesn't necessarily cover all possible values of the result type (for instance, std::minstd_rand0::min() == 1) and thus you may get random bytes that are not uniformly distributed if using an engine directly. However, for std::random_device the range is [std::numeric_limits<result_type>::min(), std::numeric_limits<result_type>::max()], so this particular engine would also work well without the adaptor.

Solution 3

To answer your question: You can't.

The standard does not allow std::uniform_int_distribution to be templated on char, signed char, or unsigned char. Some believe that this is a defect in the standard, but it is the case.

You can simply template std::uniform_int_distribution on unsigned short, and set its min/max range to std::numeric_limits<unsigned char>::min() and std::numeric_limits<unsigned char>::max(), and then simply assign the result to an unsigned char.

From the standard:

Throughout this subclause 26.5, the effect of instantiating a template:

[...]

e) that has a template type parameter named IntType is undefined unless the corresponding template argument is cv-unqualified and is one of short, int, long, long long, unsigned short, unsigned int, unsigned long, or unsigned long long.

§26.5.1.1 [rand.req.genl]

Moreover:

You should use std::mt19937 to actually generate your random bytes. std::random_device is liable to be slow, and likely produces entropy with statistical properties (i.e. suitability for use in cryptography) that you don't need.

That said, you will need to seed your std::mt19937. You can do this with a std::random_device and a std::seed_seq.

Note that if you don't use a std::seed_seq to seed your std::mt19937, your std::mt19937 will be left with many, many zeroes in its internal state, and it will therefore take it quite a while to "warm up".

For more information on "warm up", see here.

Share:
22,367

Related videos on Youtube

Gerdiner
Author by

Gerdiner

Updated on July 19, 2021

Comments

  • Gerdiner
    Gerdiner almost 3 years

    My requirement is to generate random bytes of data (not random numbers) aka uniformly distributed bits.

    As such I was wondering what are the correct/efficient ways of doing this using C++11/14 random facilities. I've had a look around at the examples, but they all seem to focus on number generation (ints, floats etc)

    Current solution I'm using is the following:

    #include <vector>
    #include <random>
    
    int main()
    {
       std::random_device rd;
       std::uniform_int_distribution<int> dist(0,255);
       std::vector<char> data(1000);
       for (char& d : data)
       {
          d = static_cast<char>(dist(rd) & 0xFF);
       }
       return 0;
    }
    
    • soandos
      soandos over 9 years
      Whats the difference between a random int, and four random btyes?
    • Gerdiner
      Gerdiner over 9 years
      Nothing per say, so is the best way to generate 4 or 8 bytes at a time - and copy into the buffer 4/8 bytes per generation?
    • Cornstalks
      Cornstalks over 9 years
      That's what I would do. Then there's no need for a distribution (which adds processing overhead).
    • Captain Obvlious
      Captain Obvlious over 9 years
      ...and use the largest integral type available on the target platform.
    • Gerdiner
      Gerdiner over 9 years
      @Cornstalks CaptainOblivious: I would like to use random_device directly, but the size of the return type seem to be implementation defined.
    • Robert Allan Hennigan Leahy
      Robert Allan Hennigan Leahy over 9 years
      @soandos: Who's to say that int is four bytes? The standard certainly doesn't.
    • soandos
      soandos over 9 years
      @RobertAllanHenniganLeahy, True, but not so material here. The point is random int = multiple random bytes
    • Cornstalks
      Cornstalks over 9 years
      @Gerdiner: The size is implementation defined, but just use sizeof() to know how many bytes it is...
    • Rapptz
      Rapptz over 9 years
      @RobertAllanHenniganLeahy int_least32_t.
    • Shafik Yaghmour
      Shafik Yaghmour almost 9 years
  • Shafik Yaghmour
    Shafik Yaghmour over 9 years
    I personally would have pointed out the undefined behavior and given Bryan time to fix it before downvoting, it was not an horrible answer, but that is just me.
  • Lightness Races in Orbit
    Lightness Races in Orbit over 9 years
    I'd like to see this defect fixed. Was it already reported/discussed somewhere?
  • Geoff Romer
    Geoff Romer about 9 years
    The paper you quote is wrong (or at least misleading) on this point: URNGs are not required to fill their output with random bits, so this answer is not correct. To take an extreme example, if G::min() == 0 and G::max == 2, none of the output bits will have an equal probability of being 1 or 0. Real-life URNGs can and do take advantage of this license, so it is not safe to assume that you can get random bits from a URNG for exactly the reasons that are detailed in the paper you cite.
  • bames53
    bames53 about 9 years
    @GeoffRomer I suppose it's fair to say it's misleading; you can't just concatonate the integers returned from engines and treat the result like a bitstream. Engines with non-power of two ranges can be viewed simply returning fractional bits and in such cases recovering a random bit stream takes a bit more work. It's probably better to simply use an engine that does have a power of two range, as engines which don't, don't provide any particular advantage in return for imposing that cost.
  • bames53
    bames53 about 9 years
    @GeoffRomer It's also true that my answer is taking advantage of the fact that random_device is specified by C++ to fill its entire result value and not return any fractional bits, but I don't feel that there's anything unreasonable about such an implementation.
  • Geoff Romer
    Geoff Romer about 9 years
    Fair enough, but your answer gives the strong impression that this will work for any URNG, not just random_device.
  • Geoff Romer
    Geoff Romer about 9 years
    This defect has been reported as LWG 2326.
  • Geoff Romer
    Geoff Romer about 9 years
    Note: as @robert-allan-hennigan-leahy observed in another answer, random_bytes_engine doesn't technically support unsigned char, but this is probably a defect in the standard.
  • Andreas Yankopolus
    Andreas Yankopolus over 7 years
    What's a good way to seed std::independent_bits_engine<> from /dev/random?
  • BeeOnRope
    BeeOnRope about 7 years
    ... the new link to the closed defect. It was decided that it was a feature request.
  • BeeOnRope
    BeeOnRope about 7 years
    @AndreasYankopolus - that's a good question which could standalone on SO.
  • David
    David over 6 years
    Note that this is very inefficient. Independent bits engine will generate a new random for each byte, but the random it generates probably fills 32 or 64 bits, so this is 4 or 8x the required underlying rng engine calls. You'd basically have to reimplement independent_bits_engine in another form to do better, which is hard.