Converting bits to bytes in Python

17,486

Solution 1

The simplest tactics to consume bits in 8-er chunks and ignore exceptions:

def getbytes(bits):
    done = False
    while not done:
        byte = 0
        for _ in range(0, 8):
            try:
                bit = next(bits)
            except StopIteration:
                bit = 0
                done = True
            byte = (byte << 1) | bit
        yield byte

Usage:

lst = [1,0,0,0,0,0,0,0,1]
for b in getbytes(iter(lst)):
    print b

getbytes is a generator and accepts a generator, that is, it works fine with large and potentially infinite streams.

Solution 2

Step 1: Add in buffer zeros

Step 2: Reverse bits since your endianness is reversed

Step 3: Concatenate into a single string

Step 4: Save off 8 bits at a time into an array

Step 5: ???

Step 6: Profit

def bitsToBytes(a):
    a = [0] * (8 - len(a) % 8) + a # adding in extra 0 values to make a multiple of 8 bits
    s = ''.join(str(x) for x in a)[::-1] # reverses and joins all bits
    returnInts = []
    for i in range(0,len(s),8):
        returnInts.append(int(s[i:i+8],2)) # goes 8 bits at a time to save as ints
    return returnInts

Solution 3

Using itertools' grouper()` recipe:

from functools import reduce
from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

bytes = [reduce(lambda byte, bit: byte << 1 | bit, eight_bits)
         for eight_bits in grouper(bits, 8, fillvalue=0)]

Example

[] -> []
[1] -> [128]
[1, 1] -> [192]
[1, 0, 0, 0, 0, 0, 0, 0, 1] -> [128, 128]

If input is a string then a specialized solution might be faster:

>>> bits = '100000001'
>>> padded_bits = bits + '0' * (8 - len(bits) % 8)
>>> padded_bits
'1000000010000000'
>>> list(int(padded_bits, 2).to_bytes(len(padded_bits) // 8, 'big'))
[128, 128]

The last byte is zero if len(bits) % 8 == 0.

Share:
17,486
Admin
Author by

Admin

Updated on June 04, 2022

Comments

  • Admin
    Admin almost 2 years

    I am trying to convert a bit string into a byte string, in Python 3.x. In each byte, bits are filled from high order to low order. The last byte is filled with zeros if necessary. The bit string is initially stored as a "collection" of booleans or integers (0 or 1), and I want to return a "collection" of integers in the range 0-255. By collection, I mean a list or a similar object, but not a character string: for example, the function below returns a generator.

    So far, the fastest I am able to get is the following:

    def bitsToBytes(a):
        s = i = 0
        for x in a:
            s += s + x
            i += 1
            if i == 8:
                yield s
                s = i = 0
        if i > 0:
            yield s << (8 - i)
    

    I have tried several alternatives: using enumerate, bulding a list instead of a generator, computing s by "(s << 1) | x" instead of the sum, and everything seems to be a bit slower. Since this solution is also one of the shortest and simplest I found, I am rather happy with it.

    However, I would like to know if there is a faster solution. Especially, is there a library routine the would do the job much faster, preferably in the standard library?


    Example input/output

    [] -> []
    [1] -> [128]
    [1,1] -> [192]
    [1,0,0,0,0,0,0,0,1] -> [128,128]
    

    Here I show the examples with lists. Generators would be fine. However, string would not, and then it would be necessary to convert back and foth between list-like data an string.

  • Admin
    Admin over 9 years
    Thanks for this very interesting "grouper", and for int.to_bytes that I didn't even know! Actually, your first solution is here a bit slower, but your second is more than 30% faster than mine, with a one-liner. And you are right in your comment above, it's fine to return a byte string, they are very easy to use later for file io. Using a list as input, I do this int("".join("01"[x] for x in data) + "0"*k, 2).to_bytes(n, "big") where k and n are computed as in your example.
  • Admin
    Admin over 9 years
    And even faster with int("".join(map("01".__getitem__, data)) + "0"*k, 2).to_bytes(n, "big")
  • jfs
    jfs over 9 years
    @Jean-ClaudeArbaut: You could post an answer with time comparisons and the ''.join-based solution. Note: grouper()-based solution may convert infinite bits stream into infinite bytes stream (just replace [] with () to convert a list into a generator) -- its purpose to accept arbitrary "collections" of bits. Different solutions may be faster for different types of input (bits sequence, iterable, a bytestring), and input sizes (small, large).