Converting bits to bytes in Python
Solution 1
The simplest tactics to consume bits in 8-er chunks and ignore exceptions:
def getbytes(bits):
done = False
while not done:
byte = 0
for _ in range(0, 8):
try:
bit = next(bits)
except StopIteration:
bit = 0
done = True
byte = (byte << 1) | bit
yield byte
Usage:
lst = [1,0,0,0,0,0,0,0,1]
for b in getbytes(iter(lst)):
print b
getbytes
is a generator and accepts a generator, that is, it works fine with large and potentially infinite streams.
Solution 2
Step 1: Add in buffer zeros
Step 2: Reverse bits since your endianness is reversed
Step 3: Concatenate into a single string
Step 4: Save off 8 bits at a time into an array
Step 5: ???
Step 6: Profit
def bitsToBytes(a):
a = [0] * (8 - len(a) % 8) + a # adding in extra 0 values to make a multiple of 8 bits
s = ''.join(str(x) for x in a)[::-1] # reverses and joins all bits
returnInts = []
for i in range(0,len(s),8):
returnInts.append(int(s[i:i+8],2)) # goes 8 bits at a time to save as ints
return returnInts
Solution 3
Using itertools
' grouper()` recipe:
from functools import reduce
from itertools import zip_longest
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
bytes = [reduce(lambda byte, bit: byte << 1 | bit, eight_bits)
for eight_bits in grouper(bits, 8, fillvalue=0)]
Example
[] -> []
[1] -> [128]
[1, 1] -> [192]
[1, 0, 0, 0, 0, 0, 0, 0, 1] -> [128, 128]
If input is a string then a specialized solution might be faster:
>>> bits = '100000001'
>>> padded_bits = bits + '0' * (8 - len(bits) % 8)
>>> padded_bits
'1000000010000000'
>>> list(int(padded_bits, 2).to_bytes(len(padded_bits) // 8, 'big'))
[128, 128]
The last byte is zero if len(bits) % 8 == 0
.
Admin
Updated on June 04, 2022Comments
-
Admin almost 2 years
I am trying to convert a bit string into a byte string, in Python 3.x. In each byte, bits are filled from high order to low order. The last byte is filled with zeros if necessary. The bit string is initially stored as a "collection" of booleans or integers (0 or 1), and I want to return a "collection" of integers in the range 0-255. By collection, I mean a list or a similar object, but not a character string: for example, the function below returns a generator.
So far, the fastest I am able to get is the following:
def bitsToBytes(a): s = i = 0 for x in a: s += s + x i += 1 if i == 8: yield s s = i = 0 if i > 0: yield s << (8 - i)
I have tried several alternatives: using enumerate, bulding a list instead of a generator, computing s by "(s << 1) | x" instead of the sum, and everything seems to be a bit slower. Since this solution is also one of the shortest and simplest I found, I am rather happy with it.
However, I would like to know if there is a faster solution. Especially, is there a library routine the would do the job much faster, preferably in the standard library?
Example input/output
[] -> [] [1] -> [128] [1,1] -> [192] [1,0,0,0,0,0,0,0,1] -> [128,128]
Here I show the examples with lists. Generators would be fine. However, string would not, and then it would be necessary to convert back and foth between list-like data an string.
-
Admin over 9 yearsThanks for this very interesting "grouper", and for int.to_bytes that I didn't even know! Actually, your first solution is here a bit slower, but your second is more than 30% faster than mine, with a one-liner. And you are right in your comment above, it's fine to return a byte string, they are very easy to use later for file io. Using a list as input, I do this
int("".join("01"[x] for x in data) + "0"*k, 2).to_bytes(n, "big")
where k and n are computed as in your example. -
Admin over 9 yearsAnd even faster with
int("".join(map("01".__getitem__, data)) + "0"*k, 2).to_bytes(n, "big")
-
jfs over 9 years@Jean-ClaudeArbaut: You could post an answer with time comparisons and the
''.join
-based solution. Note:grouper()
-based solution may convert infinite bits stream into infinite bytes stream (just replace[]
with()
to convert a list into a generator) -- its purpose to accept arbitrary "collections" of bits. Different solutions may be faster for different types of input (bits sequence, iterable, a bytestring), and input sizes (small, large).