How to read a file byte by byte in Python and how to print a bytelist as a binary?

125,660

Solution 1

To answer the second part of your question, to convert to binary you can use a format string and the ord function:

>>> byte = 'a'
>>> '{0:08b}'.format(ord(byte))
'01100001'

Note that the format pads with the right number of leading zeros, which seems to be your requirement. This method needs Python 2.6 or later.

Solution 2

To read one byte:

file.read(1)

8 bits is one byte.

Solution 3

The code you've shown will read 8 bytes. You could use

with open(filename, 'rb') as f:
   while 1:
      byte_s = f.read(1)
      if not byte_s:
         break
      byte = byte_s[0]
      ...

Solution 4

There's a python module especially made for reading and writing to and from binary encoded data called 'struct'. Since versions of Python under 2.6 doesn't support str.format, a custom method needs to be used to create binary formatted strings.

import struct

# binary string
def bstr(n): # n in range 0-255
    return ''.join([str(n >> x & 1) for x in (7,6,5,4,3,2,1,0)])

# read file into an array of binary formatted strings.
def read_binary(path):
    f = open(path,'rb')
    binlist = []
    while True:
        bin = struct.unpack('B',f.read(1))[0] # B stands for unsigned char (8 bits)
        if not bin:
            break
        strBin = bstr(bin)
        binlist.append(strBin)
    return binlist
Share:
125,660
zaplec
Author by

zaplec

Updated on May 27, 2020

Comments

  • zaplec
    zaplec about 4 years

    I'm trying to read a file byte by byte, but I'm not sure how to do that. I'm trying to do it like that:

    file = open(filename, 'rb')
    while 1:
       byte = file.read(8)
       # Do something...
    

    So does that make the variable byte to contain 8 next bits at the beginning of every loop? It doesn't matter what those bytes really are. The only thing that matters is that I need to read a file in 8-bit stacks.

    EDIT:

    Also I collect those bytes in a list and I would like to print them so that they don't print out as ASCII characters, but as raw bytes i.e. when I print that bytelist it gives the result as

    ['10010101', '00011100', .... ]