Python how to read raw binary from a file? (audio/video/text)

46,284

Solution 1

to get the binary representation I think you will need to import binascii, then:

byte = f.read(1)
binary_string = bin(int(binascii.hexlify(byte), 16))[2:].zfill(8)

or, broken down:

import binascii


filePath = "mysong.mp3"
file = open(filePath, "rb")
with file:
    byte = file.read(1)
    hexadecimal = binascii.hexlify(byte)
    decimal = int(hexadecimal, 16)
    binary = bin(decimal)[2:].zfill(8)
    print("hex: %s, decimal: %s, binary: %s" % (hexadecimal, decimal, binary))

will output:

hex: 64, decimal: 100, binary: 01100100

Solution 2

What you are reading IS really the "raw binary" content of your "binary" file. Strange as it might seems, binary data are not "0's and 1's" but binary words (aka bytes, cf http://en.wikipedia.org/wiki/Byte) which have an integer (base 10) value and can be interpreted as ascii chars. Or as integers (which is how one usually do binary operations). Or as hexadecimal. For what it's worth, "text" is actually "raw binary data" too.

To get a "binary" representation you can have a look here : Convert binary to ASCII and vice versa but that's not going to give you more "raw binary data" than what you actually have...

Now the question: why do you want these data as "0's and 1's" exactly ?

Share:
46,284
user2803250
Author by

user2803250

Updated on March 30, 2020

Comments

  • user2803250
    user2803250 about 4 years

    I want to read the raw binary of a file and put it into a string. Currently I am opening a file with the "rb" flag and printing the byte but it's coming up as ASCII characters (for text that is, for video and audio files it's giving symbols and gibberish). I'd like to get the raw 0's and 1's if possible. This needs to work for audio and video files as well so simply converting the ascii to binary isn't an option.

    with open(filePath, "rb") as file:
        byte = file.read(1)
        print byte
    
  • bruno desthuilliers
    bruno desthuilliers over 10 years
    Note to the OP : please understand the difference between "raw data" and "binary representation".
  • wombatonfire
    wombatonfire about 8 years
    binascii is not needed here. when working with 1 byte we can use ord() to get an integer ordinal and then convert it with hex() or bin(). But for multibyte values binascii.hexlify() can be handy as it will convert the whole byte string at once.
  • jfs
    jfs over 7 years
    to be crystal clear: raw_binary_data = open(filename, "rb").read(). It is unrelated to "01"-strings that contain ASCII characters '0', '1' representing the data in binary numeral system (base-2 system is a positional notation with a radix of 2): b'\x0d'[0] == 0x0d == 13 == 0b1101 == int('1101', 2) (b'\x0d'[0] is Python 3 expression, use ord('\x0d') on Python 2) but b'\x0d' != b'1101' (len(b'\x0d') == 1 and len(b'1101') == 4), b'1101' == b'\x31\x31\x30\x31'