Manipulating binary data in Python

37,734

Solution 1

To print it, you can do something like this:

print repr(data)

For the whole thing as hex:

print data.encode('hex')

For the decimal value of each byte:

print ' '.join([str(ord(a)) for a in data])

To unpack binary integers, etc. from the data as if they originally came from a C-style struct, look at the struct module.

Solution 2

\xhh is the character with hex value hh. Other characters such as . and `~' are normal characters.

Iterating on a string gives you the characters in it, one at a time.

ord(c) will return an integer representing the character. E.g., ord('A') == 65.

This will print the decimal numbers for each character:

s = '\xbe\x00\xc8d\xf8d\x08\xe4.\x07~\x03\x9e\x07\xbe\x03\xde\x07\xfe\n'
print ' '.join(str(ord(c)) for c in s)

Solution 3

Binary data is rarely divided into "lines" separated by '\n'. If it is, it will have an implicit or explicit escape mechanism to distinguish between '\n' as a line terminator and '\n' as part of the data. Reading such a file as lines blindly without knowledge of the escape mechanism is pointless.

To answer your specific concerns:

'\x07' is the ASCII BEL character, which was originally for ringing the bell on a teletype machine.

You can get the integer value of a byte 'b' by doing ord(b).

HOWEVER, to process binary data properly, you need to know what the layout is. You can have signed and unsigned integers (of sizes 1, 2, 4, 8 bytes), floating point numbers, decimal numbers of varying lengths, fixed length strings, variable length strings, etc etc. Added complication comes from whether the data is recorded in bigendian fashion or littleendian fashion. Once you know all of the above (or have very good informed guesses), the Python struct module should be able to be used for all or most of your processing; the ctypes module may also be useful.

Does the data format have a name? If so, tell us; we may be able to point you to code or docs.

You ask "How do I go about using this data safely?" which begs the question: What do you want to use it for? What manipulations do you want to do?

Solution 4

Like theatrus mentioned, ord and hex might help you. If you want to try to interpret some sort of structured binary data in the file, the struct module might be helpful.

Solution 5

You are trying to print the data converted to ASCII characters, which will not work.

You can safely use any byte of the data. If you want to print it as a hexadecimal, look at the functions ord and hex/

Share:
37,734
Dominic Bou-Samra
Author by

Dominic Bou-Samra

Updated on July 09, 2022

Comments

  • Dominic Bou-Samra
    Dominic Bou-Samra almost 2 years

    I am opening up a binary file like so:

    file = open("test/test.x", 'rb')
    

    and reading in lines to a list. Each line looks a little like:

    '\xbe\x00\xc8d\xf8d\x08\xe4.\x07~\x03\x9e\x07\xbe\x03\xde\x07\xfe\n'
    

    I am having a hard time manipulating this data. If I try and print each line, python freezes, and emits beeping noises (I think there's a binary beep code in there somewhere). How do I go about using this data safely? How can I convert each hex number to decimal?

  • Dominic Bou-Samra
    Dominic Bou-Samra almost 14 years
    Thank you! This is what I was looking for!
  • Noufal Ibrahim
    Noufal Ibrahim almost 14 years
    +1 for struct. Right way to go to interpret packed binary data.
  • dan04
    dan04 almost 14 years
    Note that \x07 is that ASCII BEL character. That's what's causing the beeping.