How to read bytes from file

11,429

Solution 1

In Python 3 files are opened in text mode with the system's encoding by default. You need to open your file in binary mode:

file = open(self.file, 'rb')

Another problem you will run into is that file.read(4) will give you a string of 4 bytes (which the hex function doesn't understand). And you possibly want an integer. For that, refer to int.from_bytes, or, more generally, to the struct module. Then you can print that number in hexadecimal format as so:

mdlength = int.from_bytes(file.read(4), byteorder='big')
print(hex(mdlength))

Solution 2

Binary files should be handled in binary mode:

f = open(filename, 'rb')

For skipping bytes, I typically use file seek (SEEK_CUR or SEEK_SET) or I just do arbitrary file.read(n) if I didn't want to bother with formality. Only time I really use seeking is if I wanted to jump to a specific position.

Interpreting binary data I just stick to the unpack method provided by the struct module, which makes it easy to define whether you want to interpret a sequence of bytes as an int, float, char, etc. That's how I've been doing it for years so maybe there are more efficient approaches like the from_bytes method described in other answers.

With the struct module you can do things like

struct.unpack("3I", f.read(12))

To read in 3 (unsigned) integers at once. So for example given the format you've reversed engineered I would probably just say

unk, size = struct.unpack("2I", f.read(8))
data = f.read(size)

Solution 3

You should open the file in binary mode: open(filename, 'rb').

Share:
11,429

Related videos on Youtube

shadefinale
Author by

shadefinale

Updated on September 15, 2022

Comments

  • shadefinale
    shadefinale over 1 year

    I'm trying to read the length of some metadata from a .lrf file. (Used with the program LoLReplay)

    There's not really documentation on these files, but I have already figured out how to do this in C++. I'm trying to re-write the project in python for multiple reasons, but I come across an error.

    To first explain, the .lrf file has metadata immediately at the start of the file in this format:

    • first 4 bytes are for something I have no clue about.

    • next 4 bytes store the length of the metadata in hexidecimal, up until the end of the metadata, which after is the actual contents of the replay.

    • bytes after the initial 8 bytes are the metadata in json format

    The problem I'm having is actually reading the metadata length. This is the current function I have:

    def getMetaLength(self):
        try:
            file = open(self.file,"r")
        except IOError:
            print ("Failed to open file.")
            file.close()
        #We need to skip the first 4 bytes.
        file.read(4)
        mdlength = file.read(4)
        print(hex(mdlength))
        file.close()
    

    When I call this function, the shell returns a traceback stating:

        Traceback (most recent call last):
        File "C:\Users\Donald\python\lolcogs\lolcogs_main.py", line 6, in <module>
        lolcogs.getMetaLength()
        File "C:\Users\Donald\python\lolcogs\LoLCogs.py", line 20, in getMetaLength
        file.read(4)
        File "C:\Python32\lib\encodings\cp1252.py", line 23, in decode
        return codecs.charmap_decode(input,self.errors,decoding_table)[0]
        UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3648:       character maps to <undefined>
    

    My best guess is that read() is trying to read characters that are encoded in some unicode format, but these are definitely just bytes that I am attempting to read. Is there a way to read these as bytes? Also, is there a better way to skip bytes when you are attempting to read a file?

  • shadefinale
    shadefinale about 10 years
    Amazing! The int.from_bytes() function is exactly what I needed. In c++ I don't know if there is an equivalent function but I had to do this manually in c++ and was about to do it manually in python until I read your comment! Thanks!