How to read bytes from file

python file python-3.x

11,429

Solution 1

In Python 3 files are opened in text mode with the system's encoding by default. You need to open your file in binary mode:

file = open(self.file, 'rb')

Another problem you will run into is that file.read(4) will give you a string of 4 bytes (which the hex function doesn't understand). And you possibly want an integer. For that, refer to int.from_bytes, or, more generally, to the struct module. Then you can print that number in hexadecimal format as so:

mdlength = int.from_bytes(file.read(4), byteorder='big')
print(hex(mdlength))

Solution 2

Binary files should be handled in binary mode:

f = open(filename, 'rb')

For skipping bytes, I typically use file seek (SEEK_CUR or SEEK_SET) or I just do arbitrary file.read(n) if I didn't want to bother with formality. Only time I really use seeking is if I wanted to jump to a specific position.

Interpreting binary data I just stick to the unpack method provided by the struct module, which makes it easy to define whether you want to interpret a sequence of bytes as an int, float, char, etc. That's how I've been doing it for years so maybe there are more efficient approaches like the from_bytes method described in other answers.

With the struct module you can do things like

struct.unpack("3I", f.read(12))

To read in 3 (unsigned) integers at once. So for example given the format you've reversed engineered I would probably just say

unk, size = struct.unpack("2I", f.read(8))
data = f.read(size)

Solution 3

You should open the file in binary mode: open(filename, 'rb').

11,429

shadefinale

Updated on September 15, 2022

Comments

shadefinale over 1 year
I'm trying to read the length of some metadata from a .lrf file. (Used with the program LoLReplay)

There's not really documentation on these files, but I have already figured out how to do this in C++. I'm trying to re-write the project in python for multiple reasons, but I come across an error.

To first explain, the .lrf file has metadata immediately at the start of the file in this format:
- first 4 bytes are for something I have no clue about.
- next 4 bytes store the length of the metadata in hexidecimal, up until the end of the metadata, which after is the actual contents of the replay.
- bytes after the initial 8 bytes are the metadata in json format
The problem I'm having is actually reading the metadata length. This is the current function I have:
```
def getMetaLength(self):
    try:
        file = open(self.file,"r")
    except IOError:
        print ("Failed to open file.")
        file.close()
    #We need to skip the first 4 bytes.
    file.read(4)
    mdlength = file.read(4)
    print(hex(mdlength))
    file.close()
```
When I call this function, the shell returns a traceback stating:
```
    Traceback (most recent call last):
    File "C:\Users\Donald\python\lolcogs\lolcogs_main.py", line 6, in <module>
    lolcogs.getMetaLength()
    File "C:\Users\Donald\python\lolcogs\LoLCogs.py", line 20, in getMetaLength
    file.read(4)
    File "C:\Python32\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
    UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3648:       character maps to <undefined>
```
My best guess is that read() is trying to read characters that are encoded in some unicode format, but these are definitely just bytes that I am attempting to read. Is there a way to read these as bytes? Also, is there a better way to skip bytes when you are attempting to read a file?
shadefinale about 10 years

Amazing! The int.from_bytes() function is exactly what I needed. In c++ I don't know if there is an equivalent function but I had to do this manually in c++ and was about to do it manually in python until I read your comment! Thanks!