How to read bytes from file
Solution 1
In Python 3 files are opened in text mode with the system's encoding by default. You need to open your file in binary mode:
file = open(self.file, 'rb')
Another problem you will run into is that file.read(4)
will give you a string of 4 bytes (which the hex
function doesn't understand). And you possibly want an integer. For that, refer to int.from_bytes
, or, more generally, to the struct module. Then you can print that number in hexadecimal format as so:
mdlength = int.from_bytes(file.read(4), byteorder='big')
print(hex(mdlength))
Solution 2
Binary files should be handled in binary mode:
f = open(filename, 'rb')
For skipping bytes, I typically use file seek
(SEEK_CUR or SEEK_SET) or I just do arbitrary file.read(n)
if I didn't want to bother with formality. Only time I really use seeking is if I wanted to jump to a specific position.
Interpreting binary data I just stick to the unpack method provided by the struct
module, which makes it easy to define whether you want to interpret a sequence of bytes as an int, float, char, etc. That's how I've been doing it for years so maybe there are more efficient approaches like the from_bytes
method described in other answers.
With the struct module you can do things like
struct.unpack("3I", f.read(12))
To read in 3 (unsigned) integers at once. So for example given the format you've reversed engineered I would probably just say
unk, size = struct.unpack("2I", f.read(8))
data = f.read(size)
Solution 3
You should open the file in binary mode: open(filename, 'rb')
.
Related videos on Youtube
shadefinale
Updated on September 15, 2022Comments
-
shadefinale over 1 year
I'm trying to read the length of some metadata from a .lrf file. (Used with the program LoLReplay)
There's not really documentation on these files, but I have already figured out how to do this in C++. I'm trying to re-write the project in python for multiple reasons, but I come across an error.
To first explain, the .lrf file has metadata immediately at the start of the file in this format:
first 4 bytes are for something I have no clue about.
next 4 bytes store the length of the metadata in hexidecimal, up until the end of the metadata, which after is the actual contents of the replay.
bytes after the initial 8 bytes are the metadata in json format
The problem I'm having is actually reading the metadata length. This is the current function I have:
def getMetaLength(self): try: file = open(self.file,"r") except IOError: print ("Failed to open file.") file.close() #We need to skip the first 4 bytes. file.read(4) mdlength = file.read(4) print(hex(mdlength)) file.close()
When I call this function, the shell returns a traceback stating:
Traceback (most recent call last): File "C:\Users\Donald\python\lolcogs\lolcogs_main.py", line 6, in <module> lolcogs.getMetaLength() File "C:\Users\Donald\python\lolcogs\LoLCogs.py", line 20, in getMetaLength file.read(4) File "C:\Python32\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3648: character maps to <undefined>
My best guess is that read() is trying to read characters that are encoded in some unicode format, but these are definitely just bytes that I am attempting to read. Is there a way to read these as bytes? Also, is there a better way to skip bytes when you are attempting to read a file?
-
shadefinale about 10 yearsAmazing! The int.from_bytes() function is exactly what I needed. In c++ I don't know if there is an equivalent function but I had to do this manually in c++ and was about to do it manually in python until I read your comment! Thanks!