Reading 32 bit signed ieee 754 floating points from a binary file with python?

python parsing floating-point binaryfiles ieee-754

23,877

Solution 1

struct.unpack('f', file.read(4))

You can also unpack several at once, which will be faster:

struct.unpack('f'*n, file.read(4*n))

Solution 2

Take a peek at struct.unpack. Something like the following might work...

f = struct.unpack('f', data_read)

Solution 3

import struct
(num,) = struct.unpack('f', f.read(4))

23,877

Author by

Razor Storm

(your about me is currently blank)

Updated on June 09, 2020

Comments

Razor Storm almost 4 years

I have a binary file which is simple a list of signed 32 bit ieee754 floating point numbers. They are not separated by anything, and simply appear one after another until EOF.

How would I read from this file and interpret them correctly as floating point numbers?

I tried using read(4), but it automatically converts them to a string with ascii encoding.

I also tried using bytearray but that only takes it in 1 byte at a time instead of 4 bytes at a time as I need.
Andrew White almost 13 years

+1 for the 'f'*n; where is that syntax documented? I must have missed that in my Python primer.
Thomas Wouters almost 13 years

String multiplication is documented in the tutorial and in the library reference section on sequence objects.
Alex S almost 13 years

@Andrew: There's a brief mention of this in the tutorial, under Strings. Search for "repeated".
Scott Griffiths almost 13 years

The more general way of unpacking several would be unpack('{0}f'.format(n), ...), or if you know how many in advance then just unpack('10f', ...) for example. Better to use the in-built repetition method than rely on string manipulation.
Alex S about 12 years

@cdiggins: I tend to favour whatever requires the least amount of typing and is easiest to read. These two factors occasionally clash, so you may have trade one off against the other, but in this case my version is both shorter and clearer. Performance-wise, I expect the two forms to be almost identical, since the bulk of the time is spent in the I/O subsystem. If the length is known at coding time, then I agree that '10f' is better, for exactly the same reasons: it is slightly shorter and easier to read than 'f'*10.
cdiggins about 12 years

@Marcelo, I agree with the principle but consider unpacking 100,000 ints. It doesn't make sense to me to create a format string that is 100k long. Instead '{0}f'.format(1000000) makes more sense.
Alex S about 12 years

@cdiggins: What doesn't make sense? At 100000 elements, my version is 10% (12 µs) slower, and remains noticeably shorter and clearer.