read a binary file (python)
10,687
Solution 1
f = open("test/test.pdf", "rb")
You must include the pseudo-mode "b" for binary when reading and writing on Windows. Otherwise the OS silently translates what it considers to be "line endings", causing i/o corruption.
Solution 2
Jonathan is correct that you should be opening the file in binary mode if you are on windows.
However, a PDF file will start with "%PDF-", which would at least be read in regardless of whether you are using binary mode or not.
So it appears to me that your "test/test.pdf" is an empty file
Solution 3
- As best as I understand the pdf format, a pdf file shouldn't be a binary file. It should be a text file that may contain lots of binary blobs. I could be wrong.
- On Windows, if you are opening a binary file, you need to include
b
in the mode of your file, i.e.open(filename, "rb")
.- On Unix-like systems, the
b
doesn't hurt anything, though it does not mean anything.
- On Unix-like systems, the
- Always use a context manager with your files. That is to say, instead of writing
f = open("test/test.pdf", "rb")
, saywith open("test/test.pdf", "r") as f:
. This will assure your file always gets closed. list(f.read())
is not likely to be useful code very often.f.read()
reaurns astr
and callinglist
on it makes a list of the characters (one-byte strings). This is very seldom needed.- Binary or text or whatever,
read
should work. Are you positive that there is anything intest/test.pdf
? Python does not seem to think there is.
Author by
beratch
Updated on June 22, 2022Comments
-
beratch almost 2 years
I cant read a file, and I dont understand why:
f = open("test/test.pdf", "r") data = list(f.read()) print data
Returns :
[]
I would like to open a PDF, and extract every bytes, and put it in a List.
What's wrong with my code ? :(
Thanks,