read a binary file (python)

10,687

Solution 1

f = open("test/test.pdf", "rb")

You must include the pseudo-mode "b" for binary when reading and writing on Windows. Otherwise the OS silently translates what it considers to be "line endings", causing i/o corruption.

Solution 2

Jonathan is correct that you should be opening the file in binary mode if you are on windows.

However, a PDF file will start with "%PDF-", which would at least be read in regardless of whether you are using binary mode or not.

So it appears to me that your "test/test.pdf" is an empty file

Solution 3

  • As best as I understand the pdf format, a pdf file shouldn't be a binary file. It should be a text file that may contain lots of binary blobs. I could be wrong.
  • On Windows, if you are opening a binary file, you need to include b in the mode of your file, i.e. open(filename, "rb").
    • On Unix-like systems, the b doesn't hurt anything, though it does not mean anything.
  • Always use a context manager with your files. That is to say, instead of writing f = open("test/test.pdf", "rb"), say with open("test/test.pdf", "r") as f:. This will assure your file always gets closed.
  • list(f.read()) is not likely to be useful code very often. f.read() reaurns a str and calling list on it makes a list of the characters (one-byte strings). This is very seldom needed.
  • Binary or text or whatever, read should work. Are you positive that there is anything in test/test.pdf? Python does not seem to think there is.
Share:
10,687
beratch
Author by

beratch

Updated on June 22, 2022

Comments

  • beratch
    beratch almost 2 years

    I cant read a file, and I dont understand why:

    f = open("test/test.pdf", "r")
    data = list(f.read())
    print data
    

    Returns : []

    I would like to open a PDF, and extract every bytes, and put it in a List.

    What's wrong with my code ? :(

    Thanks,