Python: Pandas read_excel cannot open .xls file, xlrd not supported
Your file is not a .xls, I insist! :)
With Pandas 1.1.5 and xlrd 2.1.0
Rename Presentaciones.xls
to Presentaciones.xlsx
.
import pandas as pd
# Use openpyxl.
df = pd.read_excel(r'X:...\Presentaciones.xlsx', engine='openpyxl')
print(df)
Enjoy! :)
More info
How do I know that your file is a fake .xls
and a very real .xlsx
?
Because openpyxl
doesn't work with xls
files.
import pandas as pd
df = pd.read_excel(r'X:...\test.xls', engine='openpyxl')
/*
ERROR:
InvalidFileException: openpyxl does not support the old .xls file format,
please use xlrd to read this file, or convert it to the more recent .xlsx file format.
*/
And trying to simply rename test.xls
to test.xlsx
does not work either!
import pandas as pd
df = pd.read_excel(r'X:...\test.xlsx', engine='openpyxl')
/*
Error:
OSError: File contains no valid workbook part
*/
History
Beware, the .xlsx
extension (detected by pandas) means there may be scripts in this file. Sometimes the extension can lie, so be careful!
The reason why panda stopped supporting xlsx
files is that those files are a security hazard and no one was maintaining this part of the code.
Related videos on Youtube
Comments
-
Mapotofu almost 2 years
Problem:
I am opening a .xls with pd.read_excel, but I got an error. ("Pack excel file" downloaded from https://cima.aemps.es/cima/publico/nomenclator.html)
df_cima = pd.read_excel("price_tracker/es/support/Presentaciones.xls") xlrd.biffh.XLRDError: Excel xlsx file; not supported
The suffix of this file is .xls but this error tells me that it is .xlsx
Then I tried to add
engine="openpyxl"
, which is usually used for reading the .xlsx when xlrd version is no longer 1.2.0, then it gives me another erroropenpyxl.utils.exceptions.InvalidFileException: openpyxl does not support the old .xls file format, please use xlrd to read this file, or convert it to the more recent .xlsx file format.
MY env:
- pandas version: 1.1.5
- xlrd version: 2.0.1
- openpyxl version: 3.0.6
I do not want to change my xlrd version back to 1.2.0, from other answer I see that new version of xlrd support only .xls, but I don't understand why it is not working for my file.
Thanks in advance.
-
SeaBean about 3 yearsCan you try saving the file as xlsx first ?
-
Anurag Dabas about 3 yearsthen change the extension of file from
.xls
to.xlsx
by opening inms excel
or any otherexcel
editor/viewer and saving it in same place/path with same name but different extension i.e ..xlsx
-
Mapotofu about 3 years@AnuragDabas I have multiple files per month like this, I would not choose to change it manuelly, could we change .xls to .xlsx directly when we download it with a scrpit ?
-
Mapotofu about 3 years@SeaBean, I have a download script to download this .xls, could I just modify this script and download .xlsx directly? Because changing it manuelly would not be a good choice for me
-
Anurag Dabas about 3 yearsbetween you can also create a virtual environment if you want and install
xlrd 1.2.0
in that environment...and when ever you need too work with.xls
files just activate that environment -
SeaBean about 3 yearsI downloaded the file and tried opening it in a ipynb under VScode. No problem even without specifying engine. Anyway, when I open it with MS Excel, it did prompted a message stating the file is in a different format than specified by the file extension. I have: pandas version: 1.2.1 xlrd version: 2.0.1, openpyxl version: 3.0.6
-
SeaBean about 3 yearsIf you can download the xlsx directly, just try it. I tried saving it manually to xlsx and reopen the new xlsx file by MS Excel. The error message has just gone.
-
Mapotofu about 3 years@SeaBean, I tried at home again (the problem was at my office), this time with pandas version: 1.1.5 xlrd version: 2.0.1, which is exactly the same as my office, and it works! Could it possibly caused by the version of other packages or the version of python itself? (I have python 3.7 at home, but 3.6 at office)
-
SeaBean about 3 yearsMy Python is 3.8. Your situation then must be different combinations of packages causing it. However, it would be very time consuming to sort it out. Just bypass it if you can e.g. by what you mentioned downloading the xlsx file directly or do it like me to convert it manually.
-
Mapotofu about 3 years@SeaBean, I tried pandas-1.2.1 and xlrd-2.0.1 first then pandas-1.1.5 and xlrd-2.0.1, and pycharm console is not updated when i switched env. So I just tried again these two combinations (pandas-1.2.1 and xlrd-2.0.1 vs pandas-1.1.5 and xlrd-2.0.1), this time indeed pandas-1.2.1 and xlrd-2.0.1 works well but not pandas-1.1.5 and xlrd-2.0.1, I think there must be an unresolved issue with the combination of pandas-1.1.5 and xlrd-2.0.1, I will try again tomorrow at my office
-
SeaBean about 3 yearsThat's good experiments done! Trust that you are pretty close to solving it in office!
-
Mapotofu about 3 yearsHi, thanks so much. Then I wonder why pd.read_excel(r'X:...\test.xls') will work when I update pandas from 1.1.5 to pandas-1.2.1 ? if I assume this file is not .xls, when i didn't specify the engine, it's 'xlrd' by default, which means under this version this file can be read with xlrd, but xlrd only support .xls, then this file is therefore a xls which is against my assumption...?
-
Florian Fasmeyer about 3 yearsFrom what I remember from yesterday. read_excel() choose the engine depending on what file has been detected. So generally you shouldn't precise the engine. Maybe newer versions of pandas automatically detect the type of file regardless of the extension in the file name ".xls"... Or my instinct was right for all the wrong reasons and I got it all wrong. Either way, that was fun. xD