Pandas read_excel sometimes creates index even when index_col=None
Solution 1
The issue that you're describing matches a known pandas bug. This bug was fixed in the recent pandas 0.24.0 release:
Bug Fixes
- Bug in read_excel() in which
index_col=None
was not being respected and parsing index columns anyway (GH18792, GH20480)
Solution 2
You can also use
index_col=0
instead of
index_col = None
Bill
My goal is to identify and lead initiatives to transform industrial operations that consume energy and materials, increasing productivity and reducing environmental impacts. I am interested in: management control and reporting data-driven decision-making organizational learning data analytics industrial process control and optimization coaching and training high-level programming languages (e.g. Python).
Updated on June 10, 2022Comments
-
Bill almost 2 years
I'm trying to read an excel file into a data frame and I want set the index later, so I don't want pandas to use column 0 for the index values.
By default (
index_col=None
), it shouldn't use column 0 for the index but I find that if there is no value in cell A1 of the worksheet it will.Is there any way to over-ride this behaviour (I am loading many sheets that have no value in cell A1)?
This works as expected when test1.xlsx has the value "DATE" in cell A1:
In [19]: pd.read_excel('test1.xlsx') Out[19]: DATE A B C 0 2018-01-01 00:00:00 0.766895 1.142639 0.810603 1 2018-01-01 01:00:00 0.605812 0.890286 0.810603 2 2018-01-01 02:00:00 0.623123 1.053022 0.810603 3 2018-01-01 03:00:00 0.740577 1.505082 0.810603 4 2018-01-01 04:00:00 0.335573 -0.024649 0.810603
But when the worksheet has no value in cell A1, it automatically assigns column 0 values to the index:
In [20]: pd.read_excel('test2.xlsx', index_col=None) Out[20]: A B C 2018-01-01 00:00:00 0.766895 1.142639 0.810603 2018-01-01 01:00:00 0.605812 0.890286 0.810603 2018-01-01 02:00:00 0.623123 1.053022 0.810603 2018-01-01 03:00:00 0.740577 1.505082 0.810603 2018-01-01 04:00:00 0.335573 -0.024649 0.810603
This is not what I want.
Desired result: Same as first example (but with 'Unnamed' as the column label perhaps).
Documentation says
index_col : int, list of int, default None.
Column (0-indexed) to use as the row labels of the DataFrame. Pass None if there is no such column.
-
DataGirl over 4 yearsI'm using pandas version 0.25.3 and it is happening to me. Any thoughts?
-
Xukrao over 4 years@DataGirl No. I do not experience this issue in pandas 0.25.3.
-
Bill over 4 yearsThis is not the correct answer. Using
index_col=0
will causepd.read_excel
to use the first column as the index which is what I am trying to avoid (See first sentence of the question). -
jabbiez over 4 yearsSorry for not reading well, I think you can hide index when you would like to save or convert it to other formats like df.to_excel(filename, index=False), I don't find index_col=None working as we expect.
-
YoungSheldon over 4 yearsI'm using pandas version 0.25.1. But issue doesn't seem to be resloved. I am getting the index column, no matter what!
-
Xukrao over 4 years@ChintanGotecha In that case you should report the issue at github.com/pandas-dev/pandas/issues .