Pandas read_excel sometimes creates index even when index_col=None

python excel pandas dataframe indexing

16,340

Solution 1

The issue that you're describing matches a known pandas bug. This bug was fixed in the recent pandas 0.24.0 release:

Bug Fixes

Bug in read_excel() in which index_col=None was not being respected and parsing index columns anyway (GH18792, GH20480)

Solution 2

You can also use

index_col=0

instead of

index_col = None

16,340

Author by

Bill

My goal is to identify and lead initiatives to transform industrial operations that consume energy and materials, increasing productivity and reducing environmental impacts. I am interested in: management control and reporting data-driven decision-making organizational learning data analytics industrial process control and optimization coaching and training high-level programming languages (e.g. Python).

Updated on June 10, 2022

Comments

Bill almost 2 years
I'm trying to read an excel file into a data frame and I want set the index later, so I don't want pandas to use column 0 for the index values.

By default (index_col=None), it shouldn't use column 0 for the index but I find that if there is no value in cell A1 of the worksheet it will.

Is there any way to over-ride this behaviour (I am loading many sheets that have no value in cell A1)?

This works as expected when test1.xlsx has the value "DATE" in cell A1:
```
In [19]: pd.read_excel('test1.xlsx')                                             
Out[19]: 
                 DATE         A         B         C
0 2018-01-01 00:00:00  0.766895  1.142639  0.810603
1 2018-01-01 01:00:00  0.605812  0.890286  0.810603
2 2018-01-01 02:00:00  0.623123  1.053022  0.810603
3 2018-01-01 03:00:00  0.740577  1.505082  0.810603
4 2018-01-01 04:00:00  0.335573 -0.024649  0.810603
```
But when the worksheet has no value in cell A1, it automatically assigns column 0 values to the index:
```
In [20]: pd.read_excel('test2.xlsx', index_col=None)                             
Out[20]: 
                            A         B         C
2018-01-01 00:00:00  0.766895  1.142639  0.810603
2018-01-01 01:00:00  0.605812  0.890286  0.810603
2018-01-01 02:00:00  0.623123  1.053022  0.810603
2018-01-01 03:00:00  0.740577  1.505082  0.810603
2018-01-01 04:00:00  0.335573 -0.024649  0.810603
```
This is not what I want.

Desired result: Same as first example (but with 'Unnamed' as the column label perhaps).

Documentation says

index_col : int, list of int, default None.

Column (0-indexed) to use as the row labels of the DataFrame. Pass None if there is no such column.
DataGirl over 4 years

I'm using pandas version 0.25.3 and it is happening to me. Any thoughts?
Xukrao over 4 years

@DataGirl No. I do not experience this issue in pandas 0.25.3.
Bill over 4 years

This is not the correct answer. Using index_col=0 will cause pd.read_excel to use the first column as the index which is what I am trying to avoid (See first sentence of the question).
jabbiez over 4 years

Sorry for not reading well, I think you can hide index when you would like to save or convert it to other formats like df.to_excel(filename, index=False), I don't find index_col=None working as we expect.
YoungSheldon over 4 years

I'm using pandas version 0.25.1. But issue doesn't seem to be resloved. I am getting the index column, no matter what!
Xukrao over 4 years

@ChintanGotecha In that case you should report the issue at github.com/pandas-dev/pandas/issues .