Pandas read_excel sometimes creates index even when index_col=None

16,340

Solution 1

The issue that you're describing matches a known pandas bug. This bug was fixed in the recent pandas 0.24.0 release:

Bug Fixes

Solution 2

You can also use

index_col=0

instead of

index_col = None
Share:
16,340
Bill
Author by

Bill

My goal is to identify and lead initiatives to transform industrial operations that consume energy and materials, increasing productivity and reducing environmental impacts. I am interested in: management control and reporting data-driven decision-making organizational learning data analytics industrial process control and optimization coaching and training high-level programming languages (e.g. Python).

Updated on June 10, 2022

Comments

  • Bill
    Bill almost 2 years

    I'm trying to read an excel file into a data frame and I want set the index later, so I don't want pandas to use column 0 for the index values.

    By default (index_col=None), it shouldn't use column 0 for the index but I find that if there is no value in cell A1 of the worksheet it will.

    Is there any way to over-ride this behaviour (I am loading many sheets that have no value in cell A1)?

    This works as expected when test1.xlsx has the value "DATE" in cell A1:

    In [19]: pd.read_excel('test1.xlsx')                                             
    Out[19]: 
                     DATE         A         B         C
    0 2018-01-01 00:00:00  0.766895  1.142639  0.810603
    1 2018-01-01 01:00:00  0.605812  0.890286  0.810603
    2 2018-01-01 02:00:00  0.623123  1.053022  0.810603
    3 2018-01-01 03:00:00  0.740577  1.505082  0.810603
    4 2018-01-01 04:00:00  0.335573 -0.024649  0.810603
    

    But when the worksheet has no value in cell A1, it automatically assigns column 0 values to the index:

    In [20]: pd.read_excel('test2.xlsx', index_col=None)                             
    Out[20]: 
                                A         B         C
    2018-01-01 00:00:00  0.766895  1.142639  0.810603
    2018-01-01 01:00:00  0.605812  0.890286  0.810603
    2018-01-01 02:00:00  0.623123  1.053022  0.810603
    2018-01-01 03:00:00  0.740577  1.505082  0.810603
    2018-01-01 04:00:00  0.335573 -0.024649  0.810603
    

    This is not what I want.

    Desired result: Same as first example (but with 'Unnamed' as the column label perhaps).

    Documentation says

    index_col : int, list of int, default None.

    Column (0-indexed) to use as the row labels of the DataFrame. Pass None if there is no such column.

  • DataGirl
    DataGirl over 4 years
    I'm using pandas version 0.25.3 and it is happening to me. Any thoughts?
  • Xukrao
    Xukrao over 4 years
    @DataGirl No. I do not experience this issue in pandas 0.25.3.
  • Bill
    Bill over 4 years
    This is not the correct answer. Using index_col=0 will cause pd.read_excel to use the first column as the index which is what I am trying to avoid (See first sentence of the question).
  • jabbiez
    jabbiez over 4 years
    Sorry for not reading well, I think you can hide index when you would like to save or convert it to other formats like df.to_excel(filename, index=False), I don't find index_col=None working as we expect.
  • YoungSheldon
    YoungSheldon over 4 years
    I'm using pandas version 0.25.1. But issue doesn't seem to be resloved. I am getting the index column, no matter what!
  • Xukrao
    Xukrao over 4 years
    @ChintanGotecha In that case you should report the issue at github.com/pandas-dev/pandas/issues .