Accessing a Pandas index like a regular column

39,841

Solution 1

Index has a special meaning in Pandas. It's used to optimise specific operations and can be used in various methods such as merging / joining data. Therefore, make a choice:

  • If it's "just another column", use reset_index and treat it as another column.
  • If it's genuinely used for indexing, keep it as an index and use df.index.

We can't make this choice for you. It should be dependent on the structure of your underlying data and on how you intend to analyse your data.

For more information on use of a dataframe index, see:

Solution 2

You could also use df.index.get_level_values if you need to access a (index) column by name. It also works with hierarchical indices (MultiIndex).

>>> df.index.get_level_values('name')
Index(['a', 'b', 'c', 'd', 'e'], dtype='object', name='name')

Solution 3

Instead of using reset_index, you could just copy the index to a normal column, do some work and then drop the column, for example:

df['tmp'] = df.index
# do stuff based on df['tmp']
del df['tmp']
Share:
39,841
kuzzooroo
Author by

kuzzooroo

Updated on July 09, 2022

Comments

  • kuzzooroo
    kuzzooroo almost 2 years

    I have a Pandas DataFrame with a named index. I want to pass it off to a piece off code that takes a DataFrame, a column name, and some other stuff, and does a bunch of work involving that column. Only in this case the column I want to highlight is the index, but giving the index's label to this piece of code doesn't work because you can't extract an index like you can a regular column. For example, I can construct a DataFrame like this:

    import pandas as pd, numpy as np
    
    df=pd.DataFrame({'name':map(chr, range(97, 102)), 'id':range(10000,10005), 'value':np.random.randn(5)})
    df.set_index('name', inplace=True)
    

    Here's the result:

             id     value
    name                 
    a     10000  0.659710
    b     10001  1.001821
    c     10002 -0.197576
    d     10003 -0.569181
    e     10004 -0.882097
    

    Now how am I allowed to go about accessing the name column?

    print(df.index)  # No problem
    print(df['name'])  # KeyError: u'name'
    

    I know there are workaround like duplicating the column or changing the index to something else. But is there something cleaner, like some form of column access that treats the index the same way as everything else?

  • kuzzooroo
    kuzzooroo over 5 years
    Say I have a library function that takes a DataFrame and creates a scatter plot based on it. It labels points in the plot based on the column of your choice, currently specified as a string. Now a use case has come up where it would be useful for the labels to be based on the index of a certain DataFrame. The index of this DataFrame is undoubtedly special, as you say. It's just in the context of this one function where it would be convenient to treat the index like a regular column, and I'm wondering if it can be done transparently.
  • jpp
    jpp over 5 years
    @kuzzooroo, I suggest you ask a separate question with a minimal reproducible example of the problem you are facing. The example you gave in your question, for example, doesn't show any problem with using df.index. The methods available to pd.Index objects are different to those available to pd.Series objects so we need to see your code to determine the issue.
  • Peruz
    Peruz about 4 years
    I like this solution too, in the end column and index serve different purposes, just keep both if needed.
  • guibar
    guibar over 3 years
    One wonders what the point of giving the index a name is, if it can't be used as such like other column names ...
  • jpp
    jpp over 3 years
    @guibar, you can with pd.DataFrame.query
  • Keto
    Keto almost 3 years
    This is not really a solution and your answer is out of scope. So what you're implying is that "no it's not possible because Pandas did not build an interface to interact the index like a column". If yes, then let this be the answer. Now a natural followup question is why not? We've seen other software being able to do this like SQL .
  • william_grisaitis
    william_grisaitis about 2 years
    this is not an answer to the OP's question. it's a justification of why the problem exists in the first place... which is as a pandas user of more than a decade now, still confuses me why the API works this way.
  • jpp
    jpp about 2 years
    @grisaitis, You have a couple of options: provide a better answer, or propose a Pandas code update to development team.
  • william_grisaitis
    william_grisaitis about 2 years
    @jpp thanks. i've upvoted answers and github issues that i believe do that.