Pandas: Change a specific column name in dataframe having multilevel columns

36,573

Solution 1

This is my theory

pandas does not want pd.Indexs to be mutable. We can see this if we try to change the first element of the index ourselves

dataDF.columns[0] = ('Z', '100', 'Z')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-32-2c0b76762235> in <module>()
----> 1 dataDF.columns[0] = ('Z', '100', 'Z')
//anaconda/envs/3.5/lib/python3.5/site-packages/pandas/indexes/base.py in __setitem__(self, key, value)
   1372 
   1373     def __setitem__(self, key, value):
-> 1374         raise TypeError("Index does not support mutable operations")
   1375 
   1376     def __getitem__(self, key):
TypeError: Index does not support mutable operations

But pandas can't control what you do the values attribute.

dataDF.columns.values[0] = ('Z', '100', 'Z')

we see that dataDF.columns looks the same, but dataDF.columns.values clearly reflects the change. Unfortunately, df.columns.values isn't what shows up on the display of the dataframe.


On the other hand, this really does seem like it should work. The fact that it doesn't feels wrong to me.

dataDF.rename(columns={('A', '1', 'I'): ('Z', '100', 'Z')}, inplace=True)

I believe the reason this only works after having changed the values, is that rename is forcing the reconstruction of the columns by looking at the values. Since we change the values, it now works. This is exceptionally kludgy and I don't recommend building a process that relies on this.


my recommendation

  • identify location of column name you want to change
  • assign name of column to the array of values
  • build new columns from scratch, explicity

from_col = ('A', '1', 'I')
to_col = ('Z', '100', 'Z')
colloc = dataDF.columns.get_loc(from_col)
cvals = dataDF.columns.values
cvals[colloc] = to_col

dataDF.columns = pd.MultiIndex.from_tuples(cvals.tolist())

dataDF

enter code here

Solution 2

I came across this question as I was myself trying to find the solution for renaming the column names in a data frame with multiple levels. I tried the solution provided by @Dark Matter since it appeared to be very simple solution:

dataDF.columns.levels = [[u'Z', u'B', u'C', u'D', u'E'], [u'100', u'2', u'3', u'4', u'5'], [u'Z', u'II', u'III']]

But an error message was displayed:

C:\anaconda3\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: setting `levels` directly is deprecated. Use set_levels instead
  """Entry point for launching an IPython kernel.

It appears that it worked but does not work anymore. So I used:

dataDF.columns.set_levels([['Z', 'B', 'C', 'D', 'E'],
                           ['100', '2', '3', '4', '5'],
                           ['Z', 'II', 'III']],
                          [0, 1, 2], inplace=True)

Result: dataDF

Z   B   C   D   E
100 2   3   4   5
Z   II  Z   II  III
0   1   1   1   1   1
1   2   2   2   2   2
2   3   3   3   3   3
3   4   4   4   4   4
4   5   5   5   5   5

Solution 3

You can simply change it like DF.columns.levels=[[u'Z', u'B', u'C', u'D', u'E'],[u'5', u'2', u'3', u'4', u'5'],[u'IIIIII', u'II', u'III']]

Share:
36,573
sh.jeon
Author by

sh.jeon

Updated on July 29, 2022

Comments

  • sh.jeon
    sh.jeon almost 2 years

    I want to find the way change name of specific column in a multilevel dataframe.

    With this data:

    data = {
        ('A', '1', 'I'): [1, 2, 3, 4, 5], 
        ('B', '2', 'II'): [1, 2, 3, 4, 5], 
        ('C', '3', 'I'): [1, 2, 3, 4, 5], 
        ('D', '4', 'II'): [1, 2, 3, 4, 5], 
        ('E', '5', 'III'): [1, 2, 3, 4, 5], 
    }
    
    dataDF = pd.DataFrame(data)
    

    This code not working:

    dataDF.rename(columns = {('A', '1', 'I'):('Z', '100', 'Z')}, inplace=True)
    

    Result:

        A   B   C   D   E
        1   2   3   4   5
        I   II  I   II  III
    0   1   1   1   1   1
    1   2   2   2   2   2
    2   3   3   3   3   3
    3   4   4   4   4   4
    4   5   5   5   5   5
    

    And also not:

    dataDF.columns.values[0] = ('Z', '100', 'Z')
    

    Result:

        A   B   C   D   E
        1   2   3   4   5
        I   II  I   II  III
    0   1   1   1   1   1
    1   2   2   2   2   2
    2   3   3   3   3   3
    3   4   4   4   4   4
    4   5   5   5   5   5
    

    But with combination of above codes working!!!

    dataDF.columns.values[0] = ('Z', '100', 'Z')
    dataDF.rename(columns = {('A', '1', 'I'):('Z', '100', 'Z')}, inplace=True)
    dataDF
    

    Result:

        Z   B   C   D   E
        100 2   3   4   5
        Z   II  I   II  III
    0   1   1   1   1   1
    1   2   2   2   2   2
    2   3   3   3   3   3
    3   4   4   4   4   4
    4   5   5   5   5   5
    

    Is this bug of Pandas?

  • Dark Matter
    Dark Matter over 7 years
    still I am not sure if its bug as your saying.
  • Jon
    Jon almost 3 years
    dataDF.columns.values[0] = ('Z', '100', 'Z') works for me. Printing dataDF.columns afterwards correctly reflects the change. What am I missing here? Did pandas fix this in a newer version?