Pandas KeyError using pivot

13,155

You want .pivot_table, not .pivot.

import pandas
from io import StringIO

x = StringIO("""\
Date    Stage   SubStage    Value
12/31/2015   1.00   a   0.896882891
1/1/2016     1.00   a   0.0458843
1/2/2016     1.00   a   0.126805588
1/3/2016     1.00   b   0.615824461
1/4/2016     1.00   b   0.245092069
1/5/2016     1.00   c   0.121936318
1/6/2016     1.00   c   0.170198128
1/7/2016     1.00   c   0.735872415
1/8/2016     1.00   c   0.542361912
1/4/2016     2.00   a   0.723769247
1/5/2016     2.00   a   0.305570257
1/6/2016     2.00   b   0.47461605
1/7/2016     2.00   b   0.173702623
1/8/2016     2.00   c   0.969260251
1/9/2016     2.00   c   0.017170798
""")

df = pandas.read_table(x, sep='\s+')
xtab = df.pivot_table(index='Date', columns=['Stage','SubStage'], values='Value')
print(xtab.to_string(na_rep='--'))

And that gives me:

Stage            1.0                           2.0                    
SubStage           a         b         c         a         b         c
Date                                                                  
1/1/2016    0.045884        --        --        --        --        --
1/2/2016    0.126806        --        --        --        --        --
1/3/2016          --  0.615824        --        --        --        --
1/4/2016          --  0.245092        --  0.723769        --        --
1/5/2016          --        --  0.121936  0.305570        --        --
1/6/2016          --        --  0.170198        --  0.474616        --
1/7/2016          --        --  0.735872        --  0.173703        --
1/8/2016          --        --  0.542362        --        --  0.969260
1/9/2016          --        --        --        --        --  0.017171
12/31/2015  0.896883        --        --        --        --        --
Share:
13,155
trob
Author by

trob

Updated on June 14, 2022

Comments

  • trob
    trob almost 2 years

    I'm new to Python and I would like to use Python to replicate a common excel task. If such a question has already been answered, please let me know. I've been unable to find it. I have the following pandas dataframe (data):

    Date    Stage   SubStage    Value
    12/31/2015   1.00   a   0.896882891
    1/1/2016     1.00   a   0.0458843
    1/2/2016     1.00   a   0.126805588
    1/3/2016     1.00   b   0.615824461
    1/4/2016     1.00   b   0.245092069
    1/5/2016     1.00   c   0.121936318
    1/6/2016     1.00   c   0.170198128
    1/7/2016     1.00   c   0.735872415
    1/8/2016     1.00   c   0.542361912
    1/4/2016     2.00   a   0.723769247
    1/5/2016     2.00   a   0.305570257
    1/6/2016     2.00   b   0.47461605
    1/7/2016     2.00   b   0.173702623
    1/8/2016     2.00   c   0.969260251
    1/9/2016     2.00   c   0.017170798
    

    In excel, I can use a pivot table to produce the following:

    excel pivot table using 'data'

    It seems reasonable to do the following in python:

    data.pivot(index='Date',
               columns=['Stage', 'SubStage'],
               values='Value')
    

    But that produces:

    KeyError: 'Level Stage not found'
    

    What gives?

    • Paul H
      Paul H almost 8 years
      do your column labels have trailing or leading whitespace? check by inspecting with data.columns
    • trob
      trob almost 8 years
      They do not: data.columns Index(['Date', 'Stage', 'SubStage', 'Value'], dtype='object')
  • trob
    trob almost 8 years
    This is helpful. Generally speaking, is pivot_table as opposed to pivot the agreed upon best practice? Why in the world are there two functions for the same concept?
  • Paul H
    Paul H almost 8 years
    They serve different purposes, but I couldn't tell you what they are. I always stack/unstack with a multi-level index to do something like this. @trob
  • trob
    trob almost 8 years
    Excellent. Thank you!