Filtering and selecting from pivot tables made with python pandas

42,497

Solution 1

Pivot table returns a DataFrame so you can simply filter by doing:

In [15]: pivoted = pivot_table(df, values='D', rows=['A', 'B'], cols=['C'])

In [16]: pivoted[pivoted.foo < 0]
Out[16]: 
C             bar       foo
A     B                    
one   A -0.412628 -1.062175
three B       NaN -0.562207
two   A       NaN -0.007245

You can use something like

pivoted.ix['one']

to select all A series groups

or

pivoted.ix['one', 'A']

to select distinct A and B series groups

Solution 2

Just to add info at the previous answer. When you try to use pivoted.ix['one'] in python3 you get the following message:

/usr/lib/python3.7/site-packages/ipykernel_launcher.py:7: DeprecationWarning: .ix is deprecated. Please use .loc for label based indexing or .iloc for positional indexing

See the documentation here: http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated import sys

So, in this version of python, use the following code instead:

pivoted.loc['one']

Share:
42,497
alexhli
Author by

alexhli

Updated on July 09, 2022

Comments

  • alexhli
    alexhli almost 2 years

    I'm struggling with hierarchical indexes in the Python pandas package. Specifically I don't understand how to filter and compare data in rows after it has been pivoted.

    Here is the example table from the documentation:

    import pandas as pd
    import numpy as np
    
    In [1027]: df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 6,
                                  'B' : ['A', 'B', 'C'] * 8,
                                  'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4,
                                  'D' : np.random.randn(24),
                                  'E' : np.random.randn(24)})
    
    In [1029]: pd.pivot_table(df, values='D', rows=['A', 'B'], cols=['C'])
    Out[1029]: 
        C             bar       foo
        A     B                    
        one   A -1.154627 -0.243234
              B -1.320253 -0.633158
              C  1.188862  0.377300
        three A -1.327977       NaN
              B       NaN -0.079051
              C -0.832506       NaN
        two   A       NaN -0.128534
              B  0.835120       NaN
              C       NaN  0.838040
    

    I would like to analyze as follows:

    1) Filter this table on column attributes, for example selecting rows with negative foo:

        C             bar       foo
        A     B                    
        one   A -1.154627 -0.243234
              B -1.320253 -0.633158
        three B       NaN -0.079051
        two   A       NaN -0.128534
    

    2) Compare the remaining B series values between the distinct A series groups? I am not sure how to access this information: {'one':['A','B'], 'two':['A'], 'three':['B']} and determine which series B values are unique to each key, or seen in multiple key groups, etc

    Is there a way to do this directly within the pivot table structure, or do I need to convert this back in to a pandas dataframe?

    Update: I think this code is a step in the right direction. It at least lets me access individual values within this table, but I am still hard-coding the series vales:

    table = pivot_table(df, values='D', rows=['A', 'B'], cols=['C'])
    table.ix['one', 'A']