Filtering and selecting from pivot tables made with python pandas
Solution 1
Pivot table returns a DataFrame so you can simply filter by doing:
In [15]: pivoted = pivot_table(df, values='D', rows=['A', 'B'], cols=['C'])
In [16]: pivoted[pivoted.foo < 0]
Out[16]:
C bar foo
A B
one A -0.412628 -1.062175
three B NaN -0.562207
two A NaN -0.007245
You can use something like
pivoted.ix['one']
to select all A series groups
or
pivoted.ix['one', 'A']
to select distinct A and B series groups
Solution 2
Just to add info at the previous answer. When you try to use pivoted.ix['one']
in python3 you get the following message:
/usr/lib/python3.7/site-packages/ipykernel_launcher.py:7: DeprecationWarning: .ix is deprecated. Please use .loc for label based indexing or .iloc for positional indexing
See the documentation here: http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated import sys
So, in this version of python, use the following code instead:
pivoted.loc['one']
alexhli
Updated on July 09, 2022Comments
-
alexhli almost 2 years
I'm struggling with hierarchical indexes in the Python
pandas
package. Specifically I don't understand how to filter and compare data in rows after it has been pivoted.Here is the example table from the documentation:
import pandas as pd import numpy as np In [1027]: df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 6, 'B' : ['A', 'B', 'C'] * 8, 'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4, 'D' : np.random.randn(24), 'E' : np.random.randn(24)}) In [1029]: pd.pivot_table(df, values='D', rows=['A', 'B'], cols=['C']) Out[1029]: C bar foo A B one A -1.154627 -0.243234 B -1.320253 -0.633158 C 1.188862 0.377300 three A -1.327977 NaN B NaN -0.079051 C -0.832506 NaN two A NaN -0.128534 B 0.835120 NaN C NaN 0.838040
I would like to analyze as follows:
1) Filter this table on column attributes, for example selecting rows with negative
foo
:C bar foo A B one A -1.154627 -0.243234 B -1.320253 -0.633158 three B NaN -0.079051 two A NaN -0.128534
2) Compare the remaining
B
series values between the distinctA
series groups? I am not sure how to access this information:{'one':['A','B'], 'two':['A'], 'three':['B']}
and determine which seriesB
values are unique to each key, or seen in multiple key groups, etcIs there a way to do this directly within the pivot table structure, or do I need to convert this back in to a
pandas
dataframe
?Update: I think this code is a step in the right direction. It at least lets me access individual values within this table, but I am still hard-coding the series vales:
table = pivot_table(df, values='D', rows=['A', 'B'], cols=['C']) table.ix['one', 'A']