Pivot Tables or Group By for Pandas?
33,154
Here are couple of ways to reshape your data df
In [27]: df
Out[27]:
Col X Col Y
0 class 1 cat 1
1 class 2 cat 1
2 class 3 cat 2
3 class 2 cat 3
1) Using pd.crosstab()
In [28]: pd.crosstab(df['Col X'], df['Col Y'])
Out[28]:
Col Y cat 1 cat 2 cat 3
Col X
class 1 1 0 0
class 2 1 0 1
class 3 0 1 0
2) Or, use groupby
on 'Col X','Col Y'
with unstack
over Col Y
, then fill NaNs
with zeros.
In [29]: df.groupby(['Col X','Col Y']).size().unstack('Col Y', fill_value=0)
Out[29]:
Col Y cat 1 cat 2 cat 3
Col X
class 1 1 0 0
class 2 1 0 1
class 3 0 1 0
3) Or, use pd.pivot_table()
with index=Col X
, columns=Col Y
In [30]: pd.pivot_table(df, index=['Col X'], columns=['Col Y'], aggfunc=len, fill_value=0)
Out[30]:
Col Y cat 1 cat 2 cat 3
Col X
class 1 1 0 0
class 2 1 0 1
class 3 0 1 0
4) Or, use set_index
with unstack
In [492]: df.assign(v=1).set_index(['Col X', 'Col Y'])['v'].unstack(fill_value=0)
Out[492]:
Col Y cat 1 cat 2 cat 3
Col X
class 1 1 0 0
class 2 1 0 1
class 3 0 1 0
Author by
SteelyDanish
Updated on September 09, 2020Comments
-
SteelyDanish over 3 years
I have a hopefully straightforward question that has been giving me a lot of difficulty for the last 3 hours. It should be easy.
Here's the challenge.
I have a pandas dataframe:
+--------------------------+ | Col 'X' Col 'Y' | +--------------------------+ | class 1 cat 1 | | class 2 cat 1 | | class 3 cat 2 | | class 2 cat 3 | +--------------------------+
What I am looking to transform the dataframe into:
+------------------------------------------+ | cat 1 cat 2 cat 3 | +------------------------------------------+ | class 1 1 0 0 | | class 2 1 0 1 | | class 3 0 1 0 | +------------------------------------------+
Where the values are value counts. Anybody have any insight? Thanks!
-
SteelyDanish almost 9 yearsThanks John - that was incredibly helpful, especially providing different possibilities! I didn't even think of the cross tab possibility.
-
Waylon Walker almost 7 yearsThanks for the comparison of all three. I default to groupby, and often see pivot_table used.
-
Fabian Bosler over 6 yearscame across this because I was trying to figure out the difference between groupby and pivot_table and when to use which. Your answer was certainly helpful. Do you know of somehow easily comprehensible information on the different concepts? cheers
-
Bruno Feroleto almost 6 yearsWarning: the last method (
set_index
andunstack
) does not generally work: it fails when there is a duplicate line in the original data.