Pivot Tables or Group By for Pandas?

33,154

Here are couple of ways to reshape your data df

In [27]: df
Out[27]:
     Col X  Col Y
0  class 1  cat 1
1  class 2  cat 1
2  class 3  cat 2
3  class 2  cat 3

1) Using pd.crosstab()

In [28]: pd.crosstab(df['Col X'], df['Col Y'])
Out[28]:
Col Y    cat 1  cat 2  cat 3
Col X
class 1      1      0      0
class 2      1      0      1
class 3      0      1      0

2) Or, use groupby on 'Col X','Col Y' with unstack over Col Y, then fill NaNs with zeros.

In [29]: df.groupby(['Col X','Col Y']).size().unstack('Col Y', fill_value=0)
Out[29]:
Col Y    cat 1  cat 2  cat 3
Col X
class 1      1      0      0
class 2      1      0      1
class 3      0      1      0

3) Or, use pd.pivot_table() with index=Col X, columns=Col Y

In [30]: pd.pivot_table(df, index=['Col X'], columns=['Col Y'], aggfunc=len, fill_value=0)
Out[30]:
Col Y    cat 1  cat 2  cat 3
Col X
class 1      1      0      0
class 2      1      0      1
class 3      0      1      0

4) Or, use set_index with unstack

In [492]: df.assign(v=1).set_index(['Col X', 'Col Y'])['v'].unstack(fill_value=0)
Out[492]:
Col Y    cat 1  cat 2  cat 3
Col X
class 1      1      0      0
class 2      1      0      1
class 3      0      1      0
Share:
33,154
SteelyDanish
Author by

SteelyDanish

Updated on September 09, 2020

Comments

  • SteelyDanish
    SteelyDanish over 3 years

    I have a hopefully straightforward question that has been giving me a lot of difficulty for the last 3 hours. It should be easy.

    Here's the challenge.

    I have a pandas dataframe:

    +--------------------------+
    |     Col 'X'    Col 'Y'  |
    +--------------------------+
    |     class 1      cat 1  |
    |     class 2      cat 1  |
    |     class 3      cat 2  |
    |     class 2      cat 3  |
    +--------------------------+
    

    What I am looking to transform the dataframe into:

    +------------------------------------------+
    |                  cat 1    cat 2    cat 3 |
    +------------------------------------------+
    |     class 1         1        0        0  |
    |     class 2         1        0        1  |
    |     class 3         0        1        0  |
    +------------------------------------------+
    

    Where the values are value counts. Anybody have any insight? Thanks!

  • SteelyDanish
    SteelyDanish almost 9 years
    Thanks John - that was incredibly helpful, especially providing different possibilities! I didn't even think of the cross tab possibility.
  • Waylon Walker
    Waylon Walker almost 7 years
    Thanks for the comparison of all three. I default to groupby, and often see pivot_table used.
  • Fabian Bosler
    Fabian Bosler over 6 years
    came across this because I was trying to figure out the difference between groupby and pivot_table and when to use which. Your answer was certainly helpful. Do you know of somehow easily comprehensible information on the different concepts? cheers
  • Bruno Feroleto
    Bruno Feroleto almost 6 years
    Warning: the last method (set_index and unstack) does not generally work: it fails when there is a duplicate line in the original data.