Group by two columns and count the occurrences of each combination in Pandas

57,155

Solution 1

Maybe this is what you want?

>>> data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})
>>> count_series = data.groupby(['user_id', 'product_id']).size()
>>> count_series
user_id  product_id
a1       p1            2
         p2            1
a2       p1            3
a3       p2            2
         p3            1
dtype: int64
>>> new_df = count_series.to_frame(name = 'size').reset_index()
>>> new_df
  user_id product_id  size
0      a1         p1     2
1      a1         p2     1
2      a2         p1     3
3      a3         p2     2
4      a3         p3     1
>>> new_df['size']
0    2
1    1
2    3
3    2
4    1
Name: size, dtype: int64

Solution 2

In Pandas 1.1.0 you can use the method value_counts with DataFrames:

df.value_counts()

Output:

product_id  user_id
p1          a2         3
p2          a3         2
p1          a1         2
p3          a3         1
p2          a1         1

If you need a DataFrame:

df.value_counts().to_frame('counts').reset_index()

Output:

  product_id user_id  counts
0         p1      a2       3
1         p2      a3       2
2         p1      a1       2
3         p3      a3       1
4         p2      a1       1
Share:
57,155
chessosapiens
Author by

chessosapiens

Updated on December 28, 2020

Comments

  • chessosapiens
    chessosapiens over 2 years

    I have the following data frame:

    data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})
    product_id  user_id
        p1       a1
        p1       a1
        p2       a1
        p1       a2
        p1       a2
        p1       a2
        p2       a3
        p2       a3
        p3       a3
    

    in real case there might be some other columns as well, but what i need to do is to group by data frame by product_id and user_id columns and count number of each combination and add it as a new column in a new dat frame

    output should be something like this:

    user_id product_id  count
    a1       p1            2
    a1       p2            1
    a2       p1            3
    a3       p2            2
    a3       p3            1
    

    I have tried the following code:

    grouped=data.groupby(['user_id','product_id']).count()
    

    but the result is:

    user_id product_id
     a1       p1
              p2
     a2       p1
     a3       p2
              p3
    

    actually the most important thing for me is to have a column names count that has the number of occurrences , i need to use the column later.

  • chessosapiens
    chessosapiens almost 7 years
    Yes ,very close , but how can i have them in a new dataframe especially those size number as a column name for example 'size'?
  • chessosapiens
    chessosapiens almost 7 years
    actually i have R background and this task can be easily done using dplyr summaries and count()