Pandas Group By and Count
17,267
Solution 1
I think you can use:
df = pd.DataFrame({'user_id':['a','a','s','s','s'],
'session':[4,5,4,5,5],
'revenue':[-1,0,1,2,1]})
print (df)
revenue session user_id
0 -1 4 a
1 0 5 a
2 1 4 s
3 2 5 s
4 1 5 s
a = df.groupby('user_id') \
.agg({'session': len, 'revenue': lambda x: len(x[x>0])}) \
.rename(columns={'session':'number sessions','revenue':'number_transactions'})
print (a)
number sessions number_transactions
user_id
a 2 0
s 3 3
a = df.groupby('user_id') \
.agg({'session':{'number sessions': len},
'revenue':{'number_transactions': lambda x: len(x[x>0])}})
a.columns = a.columns.droplevel()
print (a)
number sessions number_transactions
user_id
a 2 0
s 3 3
Solution 2
I'd use nunique
for session
to not double count the same session for a particular user
funcs = dict(session={'number sesssions': 'nunique'},
revenue={'number transactions': lambda x: x.gt(0).sum()})
df.groupby('user_id').agg(funcs)
setup
df = pd.DataFrame({'user_id':['a','a','s','s','s'],
'session':[4,5,4,5,5],
'revenue':[-1,0,1,2,1]})
Author by
Riley Hun
ML Engineer | Data Engineer | Software Engineer | Data Scientist
Updated on August 21, 2022Comments
-
Riley Hun over 1 year
A pandas dataframe df has 3 columns:
user_id, session, revenue
What I want to do now is group df by unique user_id and derive 2 new columns - one called number_sessions (counts the number of sessions associated with a particular user_id) and another called number_transactions (counts the number of rows under the revenue column that has a value > 0 for each user_id). How do I go about doing this?
I tried doing something like this:
df.groupby('user_id')['session', 'revenue'].agg({'number sessions': lambda x: len(x.session), 'number_transactions': lambda x: len(x[x.revenue>0])})
-
Riley Hun over 7 yearsThanks so much! This is exactly the solution that I needed. Cheers!