Using Pandas to Find Minimum Values of Grouped Rows
Solution 1
To get the minimum of column A for each group use transform
df.groupby('group_id')['A'].transform('min')
Solution 2
- focus on just
['col1', 'col2', 'col3']
- see if they are equal to
1
witheq(1)
equivalent to== 1
- see if any are equal to one along
axis=1
withany(1)
- use
loc
to make assignment
anyone = df[['col1', 'col2', 'col3']].eq(1).any(1)
df.loc[anyone, 'A'] = np.nan
numpy equivalent
anyone = (df[['col1', 'col2', 'col3']].values == 1).any(1)
df.A = np.where(anyone, np.nan, df.A)
Solution 3
df.groupby('group_id')['A'].min()
Related videos on Youtube
shishy
Updated on September 16, 2022Comments
-
shishy over 1 year
This might be a trivial question but I'm still trying to figure out pandas/numpy.
So, suppose I have a table with the following structure:
group_id | col1 | col2 | col3 | "A" | "B" x | 1 | 2 | 3 | NaN | 1 x | 3 | 2 | 3 | 1 | 1 x | 4 | 2 | 3 | 2 | 1 y | 1 | 2 | 3 | NaN | 3 y | 3 | 2 | 3 | 3 | 3 z | 3 | 2 | 3 | 10 | 2 z | 2 | 2 | 3 | 6 | 2 z | 4 | 2 | 3 | 4 | 2 z | 4 | 2 | 3 | 2 | 2
Note that there is a group_id that groups elements in each row. So at the beginning, I have the values for columns group_id and col1-col3.
Then for each row, if col1, col2, or col3 have value = 1, then "A" is NaN, otherwise the value is based on a formula (irrelevant for here so I put some numbers in place).
That, I know how to do using:
df["A"] = np.where(((df['col1'] == 1)|(df['col2']== 1) | (df['col3']) == 1))), NaN, value)
But for column "B", I need to fill it in with the minimum of values from column A for a specific group.
So for example, "B" is equal to "1" for all rows with group X because the minimum value in column A for all of the group "x" rows is equal to 1.
Similarly, for rows in group "y", the minimum value is 3, and for group "z" the minimum value is 2. How exactly do I do that using pandas...? It's confusing me a little more because the number of rows for a specific group can be of varying size.
If they were all the same size I could just say fill it with the minimum of values in a pre-set range.
I hope that made sense; please let me know if I should provide a clearer example or clarify anything!
-
shishy over 7 yearsSo, does my approach for that part of my question using np.where not work? Or did you suggest using .loc to make it more elegant (as opposed to manually checking)?
-
shishy over 7 yearsHm, why do you use .all(1)? I only care if any one of the columns = 1, not all of them. Wouldn't .eq(1) suffice?
-
shishy over 7 yearsI see; I'll update my code to use .any(1); it looks way cleaner than manually checking each column. Regarding my other question with filling in values for column B, is the trick to use groupby?
-
shishy over 7 yearsJust so I understand correctly, for each row this will find the group_id, then group all rows with that group_id. And then the .transform('min') tells is that we're trying to find the minimum of values within that group. So would it be df['B'] = df.groupby('group_id').transform('min').min("A") if I want to minimize the values from column "A" for that group? I'm assuming that this will automatically ignore any value in "A" that's a NaN
-
Ted Petrou over 7 yearsThere is no need to put
.min('A')
. Only column 'A' is used in the code above that I wrote. Its in brackets after thegroupby
statement. Everything else you wrote is correct. -
shishy over 7 yearsAh I see. That makes sense; I didn't know about transform('min'). Thanks!
-
shishy over 7 yearsSo, I should be able to say: df.loc[anyone, 'A'] = np.nan to set 'A' to NaN if there is a 1. But for entries where all columns have values besides a 1, saying df.loc[!anyone, 'A'] = value would work, right?
-
Ted Petrou over 7 yearsJust so its clear - transform returns an object that is the same number of rows for each group. So it keeps the original dataframe row dimensions the same.
-
shishy over 7 yearsDid you mean columns? It wouldn't make sense if it returned the same # of rows right? Some groups may have like 15 rows while others may have 2. Does it just pick the largest one?
-
Jeremy Caney about 3 yearsWhat’s the benefit of the
min()
syntax compared to thetransform('min')
syntax used in the accepted answer?