Using Pandas to Find Minimum Values of Grouped Rows

12,840

Solution 1

To get the minimum of column A for each group use transform

df.groupby('group_id')['A'].transform('min')

Solution 2

  • focus on just ['col1', 'col2', 'col3']
  • see if they are equal to 1 with eq(1) equivalent to == 1
  • see if any are equal to one along axis=1 with any(1)
  • use loc to make assignment

anyone = df[['col1', 'col2', 'col3']].eq(1).any(1)
df.loc[anyone, 'A'] = np.nan

numpy equivalent

anyone = (df[['col1', 'col2', 'col3']].values == 1).any(1)
df.A = np.where(anyone, np.nan, df.A)

Solution 3

df.groupby('group_id')['A'].min()
Share:
12,840

Related videos on Youtube

shishy
Author by

shishy

Updated on September 16, 2022

Comments

  • shishy
    shishy over 1 year

    This might be a trivial question but I'm still trying to figure out pandas/numpy.

    So, suppose I have a table with the following structure:

    group_id | col1 | col2 | col3 |  "A"   |  "B"
       x     |   1  |   2  |  3   |  NaN   |   1
       x     |   3  |   2  |  3   |   1    |   1 
       x     |   4  |   2  |  3   |   2    |   1
       y     |   1  |   2  |  3   |  NaN   |   3 
       y     |   3  |   2  |  3   |   3    |   3 
       z     |   3  |   2  |  3   |   10   |   2
       z     |   2  |   2  |  3   |   6    |   2
       z     |   4  |   2  |  3   |   4    |   2
       z     |   4  |   2  |  3   |   2    |   2
    

    Note that there is a group_id that groups elements in each row. So at the beginning, I have the values for columns group_id and col1-col3.

    Then for each row, if col1, col2, or col3 have value = 1, then "A" is NaN, otherwise the value is based on a formula (irrelevant for here so I put some numbers in place).

    That, I know how to do using:

    df["A"] = np.where(((df['col1'] == 1)|(df['col2']== 1) | (df['col3']) == 1))), NaN, value)
    

    But for column "B", I need to fill it in with the minimum of values from column A for a specific group.

    So for example, "B" is equal to "1" for all rows with group X because the minimum value in column A for all of the group "x" rows is equal to 1.

    Similarly, for rows in group "y", the minimum value is 3, and for group "z" the minimum value is 2. How exactly do I do that using pandas...? It's confusing me a little more because the number of rows for a specific group can be of varying size.

    If they were all the same size I could just say fill it with the minimum of values in a pre-set range.

    I hope that made sense; please let me know if I should provide a clearer example or clarify anything!

  • shishy
    shishy over 7 years
    So, does my approach for that part of my question using np.where not work? Or did you suggest using .loc to make it more elegant (as opposed to manually checking)?
  • shishy
    shishy over 7 years
    Hm, why do you use .all(1)? I only care if any one of the columns = 1, not all of them. Wouldn't .eq(1) suffice?
  • shishy
    shishy over 7 years
    I see; I'll update my code to use .any(1); it looks way cleaner than manually checking each column. Regarding my other question with filling in values for column B, is the trick to use groupby?
  • shishy
    shishy over 7 years
    Just so I understand correctly, for each row this will find the group_id, then group all rows with that group_id. And then the .transform('min') tells is that we're trying to find the minimum of values within that group. So would it be df['B'] = df.groupby('group_id').transform('min').min("A") if I want to minimize the values from column "A" for that group? I'm assuming that this will automatically ignore any value in "A" that's a NaN
  • Ted Petrou
    Ted Petrou over 7 years
    There is no need to put .min('A'). Only column 'A' is used in the code above that I wrote. Its in brackets after the groupby statement. Everything else you wrote is correct.
  • shishy
    shishy over 7 years
    Ah I see. That makes sense; I didn't know about transform('min'). Thanks!
  • shishy
    shishy over 7 years
    So, I should be able to say: df.loc[anyone, 'A'] = np.nan to set 'A' to NaN if there is a 1. But for entries where all columns have values besides a 1, saying df.loc[!anyone, 'A'] = value would work, right?
  • Ted Petrou
    Ted Petrou over 7 years
    Just so its clear - transform returns an object that is the same number of rows for each group. So it keeps the original dataframe row dimensions the same.
  • shishy
    shishy over 7 years
    Did you mean columns? It wouldn't make sense if it returned the same # of rows right? Some groups may have like 15 rows while others may have 2. Does it just pick the largest one?
  • Jeremy Caney
    Jeremy Caney about 3 years
    What’s the benefit of the min() syntax compared to the transform('min') syntax used in the accepted answer?