Create dataframe with row name, column name, and max column value
Solution 1
Use numpy.where
for indices for matched max
es and create new DataFrame
by indexing - better if performance is important in large DataFrame
:
c, r = np.where(df.eq(df.max()).T)
df = pd.DataFrame({'idx':df.index[r], 'cols':df.columns[c], 'vals': df.values[r, c]})
print(df)
idx cols vals
0 row_3 dogs 86
1 row_9 dogs 86
2 row_4 cats 80
3 row_8 sheep 89
4 row_6 lizards 88
5 row_7 lizards 88
6 row_1 birds 82
Another only pandas solution with DataFrame.unstack
and GroupBy.transform
for compare max
values per groups by first level:
s = df.unstack()
df = s[s.groupby(level=0).transform('max').eq(s)].reset_index(name='vals')
print(df)
animals Rows vals
0 dogs row_3 86
1 dogs row_9 86
2 cats row_4 80
3 sheep row_8 89
4 lizards row_6 88
5 lizards row_7 88
6 birds row_1 82
Solution 2
You may check with using .where
and eq to mask
the no ma
x value as NaN then stack
df.where(df.eq(df.max())).stack().sort_index(level=1).reset_index()
Rows animals 0
0 row_3 dogs 86.0
1 row_9 dogs 86.0
2 row_4 cats 80.0
3 row_8 sheep 89.0
4 row_6 lizards 88.0
5 row_7 lizards 88.0
6 row_1 birds 82.0
Related videos on Youtube
MarkS
Updated on June 04, 2022Comments
-
MarkS almost 2 years
I am going through a video package I bought from PACKT to learn pandas. The author used jijna2 style() to highlight the max value in each column. I quickly discovered that I can't use that technique in PyCharm. So I decided to extract the values.
What I am trying to do is create a three column dataframe by extracting the row index, column name and max column value from a dataframe with N columns, and then create the new dataframe. The new dataframe would show each row (if there are ties show all appropriate rows), the column and the max value in that column.
I created a toy dataframe just to work work through the code.
My code is below, along with the output and, at the very bottom, is what I actually want the new dataframe to look like.
I know I am using a print statement. That code is the only thing I have used so far that correctly picks up multiple rows if I have a tie.
I grab the whole row, which I don't want. I also am not sure how to construct the proposed new dataframe from the extracted data.
import pandas as pd raw_data = { 'dogs': [42, 39, 86, 15, 23, 57, 68, 81, 86], 'cats': [52, 41, 79, 80, 34, 47, 19, 22, 59], 'sheep': [62, 37, 84, 51, 67, 32, 23, 89, 73], 'lizards': [72, 43, 36, 26, 53, 88, 88, 34, 69], 'birds': [82, 35, 77, 63, 18, 12, 45, 56, 58], } df = pd.DataFrame(raw_data, index=pd.Index(['row_1', 'row_2', 'row_3', 'row_4', 'row_5', 'row_6', 'row_7', 'row_8', 'row_9'], name='Rows'), columns=pd.Index(['dogs', 'cats', 'sheep', 'lizards', 'birds'], name='animals')) print(df) print() # Get a list of all columns names cols = df.columns print(cols) print('*****') for col in cols: print((df[df[col] == df[col].max()])) ''' animals dogs cats sheep lizards birds Rows row_3 86 79 84 36 77 row_9 86 59 73 69 58 animals dogs cats sheep lizards birds Rows row_4 15 80 51 26 63 animals dogs cats sheep lizards birds Rows row_8 81 22 89 34 56 animals dogs cats sheep lizards birds Rows row_6 57 47 32 88 12 row_7 68 19 23 88 45 animals dogs cats sheep lizards birds Rows row_1 42 52 62 72 82 ''' row_3 dogs 86 row_9 dogs 86 row_4 cats 80 row_8 sheep 89 row_6 lizards 88 row_7 lizards 88 row_1 birds 82
-
BENY about 5 years@jezrael not sure why happen again, I do vote for u