How to find maximum value of a column in python dataframe
Solution 1
if you are using pandas .max()
will work :
>>> df2=pd.DataFrame({'A':[1,5,0], 'B':[3, 5, 6]})
>>> df2['A'].max()
5
Else if it's a spark
dataframe:
Best way to get the max value in a Spark dataframe column
Solution 2
I'm coming from scala, but I do believe that this is also applicable on python.
val max = df.select(max("id")).first()
but you have first import the following :
from pyspark.sql.functions import max
Solution 3
The following can be used in pyspark:
df.select(max("id")).show()
Solution 4
You can use the aggregate max as also mentioned in the pyspark documentation link below:
Link : https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=agg
Code:
row1 = df1.agg({"id": "max"}).collect()[0]
User12345
Updated on July 31, 2022Comments
-
User12345 over 1 year
I have a data frame in
pyspark
. In this data frame I have column calledid
that is unique.Now I want to find the
maximum
value of the columnid
in the data frame.I have tried like below
df['id'].max()
But got below error
TypeError: 'Column' object is not callable
Please let me know how to find the
maximum
value of a column in data frameIn the answer by @Dadep the link gives the correct answer
-
Edward over 3 yearsI had to run with df2.A.max() to make it work ... if its a help to anyone else