Saving a dataframe result value to a string variable?
16,220
Solution 1
Try something like this :
from pyspark.sql.functions import max as max_
# get last partition from all deltas
alldeltas=sqlContext.read.json (alldeltasdir)
last_delta=alldeltas.agg(max_("ingest_date")).collect()[0][0]
last_delta will give you a value, in this sample the maximum value of the column ingest_date in the dataframe.
Solution 2
Assuming you're computing a global aggregate (where the output will have a single row) and are using PySpark, the following should work:
spark.sql("SELECT MAX(date) as maxDate FROM account").first()["maxDate"]
I believe this will return a datetime
object but you can either convert that to a string in your driver code or do a SELECT CAST(MAX(DATE) as string)
instead.
Author by
oharr
Updated on June 12, 2022Comments
-
oharr almost 2 years
I created a dataframe in spark when find the max date I want to save it to the variable. Just trying to figure out how to get the result, which is a string, and save it to a variable.
code so far:
sqlDF = spark.sql("SELECT MAX(date) FROM account") sqlDF.show()
what results look likes:
+--------------------+ | max(date)| +--------------------+ |2018-04-19T14:11:...| +--------------------+
thanks
-
oharr about 6 yearsthat doesn't work in spark. I got sqlDF.first() to show the first row. but I just want the value.
-
KeepLearning about 4 years
Your_max_date = spark.sql("SELECT MAX(date) FROM account").collect()[0][0]