Saving a dataframe result value to a string variable?

16,220

Solution 1

Try something like this :

from pyspark.sql.functions import max as max_

# get last partition from all deltas
alldeltas=sqlContext.read.json (alldeltasdir)
last_delta=alldeltas.agg(max_("ingest_date")).collect()[0][0]

last_delta will give you a value, in this sample the maximum value of the column ingest_date in the dataframe.

Solution 2

Assuming you're computing a global aggregate (where the output will have a single row) and are using PySpark, the following should work:

spark.sql("SELECT MAX(date) as maxDate FROM account").first()["maxDate"]

I believe this will return a datetime object but you can either convert that to a string in your driver code or do a SELECT CAST(MAX(DATE) as string) instead.

Share:
16,220
oharr
Author by

oharr

Updated on June 12, 2022

Comments

  • oharr
    oharr almost 2 years

    I created a dataframe in spark when find the max date I want to save it to the variable. Just trying to figure out how to get the result, which is a string, and save it to a variable.

    code so far:

    sqlDF = spark.sql("SELECT MAX(date) FROM account")
    sqlDF.show()
    

    what results look likes:

    +--------------------+
    | max(date)|
    +--------------------+
    |2018-04-19T14:11:...|
    +--------------------+
    

    thanks

  • oharr
    oharr about 6 years
    that doesn't work in spark. I got sqlDF.first() to show the first row. but I just want the value.
  • KeepLearning
    KeepLearning about 4 years
    Your_max_date = spark.sql("SELECT MAX(date) FROM account").collect()[0][0]