What is 'no viable alternative at input' for spark sql?

17,027

You can use spark data frame functions.

scala> val df = Seq(("1", "1523937600000"), ("2", "1523941200000"),("3","1524024000000")).toDF("id", "unix")
df: org.apache.spark.sql.DataFrame = [id: string, unix: string]

scala> df.filter($"unix" > unix_timestamp()*1000).collect()
res5: Array[org.apache.spark.sql.Row] = Array([3,1524024000000])
scala> df.withColumn("unixinEST"
                        ,from_utc_timestamp(
                            from_unixtime(unix_timestamp()),
                             "EST"))
         .show()
+---+-------------+-------------------+
| id|         unix|          unixinEST|
+---+-------------+-------------------+
|  1|1523937600000|2018-04-18 06:13:19|
|  2|1523941200000|2018-04-18 06:13:19|
|  3|1524024000000|2018-04-18 06:13:19|
+---+-------------+-------------------+
Share:
17,027

Related videos on Youtube

Vish
Author by

Vish

C# ASP.NET SQL Java JavaScript NodeJS Learning - Spark

Updated on June 04, 2022

Comments

  • Vish
    Vish almost 2 years

    I have a DF that has startTimeUnix column (of type Number in Mongo) that contains epoch timestamps. I want to query the DF on this column but I want to pass EST datetime. I went through multiple hoops to test the following on spark-shell:

    val df = Seq(("1", "1523937600000"), ("2", "1523941200000"),("3","1524024000000")).toDF("id", "unix")
    
    df.filter($"unix" > java.time.ZonedDateTime.parse("04/17/2018 01:00:00", java.time.format.DateTimeFormatter.ofPattern ("MM/dd/yyyy HH:mm:ss").withZone ( java.time.ZoneId.of("America/New_York"))).toEpochSecond()*1000).collect()
    

    Output:

    = Array([3,1524024000000])
    

    Since the java.time functions are working, I am passing the same to spark-submit where while retrieving the data from Mongo, the filter query goes like:

    startTimeUnix < (java.time.ZonedDateTime.parse(${LT}, java.time.format.DateTimeFormatter.ofPattern('MM/dd/yyyyHHmmss').withZone(java.time.ZoneId.of('America/New_York'))).toEpochSecond()*1000) AND startTimeUnix > (java.time.ZonedDateTime.parse(${GT}, java.time.format.DateTimeFormatter.ofPattern('MM/dd/yyyyHHmmss').withZone(java.time.ZoneId.of('America/New_York'))).toEpochSecond()*1000)`

    However, I keep getting following error:

    Caused by: org.apache.spark.sql.catalyst.parser.ParseException: no viable alternative at input '(java.time.ZonedDateTime.parse(04/18/2018000000, java.time.format.DateTimeFormatter.ofPattern('MM/dd/yyyyHHmmss').withZone('(line 1, pos 138) == SQL == startTimeUnix < (java.time.ZonedDateTime.parse(04/18/2018000000, java.time.format.DateTimeFormatter.ofPattern('MM/dd/yyyyHHmmss').withZone(java.time.ZoneId.of('America/New_York'))).toEpochSecond()*1000).toString() AND startTimeUnix > (java.time.ZonedDateTime.parse(04/17/2018000000, java.time.format.DateTimeFormatter.ofPattern('MM/dd/yyyyHHmmss').withZone(java.time.ZoneId.of('America/New_York'))).toEpochSecond()*1000).toString() at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:217) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:114) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseExpression(ParseDriver.scala:43) at org.apache.spark.sql.Dataset.filter(Dataset.scala:1315)

    Somewhere it said the error meant mis-matched data type. I tried applying toString to the output of date conversion with no luck.

  • Vish
    Vish about 6 years
    I read that unix-timestamp() converts the date column value into unix. In my case, the DF contains date in unix format and it needs to be compared with the input value (EST datetime) that I'm passing in $LT, $GT.
  • Vijay Krishna
    Vijay Krishna about 6 years
    Your requirement was not clear on the question. But I updated the answer with what I understand. Let me know if that helps. You can use your own Unix timestamp instead of me generating it using the function unix_timestamp()