What is 'no viable alternative at input' for spark sql?
You can use spark data frame functions.
scala> val df = Seq(("1", "1523937600000"), ("2", "1523941200000"),("3","1524024000000")).toDF("id", "unix")
df: org.apache.spark.sql.DataFrame = [id: string, unix: string]
scala> df.filter($"unix" > unix_timestamp()*1000).collect()
res5: Array[org.apache.spark.sql.Row] = Array([3,1524024000000])
scala> df.withColumn("unixinEST"
,from_utc_timestamp(
from_unixtime(unix_timestamp()),
"EST"))
.show()
+---+-------------+-------------------+
| id| unix| unixinEST|
+---+-------------+-------------------+
| 1|1523937600000|2018-04-18 06:13:19|
| 2|1523941200000|2018-04-18 06:13:19|
| 3|1524024000000|2018-04-18 06:13:19|
+---+-------------+-------------------+
Related videos on Youtube
Comments
-
Vish almost 2 years
I have a DF that has
startTimeUnix
column (of type Number in Mongo) that contains epoch timestamps. I want to query the DF on this column but I want to pass EST datetime. I went through multiple hoops to test the following on spark-shell:val df = Seq(("1", "1523937600000"), ("2", "1523941200000"),("3","1524024000000")).toDF("id", "unix") df.filter($"unix" > java.time.ZonedDateTime.parse("04/17/2018 01:00:00", java.time.format.DateTimeFormatter.ofPattern ("MM/dd/yyyy HH:mm:ss").withZone ( java.time.ZoneId.of("America/New_York"))).toEpochSecond()*1000).collect()
Output:
= Array([3,1524024000000])
Since the java.time functions are working, I am passing the same to
spark-submit
where while retrieving the data from Mongo, the filter query goes like:startTimeUnix < (java.time.ZonedDateTime.parse(${LT}, java.time.format.DateTimeFormatter.ofPattern('MM/dd/yyyyHHmmss').withZone(java.time.ZoneId.of('America/New_York'))).toEpochSecond()*1000) AND startTimeUnix > (java.time.ZonedDateTime.parse(${GT}, java.time.format.DateTimeFormatter.ofPattern('MM/dd/yyyyHHmmss').withZone(java.time.ZoneId.of('America/New_York'))).toEpochSecond()*1000)`
However, I keep getting following error:
Caused by: org.apache.spark.sql.catalyst.parser.ParseException: no viable alternative at input '(java.time.ZonedDateTime.parse(04/18/2018000000, java.time.format.DateTimeFormatter.ofPattern('MM/dd/yyyyHHmmss').withZone('(line 1, pos 138) == SQL == startTimeUnix < (java.time.ZonedDateTime.parse(04/18/2018000000, java.time.format.DateTimeFormatter.ofPattern('MM/dd/yyyyHHmmss').withZone(java.time.ZoneId.of('America/New_York'))).toEpochSecond()*1000).toString() AND startTimeUnix > (java.time.ZonedDateTime.parse(04/17/2018000000, java.time.format.DateTimeFormatter.ofPattern('MM/dd/yyyyHHmmss').withZone(java.time.ZoneId.of('America/New_York'))).toEpochSecond()*1000).toString() at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:217) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:114) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseExpression(ParseDriver.scala:43) at org.apache.spark.sql.Dataset.filter(Dataset.scala:1315)
Somewhere it said the error meant mis-matched data type. I tried applying toString to the output of date conversion with no luck.
-
Vish about 6 yearsI read that unix-timestamp() converts the date column value into unix. In my case, the DF contains date in unix format and it needs to be compared with the input value (EST datetime) that I'm passing in $LT, $GT.
-
Vijay Krishna about 6 yearsYour requirement was not clear on the question. But I updated the answer with what I understand. Let me know if that helps. You can use your own Unix timestamp instead of me generating it using the function unix_timestamp()