How do I convert column of unix epoch to Date in Apache spark DataFrame using Java?
18,993
Solution 1
After checking spark dataframe api and sql function, I come out below snippet:
DateFrame df = sqlContext.read().json("MY_JSON_DATA_FILE"); DataFrame df_DateConverted = df.withColumn("creationDt", from_unixtime(stackoverflow_Tags.col("creationDate").divide(1000)));
The reason why "creationDate" column is divided by "1000" is cause the TimeUnit is different. The orgin "creationDate" is unix epoch in "milli-second", however spark sql "from_unixtime" is designed to handle unix epoch in "second".
Solution 2
pyspark converts from Unix epoch milliseconds to dataframe timestamp
df.select(from_unixtime((df.my_date_column.cast('bigint')/1000)).cast('timestamp').alias('my_date_column'))
Solution 3
In spark scala,
spark.sql("select from_unixtime(1593543333062/1000) as ts").show(false)
Author by
ErhWen Kuo
Updated on June 18, 2022Comments
-
ErhWen Kuo almost 2 years
I have a json data file which contain one property [creationDate] which is unix epoc in "long" number type. The Apache Spark DataFrame schema look like below:
root |-- creationDate: long (nullable = true) |-- id: long (nullable = true) |-- postTypeId: long (nullable = true) |-- tags: array (nullable = true) | |-- element: string (containsNull = true) |-- title: string (nullable = true) |-- viewCount: long (nullable = true)
I would like to do some groupBy "creationData_Year" which need to get from "creationDate".
What's the easiest way to do this kind of convert in DataFrame using Java?