SparkSQL and explode on DataFrame in Java
Solution 1
It seems it is possible to use a combination of org.apache.spark.sql.functions.explode(Column col)
and DataFrame.withColumn(String colName, Column col)
to replace the column with the exploded version of it.
Solution 2
I solved it in this manner: say that you have an array column containing job descriptions named "positions", for each person with "fullName".
Then you get from initial schema :
root
|-- fullName: string (nullable = true)
|-- positions: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- companyName: string (nullable = true)
| | |-- title: string (nullable = true)
...
to schema:
root
|-- personName: string (nullable = true)
|-- companyName: string (nullable = true)
|-- positionTitle: string (nullable = true)
by doing:
DataFrame personPositions = persons.select(persons.col("fullName").as("personName"),
org.apache.spark.sql.functions.explode(persons.col("positions")).as("pos"));
DataFrame test = personPositions.select(personPositions.col("personName"),
personPositions.col("pos").getField("companyName").as("companyName"), personPositions.col("pos").getField("title").as("positionTitle"));
JiriS
Updated on July 22, 2022Comments
-
JiriS almost 2 years
Is there an easy way how use
explode
on array column on SparkSQLDataFrame
? It's relatively simple in Scala, but this function seems to be unavailable (as mentioned in javadoc) in Java.An option is to use
SQLContext.sql(...)
andexplode
function inside the query, but I'm looking for a bit better and especially cleaner way.DataFrame
s are loaded from parquet files. -
SPS almost 3 yearsin java 8 and Spark 2.4.7 the explode(..) method returns a Column and not a DataFrame. Can you please specify your versions where explode is giving a dataframe to you?