na.fill in Spark DataFrame Scala
16,316
Take a look at dtypes: Array[(String, String)]
. You can use the output of this method to generate a Map
for fill
, e.g.:
val typeMap = df.dtypes.map(column =>
column._2 match {
case "IntegerType" => (column._1 -> 0)
case "StringType" => (column._1 -> "")
case "DoubleType" => (column._1 -> 0.0)
}).toMap
Author by
Vijeth Hegde
Updated on June 04, 2022Comments
-
Vijeth Hegde almost 2 years
I am using Spark/Scala and I want to fill the nulls in my DataFrame with default values based on the type of the columns.
i.e String Columns -> "string", Numeric Columns -> 111, Boolean Columns -> False etc.
Currently the DF.na.functions API provides na.fill
fill(valueMap: Map[String, Any])
likedf.na.fill(Map( "A" -> "unknown", "B" -> 1.0 ))
This requires knowing the column names and also the type of the columns.
OR
fill(value: String, cols: Seq[String])
This is only String/Double types, not even Boolean.
Is there a smart way to do this?