na.fill in Spark DataFrame Scala

16,316

Take a look at dtypes: Array[(String, String)]. You can use the output of this method to generate a Map for fill, e.g.:

val typeMap = df.dtypes.map(column => 
    column._2 match {
        case "IntegerType" => (column._1 -> 0)
        case "StringType" => (column._1 -> "")
        case "DoubleType" => (column._1 -> 0.0)
    }).toMap
Share:
16,316
Vijeth Hegde
Author by

Vijeth Hegde

Updated on June 04, 2022

Comments

  • Vijeth Hegde
    Vijeth Hegde almost 2 years

    I am using Spark/Scala and I want to fill the nulls in my DataFrame with default values based on the type of the columns.

    i.e String Columns -> "string", Numeric Columns -> 111, Boolean Columns -> False etc.

    Currently the DF.na.functions API provides na.fill
    fill(valueMap: Map[String, Any]) like

    df.na.fill(Map(
        "A" -> "unknown",
        "B" -> 1.0
    ))
    

    This requires knowing the column names and also the type of the columns.

    OR

    fill(value: String, cols: Seq[String])
    

    This is only String/Double types, not even Boolean.

    Is there a smart way to do this?