na.fill in Spark DataFrame Scala

scala apache-spark dataframe

16,316

Take a look at dtypes: Array[(String, String)]. You can use the output of this method to generate a Map for fill, e.g.:

val typeMap = df.dtypes.map(column => 
    column._2 match {
        case "IntegerType" => (column._1 -> 0)
        case "StringType" => (column._1 -> "")
        case "DoubleType" => (column._1 -> 0.0)
    }).toMap

16,316

Author by

Vijeth Hegde

Updated on June 04, 2022

Comments

Vijeth Hegde almost 2 years
I am using Spark/Scala and I want to fill the nulls in my DataFrame with default values based on the type of the columns.

i.e String Columns -> "string", Numeric Columns -> 111, Boolean Columns -> False etc.

Currently the DF.na.functions API provides na.fill
fill(valueMap: Map[String, Any]) like
```
df.na.fill(Map(
    "A" -> "unknown",
    "B" -> 1.0
))
```
This requires knowing the column names and also the type of the columns.

OR
```
fill(value: String, cols: Seq[String])
```
This is only String/Double types, not even Boolean.

Is there a smart way to do this?

Recents

Why Is PNG file with Drop Shadow in Flutter Web App Grainy?

How to troubleshoot crashes detected by Google Play Store for Flutter app

Cupertino DateTime picker interfering with scroll behaviour

Why does awk -F work for most letters, but not for the letter "t"?

Flutter change focus color and icon color but not works

How to print and connect to printer using flutter desktop via usb?

Critical issues have been reported with the following SDK versions: com.google.android.gms:play-services-safetynet:17.0.0

Flutter Dart - get localized country name from country code

navigatorState is null when using pushNamed Navigation onGenerateRoutes of GetMaterialPage

Android Sdk manager not found- Flutter doctor error

Flutter Laravel Push Notification without using any third party like(firebase,onesignal..etc)

How to change the color of ElevatedButton when entering text in TextField

Related

How to transform Spark Dataframe columns to a single column of a string array

dataframe Spark scala explode json array

Filter by array value in Spark DataFrame

Convert a row to a list in spark scala

how to read excel data into a dataframe in spark/scala

SparkSQL Dataframe Error: value show is not a member of org.apache.spark.sql.DataFrameReader

Check every column in a spark dataframe has a certain value

Changing the date format of the column values in aSspark dataframe

Computing rank of a row

Convert RDD[String] to RDD[Row] to Dataframe Spark Scala