PySpark: How to fillna values in dataframe for specific columns?

apache-spark pyspark spark-dataframe

118,450

Solution 1

df.fillna(0, subset=['a', 'b'])

There is a parameter named subset to choose the columns unless your spark version is lower than 1.3.1

Solution 2

Use a dictionary to fill values of certain columns:

df.fillna( { 'a':0, 'b':0 } )

118,450

Author by

Rakesh Adhikesavan

I'm a science enthusiast, a technophile, a dog lover and an aspiring Data Scientist.

Updated on April 18, 2020

Comments

Rakesh Adhikesavan about 4 years

I have the following sample DataFrame:

a    | b    | c   | 

1    | 2    | 4   |
0    | null | null| 
null | 3    | 4   |

And I want to replace null values only in the first 2 columns - Column "a" and "b":

a    | b    | c   | 

1    | 2    | 4   |
0    | 0    | null| 
0    | 3    | 4   |

Here is the code to create sample dataframe:

rdd = sc.parallelize([(1,2,4), (0,None,None), (None,3,4)])
df2 = sqlContext.createDataFrame(rdd, ["a", "b", "c"])

I know how to replace all null values using:

df2 = df2.fillna(0)

And when I try this, I lose the third column:

df2 = df2.select(df2.columns[0:1]).fillna(0)

Recents

Why Is PNG file with Drop Shadow in Flutter Web App Grainy?

How to troubleshoot crashes detected by Google Play Store for Flutter app

Cupertino DateTime picker interfering with scroll behaviour

Why does awk -F work for most letters, but not for the letter "t"?

Flutter change focus color and icon color but not works

How to print and connect to printer using flutter desktop via usb?

Critical issues have been reported with the following SDK versions: com.google.android.gms:play-services-safetynet:17.0.0

Flutter Dart - get localized country name from country code

navigatorState is null when using pushNamed Navigation onGenerateRoutes of GetMaterialPage

Android Sdk manager not found- Flutter doctor error

Flutter Laravel Push Notification without using any third party like(firebase,onesignal..etc)

How to change the color of ElevatedButton when entering text in TextField

Related

Adding a Arraylist value to a new column in Spark Dataframe using Pyspark

multi-processing with spark(PySpark)

convert dataframe to libsvm format

'RDD' object has no attribute '_jdf' pyspark RDD

Column is not iterable in pySpark

pyspark.sql.utils.IllegalArgumentException: u'Field "features" does not exist.'

Read a csv into an RDD using Spark 2.0

How to write summary of spark sql dataframe to excel file

Getting last value of group in Spark

replace values of one column in a spark df by dictionary key-values (pyspark)