Removing Blank Strings from a Spark Dataframe

36,523

Solution 1

Removing things from a dataframe requires filter().

newDF = oldDF.filter("colName != ''")

or am I misunderstanding your question?

Solution 2

In case someone dont want to drop the records with blank strings, but just convvert the blank strings to some constant value.

val newdf = df.na.replace(df.columns,Map("" -> "0")) // to convert blank strings to zero
newdf.show()

Solution 3

You can use this:

df.filter(!($"col_name"===""))

It filters out the columns where the value of "col_name" is "" i.e. nothing/blankstring. I'm using the match filter and then inverting it by "!"

Share:
36,523
mongolol
Author by

mongolol

Updated on April 22, 2021

Comments

  • mongolol
    mongolol about 3 years

    Attempting to remove rows in which a Spark dataframe column contains blank strings. Originally did val df2 = df1.na.drop() but it turns out many of these values are being encoded as "".

    I'm stuck using Spark 1.3.1 and also cannot rely on DSL. (Importing spark.implicit_ isn't working.)