Removing Blank Strings from a Spark Dataframe
36,523
Solution 1
Removing things from a dataframe requires filter()
.
newDF = oldDF.filter("colName != ''")
or am I misunderstanding your question?
Solution 2
In case someone dont want to drop the records with blank strings, but just convvert the blank strings to some constant value.
val newdf = df.na.replace(df.columns,Map("" -> "0")) // to convert blank strings to zero
newdf.show()
Solution 3
You can use this:
df.filter(!($"col_name"===""))
It filters out the columns where the value of "col_name" is "" i.e. nothing/blankstring. I'm using the match filter and then inverting it by "!"
Author by
mongolol
Updated on April 22, 2021Comments
-
mongolol about 3 years
Attempting to remove rows in which a Spark dataframe column contains blank strings. Originally did
val df2 = df1.na.drop()
but it turns out many of these values are being encoded as""
.I'm stuck using Spark 1.3.1 and also cannot rely on DSL. (Importing spark.implicit_ isn't working.)