Spark & Scala - Cannot Filter null Values from RDD
10,119
Solution 1
Ratings.filter ( x => x._1 != null )
this actually transforms the RDD but you are not using that particular RDD. U can try
Ratings.filter(_._1 !=null).foreach(println)
Solution 2
RDDs are immutable objects - any transformation on an RDD doesn't change that original RDD, but rather produces a new one. So - you should use the RDD returned from filter
(just like you do with the result of map
) if you want to see the effect of filter
:
val result = Ratings.filter ( x => x._1 != null )
result.foreach(println)
Author by
questionasker
I'm Web, Unity3D & Flutter Developer. I love to share my ideas at my web, please visit my website for any tutorial related to marketing, programming, docker, linux, etc
Updated on June 29, 2022Comments
-
questionasker almost 2 years
i tried to filter null values from RDD but failed. Here's my code :
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable], classOf[org.apache.hadoop.hbase.client.Result]) val raw_hbaserdd = hBaseRDD.map{ kv => kv._2 } val Ratings = raw_hbaseRDD.map { result => val x = Bytes.toString(result.getValue(Bytes.toBytes("data"),Bytes.toBytes("user"))) val y = Bytes.toString(result.getValue(Bytes.toBytes("data"),Bytes.toBytes("item"))) val z = Bytes.toString(result.getValue(Bytes.toBytes("data"),Bytes.toBytes("rating"))) (x,y, z) } Ratings.filter ( x => x._1 != null ) Ratings.foreach(println)
when Debugging, null value still appeared after Filter :
(3359,1494,4) (null,null,null) (28574,1542,5) (null,null,null) (12062,1219,5) (14068,1459,3)
any Better idea ?