Initialize an RDD to empty
31,819
Solution 1
To create an empty RDD in Java, you'll just to do the following:
// Get an RDD that has no partitions or elements.
JavaSparkContext jsc;
...
JavaRDD<T> emptyRDD = jsc.emptyRDD();
I trust you know how to use generics, otherwise, for your case, you'll need:
JavaRDD<Tuple2<String,List<String>>> emptyRDD = jsc.emptyRDD();
JavaPairRDD<String,List<String>> emptyPairRDD = JavaPairRDD.fromJavaRDD(
existingRDD
);
You can also use the mapToPair
method to convert your JavaRDD
to a JavaPairRDD
.
In Scala :
val sc: SparkContext = ???
...
val emptyRDD = sc.emptyRDD
// emptyRDD: org.apache.spark.rdd.EmptyRDD[Nothing] = EmptyRDD[1] at ...
Solution 2
val emptyRdd=sc.emptyRDD[String]
Above statement will create empty RDD with String
Type
From SparkContext class:
Get an RDD that has no partitions or elements
def emptyRDD[T: ClassTag]: EmptyRDD[T] = new EmptyRDD[T] (this)
Author by
Chaitra Bannihatti
Updated on July 05, 2022Comments
-
Chaitra Bannihatti almost 2 years
I have an RDD called
JavaPairRDD<String, List<String>> existingRDD;
Now I need to initialize this
existingRDD
to empty so that when I get the actual rdd's I can do a union with thisexistingRDD
. How do I initializeexistingRDD
to an empty RDD except initializing it to null? Here is my code:JavaPairRDD<String, List<String>> existingRDD; if(ai.get()%10==0) { existingRDD.saveAsNewAPIHadoopFile("s3://manthan-impala-test/kinesis-dump/" + startTime + "/" + k + "/" + System.currentTimeMillis() + "/", NullWritable.class, Text.class, TextOutputFormat.class); //on worker failure this will get overwritten } else { existingRDD.union(rdd); }
-
Dirk Groeneveld over 6 yearsI'm pretty sure that creates an
RDD[String]
with one entry, the empty string. -
mjbsgll over 5 yearsI like this way. I was doing
val emptyRdd:RDD[String]=null
.