Initialize an RDD to empty

31,819

Solution 1

To create an empty RDD in Java, you'll just to do the following:

// Get an RDD that has no partitions or elements.
JavaSparkContext jsc;
...
JavaRDD<T> emptyRDD = jsc.emptyRDD();

I trust you know how to use generics, otherwise, for your case, you'll need:

JavaRDD<Tuple2<String,List<String>>> emptyRDD = jsc.emptyRDD();
JavaPairRDD<String,List<String>> emptyPairRDD = JavaPairRDD.fromJavaRDD(
  existingRDD
);

You can also use the mapToPair method to convert your JavaRDD to a JavaPairRDD.

In Scala :

val sc: SparkContext = ???
... 
val emptyRDD = sc.emptyRDD
// emptyRDD: org.apache.spark.rdd.EmptyRDD[Nothing] = EmptyRDD[1] at ...

Solution 2

val emptyRdd=sc.emptyRDD[String]

Above statement will create empty RDD with String Type

From SparkContext class:

Get an RDD that has no partitions or elements

def emptyRDD[T: ClassTag]: EmptyRDD[T] = new EmptyRDD[T] (this)
Share:
31,819
Chaitra Bannihatti
Author by

Chaitra Bannihatti

Updated on July 05, 2022

Comments

  • Chaitra Bannihatti
    Chaitra Bannihatti almost 2 years

    I have an RDD called

    JavaPairRDD<String, List<String>> existingRDD; 
    

    Now I need to initialize this existingRDD to empty so that when I get the actual rdd's I can do a union with this existingRDD. How do I initialize existingRDD to an empty RDD except initializing it to null? Here is my code:

    JavaPairRDD<String, List<String>> existingRDD;
    if(ai.get()%10==0)
    {
        existingRDD.saveAsNewAPIHadoopFile("s3://manthan-impala-test/kinesis-dump/" + startTime + "/" + k + "/" + System.currentTimeMillis() + "/",
        NullWritable.class, Text.class, TextOutputFormat.class); //on worker failure this will get overwritten                                  
    }
    else
    {
        existingRDD.union(rdd);
    }
    
  • Dirk Groeneveld
    Dirk Groeneveld over 6 years
    I'm pretty sure that creates an RDD[String] with one entry, the empty string.
  • mjbsgll
    mjbsgll over 5 years
    I like this way. I was doing val emptyRdd:RDD[String]=null.