Only one SparkContext may be running in this JVM - [SPARK]

java apache-spark twitter stream jvm

17,088

A Spark-shell already prepares a spark-session or spark-context for you to use - so you don't have to / can't initialize a new one. Usually you will have a line telling you under what variable it is available to you a the end of the spark-shell launch process. allowMultipleContexts exists only for testing some functionalities of Spark, and shouldn't be used in most cases.

17,088

Author by

trick15f

Updated on June 16, 2022

Comments

trick15f almost 2 years

I'm trying to run the following code to get twitter information live:

import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._
import org.apache.spark.streaming.StreamingContext._
import twitter4j.auth.Authorization
import twitter4j.Status
import twitter4j.auth.AuthorizationFactory
import twitter4j.conf.ConfigurationBuilder
import org.apache.spark.streaming.api.java.JavaStreamingContext

import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext
import org.apache.spark.mllib.feature.HashingTF
import org.apache.spark.mllib.linalg.Vector
import org.apache.spark.SparkConf
import org.apache.spark.api.java.JavaSparkContext
import org.apache.spark.api.java.function.Function
import org.apache.spark.streaming.Duration
import org.apache.spark.streaming.api.java.JavaDStream
import org.apache.spark.streaming.api.java.JavaReceiverInputDStream

val consumerKey = "xxx"
val consumerSecret = "xxx"
val accessToken = "xxx"
val accessTokenSecret = "xxx"
val url = "https://stream.twitter.com/1.1/statuses/filter.json"

val sparkConf = new SparkConf().setAppName("Twitter Streaming")
val sc = new SparkContext(sparkConf)

val documents: RDD[Seq[String]] = sc.textFile("").map(_.split(" ").toSeq)


// Twitter Streaming
val ssc = new JavaStreamingContext(sc,Seconds(2))

val conf = new ConfigurationBuilder()
conf.setOAuthAccessToken(accessToken)
conf.setOAuthAccessTokenSecret(accessTokenSecret)
conf.setOAuthConsumerKey(consumerKey)
conf.setOAuthConsumerSecret(consumerSecret)
conf.setStreamBaseURL(url)
conf.setSiteStreamBaseURL(url)

val filter = Array("Twitter", "Hadoop", "Big Data")

val auth = AuthorizationFactory.getInstance(conf.build())
val tweets : JavaReceiverInputDStream[twitter4j.Status] = TwitterUtils.createStream(ssc, auth, filter)

val statuses = tweets.dstream.map(status => status.getText)
statuses.print()
ssc.start()

But when it arrives at this command: val sc = new SparkContext(sparkConf), the following error appears:

17/05/09 09:08:35 WARN SparkContext: Multiple running SparkContexts detected in the same JVM! org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true.

I have tried to add the following parameters to the sparkConf value, but the error still appears:

val sparkConf = new SparkConf().setAppName("Twitter Streaming").setMaster("local[4]").set("spark.driver.allowMultipleContexts", "true")

If I ignore the error and continue running commands I get this other error:

17/05/09 09:15:44 WARN ReceiverSupervisorImpl: Restarting receiver with delay 2000 ms: Error receiving tweets 401:Authentication credentials (https://dev.twitter.com/pages/auth) were missing or incorrect. Ensure that you have set valid consumer key/secret, access token/secret, and the system clock is in sync. \n\n\nError 401 Unauthorized HTTP ERROR: 401

Problem accessing '/1.1/statuses/filter.json'. Reason:Unauthorized

Any kind of contribution is appreciated. A greeting and have a good day.

trick15f about 7 years

So the solution would be to omit the following commands: val sparkConf = new SparkConf().setAppName("Twitter Streaming") & val sc = new SparkContext(sparkConf)?. Thanks for the clarification.
Rick Moritz about 7 years

Yes - depending on your Spark version you may also have to substitute sc with spark.sparkContext (if Spark >=2.0)