Only one SparkContext may be running in this JVM - [SPARK]
A Spark-shell already prepares a spark-session or spark-context for you to use - so you don't have to / can't initialize a new one. Usually you will have a line telling you under what variable it is available to you a the end of the spark-shell launch process. allowMultipleContexts exists only for testing some functionalities of Spark, and shouldn't be used in most cases.
trick15f
Updated on June 16, 2022Comments
-
trick15f almost 2 years
I'm trying to run the following code to get twitter information live:
import org.apache.spark._ import org.apache.spark.streaming._ import org.apache.spark.streaming.twitter._ import org.apache.spark.streaming.StreamingContext._ import twitter4j.auth.Authorization import twitter4j.Status import twitter4j.auth.AuthorizationFactory import twitter4j.conf.ConfigurationBuilder import org.apache.spark.streaming.api.java.JavaStreamingContext import org.apache.spark.rdd.RDD import org.apache.spark.SparkContext import org.apache.spark.mllib.feature.HashingTF import org.apache.spark.mllib.linalg.Vector import org.apache.spark.SparkConf import org.apache.spark.api.java.JavaSparkContext import org.apache.spark.api.java.function.Function import org.apache.spark.streaming.Duration import org.apache.spark.streaming.api.java.JavaDStream import org.apache.spark.streaming.api.java.JavaReceiverInputDStream val consumerKey = "xxx" val consumerSecret = "xxx" val accessToken = "xxx" val accessTokenSecret = "xxx" val url = "https://stream.twitter.com/1.1/statuses/filter.json" val sparkConf = new SparkConf().setAppName("Twitter Streaming") val sc = new SparkContext(sparkConf) val documents: RDD[Seq[String]] = sc.textFile("").map(_.split(" ").toSeq) // Twitter Streaming val ssc = new JavaStreamingContext(sc,Seconds(2)) val conf = new ConfigurationBuilder() conf.setOAuthAccessToken(accessToken) conf.setOAuthAccessTokenSecret(accessTokenSecret) conf.setOAuthConsumerKey(consumerKey) conf.setOAuthConsumerSecret(consumerSecret) conf.setStreamBaseURL(url) conf.setSiteStreamBaseURL(url) val filter = Array("Twitter", "Hadoop", "Big Data") val auth = AuthorizationFactory.getInstance(conf.build()) val tweets : JavaReceiverInputDStream[twitter4j.Status] = TwitterUtils.createStream(ssc, auth, filter) val statuses = tweets.dstream.map(status => status.getText) statuses.print() ssc.start()
But when it arrives at this command:
val sc = new SparkContext(sparkConf)
, the following error appears:17/05/09 09:08:35 WARN SparkContext: Multiple running SparkContexts detected in the same JVM! org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true.
I have tried to add the following parameters to the sparkConf value, but the error still appears:
val sparkConf = new SparkConf().setAppName("Twitter Streaming").setMaster("local[4]").set("spark.driver.allowMultipleContexts", "true")
If I ignore the error and continue running commands I get this other error:
17/05/09 09:15:44 WARN ReceiverSupervisorImpl: Restarting receiver with delay 2000 ms: Error receiving tweets 401:Authentication credentials (https://dev.twitter.com/pages/auth) were missing or incorrect. Ensure that you have set valid consumer key/secret, access token/secret, and the system clock is in sync. \n\n\nError 401 Unauthorized HTTP ERROR: 401
Problem accessing '/1.1/statuses/filter.json'. Reason:Unauthorized
Any kind of contribution is appreciated. A greeting and have a good day.
-
trick15f about 7 yearsSo the solution would be to omit the following commands:
val sparkConf = new SparkConf().setAppName("Twitter Streaming")
&val sc = new SparkContext(sparkConf)
?. Thanks for the clarification. -
Rick Moritz about 7 yearsYes - depending on your Spark version you may also have to substitute sc with spark.sparkContext (if Spark >=2.0)