spark in yarn-cluser 'sc' not defined
13,404
Solution 1
This is what worked for me:
from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
conf = SparkConf().setAppName("building a warehouse")
sc = SparkContext(conf=conf)
sqlCtx = SQLContext(sc)
Hope this helps.
Solution 2
sc
is a helper value created in the spark-shell
, but is not automatically created with spark-submit
. You must instantiate your own SparkContext
and use that
conf = SparkConf().setAppName(appName)
sc = SparkContext(conf=conf)
Related videos on Youtube
Author by
Tara
Updated on September 14, 2022Comments
-
Tara over 1 year
I am using spark 1.3.1.
Do I have to declare sc when spark run in yarn-cluster mode? I have no problem running the same python program in spark python shell.
This is how I submit the job :
/bin/spark-submit --master yarn-cluster test.py --conf conf/spark-defaults.conf
where in spark-defaults I did declare where the
spark.yarn.jar
is, also check permission on wherespark.yarn.jar
is and/user/admin
, the spark user, to make there is read-write-execute for all.In my
test.py
program, I havefrom pyspark.sql import SQLContext
and the first line issqlctx=SQLContext(sc)
and the error is
NameError: name 'sc' is not defined
on that line.
Any idea?
-
Tara almost 9 yearsthanks. Then is my passing of "conf" parameter in the submit command is useless if I am creating another "conf"?
-
Justin Pihony almost 9 yearsNo, the conf file is used if nothing is set in the code. Also, if this helped you then don't forget to accept and upvote :)