spark in yarn-cluser 'sc' not defined

13,404

Solution 1

This is what worked for me:

from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext

conf = SparkConf().setAppName("building a warehouse")
sc = SparkContext(conf=conf)
sqlCtx = SQLContext(sc)

Hope this helps.

Solution 2

sc is a helper value created in the spark-shell, but is not automatically created with spark-submit. You must instantiate your own SparkContext and use that

conf = SparkConf().setAppName(appName)
sc = SparkContext(conf=conf)
Share:
13,404

Related videos on Youtube

Tara
Author by

Tara

Updated on September 14, 2022

Comments

  • Tara
    Tara over 1 year

    I am using spark 1.3.1.

    Do I have to declare sc when spark run in yarn-cluster mode? I have no problem running the same python program in spark python shell.

    This is how I submit the job :

    /bin/spark-submit --master yarn-cluster test.py --conf conf/spark-defaults.conf
    

    where in spark-defaults I did declare where the spark.yarn.jar is, also check permission on where spark.yarn.jar is and /user/admin, the spark user, to make there is read-write-execute for all.

    In my test.py program, I have from pyspark.sql import SQLContext and the first line is

    sqlctx=SQLContext(sc)
    

    and the error is

    NameError: name 'sc' is not defined
    

    on that line.

    Any idea?

  • Tara
    Tara almost 9 years
    thanks. Then is my passing of "conf" parameter in the submit command is useless if I am creating another "conf"?
  • Justin Pihony
    Justin Pihony almost 9 years
    No, the conf file is used if nothing is set in the code. Also, if this helped you then don't forget to accept and upvote :)