Spark-submit fails to import SparkContext

10,271

The python files that are executed by spark-submit should be on the PYTHONPATH. Either add the full path of the directory by doing:

export PYTHONPATH=full/path/to/dir:$PYTHONPATH

or you can also add '.' to the PYTHONPATH if you are already inside the directory where the python script is

export PYTHONPATH='.':$PYTHONPATH

Thanks to @Def_Os for pointing that out!

Share:
10,271
Admin
Author by

Admin

Updated on July 06, 2022

Comments

  • Admin
    Admin almost 2 years

    I'm running Spark 1.4.1 on my local Mac laptop and am able to use pyspark interactively without any issues. Spark was installed through Homebrew and I'm using Anaconda Python. However, as soon as I try to use spark-submit, I get the following error:

    15/09/04 08:51:09 ERROR SparkContext: Error initializing SparkContext.
    java.io.FileNotFoundException: Added file file:test.py does not exist.
        at org.apache.spark.SparkContext.addFile(SparkContext.scala:1329)
        at org.apache.spark.SparkContext.addFile(SparkContext.scala:1305)
        at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458)
        at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:458)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
        at py4j.Gateway.invoke(Gateway.java:214)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
        at py4j.GatewayConnection.run(GatewayConnection.java:207)
        at java.lang.Thread.run(Thread.java:745)
    15/09/04 08:51:09 ERROR SparkContext: Error stopping SparkContext after init error.
    java.lang.NullPointerException
        at org.apache.spark.network.netty.NettyBlockTransferService.close(NettyBlockTransferService.scala:152)
        at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1216)
        at org.apache.spark.SparkEnv.stop(SparkEnv.scala:96)
        at org.apache.spark.SparkContext.stop(SparkContext.scala:1659)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:565)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
     Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
        at py4j.Gateway.invoke(Gateway.java:214)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
        at py4j.GatewayConnection.run(GatewayConnection.java:207)
        at java.lang.Thread.run(Thread.java:745)
    Traceback (most recent call last):
      File "test.py", line 35, in <module> sc = SparkContext("local","test") 
      File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 113, in __init__
      File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 165, in _do_init
      File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 219, in _initialize_context
      File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 701, in __call__
      File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
    py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
    : java.io.FileNotFoundException: Added file file:test.py does not exist.
        at org.apache.spark.SparkContext.addFile(SparkContext.scala:1329)
        at org.apache.spark.SparkContext.addFile(SparkContext.scala:1305)
        at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458)
        at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:458)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
        at py4j.Gateway.invoke(Gateway.java:214)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
        at py4j.GatewayConnection.run(GatewayConnection.java:207)
        at java.lang.Thread.run(Thread.java:745)
    

    Here is my code:

    from pyspark import SparkContext
    
    if __name__ == "__main__":
        sc = SparkContext("local","test")
        sc.parallelize([1,2,3,4])
        sc.stop()
    

    If I move the file to anywhere in the /usr/local/Cellar/apache-spark/1.4.1/ directory, then spark-submit works fine. I have my environment variables set as follows:

    export SPARK_HOME="/usr/local/Cellar/apache-spark/1.4.1"
    export PATH=$SPARK_HOME/bin:$PATH
    export PYTHONPATH=$SPARK_HOME/libexec/python:$SPARK_HOME/libexec/python/lib/py4j-0.8.2.1-src.zip
    

    I'm sure something is set incorrectly in my environment, but I can't seem to track it down.