Spark-submit fails to import SparkContext
The python files that are executed by spark-submit
should be on the PYTHONPATH
. Either add the full path of the directory by doing:
export PYTHONPATH=full/path/to/dir:$PYTHONPATH
or you can also add '.'
to the PYTHONPATH
if you are already inside the directory where the python script is
export PYTHONPATH='.':$PYTHONPATH
Thanks to @Def_Os for pointing that out!
Admin
Updated on July 06, 2022Comments
-
Admin almost 2 years
I'm running Spark 1.4.1 on my local Mac laptop and am able to use
pyspark
interactively without any issues. Spark was installed through Homebrew and I'm using Anaconda Python. However, as soon as I try to usespark-submit
, I get the following error:15/09/04 08:51:09 ERROR SparkContext: Error initializing SparkContext. java.io.FileNotFoundException: Added file file:test.py does not exist. at org.apache.spark.SparkContext.addFile(SparkContext.scala:1329) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1305) at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458) at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.SparkContext.<init>(SparkContext.scala:458) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:214) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) 15/09/04 08:51:09 ERROR SparkContext: Error stopping SparkContext after init error. java.lang.NullPointerException at org.apache.spark.network.netty.NettyBlockTransferService.close(NettyBlockTransferService.scala:152) at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1216) at org.apache.spark.SparkEnv.stop(SparkEnv.scala:96) at org.apache.spark.SparkContext.stop(SparkContext.scala:1659) at org.apache.spark.SparkContext.<init>(SparkContext.scala:565) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:214) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) Traceback (most recent call last): File "test.py", line 35, in <module> sc = SparkContext("local","test") File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 113, in __init__ File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 165, in _do_init File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 219, in _initialize_context File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 701, in __call__ File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.io.FileNotFoundException: Added file file:test.py does not exist. at org.apache.spark.SparkContext.addFile(SparkContext.scala:1329) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1305) at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458) at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.SparkContext.<init>(SparkContext.scala:458) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:214) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745)
Here is my code:
from pyspark import SparkContext if __name__ == "__main__": sc = SparkContext("local","test") sc.parallelize([1,2,3,4]) sc.stop()
If I move the file to anywhere in the
/usr/local/Cellar/apache-spark/1.4.1/
directory, thenspark-submit
works fine. I have my environment variables set as follows:export SPARK_HOME="/usr/local/Cellar/apache-spark/1.4.1" export PATH=$SPARK_HOME/bin:$PATH export PYTHONPATH=$SPARK_HOME/libexec/python:$SPARK_HOME/libexec/python/lib/py4j-0.8.2.1-src.zip
I'm sure something is set incorrectly in my environment, but I can't seem to track it down.