Spark-submit not working when application jar is in hdfs

32,935

Solution 1

The only way it worked for me, when I was using

--master yarn-cluster

Solution 2

To make HDFS library accessible to spark-job , you have to run job in cluster mode.

$SPARK_HOME/bin/spark-submit \
--deploy-mode cluster \
--class <main_class> \
--master yarn-cluster \
hdfs://myhost:8020/user/root/myjar.jar

Also, There is Spark JIRA raised for client mode which is not supported yet.

SPARK-10643 :Support HDFS application download in client mode spark submit

Solution 3

There is a workaround. You could mount the directory in HDFS (which contains your application jar) as local directory.

I did the same (with azure blob storage, but it should be similar for HDFS)

example command for azure wasb

sudo mount -t cifs //{storageAccountName}.file.core.windows.net/{directoryName} {local directory path} -o vers=3.0,username={storageAccountName},password={storageAccountKey},dir_mode=0777,file_mode=0777

Now, in your spark submit command, you provide the path from the command above

$ ./bin/spark-submit --class com.example.SimpleApp --master local {local directory path}/simple-project-1.0-SNAPSHOT.jar

Share:
32,935
dilm
Author by

dilm

Updated on September 01, 2020

Comments

  • dilm
    dilm over 3 years

    I'm trying to run a spark application using bin/spark-submit. When I reference my application jar inside my local filesystem, it works. However, when I copied my application jar to a directory in hdfs, i get the following exception:

    Warning: Skip remote jar hdfs://localhost:9000/user/hdfs/jars/simple-project-1.0-SNAPSHOT.jar. java.lang.ClassNotFoundException: com.example.SimpleApp

    Here's the command:

    $ ./bin/spark-submit --class com.example.SimpleApp --master local hdfs://localhost:9000/user/hdfs/jars/simple-project-1.0-SNAPSHOT.jar

    I'm using hadoop version 2.6.0, spark version 1.2.1