Spark Job running on Yarn Cluster java.io.FileNotFoundException: File does not exits , eventhough the file exits on the master node

22,314

The spark configuration was not pointing to the right hadoop Configuration directory. The hadoop configuration for 2.7.2 is residing at file path hadoop 2.7.2./etc/hadoop/ rather than /root/hadoop2.7.2/conf. When i pointed HADOOP_CONF_DIR=/root/hadoop2.7.2/etc/hadoop/ under spark-env.sh the spark submit started working and File not found exception disappeared. Earlier it was pointing to /root/hadoop2.7.2/conf (which does not exits). If spark does not points to proper hadoop configuration directory it might results in similar error. I think its probably a bug in spark , it should handle it gracefully rather than throwing ambiguous error messages .

Share:
22,314
Dev Loper
Author by

Dev Loper

Updated on July 17, 2022

Comments

  • Dev Loper
    Dev Loper almost 2 years

    I am fairly new to Spark . I tried searching but I couldn't get a proper solution . I have installed hadoop 2.7.2 on two boxes ( one master node and the other worker node) I have setup the cluster by following the below link http://javadev.org/docs/hadoop/centos/6/installation/multi-node-installation-on-centos-6-non-sucure-mode/ I was running hadoop and spark application as root user for testing the cluster.

    I have installed the spark on the master node and spark is starting without any errors . However when I submit the job using spark submit I am getting File Not Found exception even though the file is present in the master node in the very same location in the error.I am executing below Spark Submit command and please find the logs output below the command.

    /bin/spark-submit  --class com.test.Engine  --master yarn --deploy-mode      cluster /app/spark-test.jar
    
    16/04/21 19:16:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    16/04/21 19:16:13 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
    16/04/21 19:16:14 INFO Client: Requesting a new application from cluster with 1 NodeManagers
    16/04/21 19:16:14 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
    16/04/21 19:16:14 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
    16/04/21 19:16:14 INFO Client: Setting up container launch context for our AM
    16/04/21 19:16:14 INFO Client: Setting up the launch environment for our AM container
    16/04/21 19:16:14 INFO Client: Preparing resources for our AM container
    16/04/21 19:16:14 INFO Client: Source and destination file systems are the same. Not copying file:/mi/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar
    16/04/21 19:16:14 INFO Client: Source and destination file systems are the same. Not copying file:/app/spark-test.jar
    16/04/21 19:16:14 INFO Client: Source and destination file systems are the same. Not copying file:/tmp/spark-120aeddc-0f87-4411-9400-22ba01096249/__spark_conf__5619348744221830008.zip
    16/04/21 19:16:14 INFO SecurityManager: Changing view acls to: root
    16/04/21 19:16:14 INFO SecurityManager: Changing modify acls to: root
    16/04/21 19:16:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
    16/04/21 19:16:15 INFO Client: Submitting application 1 to ResourceManager
    16/04/21 19:16:15 INFO YarnClientImpl: Submitted application application_1461246306015_0001
    16/04/21 19:16:16 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
    16/04/21 19:16:16 INFO Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1461246375622
         final status: UNDEFINEDsparkcluster01.testing.com
         tracking URL: http://sparkcluster01.testing.com:8088/proxy/application_1461246306015_0001/
         user: root
    16/04/21 19:16:17 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
    16/04/21 19:16:18 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
    16/04/21 19:16:19 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
    16/04/21 19:16:20 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
    16/04/21 19:16:21 INFO Client: Application report for application_1461246306015_0001 (state: FAILED)
    16/04/21 19:16:21 INFO Client: 
         client token: N/A
         diagnostics: Application application_1461246306015_0001 failed 2 times due to AM Container for appattempt_1461246306015_0001_000002 exited with  exitCode: -1000
    For more detailed output, check application tracking page:http://sparkcluster01.testing.com:8088/cluster/app/application_1461246306015_0001Then, click on links to logs of each attempt.
    Diagnostics: java.io.FileNotFoundException: File file:/app/spark-test.jar does not exist
    Failing this attempt. Failing the application.
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1461246375622
         final status: FAILED
         tracking URL: http://sparkcluster01.testing.com:8088/cluster/app/application_1461246306015_0001
         user: root
    Exception in thread "main" org.ap/app/spark-test.jarache.spark.SparkException: Application application_1461246306015_0001 finished with failed status
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
        at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
        at org.apache.spark.deploy.yarn.Client.main(Client.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    
    

    I even tried running the spark on HDFS file system by placing my application on HDFS and giving the HDFS path in the Spark Submit command. Even then its throwing File Not Found Exception on some Spark Conf file. I am executing below Spark Submit command and please find the logs output below the command.

     ./bin/spark-submit  --class com.test.Engine  --master yarn --deploy-mode cluster hdfs://sparkcluster01.testing.com:9000/beacon/job/spark-test.jar
    
    16/04/21 18:11:45 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
    16/04/21 18:11:46 INFO Client: Requesting a new application from cluster with 1 NodeManagers
    16/04/21 18:11:46 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
    16/04/21 18:11:46 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
    16/04/21 18:11:46 INFO Client: Setting up container launch context for our AM
    16/04/21 18:11:46 INFO Client: Setting up the launch environment for our AM container
    16/04/21 18:11:46 INFO Client: Preparing resources for our AM container
    16/04/21 18:11:46 INFO Client: Source and destination file systems are the same. Not copying file:/mi/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar
    16/04/21 18:11:47 INFO Client: Uploading resource hdfs://sparkcluster01.testing.com:9000/beacon/job/spark-test.jar -> file:/root/.sparkStaging/application_1461234217994_0017/spark-test.jar
    16/04/21 18:11:49 INFO Client: Source and destination file systems are the same. Not copying file:/tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21/__spark_conf__6818051470272245610.zip
    16/04/21 18:11:50 INFO SecurityManager: Changing view acls to: root
    16/04/21 18:11:50 INFO SecurityManager: Changing modify acls to: root
    16/04/21 18:11:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
    16/04/21 18:11:50 INFO Client: Submitting application 17 to ResourceManager
    16/04/21 18:11:50 INFO YarnClientImpl: Submitted application application_1461234217994_0017
    16/04/21 18:11:51 INFO Client: Application report for application_1461234217994_0017 (state: ACCEPTED)
    16/04/21 18:11:51 INFO Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1461242510849
         final status: UNDEFINED
         tracking URL: http://sparkcluster01.testing.com:8088/proxy/application_1461234217994_0017/
         user: root
    16/04/21 18:11:52 INFO Client: Application report for application_1461234217994_0017 (state: ACCEPTED)
    16/04/21 18:11:53 INFO Client: Application report for application_1461234217994_0017 (state: ACCEPTED)
    16/04/21 18:11:54 INFO Client: Application report for application_1461234217994_0017 (state: FAILED)
    16/04/21 18:11:54 INFO Client: 
         client token: N/A
         diagnostics: Application application_1461234217994_0017 failed 2 times due to AM Container for appattempt_1461234217994_0017_000002 exited with  exitCode: -1000
    For more detailed output, check application tracking page:http://sparkcluster01.testing.com:8088/cluster/app/application_1461234217994_0017Then, click on links to logs of each attempt.
    Diagnostics: File file:/tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21/__spark_conf__6818051470272245610.zip does not exist
    java.io.FileNotFoundException: File file:/tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21/__spark_conf__6818051470272245610.zip does not exist
        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
        at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
        at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
    
    Failing this attempt. Failing the application.
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1461242510849
         final status: FAILED
         tracking URL: http://sparkcluster01.testing.com:8088/cluster/app/application_1461234217994_0017
         user: root
    Exception in thread "main" org.apache.spark.SparkException: Application application_1461234217994_0017 finished with failed status
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
        at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
        at org.apache.spark.deploy.yarn.Client.main(Client.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    16/04/21 18:11:55 INFO ShutdownHookManager: Shutdown hook called
    16/04/21 18:11:55 INFO ShutdownHookManager: Deleting directory /tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21
    
  • Yoga Gowda
    Yoga Gowda over 5 years
    I have a hadoop-2.8.2, when I changed the HADOOP_CONF_DIR to /etc/hadoop from /etc/hadoop/conf, getting some weird exception and application failed to start.