Submit Spark job on Yarn cluster

10,392

Solution 1

In my case I used:

val config = new SparkConf()
config.setMaster("local[*]") 

and submitted the job using:

spark-submit --master yarn-cluster ..

Once I removed config.setMaster from my code the issue was resolved.

Solution 2

You will have to set the master of the application to "yarn-cluster" like so:

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

class Hello extends App {
    val config = new SparkConf().setAppName("HelloWorld")
    config.setMaster("yarn-cluster");
    val context = new SparkContext(config);

    println("Application executed!")
}

Solution 3

i can't see any spark context initialization in your code. So, by submitting your jar as spark-submit , you are just setting up classpath. you code is not making any use of spark. However, if you wanna create a context and run it in cluster mode, then don't set any master in your code. it will work. for ex: val config = new SparkConf().setAppName("HelloWorld") val context = new SparkContext(config); Then submit it using spark-submit --verbose --master yarn-cluster--class Hello SCALA/hello.jar

Share:
10,392
user2573552
Author by

user2573552

Updated on June 18, 2022

Comments

  • user2573552
    user2573552 almost 2 years

    I am struggling for more than 2 days now with the following problem.

    I wrote a basic "HelloWorld" script in Scala:

    object Hello extends App{
      println("WELCOME TO A FIRST TEST WITH SCALA COMPILED WITH SBT counting fr. 1:15 with sleep 1")
      val data = 1 to 15
    
      for( a <- data ){
        println( "Value of a: " + a )
        Thread sleep 1000
      }
    

    That I then compiled with SBT in order to get a JAR compiled version.

    I transferred then everything on a cluster (which is Horthonworks sandbox running on a virtual Linux machine) with HDP 2.2.4.2.

    I am actually able to run the job with the following command on the cluster using a yarn-client:

    spark-submit --verbose --master yarn-client --class Hello SCALA/hello.jar
    

    However, while trying to submit the same helloWorld job on a yarn-cluster with the following command

    spark-submit --verbose --master yarn-cluster--class Hello SCALA/hello.jar
    

    The job first run properly (the outputs are the one expected, and it exits 0) but then the job stop with the following:

    15/06/05 15:52:09 INFO Client: Application report for application_1433491352951_0010 (state: FAILED)
    
    15/06/05 15:52:09 INFO Client:
             client token: N/A
             diagnostics: Application application_1433491352951_0010 failed 2 times due to AM Container for appattempt_1433491352951_0010_000002 exited with  exitCode: 0
    For more detailed output, check application tracking page:http://sandbox.hortonworks.com:8088/proxy/application_1433491352951_0010/Then, click on links to logs of each attempt.
    Diagnostics: Failing this attempt. Failing the application.
             ApplicationMaster host: N/A
             ApplicationMaster RPC port: -1
             queue: default
             start time: 1433519471297
             final status: FAILED
             tracking URL: http://sandbox.hortonworks.com:8088/cluster/app/application_1433491352951_0010
             user: root
    Error: application failed with exception
    org.apache.spark.SparkException: Application finished with failed status
            at org.apache.spark.deploy.yarn.ClientBase$class.run(ClientBase.scala:522)
            at org.apache.spark.deploy.yarn.Client.run(Client.scala:35)
            at org.apache.spark.deploy.yarn.Client$.main(Client.scala:139)
            at org.apache.spark.deploy.yarn.Client.main(Client.scala)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:606)
            at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:367)
            at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:77)
            at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    

    I decided then to check the log with the following command-line:

    yarn logs -applicationId application_1433491352951_00010
    

    And I get:

    15/06/05 15:56:33 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
    15/06/05 15:56:33 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/192.168.182.129:8050
    15/06/05 15:56:35 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
    15/06/05 15:56:35 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
    
    
    Container: container_e08_1433491352951_0010_01_000001 on sandbox.hortonworks.com_45454
    ========================================================================================
    LogType:stderr
    Log Upload Time:Fri Jun 05 15:52:10 +0000 2015
    LogLength:2050
    Log Contents:
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/hadoop/yarn/local/usercache/root/filecache/28/spark-assembly-1.2.1.2.2.4.2-2-hadoop2.6.0.2.2.4.2-2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/hdp/2.2.4.2-2/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/hadoop/yarn/local/usercache/root/filecache/29/hello.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    15/06/05 15:51:18 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT]
    15/06/05 15:51:20 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1433491352951_0010_000001
    15/06/05 15:51:21 INFO spark.SecurityManager: Changing view acls to: yarn,root
    15/06/05 15:51:21 INFO spark.SecurityManager: Changing modify acls to: yarn,root
    15/06/05 15:51:21 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, root); users with modify permissions: Set(yarn, root)
    15/06/05 15:51:21 INFO yarn.ApplicationMaster: Starting the user JAR in a separate Thread
    15/06/05 15:51:21 INFO yarn.ApplicationMaster: Waiting for spark context initialization
    15/06/05 15:51:21 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 0
    15/06/05 15:51:31 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 1
    15/06/05 15:51:36 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
    15/06/05 15:51:41 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors. Failing the application.
    15/06/05 15:51:41 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
    15/06/05 15:51:41 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1433491352951_0010
    
    LogType:stdout
    Log Upload Time:Fri Jun 05 15:52:10 +0000 2015
    LogLength:300
    Log Contents:
    WELCOME TO A FIRST TEST WITH SCALA COMPILED WITH SBT counting fr. 1:15 with sleep 1
    Value of a: 1
    Value of a: 2
    Value of a: 3
    Value of a: 4
    Value of a: 5
    Value of a: 6
    Value of a: 7
    Value of a: 8
    Value of a: 9
    Value of a: 10
    Value of a: 11
    Value of a: 12
    Value of a: 13
    Value of a: 14
    Value of a: 15
    
    
    
    Container: container_e08_1433491352951_0010_02_000001 on sandbox.hortonworks.com_45454
    ========================================================================================
    LogType:stderr
    Log Upload Time:Fri Jun 05 15:52:10 +0000 2015
    LogLength:2050
    Log Contents:
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/hadoop/yarn/local/usercache/root/filecache/28/spark-assembly-1.2.1.2.2.4.2-2-hadoop2.6.0.2.2.4.2-2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/hdp/2.2.4.2-2/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/hadoop/yarn/local/usercache/root/filecache/29/hello.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    15/06/05 15:51:45 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT]
    15/06/05 15:51:47 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1433491352951_0010_000002
    15/06/05 15:51:48 INFO spark.SecurityManager: Changing view acls to: yarn,root
    15/06/05 15:51:48 INFO spark.SecurityManager: Changing modify acls to: yarn,root
    15/06/05 15:51:48 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, root); users with modify permissions: Set(yarn, root)
    15/06/05 15:51:48 INFO yarn.ApplicationMaster: Starting the user JAR in a separate Thread
    15/06/05 15:51:48 INFO yarn.ApplicationMaster: Waiting for spark context initialization
    15/06/05 15:51:48 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 0
    15/06/05 15:51:58 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 1
    15/06/05 15:52:03 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
    15/06/05 15:52:08 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors. Failing the application.
    15/06/05 15:52:08 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
    15/06/05 15:52:08 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1433491352951_0010
    
    LogType:stdout
    Log Upload Time:Fri Jun 05 15:52:10 +0000 2015
    LogLength:300
    Log Contents:
    WELCOME TO A FIRST TEST WITH SCALA COMPILED WITH SBT counting fr. 1:15 with sleep 1
    Value of a: 1
    Value of a: 2
    Value of a: 3
    Value of a: 4
    Value of a: 5
    Value of a: 6
    Value of a: 7
    Value of a: 8
    Value of a: 9
    Value of a: 10
    Value of a: 11
    Value of a: 12
    Value of a: 13
    Value of a: 14
    Value of a: 15
    

    I took the HelloWorld project that someone suggested, recompiled, and tried again. Now I got another problem: when I submit the task with the following command:

    spark-submit --verbose --master yarn-cluster SCALA/hello.jar
    

    I get the following comments running to infinity:

    15/06/08 16:42:35 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
    

    I don't really understand since it looks like the server is not responding while the program is supposed to run on the Hadoop cluster from the sandbox.

  • absmiths
    absmiths over 6 years
    I just did this. Deployed to yarn cluster and forgot that I had it set to local mode. Removed local and it ran fine. :|