Where are logs in Spark on YARN?

91,007

Solution 1

Pretty article for this question:

Running Spark on YARN - see the section "Debugging your Application". Decent explanation with all required examples.

The only thing you need to follow to get correctly working history server for Spark is to close your Spark context in your application. Otherwise, application history server does not see you as COMPLETE and does not show anything (despite history UI is accessible but not so visible).

Solution 2

You can access logs through the command

yarn logs -applicationId <application ID> [OPTIONS]

general options are:

  • appOwner <Application Owner> - AppOwner (assumed to be current user if not specified)
  • containerId <Container ID> - ContainerId (must be specified if node address is specified)
  • nodeAddress <Node Address> - NodeAddress in the format nodename:port (must be specified if container id is specified)

Examples:

yarn logs -applicationId application_1414530900704_0003                                      
yarn logs -applicationId application_1414530900704_0003 myuserid

// the user ids are different
yarn logs -applicationId <appid> -appOwner <userid>

Solution 3

None of the answers make it crystal clear where to look for logs ( although they do in pieces) so I am putting it together.

If log aggregation is turned on (with the yarn.log-aggregation-enable yarn-site.xml) then do this

yarn logs -applicationId <app ID>

However, if this is not turned on then one needs to go on the Data-Node machine and look at

$HADOOP_HOME/logs/userlogs/application_1474886780074_XXXX/

application_1474886780074_XXXX is the application id

Solution 4

It logs to:

/var/log/hadoop-yarn/containers/[application id]/[container id]/stdout

The logs are on every node that your Spark job runs on.

Share:
91,007
DeepNightTwo
Author by

DeepNightTwo

Coder

Updated on July 09, 2022

Comments

  • DeepNightTwo
    DeepNightTwo almost 2 years

    I'm new to spark. Now I can run spark 0.9.1 on yarn (2.0.0-cdh4.2.1). But there is no log after execution.

    The following command is used to run a spark example. But logs are not found in the history server as in a normal MapReduce job.

    SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop2.0.0-cdh4.2.1.jar \
    ./bin/spark-class org.apache.spark.deploy.yarn.Client --jar ./spark-example-1.0.0.jar \
    --class SimpleApp --args yarn-standalone  --num-workers 3 --master-memory 1g \
    --worker-memory 1g --worker-cores 1
    

    where can I find the logs/stderr/stdout?

    Is there someplace to set the configuration? I did find an output from console saying:

    14/04/14 18:51:52 INFO Client: Command for the ApplicationMaster: $JAVA_HOME/bin/java -server -Xmx640m -Djava.io.tmpdir=$PWD/tmp org.apache.spark.deploy.yarn.ApplicationMaster --class SimpleApp --jar ./spark-example-1.0.0.jar --args 'yarn-standalone' --worker-memory 1024 --worker-cores 1 --num-workers 3 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr

    In this line, notice 1> $LOG_DIR/stdout 2> $LOG_DIR/stderr

    Where can LOG_DIR be set?

  • ChenZhou
    ChenZhou over 8 years
    That's only true if yarn.log-aggregation-enable is true in yarn-site.xml and the application is already finished.
  • iec2011007
    iec2011007 over 8 years
    nish1013 Try yarn application --list
  • Ravi Chinoy
    Ravi Chinoy almost 8 years
    how to get the logs if the log files are zipped ? I tried this command but it prints garbled zipped file output. thanks.
  • Ben Hoyt
    Ben Hoyt over 7 years
    To get the application ID, run yarn application -list -appStates ALL and get the first field of the first line that starts with "application_". In my case it's something like "application_1480604706480_0001".
  • stefan.m
    stefan.m about 7 years
    For those like me who do not know how to get the application id: use yarn applications -list
  • Harikrishnan Ck
    Harikrishnan Ck almost 7 years
    @jacek Laskowski, From the spark history server, I am unable to access the container logs if log aggregation is enabled. it keeps checking in the node manager log directory and not in the aggregated log location. This is for a completed job
  • iruvar
    iruvar over 6 years
    @stefan.m, that would be yarn application -list and not yarn applications -list
  • soMuchToLearnAndShare
    soMuchToLearnAndShare over 6 years
    does it matter if the spark job was client mode or cluster mode? I do not seem to see the whole logs (i am under client mode)
  • snark
    snark over 6 years
    And yarn logs -applicationId <app ID> -log_files stdout will retrieve just the stdout if that's all your interested in:).
  • JMess
    JMess almost 5 years
    How do you determine which container was running the driver?
  • alex
    alex almost 5 years
    When log aggregation isn't turned on it's stored in /tmp/logs. I found mine using hdfs dfs -ls /tmp/logs/{USER}/logs where {USER} is the user which launched the spark application.
  • mawaldne
    mawaldne over 3 years
    This worked for me. ALSO, if you are running a cluster, the logs might be on one of the nodes. So make sure you cheap each node in your cluster and see if the logs are in this directory structure.
  • user238607
    user238607 about 3 years
    For future readers, I found this library useful. Please give it a try : github.com/hammerlab/yarn-logs-helpers
  • Aashish Chaubey
    Aashish Chaubey about 2 years
    what do I do if I want a running log of this application?