View worker / executor logs in Spark UI since 1.0.0+

10,384

Solution 1

These answers document how to find them from command line or UI

Where are logs in Spark on YARN?

For UI, on an edge node

Look in /etc/hadoop/conf/yarn-site.xml for the yarn resource manager URI (yarn.resourcemanager.webapp.address).

Or use command line:

yarn logs -applicationId [OPTIONS]

Solution 2

Depending on your configuration of YARN NodeManager log aggregation, the spark job logs are aggregated automatically. Runtime log is usually be found in following ways:

Spark Master Log

If you're running with yarn-cluster, go to YARN Scheduler web UI. You can find the Spark Master log there. Job description page "log' button gives the content.

With yarn-client, the driver runs in your spark-submit command. Then what you see is the driver log, if log4j.properties is configured to output in stderr or stdout.

Spark Executor Log

Search for "executorHostname" in driver logs. See comments for more detail.

Share:
10,384
samthebest
Author by

samthebest

To make me answer a question I like to answer questions on Spark, Hadoop, Big Data and Scala. I'm pretty good at Bash, git and Linux, so I can sometimes answer these questions too. I've stopped checking my filters for new questions these days, so I'm probably not answering questions which I probably could. Therefore if you think I can help, especially with Spark and Scala, then rather than me give me email out, please comment on a similar question/answer of mine with a link. Furthermore cross-linking similar questions can be nice for general SO browsing and good for SEO. My favourite answers Round parenthesis are much much better than curly braces http://stackoverflow.com/a/27686566/1586965 Underscore evangelism and in depth explanation http://stackoverflow.com/a/25763401/1586965 Generalized memoization http://stackoverflow.com/a/19065888/1586965 Monad explained in basically 2 LOCs http://stackoverflow.com/a/20707480/1586965

Updated on June 18, 2022

Comments

  • samthebest
    samthebest almost 2 years

    In 0.9.0 to view worker logs it was simple, they where one click away from the spark ui home page.

    Now (1.0.0+) I cannot find them. Furthermore the Spark UI stops working when my job crashes! This is annoying, what is the point of a debugging tool that only works when your application does not need debugging. According to http://apache-spark-user-list.1001560.n3.nabble.com/Viewing-web-UI-after-fact-td12023.html I need to find out what my master-url is, but I don't how to, spark doesn't spit out this information at startup, all it says is:

    ... -Dspark.master=\"yarn-client\" ...
    

    and obviously http://yarn-client:8080 doesn't work. Some sites talk about how now in YARN finding logs has been super obfuscated - rather than just being on the UI, you have to login to the boxes to find them. Surely this is a massive regression and there has to be a simpler way??

    How am I supposed to find out what the master URL is? How can I find my worker (now called executor) logs?

  • samthebest
    samthebest over 9 years
    Please could you expand on "Search for "executorHostname" in driver logs.", suppose I find the hostnames for my executors, which I do know, how do I then view the logs???
  • suztomo
    suztomo over 9 years
    Check the location : yarn.nodemanager.log-dirs: Determines where the container-logs are stored on the node when the containers are running. Default is ${yarn.log.dir}/userlogs. hortonworks.com/blog/…
  • samthebest
    samthebest over 9 years
    Yes, I'm aware that I can ssh into each box, find the actual files and read them. I want to know how to read the logs in a web UI, just like I could in 0.9.0. It seems like a major regression to make me ssh into boxes to find logs.
  • suztomo
    suztomo over 9 years
    If yarn.nodemanager.log.log-dirs is under yarn.log.dir, then you read the log via NomeManager's web UI in the same way as you read NodeManager's log.
  • samthebest
    samthebest over 9 years
    How do I find the "NomeManager's web UI" URL? I guess I just have to ask my DevOps team what they have configured it too right? Or is there a self service way to find out given one can ssh into the box?
  • suztomo
    suztomo over 9 years
    Yes > ask my DevOps team