How to stop INFO messages displaying on spark console?

229,144

Solution 1

Thanks @AkhlD and @Sachin Janani for suggesting changes in .conf file.

Following code solved my issue:

1) Added import org.apache.log4j.{Level, Logger} in import section

2) Added following line after creation of spark context object i.e. after val sc = new SparkContext(conf):

val rootLogger = Logger.getRootLogger()
rootLogger.setLevel(Level.ERROR)

Solution 2

Edit your conf/log4j.properties file and change the following line:

log4j.rootCategory=INFO, console

to

log4j.rootCategory=ERROR, console

Another approach would be to :

Start spark-shell and type in the following:

import org.apache.log4j.Logger
import org.apache.log4j.Level

Logger.getLogger("org").setLevel(Level.OFF)
Logger.getLogger("akka").setLevel(Level.OFF)

You won't see any logs after that.

Other options for Level include: all, debug, error, fatal, info, off, trace, trace_int, warn

Details about each can be found in the documentation.

Solution 3

Right after starting spark-shell type ;

sc.setLogLevel("ERROR")

In Spark 2.0 (Scala):

spark = SparkSession.builder.getOrCreate()
spark.sparkContext.setLogLevel("ERROR")

API Docs : https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.sql.SparkSession

For Java:

spark = SparkSession.builder.getOrCreate();
spark.sparkContext().setLogLevel("ERROR");

Solution 4

All the methods collected with examples

Intro

Actually, there are many ways to do it. Some are harder from others, but it is up to you which one suits you best. I will try to showcase them all.


#1 Programatically in your app

Seems to be the easiest, but you will need to recompile your app to change those settings. Personally, I don't like it but it works fine.

Example:

import org.apache.log4j.{Level, Logger}

val rootLogger = Logger.getRootLogger()
rootLogger.setLevel(Level.ERROR)

Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
Logger.getLogger("org.spark-project").setLevel(Level.WARN)

You can achieve much more just using log4j API.
Source: [Log4J Configuration Docs, Configuration section]


#2 Pass log4j.properties during spark-submit

This one is very tricky, but not impossible. And my favorite.

Log4J during app startup is always looking for and loading log4j.properties file from classpath.

However, when using spark-submit Spark Cluster's classpath has precedence over app's classpath! This is why putting this file in your fat-jar will not override the cluster's settings!

Add -Dlog4j.configuration=<location of configuration file> to spark.driver.extraJavaOptions (for the driver) or
spark.executor.extraJavaOptions (for executors).

Note that if using a file, the file: protocol should be explicitly provided, and the file needs to exist locally on all the nodes.

To satisfy the last condition, you can either upload the file to the location available for the nodes (like hdfs) or access it locally with driver if using deploy-mode client. Otherwise:

upload a custom log4j.properties using spark-submit, by adding it to the --files list of files to be uploaded with the application.

Source: Spark docs, Debugging

Steps:

Example log4j.properties:

# Blacklist all to warn level
log4j.rootCategory=WARN, console

log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Whitelist our app to info :)
log4j.logger.com.github.atais=INFO

Executing spark-submit, for cluster mode:

spark-submit \
    --master yarn \
    --deploy-mode cluster \
    --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" \
    --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" \
    --files "/absolute/path/to/your/log4j.properties" \
    --class com.github.atais.Main \
    "SparkApp.jar"

Note that you must use --driver-java-options if using client mode. Spark docs, Runtime env

Executing spark-submit, for client mode:

spark-submit \
    --master yarn \
    --deploy-mode client \
    --driver-java-options "-Dlog4j.configuration=file:/absolute/path/to/your/log4j.properties" \
    --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" \
    --files "/absolute/path/to/your/log4j.properties" \
    --class com.github.atais.Main \
    "SparkApp.jar"

Notes:

  1. Files uploaded to spark-cluster with --files will be available at root dir, so there is no need to add any path in file:log4j.properties.
  2. Files listed in --files must be provided with absolute path!
  3. file: prefix in configuration URI is mandatory.

#3 Edit cluster's conf/log4j.properties

This changes global logging configuration file.

update the $SPARK_CONF_DIR/log4j.properties file and it will be automatically uploaded along with the other configurations.

Source: Spark docs, Debugging

To find your SPARK_CONF_DIR you can use spark-shell:

atais@cluster:~$ spark-shell 
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.1
      /_/   

scala> System.getenv("SPARK_CONF_DIR")
res0: String = /var/lib/spark/latest/conf

Now just edit /var/lib/spark/latest/conf/log4j.properties (with example from method #2) and all your apps will share this configuration.


#4 Override configuration directory

If you like the solution #3, but want to customize it per application, you can actually copy conf folder, edit it contents and specify as the root configuration during spark-submit.

To specify a different configuration directory other than the default “SPARK_HOME/conf”, you can set SPARK_CONF_DIR. Spark will use the configuration files (spark-defaults.conf, spark-env.sh, log4j.properties, etc) from this directory.

Source: Spark docs, Configuration

Steps:

  1. Copy cluster's conf folder (more info, method #3)

  2. Edit log4j.properties in that folder (example in method #2)

  3. Set SPARK_CONF_DIR to this folder, before executing spark-submit,
    example:

    export SPARK_CONF_DIR=/absolute/path/to/custom/conf
    
    spark-submit \
        --master yarn \
        --deploy-mode cluster \
        --class com.github.atais.Main \
        "SparkApp.jar"
    

Conclusion

I am not sure if there is any other method, but I hope this covers the topic from A to Z. If not, feel free to ping me in the comments!

Enjoy your way!

Solution 5

Use below command to change log level while submitting application using spark-submit or spark-sql:

spark-submit \
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:<file path>/log4j.xml" \
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:<file path>/log4j.xml"

Note: replace <file path> where log4j config file is stored.

Log4j.properties:

log4j.rootLogger=ERROR, console

# set the log level for these components
log4j.logger.com.test=DEBUG
log4j.logger.org=ERROR
log4j.logger.org.apache.spark=ERROR
log4j.logger.org.spark-project=ERROR
log4j.logger.org.apache.hadoop=ERROR
log4j.logger.io.netty=ERROR
log4j.logger.org.apache.zookeeper=ERROR

# add a ConsoleAppender to the logger stdout to write to the console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.layout=org.apache.log4j.PatternLayout
# use a simple message format
log4j.appender.console.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

log4j.xml

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">

<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/">
   <appender name="console" class="org.apache.log4j.ConsoleAppender">
    <param name="Target" value="System.out"/>
    <layout class="org.apache.log4j.PatternLayout">
    <param name="ConversionPattern" value="%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n" />
    </layout>
  </appender>
    <logger name="org.apache.spark">
        <level value="error" />
    </logger>
    <logger name="org.spark-project">
        <level value="error" />
    </logger>
    <logger name="org.apache.hadoop">
        <level value="error" />
    </logger>
    <logger name="io.netty">
        <level value="error" />
    </logger>
    <logger name="org.apache.zookeeper">
        <level value="error" />
    </logger>
   <logger name="org">
        <level value="error" />
    </logger>
    <root>
        <priority value ="ERROR" />
        <appender-ref ref="console" />
    </root>
</log4j:configuration>

Switch to FileAppender in log4j.xml if you want to write logs to file instead of console. LOG_DIR is a variable for logs directory which you can supply using spark-submit --conf "spark.driver.extraJavaOptions=-D.

<appender name="file" class="org.apache.log4j.DailyRollingFileAppender">
        <param name="file" value="${LOG_DIR}"/>
        <param name="datePattern" value="'.'yyyy-MM-dd"/>
        <layout class="org.apache.log4j.PatternLayout">
            <param name="ConversionPattern" value="%d [%t] %-5p %c %x - %m%n"/>
        </layout>
    </appender>

Another important thing to understand here is, when job is launched in distributed mode ( deploy-mode cluster and master as yarn or mesos) the log4j configuration file should exist on driver and worker nodes (log4j.configuration=file:<file path>/log4j.xml) else log4j init will complain-

log4j:ERROR Could not read configuration file [log4j.properties]. java.io.FileNotFoundException: log4j.properties (No such file or directory)

Hint on solving this problem-

Keep log4j config file in distributed file system(HDFS or mesos) and add external configuration using log4j PropertyConfigurator. or use sparkContext addFile to make it available on each node then use log4j PropertyConfigurator to reload configuration.

Share:
229,144

Related videos on Youtube

Vishwas
Author by

Vishwas

Java, Scala, Python, MongoDB Developer Having experience of working on Kubernetes and Azure.

Updated on May 13, 2021

Comments

  • Vishwas
    Vishwas about 3 years

    I'd like to stop various messages that are coming on spark shell.

    I tried to edit the log4j.properties file in order to stop these message.

    Here are the contents of log4j.properties

    # Define the root logger with appender file
    log4j.rootCategory=WARN, console
    log4j.appender.console=org.apache.log4j.ConsoleAppender
    log4j.appender.console.target=System.err
    log4j.appender.console.layout=org.apache.log4j.PatternLayout
    log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
    
    # Settings to quiet third party logs that are too verbose
    log4j.logger.org.eclipse.jetty=WARN
    log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
    log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
    log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
    

    But messages are still getting displayed on the console.

    Here are some example messages

    15/01/05 15:11:45 INFO SparkEnv: Registering BlockManagerMaster
    15/01/05 15:11:45 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150105151145-b1ba
    15/01/05 15:11:45 INFO MemoryStore: MemoryStore started with capacity 0.0 B.
    15/01/05 15:11:45 INFO ConnectionManager: Bound socket to port 44728 with id = ConnectionManagerId(192.168.100.85,44728)
    15/01/05 15:11:45 INFO BlockManagerMaster: Trying to register BlockManager
    15/01/05 15:11:45 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager 192.168.100.85:44728 with 0.0 B RAM
    15/01/05 15:11:45 INFO BlockManagerMaster: Registered BlockManager
    15/01/05 15:11:45 INFO HttpServer: Starting HTTP Server
    15/01/05 15:11:45 INFO HttpBroadcast: Broadcast server star
    

    How do I stop these?

  • Vishwas
    Vishwas over 9 years
    In which file do I set above properties??
  • Sachin Janani
    Sachin Janani over 9 years
    You can add these lines in your Driver program @Vishwas
  • Vishwas
    Vishwas over 9 years
    I have added same but still logs appears on console
  • Sachin Janani
    Sachin Janani over 9 years
    Have you change the property log4j.rootCategory=OFF. I have tested these at my end and its working fine
  • Vishwas
    Vishwas over 9 years
    I have made changes as you mentioned. can you please tell me which version of spark you are using??
  • Sachin Janani
    Sachin Janani over 9 years
    I am using spark 1.0 but i think spark 0.9 also has the same settings as they are related to logs
  • snowindy
    snowindy almost 9 years
    I think that OFF is too restrictive. WARN or ERROR may fit better here.
  • AkhlD
    AkhlD over 8 years
    Add that in your projects Main class.
  • horatio1701d
    horatio1701d about 8 years
    Tried this but still getting the logging outputs.
  • Alex Raj Kaliamoorthy
    Alex Raj Kaliamoorthy almost 8 years
    How would you set this property in a program?
  • Tagar
    Tagar almost 8 years
    Great answer. Any way to do the same from PySpark programmatically?
  • Jim Ho
    Jim Ho almost 8 years
    I like this solution as having no permission to access conf/
  • Sam-T
    Sam-T over 7 years
    I did both- removed the .template suffix from log4j.properties and set the level to ERROR, and val rootLogger = Logger.getRootLogger() rootLogger.setLevel(Level.ERROR) It worked
  • SharpLu
    SharpLu almost 7 years
    This is only avaibale for spark.sql.SparkSession or also avaibale for JavaSparkContext ??
  • deepelement
    deepelement almost 7 years
    This is one of the few examples that doesn't clobber all org logs that are traditionally from the default logger.
  • alan
    alan almost 7 years
    Yes, it's available for JavaSparkContext. Thanks, @cantdutchthis. This has bothered me for a while.
  • abhihello123
    abhihello123 about 6 years
    This is the only answer which worked for me without created a separate log4j. thanks!
  • Ami
    Ami about 6 years
    This works, but it doesn't stop the 58 lines of INFO messages that appear during the creation of the Spark context.
  • Ami
    Ami about 6 years
    what is spark2-submit ?
  • Ami
    Ami about 6 years
    This works very well, but what is the Log4j.properties file for? It doesn't seem to be used. Are you simply documenting the properties set in the XML file?
  • Rahul Sharma
    Rahul Sharma about 6 years
    You can use either of them.
  • WestCoastProjects
    WestCoastProjects about 6 years
    The programmatic part of this does not work. Instead see this answer from @cantdutchthis stackoverflow.com/a/37836847/1056563
  • Admin
    Admin about 6 years
    It works for me, however I'm still getting a couple of messages at the beginning of my test. Any idea?
  • Toby Eggitt
    Toby Eggitt about 6 years
    In spark 2.3.1, this reduces my messages by half, but I still get lots of INFO
  • Toby Eggitt
    Toby Eggitt about 6 years
    This makes zero difference for me on Spark 2.3.1
  • Nephilim
    Nephilim almost 6 years
    spark2-submit is used for Spark2.
  • Gaurav Adurkar
    Gaurav Adurkar almost 6 years
    Please check this answer, stackoverflow.com/a/51554118/2094086 hope you're looking for the same.
  • Ben Watson
    Ben Watson almost 6 years
    I have had success with the above - I use --files in the spark-submit command to make log4j.properties available on all nodes.
  • dlamblin
    dlamblin over 5 years
    Nice, For PySpark it's basically the same syntax actually.
  • Yeikel
    Yeikel over 5 years
    This is the only solution that worked for me and it does not involve any code change. Create a file Log4.properties under main/resources in case that it does not exist
  • Yeikel
    Yeikel over 5 years
    FYI , this solution works very well with Spark 2.x but it does not seem to work with Spark 1.6.x standalone
  • Yeikel
    Yeikel over 5 years
    This does not seem to work with Spark 1.6.x in standalone
  • Yeikel
    Yeikel over 5 years
    Note that this is for Spark 2.x
  • swdev
    swdev over 5 years
    To set this from the command-line would have be awesome. But this didn't work for me.
  • Aviad Klein
    Aviad Klein about 5 years
    Who is this @AkhlD?
  • MrCartoonology
    MrCartoonology almost 5 years
    I'd love to find a programatic way that works without messing with the log4j file -- but when I try that, I still get warnings like WARN org.apache.spark.scheduler.TaskSetManager: Lost task 612.1 in stage 0.0 (TID 2570 ..., executor 15): TaskKilled (another attem
  • belgacea
    belgacea over 4 years
    This should be the accepted answer. It offers much details and sums up a lot more use cases than the others. (Without encouraging to disable the logs.)
  • oneday
    oneday over 4 years
    @Atais - You should add below So if you are like me and find that the answers above didn't help, then maybe you too have to remove the '.template' suffix from your log4j conf file and then the above works perfectly!
  • Arunraj Nair
    Arunraj Nair over 4 years
    Additional note on the programmatic approach- The level has to be set before the SparkContext is created
  • Adiga
    Adiga about 4 years
    This solution like multiple other solutions online, seems to work only when the deploy mode is 'client'. When deployed in 'cluster' mode, it always uses the log4j.properties files from 'SPARK_HOME/conf/', no matter what combinations of the runtime options are tried (Of course, I am talking about the logs displayed in the console or terminal from where application is submitted). However, all these settings seem to get applied for the logs we see in yarn logs, irrespective of deploy-mode.
  • Atais
    Atais about 4 years
    @ArunrajNair should not be the case, because logging is a separate feature, not connected to SparkContext.
  • Nikhil Redij
    Nikhil Redij about 4 years
    @AkhlD: I have added these 3 in my code. Still I can see INFO and WARN messages in log log4j.rootCategory=ERROR, console And ``` val rootLogger = Logger.getRootLogger() rootLogger.setLevel(Level.OFF) Logger.getLogger("org.apache.spark").setLevel(Level.OFF) Logger.getLogger("org.spark-project").setLevel(Level.OFF) Logger.getLogger("org").setLevel(Level.OFF) Logger.getLogger("akka").setLevel(Level.OFF) ``` And ssc.sparkContext.setLogLevel("ERROR")
  • Thomas Decaux
    Thomas Decaux almost 3 years
    this will change only the log level of the current java process, after Spark has been initialized
  • Vishal Kamlapure
    Vishal Kamlapure about 2 years
    This works well. Just remove the .template from the log4j fle and set log4j.rootCategory=Error, console