List of spark-submit options

23,388

Solution 1

While @suj1th's valuable inputs did solve my problem, I'm answering my own question to directly address my query.


  • You need not look up for SparkSubmitOptionParser's attribute-name for a given Spark property (configuration setting). Both will do just fine. However, do note that there's a subtle difference between there usage as shown below:

    spark-submit --executor-cores 2

    spark-submit --conf spark.executor.cores=2

    Both commands shown above will have same effect. The second method takes configurations in the format --conf <key>=<value>.

  • Enclosing values in quotes (correct me if this is incorrect / incomplete)

    (i) Values need not be enclosed in quotes (single '' or double "") of any kind (you still can if you want).

    (ii) If the value has a space character, enclose the entire thing in double quotes "" like "<key>=<value>" as shown here.

  • For a comprehensive list of all configurations that can be passed with spark-submit, just run spark-submit --help

  • In this link provided by @suj1th, they say that:

    configuration values explicitly set on a SparkConf take the highest precedence, then flags passed to spark-submit, then values in the defaults file.

    If you are ever unclear where configuration options are coming from, you can print out fine-grained debugging information by running spark-submit with the --verbose option.


Following two links from Spark docs list a lot of configurations:

Solution 2

In your case, you should actually load your configurations from a file, as mentioned in this document, instead of passing them as flags to spark-submit. This relieves the overhead of mapping SparkSubmitArguments to Spark configuration parameters. To quote from the above document:

Loading default Spark configurations this way can obviate the need for certain flags to spark-submit. For instance, if the spark.master property is set, you can safely omit the --master flag from spark-submit. In general, configuration values explicitly set on a SparkConf take the highest precedence, then flags passed to spark-submit, then values in the defaults file.

Share:
23,388
y2k-shubham
Author by

y2k-shubham

"A Big-Shot is a Little-Shot who keeps on Shooting, So keep Trying" - Dr. APJ Abdul Kalam, [Wings of Fire] My Bio Science Lover Introvert Technology Enthusiast Know me Web: y2k-shubham About.Me: y2k-shubham Connect with me GitHub: y2k-shubham Disqus: y2k_shubham . LinkedIn: y2k-shubham Twitter: y2k_shubham . Instagram: y2k.shubham Facebook: y2k.shubham

Updated on July 09, 2022

Comments

  • y2k-shubham
    y2k-shubham almost 2 years

    There are a ton of tunable settings mentioned on Spark configurations page. However as told here, the SparkSubmitOptionParser attribute-name for a Spark property can be different from that property's-name.

    For instance, spark.executor.cores is passed as --executor-cores in spark-submit.


    Where can I find an exhaustive list of all tuning parameters of Spark (along-with their SparkSubmitOptionParser property name) that can be passed with spark-submit command?