List of spark-submit options
Solution 1
While @suj1th's valuable inputs did solve my problem, I'm answering my own question to directly address my query.
-
You need not look up for
SparkSubmitOptionParser
's attribute-name for a givenSpark
property (configuration setting). Both will do just fine. However, do note that there's a subtle difference between there usage as shown below:spark-submit --executor-cores 2
spark-submit --conf spark.executor.cores=2
Both commands shown above will have same effect. The second method takes configurations in the format
--conf <key>=<value>
. -
Enclosing values in quotes (correct me if this is incorrect / incomplete)
(i) Values need not be enclosed in quotes (single
''
or double""
) of any kind (you still can if you want).(ii) If the value has a
space
character, enclose the entire thing in double quotes""
like"<key>=<value>"
as shown here. For a comprehensive list of all configurations that can be passed with
spark-submit
, just runspark-submit --help
-
In this link provided by @suj1th, they say that:
configuration values explicitly set on a SparkConf take the highest precedence, then flags passed to spark-submit, then values in the defaults file.
If you are ever unclear where configuration options are coming from, you can print out fine-grained debugging information by running spark-submit with the --verbose option.
Following two links from Spark
docs list a lot of configurations:
Solution 2
In your case, you should actually load your configurations from a file, as mentioned in this document, instead of passing them as flags to spark-submit
. This relieves the overhead of mapping SparkSubmitArguments
to Spark configuration parameters. To quote from the above document:
Loading default Spark configurations this way can obviate the need for certain flags to
spark-submit
. For instance, if the spark.master property is set, you can safely omit the--master
flag fromspark-submit
. In general, configuration values explicitly set on aSparkConf
take the highest precedence, then flags passed tospark-submit
, then values in the defaults file.
![y2k-shubham](https://i.stack.imgur.com/tFAHl.png?s=256&g=1)
y2k-shubham
"A Big-Shot is a Little-Shot who keeps on Shooting, So keep Trying" - Dr. APJ Abdul Kalam, [Wings of Fire] My Bio Science Lover Introvert Technology Enthusiast Know me Web: y2k-shubham About.Me: y2k-shubham Connect with me GitHub: y2k-shubham Disqus: y2k_shubham . LinkedIn: y2k-shubham Twitter: y2k_shubham . Instagram: y2k.shubham Facebook: y2k.shubham
Updated on July 09, 2022Comments
-
y2k-shubham almost 2 years
There are a ton of tunable settings mentioned on
Spark
configurations page. However as told here, theSparkSubmitOptionParser
attribute-name for aSpark
property can be different from that property's-name.For instance,
spark.executor.cores
is passed as--executor-cores
inspark-submit
.
Where can I find an exhaustive list of all tuning parameters of
Spark
(along-with theirSparkSubmitOptionParser
property name) that can be passed withspark-submit
command?