How to build and run Scala Spark locally
Building Spark locally, the short answer:
git clone [email protected]:apache/spark.git
cd spark
sbt/sbt compile
Going in detail into your question, what you're actually asking is 'How to debug a Spark application in Eclipse'. To have debugging in Eclipse, you don't really need to build Spark in Eclipse. All you need is to create a job with its Spark lib dependency and ask Maven 'download sources'. That way you can use the Eclipse debugger to step into the code.
Then, when creating the Spark Context, use sparkConfig.local[1]
as master like:
val conf = new SparkConf()
.setMaster("local[1]")
.setAppName("SparkDebugExample")
so that all Spark interactions are executed in local mode in one thread and therefore visible to your debugger.
If you are investigating a performance issue, remember that Spark is a distributed system, where network plays an important role. Debugging the system locally will only give you part of the answer. Monitoring the job in the actual cluster will be required in order to have a complete picture of the performance characteristics of your job.
Comments
-
blue-sky over 1 year
I'm attempting to build Apache Spark locally. Reason for this is to debug Spark methods like reduce. In particular I'm interested in how Spark implements and distributes Map Reduce under the covers as I'm experiencing performance issues and I think running these tasks from source is best method of finding out what the issue is.
So I have cloned the latest from Spark repo :
git clone https://github.com/apache/spark.git
Spark appears to be a Maven project so when I create it in Eclipse here is the structure :
Some of the top level folders also have pom files :
So should I just be building one of these sub projects ? Are these correct steps for running Spark against a local code base ?
-
maasg almost 10 yearsTo see Spark internals, you only need
core
. This should get you there: syndeticlogic.net/?p=311 BTW, SBT is better to get Spark up and running. I also recommend you to use Intellij instead of Eclipse.
-
-
RagHaven about 9 yearsCan you elaborate on what you mean by "All you need is to create a job with its Spark lib dependency and ask Maven 'download sources'." Currently I have a simple spark application which is similar to the one on the Apache Spark website. I'd like to run this from within Eclipse, and step through the code, so that I can step into the actual core implementation of spark to get an idea of how certain things work within Spark.