Running Spark Application from Eclipse

14,835

Solution 1

It seems you are not using any package/library manager (e.g. sbt, maven) which should eliminate any versioning issues. It might be challenging to set correct version of java, scala, spark and all its subsequent dependencies on your own. I strongly recommend to change your your project into Maven: Convert Existing Eclipse Project to Maven Project

Personally, I have very good experiences with sbt on IntelliJ IDEA (https://confluence.jetbrains.com/display/IntelliJIDEA/Getting+Started+with+SBT) which is easy to set up and maintain.

Solution 2

I've just created a Maven archetype for Spark the other day.
It sets up a new Spark 1.3.0 project in Eclipse/Idea with Scala 2.10.4.

Just follow the instructions here.

You'll just have to change the Scala version after the project is generated:
Right click on the generated project and select:
Scala > Set the Scala Installation > Fixed 2.10.5.(bundled)

The default version that comes with ScalaIDE (currently 2.11.6) is automatically added to the project by ScalaIDE when it detects scala-maven-plugin in the pom.

I'd appreciate a feedback, if someone knows how to set the Scala library container version from Maven, while it bootstraps a new project. Where does the ScalaIDE look up the Scala version, if anywhere?

BTW Just make sure you download sources (Project right-click > Maven > Download sources) before stepping into Spark code in debugger.

If you want to use (IMHO the very best) Eclipse goodies (References, Type hierarchy, Call hierarchy) you'll have to build Spark yourself, so that all the sources are on your build path (as Maven Scala dependencies are not processed by EclipseIDE/JDT, even though they are, of course, on the build path).

Have fun debugging, I can tell you that it helped me tremendously to get deeper into Spark and really understand how it works :)

Solution 3

You could try to add the spark-assembly.jar instead.

As other have noted, the better way is to use Sbt (or Maven) to manage your dependencies. spark-core has many dependencies itself, and adding just that one jar won't be enough.

Share:
14,835
RagHaven
Author by

RagHaven

Generalist Software Engineer

Updated on June 04, 2022

Comments

  • RagHaven
    RagHaven almost 2 years

    I am trying to develop a spark application on Eclipse, and then debug it by stepping through it.

    I downloaded the Spark source code and I have added some of the spark sub projects(such as spark-core) to Eclipse. Now, I am trying to develop a spark application using Eclipse. I have already installed the ScalaIDE on Eclipse. I created a simple application based on the example given in the Spark website.

    import org.apache.spark.SparkContext
    import org.apache.spark.SparkContext._
    import org.apache.spark.SparkConf
    
    object SimpleApp {
      def main(args: Array[String]) {
        val logFile = "YOUR_SPARK_HOME/README.md" // Should be some file on your system
        val conf = new SparkConf().setAppName("Simple Application")
        val sc = new SparkContext(conf)
        val logData = sc.textFile(logFile, 2).cache()
        val numAs = logData.filter(line => line.contains("a")).count()
        val numBs = logData.filter(line => line.contains("b")).count()
        println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
      }
    }
    

    To my project, I added the spark-core project as a dependent project(right click -> build path -> add project). Now, I am trying to build my application and run it. However, my project shows that it has errors, but I don't see any errors listed in the problems view within Eclipse, nor do I see any lines highlighted in red. So, I am not sure what the problem is. My assumption is that I need to add external jars to my project, but I am not sure what these jars would be. The error is caused by val conf = new SparkConf().setAppName("Simple Application") and the subsequent lines. I tried removing those lines, and the error went away. I would appreciate any help and guidance, thanks!

  • tgkprog
    tgkprog over 7 years
    Better to set this in the environment, run config then in code. You do a code change before deploy?