How to run a spark example program in Intellij IDEA

26,028

Solution 1

Spark lib isn't your class_path.

Execute sbt/sbt assembly,

and after include "/assembly/target/scala-$SCALA_VERSION/spark-assembly*hadoop*-deps.jar" to your project.

Solution 2

This may help IntelliJ-Runtime-error-tt11383. Change module dependencies from provide to compile. This works for me.

Solution 3

You need to add the spark dependency. If you are using maven just add these lines to your pom.xml:

<dependencies>
    ...
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_${scala.binary.version}</artifactId>
        <version>${spark.version}</version>
        <scope>provided</scope>
    </dependency>
    ...
</dependencies>

This way you'll have the dependency for compiling and testing purposes but not in the "jar-with-dependencies" artifact.

But if you want to execute the whole application in an standalone cluster running in your intellij you can add a maven profile to add the dependency with compile scope. Just like this:

<properties>
    <scala.binary.version>2.11</scala.binary.version>
    <spark.version>1.2.1</spark.version>
    <spark.scope>provided</spark.scope>
</properties>

<profiles>
    <profile>
        <id>local</id>
        <properties>
            <spark.scope>compile</spark.scope>
        </properties>
        <dependencies>
            <!--<dependency>-->
                <!--<groupId>org.apache.hadoop</groupId>-->
                <!--<artifactId>hadoop-common</artifactId>-->
                <!--<version>2.6.0</version>-->
            <!--</dependency>-->
            <!--<dependency>-->
                <!--<groupId>com.hadoop.gplcompression</groupId>-->
                <!--<artifactId>hadoop-gpl-compression</artifactId>-->
                <!--<version>0.1.0</version>-->
            <!--</dependency>-->
            <dependency>
                <groupId>com.hadoop.gplcompression</groupId>
                <artifactId>hadoop-lzo</artifactId>
                <version>0.4.19</version>
            </dependency>
        </dependencies>
        <activation>
            <activeByDefault>false</activeByDefault>
            <property>
                <name>env</name>
                <value>local</value>
            </property>
        </activation>
    </profile>
</profiles>

<dependencies>
    <!-- SPARK DEPENDENCIES -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_${scala.binary.version}</artifactId>
        <version>${spark.version}</version>
        <scope>${spark.scope}</scope>
    </dependency>
</dependencies>

I also added an option to my application to start a local cluster if --local is passed:

  private def sparkContext(appName: String, isLocal:Boolean): SparkContext = {
      val sparkConf = new SparkConf().setAppName(appName)
      if (isLocal) {
          sparkConf.setMaster("local")
      }
      new SparkContext(sparkConf)
  }

Finally you have to enable "local" profile in Intellij in order to get proper dependencies. Just go to "Maven Projects" tab and enable the profile.

Share:
26,028
WestCoastProjects
Author by

WestCoastProjects

R/python/javascript recently and before that Scala/Spark. Machine learning and data pipelines apps.

Updated on July 09, 2022

Comments

  • WestCoastProjects
    WestCoastProjects almost 2 years

    First on the command line from the root of the downloaded spark project I ran

    mvn package
    

    It was successful.

    Then an intellij project was created by importing the spark pom.xml.

    In the IDE the example class appears fine: all of the libraries are found. This can be viewed in the screenshot.

    However , when attempting to run the main() a ClassNotFoundException on SparkContext occurs.

    Why can Intellij not simply load and run this maven based scala program? And what can be done as a workaround?

    As one can see below, the SparkContext is looking fine in the IDE: but then is not found when attempting to run: enter image description here

    The test was run by right clicking inside main():

    enter image description here

    .. and selecting Run GroupByTest

    It gives

    Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkContext
        at org.apache.spark.examples.GroupByTest$.main(GroupByTest.scala:36)
        at org.apache.spark.examples.GroupByTest.main(GroupByTest.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
    Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkContext
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 7 more
    

    Here is the run configuration:

    enter image description here

  • WestCoastProjects
    WestCoastProjects over 10 years
    If spark-lib were not in the classpath then why is it found in the IDE? No errors in IDE. Notice the classpath for the JVM is using the module classpath.
  • WestCoastProjects
    WestCoastProjects over 10 years
    OK so that did it. So apparently running mvn package is not sufficient
  • WestCoastProjects
    WestCoastProjects about 9 years
    I am running out of the spark codebase, so it is not about adding a spark dependency.
  • WestCoastProjects
    WestCoastProjects over 8 years
    This is an old thread - and the problem may not be the same anymore. But it is still a useful tip to include in the troubleshooting toolbox.