How to run a spark example program in Intellij IDEA
Solution 1
Spark lib isn't your class_path.
Execute sbt/sbt assembly
,
and after include "/assembly/target/scala-$SCALA_VERSION/spark-assembly*hadoop*-deps.jar" to your project.
Solution 2
This may help IntelliJ-Runtime-error-tt11383. Change module dependencies from provide to compile. This works for me.
Solution 3
You need to add the spark dependency. If you are using maven just add these lines to your pom.xml:
<dependencies>
...
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
...
</dependencies>
This way you'll have the dependency for compiling and testing purposes but not in the "jar-with-dependencies" artifact.
But if you want to execute the whole application in an standalone cluster running in your intellij you can add a maven profile to add the dependency with compile scope. Just like this:
<properties>
<scala.binary.version>2.11</scala.binary.version>
<spark.version>1.2.1</spark.version>
<spark.scope>provided</spark.scope>
</properties>
<profiles>
<profile>
<id>local</id>
<properties>
<spark.scope>compile</spark.scope>
</properties>
<dependencies>
<!--<dependency>-->
<!--<groupId>org.apache.hadoop</groupId>-->
<!--<artifactId>hadoop-common</artifactId>-->
<!--<version>2.6.0</version>-->
<!--</dependency>-->
<!--<dependency>-->
<!--<groupId>com.hadoop.gplcompression</groupId>-->
<!--<artifactId>hadoop-gpl-compression</artifactId>-->
<!--<version>0.1.0</version>-->
<!--</dependency>-->
<dependency>
<groupId>com.hadoop.gplcompression</groupId>
<artifactId>hadoop-lzo</artifactId>
<version>0.4.19</version>
</dependency>
</dependencies>
<activation>
<activeByDefault>false</activeByDefault>
<property>
<name>env</name>
<value>local</value>
</property>
</activation>
</profile>
</profiles>
<dependencies>
<!-- SPARK DEPENDENCIES -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>${spark.scope}</scope>
</dependency>
</dependencies>
I also added an option to my application to start a local cluster if --local is passed:
private def sparkContext(appName: String, isLocal:Boolean): SparkContext = {
val sparkConf = new SparkConf().setAppName(appName)
if (isLocal) {
sparkConf.setMaster("local")
}
new SparkContext(sparkConf)
}
Finally you have to enable "local" profile in Intellij in order to get proper dependencies. Just go to "Maven Projects" tab and enable the profile.
WestCoastProjects
R/python/javascript recently and before that Scala/Spark. Machine learning and data pipelines apps.
Updated on July 09, 2022Comments
-
WestCoastProjects almost 2 years
First on the command line from the root of the downloaded spark project I ran
mvn package
It was successful.
Then an intellij project was created by importing the spark pom.xml.
In the IDE the example class appears fine: all of the libraries are found. This can be viewed in the screenshot.
However , when attempting to run the main() a ClassNotFoundException on SparkContext occurs.
Why can Intellij not simply load and run this maven based scala program? And what can be done as a workaround?
As one can see below, the SparkContext is looking fine in the IDE: but then is not found when attempting to run:
The test was run by right clicking inside main():
.. and selecting Run GroupByTest
It gives
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkContext at org.apache.spark.examples.GroupByTest$.main(GroupByTest.scala:36) at org.apache.spark.examples.GroupByTest.main(GroupByTest.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkContext at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 7 more
Here is the run configuration:
-
WestCoastProjects over 10 yearsIf spark-lib were not in the classpath then why is it found in the IDE? No errors in IDE. Notice the classpath for the JVM is using the module classpath.
-
WestCoastProjects over 10 yearsOK so that did it. So apparently running mvn package is not sufficient
-
WestCoastProjects about 9 yearsI am running out of the spark codebase, so it is not about adding a spark dependency.
-
WestCoastProjects over 8 yearsThis is an old thread - and the problem may not be the same anymore. But it is still a useful tip to include in the troubleshooting toolbox.