How to spark-submit with main class in jar?

13,169

Solution 1

Afraid none of these were the issue. I had previously tried deleting everything in the project and starting over, but that didn't work either. Once it occurred to me start an entirely different project, it worked just fine. Apparently Intellij (of which I am a fan) decided to create a hidden problem somewhere.

Solution 2

Why don't you use the path to the jar file so spark-submit (as any other command line tool) could find and use it?

Given the path out/artifacts/TimeSeriesFilter_jar/scala-ts.jar I'd use the following:

spark-submit --class com.stronghold.HelloWorld out/artifacts/TimeSeriesFilter_jar/scala-ts.jar

Please note that you should be in the project's main directory which seems to be /home/[USER]/projects/scala_ts.

Please also note that I removed --master local[*] since that's the default master URL spark-submit uses.

Share:
13,169
Marvin Ward Jr
Author by

Marvin Ward Jr

I am a researcher with the JPMorgan Chase Institute. We use de-identifed data that is administratively collected by the bank to gain insight into the economic decisions of households, firms, and market actors. All of our work is available for free on our website (including some summary data!). Personally, I lead the local commerce work which relies on billions of credit and debit transactions to explore consumption activity within 14 metro areas in the US. Before joining JPMC in 2016, I had the great fortune to work with the folks in the DC Office of Revenue Analysis and the Tax Analysis Division of the Congressional Budget Office.

Updated on June 30, 2022

Comments

  • Marvin Ward Jr
    Marvin Ward Jr almost 2 years

    There are a ton of questions about ClassNotFoundException but I haven't seen any (yet) that fit this specific case. I am attempting to run the following command:

    spark-submit --master local[*] --class com.stronghold.HelloWorld scala-ts.jar

    It throws the following exception:

    \u@\h:\w$ spark_submit --class com.stronghold.HelloWorld scala-ts.jar                                                                                                                                                                                                                                                                               ⬡ 9.8.0 [±master ●●●] 
    2018-05-06 19:52:33 WARN  Utils:66 - Your hostname, asusTax resolves to a loopback address: 127.0.1.1; using 192.168.1.184 instead (on interface p1p1)                               
    2018-05-06 19:52:33 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address                                                                                       
    2018-05-06 19:52:33 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable                                
    java.lang.ClassNotFoundException: com.stronghold.HelloWorld                               
            at java.net.URLClassLoader.findClass(URLClassLoader.java:381)                     
            at java.lang.ClassLoader.loadClass(ClassLoader.java:424)                          
            at java.lang.ClassLoader.loadClass(ClassLoader.java:357)                          
            at java.lang.Class.forName0(Native Method)                                        
            at java.lang.Class.forName(Class.java:348)                                        
            at org.apache.spark.util.Utils$.classForName(Utils.scala:235)                     
            at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:836)                                                                  
            at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)        
            at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)             
            at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)               
            at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)                    
    2018-05-06 19:52:34 INFO  ShutdownHookManager:54 - Shutdown hook called                   
    2018-05-06 19:52:34 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-e8a77988-d30c-4e96-81fe-bcaf5d565c75
    

    However, the jar clearly contains this class:

    1     " zip.vim version v28                                                                                                                                                                                                                                                                                                                                               
        1 " Browsing zipfile /home/[USER]/projects/scala_ts/out/artifacts/TimeSeriesFilter_jar/scala-ts.jar
        2 " Select a file with cursor and press ENTER
        3  
        4 META-INF/MANIFEST.MF
        5 com/
        6 com/stronghold/
        7 com/stronghold/HelloWorld$.class
        8 com/stronghold/TimeSeriesFilter$.class
        9 com/stronghold/DataSource.class
       10 com/stronghold/TimeSeriesFilter.class
       11 com/stronghold/HelloWorld.class
       12 com/stronghold/scratch.sc
       13 com/stronghold/HelloWorld$delayedInit$body.class
    

    Typically, the hang up here is on file structure, but I am pretty sure that's correct here:

    ../
    scala_ts/
    | .git/
    | .idea/
    | out/
    | | artifacts/
    | | | TimeSeriesFilter_jar/
    | | | | scala-ts.jar
    | src/
    | | main/
    | | | scala/
    | | | | com/
    | | | | | stronghold/
    | | | | | | DataSource.scala
    | | | | | | HelloWorld.scala
    | | | | | | TimeSeriesFilter.scala
    | | | | | | scratch.sc
    | | test/
    | | | scala/
    | | | | com/
    | | | | | stronghold/
    | | | | | | AppTest.scala
    | | | | | | MySpec.scala                                                                                                                                                                                                                                                                                                                                                  
    | target/
    | README.md
    | pom.xml
    

    I have run other jobs with the same structure at work (so, a different environment). I am now trying to gain some more facility with a home project, but this seems to be an early hang up.

    In a nutshell, am I just missing something glaringly obvious?

    APPENDIX

    For those that are interested, here is my pom:

    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
      <modelVersion>4.0.0</modelVersion>
      <groupId>com.stronghold</groupId>
      <artifactId>scala-ts</artifactId>
      <version>1.0-SNAPSHOT</version>
      <inceptionYear>2008</inceptionYear>
      <properties>
        <scala.version>2.11.8</scala.version>
      </properties>
    
      <repositories>
        <repository>
          <id>scala-tools.org</id>
          <name>Scala-Tools Maven2 Repository</name>
          <url>http://scala-tools.org/repo-releases</url>
        </repository>
      </repositories>
    
      <pluginRepositories>
        <pluginRepository>
          <id>scala-tools.org</id>
          <name>Scala-Tools Maven2 Repository</name>
          <url>http://scala-tools.org/repo-releases</url>
        </pluginRepository>
      </pluginRepositories>
    
      <dependencies>
        <dependency>
          <groupId>org.scala-lang</groupId>
          <artifactId>scala-library</artifactId>
          <version>2.11.8</version>
        </dependency>
        <dependency>
          <groupId>junit</groupId>
          <artifactId>junit</artifactId>
          <version>4.9</version>
          <scope>test</scope>
        </dependency>
        <dependency>
          <groupId>org.scala-tools.testing</groupId>
          <artifactId>specs_2.10</artifactId>
          <version>1.6.9</version>
          <scope>test</scope>
        </dependency>
        <dependency>
          <groupId>org.apache.spark</groupId>
          <artifactId>spark-core_2.11</artifactId>
          <version>2.2.0</version>
        </dependency>
        <dependency>
          <groupId>org.apache.spark</groupId>
          <artifactId>spark-sql_2.11</artifactId>
          <version>2.2.0</version>
        </dependency>
        <dependency>
          <groupId>org.apache.spark</groupId>
          <artifactId>spark-catalyst_2.11</artifactId>
          <version>2.2.0</version>
        </dependency>
        <dependency>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-common</artifactId>
          <version>2.7.3</version>
        </dependency>
      </dependencies>
    
      <build>
        <sourceDirectory>src/main/scala</sourceDirectory>
        <testSourceDirectory>src/test/scala</testSourceDirectory>
        <plugins>
          <plugin>
            <groupId>org.scala-tools</groupId>
            <artifactId>maven-scala-plugin</artifactId>
            <executions>
              <execution>
                <goals>
                  <goal>compile</goal>
                  <goal>testCompile</goal>
                </goals>
              </execution>
            </executions>
            <configuration>
              <scalaVersion>${scala.version}</scalaVersion>
              <args>
                <arg>-target:jvm-1.5</arg>
              </args>
            </configuration>
          </plugin>
          <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-eclipse-plugin</artifactId>
            <configuration>
              <downloadSources>true</downloadSources>
              <buildcommands>
                <buildcommand>ch.epfl.lamp.sdt.core.scalabuilder</buildcommand>
              </buildcommands>
              <additionalProjectnatures>
                <projectnature>ch.epfl.lamp.sdt.core.scalanature</projectnature>
              </additionalProjectnatures>
              <classpathContainers>
                <classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer>
                <classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER</classpathContainer>
              </classpathContainers>
            </configuration>
          </plugin>
        </plugins>
      </build>
      <reporting>
        <plugins>
          <plugin>
            <groupId>org.scala-tools</groupId>
            <artifactId>maven-scala-plugin</artifactId>
            <configuration>
              <scalaVersion>${scala.version}</scalaVersion>
            </configuration>
          </plugin>
        </plugins>
      </reporting>
    </project>
    

    UPDATE

    Apologies for the lack of clarity. I ran the command from within the same directory as the .jar (/home/[USER]/projects/scala_ts/out/artifacts/TimeSeriesFilter_jar/). That said, just to be clear, specifying the full path does not change the outcome.

    It should also be noted that I can run HelloWorld from within Intellij, and it uses the same class reference (com.stronghold.HelloWorld).

  • Marvin Ward Jr
    Marvin Ward Jr almost 6 years
    Would you mind elaborating on why this is useful? I didn't have to use uber-jars in other contexts.
  • Marvin Ward Jr
    Marvin Ward Jr almost 6 years
    I cleaned and packaged the jar again, but I am afraid it didn't make a difference. As for referencing an old jar, I only created one for this project. Just to be safe, I deleted the jar and started from scratch by building a new one. Unfortunately, no dice.
  • Ramesh Maharjan
    Ramesh Maharjan almost 6 years
    did you check the jar file name in target folder of the project?
  • Marvin Ward Jr
    Marvin Ward Jr almost 6 years
    Sorry, I was swamped with work this week. I am just getting back to this. The answer is yes, the jar name is correct.
  • Ramesh Maharjan
    Ramesh Maharjan almost 6 years
    what do you mean by correct? can you share the jar file name with the full path?
  • Marvin Ward Jr
    Marvin Ward Jr almost 6 years
    I mean there is no other jar. The only one I have built is here: ../scala_ts/out/artifacts/TimeSeriesFilter_jar/scala-ts.jar. Also, there are no jars in the target folder, they sit in the out folder when built.
  • Ramesh Maharjan
    Ramesh Maharjan almost 6 years
    I am talking about ../scala_ts/target/ . whats the jar name there? use that jar. thats what i meant in my answer
  • Marvin Ward Jr
    Marvin Ward Jr almost 6 years
    In general, in a variety of scala applications that have run (in my work environment), there is no information about jar name in the target folder.
  • Ramesh Maharjan
    Ramesh Maharjan almost 6 years
    I am looking at you pom file and your pom file suggests that when you package to make jar file it goes to target folder with name scala-ts-1.0-SNAPSHOT.jar with all the updates. Thats why I suggest you to use that one and try
  • Marvin Ward Jr
    Marvin Ward Jr almost 6 years
    I hear you, but again, that file does not exist. There is no jar called scala-ts-1.0-SNAPSHOT.jar. There is only a jar called scala-ts.jar. You cannot access a jar that does not exist: Error: Unable to access jarfile scala-ts-1.0-SNAPSHOT.jar. The issue isn't accessing the right jar, I think it has something to do with how the jar is built.
  • Ramesh Maharjan
    Ramesh Maharjan almost 6 years
    yeah I guess so too. are you building your jar using artifact or maven?
  • Marvin Ward Jr
    Marvin Ward Jr almost 6 years
    I am indeed. I added an artifact to the project, and built the jar using that artifact configuration.
  • Ramesh Maharjan
    Ramesh Maharjan almost 6 years