How to run a Spark-java program from command line
Pick up the wordcount example from say: https://github.com/holdenk/fastdataprocessingwithsparkexamples/tree/master/src/main/scala/pandaspark/examples. Follow these steps to create the fat jar file:
mkdir example-java-build/; cd example-java-build
mvn archetype:generate \
-DarchetypeGroupId=org.apache.maven.archetypes \
-DgroupId=spark.examples \
-DartifactId=JavaWordCount \
-Dfilter=org.apache.maven.archetypes:maven-archetype-quickstart
cp ../examples/src/main/java/spark/examples/JavaWordCount.java
JavaWordCount/src/main/java/spark/examples/JavaWordCount.java
You add the relevant spark-core and spark examples dependencies. Make sure you have the dependencies based on your version of spark. I use spark 1.1.0 and so I have the relevant dependencies. My pom.xml looks like this:
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-examples_2.10</artifactId>
<version>1.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.1.0</version>
</dependency>
</dependencies>
Build your jar file using mvn.
cd example-java-build/JavaWordCount
mvn package
This creates your fat jar file inside the target directory.
Copy the jar file to any location on the server.
Go to the your bin folder of your spark. ( in my case: /root/spark-1.1.0-bin-hadoop2.4/bin
)
Submit spark job: My job looks like this:
./spark-submit --class "spark.examples.JavaWordCount" --master yarn://myserver1:8032 /root/JavaWordCount-1.0-SNAPSHOT.jar hdfs://myserver1:8020/user/root/hackrfoe.txt
Here --class is: The entry point for your application (e.g. org.apache.spark.examples.SparkPi) --master: The master URL for the cluster (e.g. spark://23.195.26.187:7077) The last argument is any text file of your choice for the program.
The output should like this, giving word counts of all words in the text file.
in: 17
sleeping.: 1
sojourns: 1
What: 4
protect: 1
largest: 1
other: 1
public: 1
worst: 1
hackers: 12
detected: 1
from: 4
and,: 1
secretly: 1
breaking: 1
football: 1
answer.: 1
attempting: 2
"hacker: 3
Hope this helps!
Related videos on Youtube
Pooja3101
Updated on June 17, 2022Comments
-
Pooja3101 almost 2 years
I am running the wordcount java program in spark. How do I run it from the command line.
-
WestCoastProjects over 9 years+1 Well documented answer. I haven't tried it yet but even if it has any small bugs it will be helpful. I will report back if any details missing.