spark + sbt-assembly: "deduplicate: different file contents found in the following"
16,754
Solution 1
You will have to define mergeStratey in assembly, as what I did for my spark app below.
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
{
case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
case PathList("org", "apache", xs @ _*) => MergeStrategy.last
case PathList("com", "google", xs @ _*) => MergeStrategy.last
case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
case PathList("com", "codahale", xs @ _*) => MergeStrategy.last
case PathList("com", "yammer", xs @ _*) => MergeStrategy.last
case "about.html" => MergeStrategy.rename
case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last
case "META-INF/mailcap" => MergeStrategy.last
case "META-INF/mimetypes.default" => MergeStrategy.last
case "plugin.properties" => MergeStrategy.last
case "log4j.properties" => MergeStrategy.last
case x => old(x)
}
}
Solution 2
As an addition to Wesley Milano's answer, the code needs to be adapted a bit for the newer version (i.e. 0.13.0) of the sbt-assembly plugin, in case someone is wondering about deprecation warnings:
assemblyMergeStrategy in assembly := {
case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
case PathList("org", "apache", xs @ _*) => MergeStrategy.last
case PathList("com", "google", xs @ _*) => MergeStrategy.last
case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
case PathList("com", "codahale", xs @ _*) => MergeStrategy.last
case PathList("com", "yammer", xs @ _*) => MergeStrategy.last
case "about.html" => MergeStrategy.rename
case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last
case "META-INF/mailcap" => MergeStrategy.last
case "META-INF/mimetypes.default" => MergeStrategy.last
case "plugin.properties" => MergeStrategy.last
case "log4j.properties" => MergeStrategy.last
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
Author by
Grant
Updated on June 14, 2022Comments
-
Grant almost 2 years
I ran spark application and wanna pack the test classes into the fat jar. What is weird is I ran "sbt assembly" successfully, but failed when I ran "sbt test:assembly".
I tried sbt-assembly : including test classes, it didn't work for my case.
SBT version : 0.13.8
build.sbt:
import sbtassembly.AssemblyPlugin._ name := "assembly-test" version := "1.0" scalaVersion := "2.10.5" libraryDependencies ++= Seq( ("org.apache.spark" % "spark-core_2.10" % "1.3.1" % Provided) .exclude("org.mortbay.jetty", "servlet-api"). exclude("commons-beanutils", "commons-beanutils-core"). exclude("commons-collections", "commons-collections"). exclude("commons-logging", "commons-logging"). exclude("com.esotericsoftware.minlog", "minlog").exclude("com.codahale.metrics", "metrics-core"), "org.json4s" % "json4s-jackson_2.10" % "3.2.10" % Provided, "com.google.inject" % "guice" % "4.0" ) Project.inConfig(Test)(assemblySettings)
-
Grant almost 9 yearsPut all these stuff in the sbt file and added more "exclude(...)" clauses, jar can be generated and test classes also are in the jar, however I found "provided" doesn't work
-
Wesley Miao almost 9 years"provided" is only needed if you submit your spark app through spark-submit. If you run your spark app directly, don't use "provided".
-
Felipe about 8 yearsI'v been using Scala for more than a year and I have no idea what this bit of code code, but the important thing is that it works. Thanks
-
Admin almost 8 years@FelipeAlmeida You seemed to be experienced at spark so I was wondering if you can help me out a bit... I am trying to create a jar file form my SBT project to run it. Do you know how I can do that?
-
Felipe almost 8 years@1290 Sure. I've actually written a piece on this: queirozf.com/entries/…
-
ecoe over 7 yearsspark 2.x.x requires a slight variation of this solution: queirozf.com/entries/…