spark + sbt-assembly: "deduplicate: different file contents found in the following"

16,754

Solution 1

You will have to define mergeStratey in assembly, as what I did for my spark app below.

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
  {
    case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
    case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
    case PathList("org", "apache", xs @ _*) => MergeStrategy.last
    case PathList("com", "google", xs @ _*) => MergeStrategy.last
    case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
    case PathList("com", "codahale", xs @ _*) => MergeStrategy.last
    case PathList("com", "yammer", xs @ _*) => MergeStrategy.last
    case "about.html" => MergeStrategy.rename
    case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last
    case "META-INF/mailcap" => MergeStrategy.last
    case "META-INF/mimetypes.default" => MergeStrategy.last
    case "plugin.properties" => MergeStrategy.last
    case "log4j.properties" => MergeStrategy.last
    case x => old(x)
  }
}

Solution 2

As an addition to Wesley Milano's answer, the code needs to be adapted a bit for the newer version (i.e. 0.13.0) of the sbt-assembly plugin, in case someone is wondering about deprecation warnings:

assemblyMergeStrategy in assembly := {
    case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
    case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
    case PathList("org", "apache", xs @ _*) => MergeStrategy.last
    case PathList("com", "google", xs @ _*) => MergeStrategy.last
    case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
    case PathList("com", "codahale", xs @ _*) => MergeStrategy.last
    case PathList("com", "yammer", xs @ _*) => MergeStrategy.last
    case "about.html" => MergeStrategy.rename
    case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last
    case "META-INF/mailcap" => MergeStrategy.last
    case "META-INF/mimetypes.default" => MergeStrategy.last
    case "plugin.properties" => MergeStrategy.last
    case "log4j.properties" => MergeStrategy.last
    case x =>
        val oldStrategy = (assemblyMergeStrategy in assembly).value
        oldStrategy(x)
}
Share:
16,754
Grant
Author by

Grant

Updated on June 14, 2022

Comments

  • Grant
    Grant almost 2 years

    I ran spark application and wanna pack the test classes into the fat jar. What is weird is I ran "sbt assembly" successfully, but failed when I ran "sbt test:assembly".

    I tried sbt-assembly : including test classes, it didn't work for my case.

    SBT version : 0.13.8

    build.sbt:

    import sbtassembly.AssemblyPlugin._
    
    name := "assembly-test"
    
    version := "1.0"
    
    scalaVersion := "2.10.5"
    
    libraryDependencies ++= Seq(
      ("org.apache.spark" % "spark-core_2.10" % "1.3.1" % Provided)
        .exclude("org.mortbay.jetty", "servlet-api").
        exclude("commons-beanutils", "commons-beanutils-core").
        exclude("commons-collections", "commons-collections").
        exclude("commons-logging", "commons-logging").
        exclude("com.esotericsoftware.minlog", "minlog").exclude("com.codahale.metrics", "metrics-core"),
      "org.json4s" % "json4s-jackson_2.10" % "3.2.10" % Provided,
      "com.google.inject" % "guice" % "4.0"
    )
    
    Project.inConfig(Test)(assemblySettings)
    
  • Grant
    Grant almost 9 years
    Put all these stuff in the sbt file and added more "exclude(...)" clauses, jar can be generated and test classes also are in the jar, however I found "provided" doesn't work
  • Wesley Miao
    Wesley Miao almost 9 years
    "provided" is only needed if you submit your spark app through spark-submit. If you run your spark app directly, don't use "provided".
  • Felipe
    Felipe about 8 years
    I'v been using Scala for more than a year and I have no idea what this bit of code code, but the important thing is that it works. Thanks
  • Admin
    Admin almost 8 years
    @FelipeAlmeida You seemed to be experienced at spark so I was wondering if you can help me out a bit... I am trying to create a jar file form my SBT project to run it. Do you know how I can do that?
  • Felipe
    Felipe almost 8 years
    @1290 Sure. I've actually written a piece on this: queirozf.com/entries/…
  • ecoe
    ecoe over 7 years
    spark 2.x.x requires a slight variation of this solution: queirozf.com/entries/…