Scala: how to merge a collection of Maps

32,578

Solution 1

How about this one:

def mergeMap[A, B](ms: List[Map[A, B]])(f: (B, B) => B): Map[A, B] =
  (Map[A, B]() /: (for (m <- ms; kv <- m) yield kv)) { (a, kv) =>
    a + (if (a.contains(kv._1)) kv._1 -> f(a(kv._1), kv._2) else kv)
  }

val ms = List(Map("hello" -> 1.1, "world" -> 2.2), Map("goodbye" -> 3.3, "hello" -> 4.4))
val mm = mergeMap(ms)((v1, v2) => v1 + v2)

println(mm) // prints Map(hello -> 5.5, world -> 2.2, goodbye -> 3.3)

And it works in both 2.7.5 and 2.8.0.

Solution 2

Well, you could do:

mapList reduce (_ ++ _)

except for the special requirement for collision.

Since you do have that special requirement, perhaps the best would be doing something like this (2.8):

def combine(m1: Map, m2: Map): Map = {
  val k1 = Set(m1.keysIterator.toList: _*)
  val k2 = Set(m2.keysIterator.toList: _*)
  val intersection = k1 & k2

  val r1 = for(key <- intersection) yield (key -> (m1(key) + m2(key)))
  val r2 = m1.filterKeys(!intersection.contains(_)) ++ m2.filterKeys(!intersection.contains(_)) 
  r2 ++ r1
}

You can then add this method to the map class through the Pimp My Library pattern, and use it in the original example instead of "++":

class CombiningMap(m1: Map[Symbol, Double]) {
  def combine(m2: Map[Symbol, Double]) = {
    val k1 = Set(m1.keysIterator.toList: _*)
    val k2 = Set(m2.keysIterator.toList: _*)
    val intersection = k1 & k2
    val r1 = for(key <- intersection) yield (key -> (m1(key) + m2(key)))
    val r2 = m1.filterKeys(!intersection.contains(_)) ++ m2.filterKeys(!intersection.contains(_))
    r2 ++ r1
  }
}

// Then use this:
implicit def toCombining(m: Map[Symbol, Double]) = new CombiningMap(m)

// And finish with:
mapList reduce (_ combine _)

While this was written in 2.8, so keysIterator becomes keys for 2.7, filterKeys might need to be written in terms of filter and map, & becomes **, and so on, it shouldn't be too different.

Solution 3

I'm surprised no one's come up with this solution yet:

myListOfMaps.flatten.toMap

Does exactly what you need:

  1. Merges the list to a single Map
  2. Weeds out any duplicate keys

Example:

scala> List(Map('a -> 1), Map('b -> 2), Map('c -> 3), Map('a -> 4, 'b -> 5)).flatten.toMap
res7: scala.collection.immutable.Map[Symbol,Int] = Map('a -> 4, 'b -> 5, 'c -> 3)

flatten turns the list of maps into a flat list of tuples, toMap turns the list of tuples into a map with all the duplicate keys removed

Solution 4

Starting Scala 2.13, another solution which handles duplicate keys and is only based on the standard library consists in merging the Maps as sequences (flatten) before applying the new groupMapReduce operator which (as its name suggests) is an equivalent of a groupBy followed by a mapping and a reduce step of grouped values:

List(Map("hello" -> 1.1, "world" -> 2.2), Map("goodbye" -> 3.3, "hello" -> 4.4))
  .flatten
  .groupMapReduce(_._1)(_._2)(_ + _)
// Map("world" -> 2.2, "goodbye" -> 3.3, "hello" -> 5.5)

This:

  • flattens (concatenates) the maps as a sequence of tuples (List(("hello", 1.1), ("world", 2.2), ("goodbye", 3.3), ("hello", 4.4))), which keeps all key/values (even duplicate keys)

  • groups elements based on their first tuple part (_._1) (group part of groupMapReduce)

  • maps grouped values to their second tuple part (_._2) (map part of groupMapReduce)

  • reduces mapped grouped values (_+_) by taking their sum (but it can be any reduce: (T, T) => T function) (reduce part of groupMapReduce)


The groupMapReduce step can be seen as a one-pass version equivalent of:

list.groupBy(_._1).mapValues(_.map(_._2).reduce(_ + _))

Solution 5

Interesting, noodling around with this a bit, I got the following (on 2.7.5):

General Maps:

   def mergeMaps[A,B](collisionFunc: (B,B) => B)(listOfMaps: Seq[scala.collection.Map[A,B]]): Map[A, B] = {
    listOfMaps.foldLeft(Map[A, B]()) { (m, s) =>
      Map(
        s.projection.map { pair =>
        if (m contains pair._1)
          (pair._1, collisionFunc(m(pair._1), pair._2))
        else
          pair
      }.force.toList:_*)
    }
  }

But man, that is hideous with the projection and forcing and toList and whatnot. Separate question: what's a better way to deal with that within the fold?

For mutable Maps, which is what I was dealing with in my code, and with a less general solution, I got this:

def mergeMaps[A,B](collisionFunc: (B,B) => B)(listOfMaps: List[mutable.Map[A,B]]): mutable.Map[A, B] = {
    listOfMaps.foldLeft(mutable.Map[A,B]()) {
      (m, s) =>
      for (k <- s.keys) {
        if (m contains k)
          m(k) = collisionFunc(m(k), s(k))
        else
          m(k) = s(k)
      }
      m
    }
  }

That seems a little bit cleaner, but will only work with mutable Maps as it's written. Interestingly, I first tried the above (before I asked the question) using /: instead of foldLeft, but I was getting type errors. I thought /: and foldLeft were basically equivalent, but the compiler kept complaining that I needed explicit types for (m, s). What's up with that?

Share:
32,578
Jeff
Author by

Jeff

Updated on February 11, 2020

Comments

  • Jeff
    Jeff over 4 years

    I have a List of Map[String, Double], and I'd like to merge their contents into a single Map[String, Double]. How should I do this in an idiomatic way? I imagine that I should be able to do this with a fold. Something like:

    val newMap = Map[String, Double]() /: listOfMaps { (accumulator, m) => ... }
    

    Furthermore, I'd like to handle key collisions in a generic way. That is, if I add a key to the map that already exists, I should be able to specify a function that returns a Double (in this case) and takes the existing value for that key, plus the value I'm trying to add. If the key does not yet exist in the map, then just add it and its value unaltered.

    In my specific case I'd like to build a single Map[String, Double] such that if the map already contains a key, then the Double will be added to the existing map value.

    I'm working with mutable maps in my specific code, but I'm interested in more generic solutions, if possible.