Merge maps by key

26,213

Solution 1

scala.collection.immutable.IntMap has an intersectionWith method that does precisely what you want (I believe):

import scala.collection.immutable.IntMap

val a = IntMap(1 -> "one", 2 -> "two", 3 -> "three", 4 -> "four")
val b = IntMap(1 -> "un", 2 -> "deux", 3 -> "trois")

val merged = a.intersectionWith(b, (_, av, bv: String) => Seq(av, bv))

This gives you IntMap(1 -> List(one, un), 2 -> List(two, deux), 3 -> List(three, trois)). Note that it correctly ignores the key that only occurs in a.

As a side note: I've often found myself wanting the unionWith, intersectionWith, etc. functions from Haskell's Data.Map in Scala. I don't think there's any principled reason that they should only be available on IntMap, instead of in the base collection.Map trait.

Solution 2

val a = Map(1 -> "one", 2 -> "two", 3 -> "three")
val b = Map(1 -> "un", 2 -> "deux", 3 -> "trois")

val c = a.toList ++ b.toList
val d = c.groupBy(_._1).map{case(k, v) => k -> v.map(_._2).toSeq}
//res0: scala.collection.immutable.Map[Int,Seq[java.lang.String]] =
        //Map((2,List(two, deux)), (1,List(one, un), (3,List(three, trois)))

Solution 3

Scalaz adds a method |+| for any type A for which a Semigroup[A] is available.

If you mapped your Maps so that each value was a single-element sequence, then you could use this quite simply:

scala> a.mapValues(Seq(_)) |+| b.mapValues(Seq(_))
res3: scala.collection.immutable.Map[Int,Seq[java.lang.String]] = Map(1 -> List(one, un), 2 -> List(two, deux), 3 -> List(three, trois))

Solution 4

Starting Scala 2.13, you can use groupMap which (as its name suggests) is an equivalent of a groupBy followed by map on values:

// val map1 = Map(1 -> "one", 2 -> "two",  3 -> "three")
// val map2 = Map(1 -> "un",  2 -> "deux", 3 -> "trois")
(map1.toSeq ++ map2).groupMap(_._1)(_._2)
// Map(1 -> List("one", "un"), 2 -> List("two", "deux"), 3 -> List("three", "trois"))

This:

  • Concatenates the two maps as a sequence of tuples (List((1, "one"), (2, "two"), (3, "three"))). For conciseness, map2 is implicitly converted to Seq to align with map1.toSeq's type - but you could choose to make it explicit by using map2.toSeq.

  • groups elements based on their first tuple part (_._1) (group part of groupMap)

  • maps grouped values to their second tuple part (_._2) (map part of groupMap)

Solution 5

val fr = Map(1 -> "one", 2 -> "two", 3 -> "three")
val en = Map(1 -> "un", 2 -> "deux", 3 -> "trois")

def innerJoin[K, A, B](m1: Map[K, A], m2: Map[K, B]): Map[K, (A, B)] = {
  m1.flatMap{ case (k, a) => 
    m2.get(k).map(b => Map((k, (a, b)))).getOrElse(Map.empty[K, (A, B)])
  }
}

innerJoin(fr, en) // Map(1 -> ("one", "un"), 2 -> ("two", "deux"), 3 -> ("three", "trois")): Map[Int, (String, String)]
Share:
26,213

Related videos on Youtube

Submonoid
Author by

Submonoid

Updated on July 09, 2022

Comments

  • Submonoid
    Submonoid almost 2 years

    Say I have two maps:

    val a = Map(1 -> "one", 2 -> "two", 3 -> "three")
    val b = Map(1 -> "un", 2 -> "deux", 3 -> "trois")
    

    I want to merge these maps by key, applying some function to collect the values (in this particular case I want to collect them into a seq, giving:

    val c = Map(1 -> Seq("one", "un"), 2 -> Seq("two", "deux"), 3 -> Seq("three", "trois"))
    

    It feels like there should be a nice, idiomatic way of doing this.

    • user unknown
      user unknown over 12 years
      You should include the information, how to handle elements which happen to exist only in one Map, preferably in the example data for easy testing, to avoid ambiguity.
  • Submonoid
    Submonoid over 12 years
    Actually, my values in the real case are Sequences, but I want to combine them by building into another sequence, rather than by appending one to the other.
  • Cristiano Fontes
    Cristiano Fontes over 12 years
    Would you mind explaining that _._1 for a complete scala newbie ?
  • Infinity
    Infinity over 12 years
    A map is collection of Tuples2. For example: val tuple: Tuple3[Int, Int, String] = (100, 10, "one") , if you want get a string "one" you can use tuple._3 . Tuples are useful e.g. if you want return more than one value
  • Ben James
    Ben James over 12 years
    I'm not sure if I understand you, sorry - do you want the values to be nested sequences or not?
  • om-nom-nom
    om-nom-nom over 12 years
    And the first part of _._1 (underscore before dot) is an anonymous name of argument. For example: List(1,2,3,4).map(_.toDouble) will cast all of the list members to Double. It is like i in for(i <- List(1,2,3,4)) ...
  • Submonoid
    Submonoid over 12 years
    Yes, I would want nested sequences, which I could do by wrapping my existing sequences in a Seq, but this feels somewhat like cheating - and in other cases I might want to use a completely different combiner that wouldn't fit into the semigroup structure - giving the size of the intersection of the value sequences, for example.
  • Ben James
    Ben James over 12 years
    I see, admittedly this is just a cheat that I would consider using in this particular situation, not a general solution.
  • Joshua Hartman
    Joshua Hartman over 12 years
    You've implemented a hash join. You could write different methods for each type of join, like left outer, right outer, outer, and inner that would give you the behavior you needed in each circumstance.
  • Luigi Plinge
    Luigi Plinge over 12 years
    + 1 but you can simplify by leaving off the final .toSeq as it doesn't do anything useful
  • Travis Brown
    Travis Brown over 12 years
    Note that IntMap's intersectionWith handles the case of a key only occurring in one map as you specify here.
  • Travis Brown
    Travis Brown over 12 years
    This doesn't correctly handle cases where a key is in one map but not the other, and rebuilding the map also makes it more expensive than intersectionWith, which is linear with the total number of elements.
  • Submonoid
    Submonoid over 12 years
    unionWith, interesectionWith etc look exactly like what I'm looking for. Just a shame they're in the wrong language!
  • OliverKK
    OliverKK over 8 years
    I just tested the scalaz functionality intersectionWith and found out, that keys which occur in b are ignored as well as in a.
  • Markus Knecht
    Markus Knecht over 8 years
    A shorter more efficent alternative that does the same would be: def merge[A,B,C](a : Map[A,B], b : Map[A,B])(c : (B,B) => C) = { for((k,v1) <-a; v2 <- b.get(k)) yield (k, c(v1, v2)) } merge(a,b){Seq(_,_)}
  • mpr
    mpr about 7 years
    what if keys are of type String? or any other type?
  • Travis Brown
    Travis Brown about 7 years
    @mpr Then you'll need to do something like map over the values with List(_) and sum with the monoid instance for maps in Scalaz or Cats (or of course just write your own intersectionWith from scratch).
  • stackexchanger
    stackexchanger about 5 years
    "This doesn't correctly handle cases where a key is in one map but not the other." Doesn't it? I can't see how that fails. Agree with the second point though.
  • gehbiszumeis
    gehbiszumeis over 4 years
    Please put your answer always in context instead of just pasting code. See here for more details.