Scala: build a Map from a List of tuples, but fail if there are contradictory entries

12,186

Solution 1

As always when working with Seq[A] the optimal solution performance-wise depends on the concrete collection type. A general but not very efficient solution would be to fold over an Option[Map[A,B]]:

def optMap[A,B](in: Iterable[(A,B)]): Option[Map[A,B]] = 
  in.iterator.foldLeft(Option(Map[A,B]())) {
    case (Some(m),e @ (k,v)) if m.getOrElse(k, v) == v => Some(m + e)
    case _ => None
  }

If you restrict yourself to using List[A,B]s an optimized version would be:

@tailrec
def rmap[A,B](in: List[(A,B)], out: Map[A,B] = Map[A,B]()): Option[Map[A,B]] = in match {
  case (e @ (k,v)) :: tail if out.getOrElse(k,v) == v =>
    rmap(tail, out + e)
  case Nil =>
    Some(out)
  case _ => None
}

Additionally a less idiomatic version using mutable maps could be implemented like this:

def mmap[A,B](in: Iterable[(A,B)]): Option[Map[A,B]] = {
  val dest = collection.mutable.Map[A,B]()

  for (e @ (k,v) <- in) {
    if (dest.getOrElse(k, v) != v) return None
    dest += e
  }

  Some(dest.toMap)
}

Solution 2

Here is a fail-slowly solution (if creating the entire map and then discarding it is okay):

def uniqueMap[A,B](s: Seq[(A,B)]) = {
  val m = s.toMap
  if (m.size == s.length) Some(s) else None
}

Here is a mutable fail-fast solution (bail out as soon as the error is detected):

def uniqueMap[A,B](s: Seq[(A,B)]) = {
  val h = new collection.mutable.HashMap[A,B]
  val i = s.iterator.takeWhile(x => !(h contains x._1)).foreach(h += _)
  if (h.size == s.length) Some(h) else None
}

And here's an immutable fail-fast solution:

def uniqueMap[A,B](s: Seq[(A,B)]) = {
  def mapUniquely(i: Iterator[(A,B)], m: Map[A,B]): Option[Map[A,B]] = {
    if (i.hasNext) {
      val j = i.next
      if (m contains j._1) None
      else mapUniquely(i, m + j)
    }
    else Some(m)
  }
  mapUniquely(s.iterator, Map[A,B]())
}

Edit: and here's a solution using put for speed (hopefully):

def uniqueMap[A,B](s: Seq[(A,B)]) = {
  val h = new collection.mutable.HashMap[A,B]
  val okay = s.iterator.forall(x => {
    val y = (h put (x._1,x._2))
    y.isEmpty || y.get == x._2 
  })
  if (okay) Some(h) else None
}

Edit: now tested, and it's ~2x as fast on input that works (returns true) than Moritz' or my straightforward solution.

Solution 3

Scala 2.9 is near, so why not to take advantage of the combinations method (inspired by Moritz's answer):

def optMap[A,B](in: List[(A,B)]) = {
  if (in.combinations(2).exists { 
    case List((a,b),(c,d)) => a == c && b != d
    case _ => false 
  }) None else Some(in.toMap)
}

scala> val in = List(1->1,2->3,3->4,4->5,2->3)
in: List[(Int, Int)] = List((1,1), (2,3), (3,4), (4,5), (2,3))

scala> optMap(in)
res29: Option[scala.collection.immutable.Map[Int,Int]] = Some(Map(1 -> 1, 2 -> 3, 3 -> 4, 4 -> 5))

scala> val in = List(1->1,2->3,3->4,4->5,2->3,1->2)
in: List[(Int, Int)] = List((1,1), (2,3), (3,4), (4,5), (2,3), (1,2))

scala> optMap(in)
res30: Option[scala.collection.immutable.Map[Int,Int]] = None

Solution 4

You can also use gourpBy as follows:

  val pList = List(1 -> "a", 1 -> "b", 2 -> "c", 3 -> "d")

  def optMap[A,B](in: Iterable[(A,B)]): Option[Map[A,B]] = {
    Option(in.groupBy(_._1).map{case(_, list) => if(list.size > 1) return None else list.head})
  }

  println(optMap(pList))

It's efficiency is competitive to the above solutions. In fact if you examine the gourpBy implementation you will see that it is very similar to some of the solutions suggested.

Share:
12,186
ziggystar
Author by

ziggystar

Updated on June 09, 2022

Comments

  • ziggystar
    ziggystar about 2 years

    I think this might be a common operation. So maybe it's inside the API but I can't find it. Also I'm interested in an efficient functional/simple solution if not.

    Given a sequence of tuples ("a" -> 1, "b" ->2, "c" -> 3) I want to turn it into a map. That's easy using TraversableOnce.toMap. But I want to fail this construction if the resulting map "would contain a contradiction", i.e. different values assigned to the same key. Like in the sequence ("a" -> 1, "a" -> 2). But duplicates shall be allowed.

    Currently I have this (very imperative) code:

    def buildMap[A,B](in: TraversableOnce[(A,B)]): Option[Map[A,B]] = {
      val map = new HashMap[A,B]
      val it = in.toIterator
      var fail = false
      while(it.hasNext){
        val next = it.next()
        val old = map.put(next._1, next._2)
        fail = old.isDefined && old.get != next._2
      }
    
      if(fail) None else Some(map.toMap)
    }
    

    Side Question

    Is the final toMap really necessary? I get a type error when omitting it, but I think it should work. The implementation of toMap constructs a new map which I want to avoid.

  • ziggystar
    ziggystar about 13 years
    I want to fail only on contradicting duplicates and not on exact duplicates. Nonetheless a very nice answer and I can easily adapt it to my case. I like how you use takeWhile and foreach in a counter intuitive sequence. :) I'll stick to the second suggestion for performance reasons.
  • Rex Kerr
    Rex Kerr about 13 years
    @ziggystar - My mistake. Replace h contains x._1 with h.get(x._1).map(_ == x._2).getOrElse(true) or somesuch and you'll be set.
  • ziggystar
    ziggystar about 13 years
    Uh, adaption is not so easy. The problem is that I probably want to use HashMap.put which returns the old element which I can then compare to the new one. But using your solution this is impossible. I'd have to add the element and then check the condition by performing a second query to the map.
  • Rex Kerr
    Rex Kerr about 13 years
    @ziggystar - Picky, picky! Okay, I'll update with an answer that works there.
  • ziggystar
    ziggystar about 13 years
    Using getOrElseUpdate might make your third code a bit shorter I think.
  • Alexander Aleksandrovič Klimov
    Alexander Aleksandrovič Klimov about 13 years
    Your "less idiomatic" one seems significantly clearer to me. I'd be interested to hear from readers with more functional programming experience then me if they would find working out what your first one does easier than doing the same for the last one...
  • Alexander Aleksandrovič Klimov
    Alexander Aleksandrovič Klimov about 13 years
    @ziggystar, shorter, but the side-effect-in-condition means less clear (to me, anywway)
  • ziggystar
    ziggystar about 13 years
    Whoa, that's astonishingly inefficient.
  • Yinzara
    Yinzara over 11 years
    Welcome to Scala, the language most easy to write AMAZINGLY inefficient code with the smallest footprint possible