zipWith (mapping over multiple Seq) in Scala

22,579

Solution 1

The function you want is called zipWith, but it isn't a part of the standard library. It will be in 2.8 (UPDATE: Apparently not, see comments).

foo zipWith((f: Double, b : Double) => f+b) bar

See this Trac ticket.

Solution 2

In Scala 2.8:

val baz = (foo, bar).zipped map (_ + _)

And it works for more than two operands in the same way. I.e. you could then follow this up with:

(foo, bar, baz).zipped map (_ * _ * _)

Solution 3

Well, that, the lack of zip, is a deficiency in Scala's 2.7 Seq. Scala 2.8 has a well-thought collection design, to replace the ad-hoc way the collections present in 2.7 came to be (note that they weren't all created at once, with an unified design).

Now, when you want to avoid creating temporary collection, you should use "projection" on Scala 2.7, or "view" on Scala 2.8. This will give you a collection type for which certain instructions, particularly map, flatMap and filter, are non-strict. On Scala 2.7, the projection of a List is a Stream. On Scala 2.8, there is a SequenceView of a Sequence, but there is a zipWith right there in the Sequence, you wouldn't even need it.

Having said that, as mentioned, JVM is optimized to handle temporary object allocations, and, when running in server mode, the run-time optimization can do wonders. So, do not optimize prematurely. Test the code in the conditions it will be run -- and if you haven't planned to run it in server mode, then rethink that if the code is expected to be long-running, and optmize when/where/if necessary.

EDIT

What is actually going to be available on Scala 2.8 is this:

(foo,bar).zipped.map(_+_)

Solution 4

A lazy list isn't a copy of a list - it's more like a single object. In the case of a lazy zip implementation, each time it is asked for the next item, it grabs an item from each of the two input lists and creates a tuple from them, and you then break the tuple apart with the pattern-matching in your lambda.

So there's never a need to create a complete copy of the whole input list(s) before starting to operate on them. It boils down to a very similar allocation pattern to any application running on the JVM - lots of very short-lived but small allocations, which the JVM is optimised to deal with.

Update: to be clear, you need to be using Streams (lazy lists) not Lists. Scala's streams have a zip that works the lazy way, and so you shouldn't be converting things into lists.

Ideally your algorithm should be capable of working on two infinite streams without blowing up (assuming it doesn't do any folding, of course, but just reads and generates streams).

Solution 5

When faced a similar task, I added the following pimp to Iterables:

implicit class IterableOfIterablePimps[T](collOfColls: Iterable[Iterable[T]]) {
  def mapZipped[V](f: Iterable[T] => V): Iterable[V] = new Iterable[V] {
    override def iterator: Iterator[V] = new Iterator[V] {
      override def next(): V = {
        val v = f(itemsLeft.map(_.head))
        itemsLeft = itemsLeft.map(_.tail)
        v
      }

      override def hasNext: Boolean = itemsLeft.exists(_.nonEmpty)

      private var itemsLeft = collOfColls
    }
  }
}

Having this, one can do something like:

val collOfColls = List(List(1, 2, 3), List(4, 5, 6), List(7, 8, 9))
collOfColls.mapZipped { group =>
  group // List(1, 4, 7), then List(2, 5, 8), then List(3, 6, 9)
}

Notice that you should carefully consider collection type passed as nested Iterable, since tail and head will be recurrently called on it. So, ideally you should pass Iterable[List] or other collection with fast tail and head.

Also, this code expects nested collections of the same size. That was my use case, but I suspect this can be improved, if needed.

Share:
22,579
bsdfish
Author by

bsdfish

Updated on January 26, 2021

Comments

  • bsdfish
    bsdfish over 3 years

    Suppose I have

    val foo : Seq[Double] = ...
    val bar : Seq[Double] = ...
    

    and I wish to produce a seq where the baz(i) = foo(i) + bar(i). One way I can think of to do this is

    val baz : Seq[Double] = (foo.toList zip bar.toList) map ((f: Double, b : Double) => f+b)
    

    However, this feels both ugly and inefficient -- I have to convert both seqs to lists (which explodes with lazy lists), create this temporary list of tuples, only to map over it and let it be GCed. Maybe streams solve the lazy problem, but in any case, this feels like unnecessarily ugly. In lisp, the map function would map over multiple sequences. I would write

    (mapcar (lambda (f b) (+ f b)) foo bar)
    

    And no temporary lists would get created anywhere. Is there a map-over-multiple-lists function in Scala, or is zip combined with destructuring really the 'right' way to do this?

  • bsdfish
    bsdfish almost 15 years
    I know what a lazy list is, but am not very familiar with Scala. foo.toList isn't lazy, right? In any case, coming from a CL background, it feels very weird that there isn't a mapMultiple function, so my reason for asking this question is just to figure out what the Scala correct way of doing this is. Performance is actually quite important; this is in my inner loop, and while I can try to optimize later, I'd like to code it up in a reasonable way first.
  • Daniel Earwicker
    Daniel Earwicker almost 15 years
    I'm telling you that you were correct when you said "maybe streams solve the problem" - use the stream version of zip. If you think that small allocations are putting pressure on the GC, write an imperative equivalent in the JVM-based language of your choice, and time them to see if it's true (I've frequently been amazed at the brilliants of VMs dealing with lots of small shortlived allocations).
  • Daniel C. Sobral
    Daniel C. Sobral almost 15 years
    I won't vote down this question for now, but it's completely wrong. That will result in NxN elements, and what the question asked for results in only N elements. You are adding each combination of elements from foo and bar, but what is asked for is foo(i)+bar(i).
  • Daniel Spiewak
    Daniel Spiewak almost 15 years
    Good point. It was a little early in the morning, so apparently my brain wasn't working properly. I'm going to delete this answer, since it really doesn't provide what the originator was asking.
  • Daniel Spiewak
    Daniel Spiewak almost 15 years
    Actually, I'll just update the answer. It's good information, just not applicable to this question.
  • Daniel C. Sobral
    Daniel C. Sobral over 14 years
    Sorry, no zipWith on Scala 2.8.
  • Debilski
    Debilski almost 14 years
    It doesn’t seem to work with more than three operands, however. Is that correct?
  • retronym
    retronym almost 14 years
    Correct, zipped is defined only on Tuple2 and Tuple3. Abstracting over arity is one of the final frontiers in Scala (and most other statically typed langauges). HLists offer one possibility...
  • AmigoNico
    AmigoNico over 11 years
    Just to be clear (and I'm sure Daniel would agree), Scala has nothing to apologize for here -- what you get with Scala is even better. See Martin's answer below, and Daniel's. It would be nice if somebody could make Martin's the approved answer to this question...
  • Mysterious Dan
    Mysterious Dan about 10 years
    @retronym there's also the <*>/<$> approach we take with ZipLists in Haskell, where you don't actually need to abstract over arity due to the homogeneity of curried functions. So if I wanted to zipWith a 5-parameter f, I could do more or less f <$> xs <*> ys <*> zs <*> ps <*> qs. Unfortunately curried functions are a lot more painful to deal with in Scala :( perhaps some ideas could be carried over, since this approach seems substantially more elegant than the HList one.