How to get distinct items from a Scala Iterable, maintaining laziness
Solution 1
You can use the distinct
method on Stream
. For example, if you have this Iterable
:
val it = new java.lang.Iterable[Int] {
def iterator = new java.util.Iterator[Int] {
var i = 0
var first = true
def hasNext = true
def next =
if (first) { first = false; i } else { first = true; i += 1; i - 1 }
def remove() { throw new UnsupportedOperationException("Can't remove.") }
}
}
You can write:
scala> import scala.collection.JavaConverters._
import scala.collection.JavaConverters._
scala> val s = it.asScala.toStream
s: scala.collection.immutable.Stream[Int] = Stream(0, ?)
scala> s.take(10).toList
res0: List[Int] = List(0, 0, 1, 1, 2, 2, 3, 3, 4, 4)
scala> val s = it.asScala.toStream.distinct
s: scala.collection.immutable.Stream[Int] = Stream(0, ?)
scala> s.take(10).toList
res1: List[Int] = List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
We can tell that everything is appropriately lazy since the stream is infinite.
Solution 2
UPDATE Reading questions carefully is good. No laziness in this solution. Sorry.
toSet
will do exactly what you want:
- Store iterated elements in a collection (not what you want but required)
- Drop / Replace duplicates
Example
val it = Seq(1,2,3,4,2,4): Iterable[Int]
it.toSet
// Set(1,2,3,4)
If you feel fancy, you can convert that back to an iterable:
it.toSet.toIterable
Or, pimp the Iterable
:
implicit class UniquableIterable[T](t: Iterable[T]) {
def unique = t.toSet.toIterable
}
And then call
it.unique
Solution 3
Expanding on my comment above, but I can't test it right now:
def unique[A](it: Iterator[A]): Iterator[A] = {
val seen = mutable.Set[A]()
it.filter { a =>
if (seen(a))
false
else {
seen += a
true
}
}
}
You get the idea, at least. You would then apply this to the iterator you get from your iterable, and not get the unnecessary storage behavior of Stream
.
Solution 4
Here is the code that adds .disctinct
method to Iterator
.
implicit class IteratorWrapper[T](it: Iterator[T]) {
def distinct = new Iterator[T] {
var seen = Set.empty[T]
var ahead = Option.empty[T]
def searchAhead {
while (ahead.isEmpty && it.hasNext) {
val v = it.next
if (!seen(v)) {
seen += v
ahead = Some(v)
}
}
}
def hasNext = {
searchAhead
ahead.nonEmpty
}
def next = {
searchAhead
val result = ahead.get
ahead = None
result
}
}
}
Be aware that, as it is usually so with Iterators, the original iterator is not valid after calling .distinct
on it.
Dan Gravell
Indie developer and founder of bliss and OneMusicAPI. Started programming when I was seven years old with an Amstrad CPC 6128 and Appendix 3 of the "Amstrad CPC 6128 User Instructions"... In the mid 90s I got into Java via my University CS course, and that has been my main programming environment/ecosystem ever since.
Updated on June 19, 2022Comments
-
Dan Gravell almost 2 years
I have a
java.lang.Iterable
which computes its values lazily. I am accessing it from Scala. Is there a core API way of returning only distinct values? For instance, imaging there was a filter method that also provided all results returned thus far:val myLazyDistinctIterable = iterable.filter((previousReturnedItems, newItem) => previousReturnedItems.contains(newItem))
I guess this is not a very general case because it involves storing previously returned items, and that might be why it isn't in the core API.
I know about
List.distinct
andSet
s but I want something that will not compute its elements until asked. -
Dan Gravell about 11 yearsNo worries, guess you're just not a lazy person ;)
-
Dan Gravell about 11 yearsD'oh, should've checked Stream. Thanks. I remember now that Stream stores its previously returned items. I thought that was an implementation detail originally but I guess it's part of its semantics too.
-
gzm0 about 11 years@DanGravell Yep... I don't think you'll get around writing a
UniqueIterable
with really ugly internal state. -
Andrew Scott Evans over 8 yearshave you ever tried to use Scalas own distinct? x.toList.distinct
-
gzm0 over 8 yearsThis is wrong! Consider:
Iterator(1, 1)
. After callingnext
for the first time,hasNext
will still be true, but callingnext
will actually throw. -
Rok Kralj over 8 yearsI have done exactly that and everything is fine as it should be. pastebin.com/NPAjq2aS
-
gzm0 over 8 yearsWell yes... because you are calling next on the original iterator, not the one created by the call to
distinct
. -
Rok Kralj over 8 yearsI have fixed the problem. The consequence is that even
.hasNext
can mutate the iterator, as there is no way of telling whether the distinct iterator has any elements left, you have to peek.