Java 8 Stream Filter - Sort based pdate
Solution 1
You can use .sorted()
Stream API method:
.sorted(Comparator.comparing(Document::getPDate).reversed())
And the full, refactored example:
List<Document> outList = documentList.stream()
.filter(p -> p.getInteger(CommonConstants.VISIBILITY) == 1)
.sorted(Comparator.comparing(Document::getPDate).reversed())
.skip(skipValue).limit(limtValue)
.collect(Collectors.toCollection(ArrayList::new))
Few things to remember about:
- If you do not care about the
List
implementation, useCollectors.toList()
- The
collect()
is a terminal operation and should be called as the last operation -
.parallel().sequential()
this is totally useless - if you want to parallelize, stick to.parallel()
if not, do not write anything, streams are sequential by default - The whole Stream will be loaded to the memory for the sake of sorting
Solution 2
Alternative approach to pivovarit's answer, which might be useful in case your dataset is potentially too big to hold in memory at once (sorted Stream
s have to maintain whole underlying dataset in intermediate container to provide ability to sort it properly).
We will not utilize stream sort operation here: instead, we will use data structure that will hold as many elements in set as we told it to, and will push out extra elements based on sort criteria (I do not claim to provide best implementation here, just the idea of it).
To achieve this, we need custom collector:
class SortedPileCollector<E> implements Collector<E, SortedSet<E>, List<E>> {
int maxSize;
Comparator<E> comptr;
public SortedPileCollector(int maxSize, Comparator<E> comparator) {
if (maxSize < 1) {
throw new IllegalArgumentException("Max size cannot be " + maxSize);
}
this.maxSize = maxSize;
comptr = Objects.requireNonNull(comparator);
}
public Supplier<SortedSet<E>> supplier() {
return () -> new TreeSet<>(comptr);
}
public BiConsumer<SortedSet<E>, E> accumulator() {
return this::accumulate; // see below
}
public BinaryOperator<SortedSet<E>> combiner() {
return this::combine;
}
public Function<SortedSet<E>, List<E>> finisher() {
return set -> new ArrayList<>(set);
}
public Set<Characteristics> characteristics() {
return EnumSet.of(Characteristics.UNORDERED);
}
// The interesting part
public void accumulate(SortedSet<E> set, E el) {
Objects.requireNonNull(el);
Objects.requireNonNull(set);
if (set.size() < maxSize) {
set.add(el);
}
else {
if (set.contains(el)) {
return; // we already have this element
}
E tailEl = set.last();
Comparator<E> c = set.comparator();
if (c.compare(tailEl, el) <= 0) {
// If we did not have capacity, received element would've gone to the end of our set.
// However, since we are at capacity, we will skip the element
return;
}
else {
// We received element that we should preserve.
// Remove set tail and add our new element.
set.remove(tailEl);
set.add(el);
}
}
}
public SortedSet<E> combine(SortedSet<E> first, SortedSet<E> second) {
SortedSet<E> result = new TreeSet<>(first);
second.forEach(el -> accumulate(result, el)); // inefficient, but hopefully you see the general idea.
return result;
}
}
The above collector acts as mutable structure that manages sorted set of data. Note, that "duplicate" elements are ignored by this implementation - you will need to change implementation if you want to allow duplicates.
Use of this comparator for your case, assuming you want three top elements:
Comparator<Document> comparator = Comparator.comparing(Document::getPDate).reversed(); // see pivovarit's answer
List<Document> = documentList.stream()
.filter(p -> p.getInteger(VISIBILITY) == 1)
.collect(new SortedPileCollector<>(3, comparator));
Related videos on Youtube
Bharathiraja S
Updated on June 04, 2022Comments
-
Bharathiraja S almost 2 years
Am trying to sort the filed in filter.
Input Document / Sample Record:
DocumentList: [ Document{ { _id=5975ff00a213745b5e1a8ed9, u_id=, mailboxcontent_id=5975ff00a213745b5e1a8ed8, idmapping=Document{ {ptype=PDF, cid=00988, normalizedcid=00988, systeminstanceid=, sourceschemaname=, pid=0244810006} }, batchid=null, pdate=Tue Jul 11 17:52:25 IST 2017, locale=en_US } }, Document{ { _id=597608aba213742554f537a6, u_id=, mailboxcontent_id=597608aba213742554f537a3, idmapping=Document{ {platformtype=PDF, cid=00999, normalizedcid=00999, systeminstanceid=, sourceschemaname=, pid=0244810006} }, batchid=null, pdate=Fri Jul 28 01:26:22 IST 2017, locale=en_US } } ]
Here, I need to sort based on pdate.
List<Document> outList = documentList.stream() .filter(p -> p.getInteger(CommonConstants.VISIBILITY) == 1) .parallel() .sequential() .collect(Collectors.toCollection(ArrayList::new)) .sort() .skip(skipValue) .limit(limtValue);
Not sure how to sort
"order by pdate DESC"
Thank you in advance!
-
M. Prokhorov almost 7 yearsThis will not compile:
List
does not havesort()
method - it hassort(Comparator<T>)
, and that method does not return anything. -
M. Prokhorov almost 7 yearsAlso, combination
.parallel().sequential()
is an exact equivalent to.sequential()
alone. That chain looks like you thrown some random operations at it, to see if anything sticks. -
fps almost 7 yearsThis seems like the return value of some NoSql database to me... Tell the database to sort, skip and limit, it doesn't make any sense to load the whole dataset in memory to then skip and limit the final result...
-
Grzegorz Piwowarek almost 7 yearsif you found an answer helpful, please
accept
it
-
-
Anthony Raymond almost 7 yearsis a simple comparator is going to be in
DESC
order? -
Trash Can almost 7 yearsBy default, it is in ascending order
-
Anthony Raymond almost 7 yearsAs i though, the OP asked for DESC order
-
Anthony Raymond almost 7 yearsI just deleted my last comment when i saw your edit ^^.
-
Trash Can almost 7 years@AnthonyRaymond the OP didn't state that. My bad, see it
-
M. Prokhorov almost 7 years@Dummy, see at the bottom, he wanted
order by pdate DESC
. -
Anthony Raymond almost 7 years@Dummy read the question again (at the very end)
order by pdate DESC
-
Trash Can almost 7 years@M.Prokhorov, fixed it.
-
Trash Can almost 7 years@AnthonyRaymond Fixed it
-
M. Prokhorov almost 7 yearsDon't sorted streams materialize their whole datasets into memory (at least I think they do) regardless of whether
limit
is in place? If so, it warrants a mention, I think. -
Anthony Raymond almost 7 years@M.Prokhorov they does indeed, but the returned list will contains only
limit
elements. I don't think is was use for optimization purpose here. -
M. Prokhorov almost 7 years@AnthonyRaymond, well, in case that
DocumentList
is huge and streamed over IO from disk, he may get a surprise at later point. -
Anthony Raymond almost 7 years@M.Prokhorov He may, let's hope he won't :)
-
Grzegorz Piwowarek almost 7 years@M.Prokhorov good idea - I added info about that just in case - but I assume that OP realizes that sorting can be performed only on the fully loaded data set
-
Bharathiraja S almost 7 yearsThank you all!
&& (!StringUtils.isEmpty(req.getPType()) ? (((Document)d.get("idmapping")).getString("ptype").equalsIgnoreCase(req.getPType())) : true)) .sorted(Comparator.comparing(Document::getPdate).reversed()) .collect(Collectors.toList());
Compile Error Getting The type Document does not define getPublicationdate(T) that is applicable here, in "comparing(Document::getPdate" -
Bharathiraja S almost 7 yearsAfter adding sorted am getting below error. The type Document does not define getPdate(T) that is applicable here
-
M. Prokhorov almost 7 years@pivovarit, I made example implementation that might enable
sort and limit
even for huge datasets, based on collector that skips elements. See my answer for details. -
Trash Can almost 7 yearsWhat is the method name in
Document
class that returns thepdate
? -
Holger almost 7 yearsConsidering that
contains
andremove
have the same complexity asadd
, all bearing a lookup operation, it might be better to use justif(set.add(el)) set.pollLast();
, which might create a node that will be removed right afterwards, but bears only a single lookup operation and is much simpler thanE tailEl = set.last(); Comparator<E> c = set.comparator(); if (c.compare(tailEl, el) <= 0) { return; } else { set.remove(tailEl); set.add(el); }
… -
M. Prokhorov almost 7 years@Holger, what you suggest is entirely valid, assuming that we are OK replacing
SortedSet
withNavigableSet
(in my example implementation we can, sinceTreeSet
is-aNavigableSet
anyway). I understand that my implementation might not be among the better approaches, and I even said that I only intend to give basic idea. -
Holger almost 7 yearsIf you follow the usual pattern, you provide a factory method return a
Collector<E,?,List<E>>
, making the actual intermediate container type an implementation detail. Using methods likeCollector.of
in the factory method, you don’t even need to implementCollector
. -
M. Prokhorov almost 7 years@Holger,
Collector.of
would require function factories for accumulator and combiner, that capturemaxSize
parameter, and pass it to extracted static methods with arity of 3 (current implementation works around by having captured a reference tothis
). While this approach is generally more preferred because it allows testing things separately, I feel this would be slightly too much code for such simple answer. Good points on design though. -
Holger almost 7 yearsWell, the accumulator could be simplified to a single lambda expression,
(set,el) -> { if(set.add(el) && set.size()>maxSize) set.pollLast(); }
, likewise the combiner(set1,set2) -> { if(set1.addAll(set2)) { while(set1.size()>maxSize) set1.pollLast(); } return set1; }
, indeed, they are capturingmaxSize
.