Java 8 Stream Filter - Sort based pdate

java sorting java-8 java-stream

12,267

Solution 1

You can use .sorted() Stream API method:

.sorted(Comparator.comparing(Document::getPDate).reversed())

And the full, refactored example:

List<Document> outList = documentList.stream()
  .filter(p -> p.getInteger(CommonConstants.VISIBILITY) == 1)
  .sorted(Comparator.comparing(Document::getPDate).reversed())
  .skip(skipValue).limit(limtValue)
  .collect(Collectors.toCollection(ArrayList::new))

Few things to remember about:

If you do not care about the List implementation, use Collectors.toList()
The collect() is a terminal operation and should be called as the last operation
.parallel().sequential() this is totally useless - if you want to parallelize, stick to .parallel() if not, do not write anything, streams are sequential by default
The whole Stream will be loaded to the memory for the sake of sorting

Solution 2

Alternative approach to pivovarit's answer, which might be useful in case your dataset is potentially too big to hold in memory at once (sorted Streams have to maintain whole underlying dataset in intermediate container to provide ability to sort it properly).

We will not utilize stream sort operation here: instead, we will use data structure that will hold as many elements in set as we told it to, and will push out extra elements based on sort criteria (I do not claim to provide best implementation here, just the idea of it).

To achieve this, we need custom collector:

class SortedPileCollector<E> implements Collector<E, SortedSet<E>, List<E>> {
  int maxSize;
  Comparator<E> comptr;

  public SortedPileCollector(int maxSize, Comparator<E> comparator) {
    if (maxSize < 1) {
      throw new IllegalArgumentException("Max size cannot be " + maxSize);
    }
    this.maxSize = maxSize;
    comptr = Objects.requireNonNull(comparator);
  }

  public Supplier<SortedSet<E>> supplier() {
    return () -> new TreeSet<>(comptr);
  }

  public BiConsumer<SortedSet<E>, E> accumulator() {
    return this::accumulate; // see below
  }

  public BinaryOperator<SortedSet<E>> combiner() {
    return this::combine;
  }

  public Function<SortedSet<E>, List<E>> finisher() {
    return set -> new ArrayList<>(set);
  }

  public Set<Characteristics> characteristics() {
    return EnumSet.of(Characteristics.UNORDERED);
  }

  // The interesting part
  public void accumulate(SortedSet<E> set, E el) {
    Objects.requireNonNull(el);
    Objects.requireNonNull(set);
    if (set.size() < maxSize) {
      set.add(el);
    }
    else {
      if (set.contains(el)) {
        return; // we already have this element
      }
      E tailEl = set.last();
      Comparator<E> c = set.comparator();
      if (c.compare(tailEl, el) <= 0) {
        // If we did not have capacity, received element would've gone to the end of our set.
        // However, since we are at capacity, we will skip the element
        return;
      }
      else {
        // We received element that we should preserve.
        // Remove set tail and add our new element.
        set.remove(tailEl);
        set.add(el);
      }
    }
  }

  public SortedSet<E> combine(SortedSet<E> first, SortedSet<E> second) {
    SortedSet<E> result = new TreeSet<>(first);
    second.forEach(el -> accumulate(result, el)); // inefficient, but hopefully you see the general idea.
    return result;
  }
}

The above collector acts as mutable structure that manages sorted set of data. Note, that "duplicate" elements are ignored by this implementation - you will need to change implementation if you want to allow duplicates.

Use of this comparator for your case, assuming you want three top elements:

Comparator<Document> comparator = Comparator.comparing(Document::getPDate).reversed(); // see pivovarit's answer
List<Document> = documentList.stream()
  .filter(p -> p.getInteger(VISIBILITY) == 1)
  .collect(new SortedPileCollector<>(3, comparator));

12,267

Bharathiraja S

Updated on June 04, 2022

Comments

Bharathiraja S almost 2 years

Am trying to sort the filed in filter.

Input Document / Sample Record:

DocumentList: [
    Document{
        {
            _id=5975ff00a213745b5e1a8ed9,
            u_id=,
            mailboxcontent_id=5975ff00a213745b5e1a8ed8,                
            idmapping=Document{
                {ptype=PDF, cid=00988, normalizedcid=00988, systeminstanceid=, sourceschemaname=, pid=0244810006}
            },
            batchid=null,
            pdate=Tue Jul 11 17:52:25 IST 2017, locale=en_US
        }
    },
    Document{
        {
            _id=597608aba213742554f537a6,
            u_id=,
            mailboxcontent_id=597608aba213742554f537a3, 
            idmapping=Document{
                {platformtype=PDF, cid=00999, normalizedcid=00999, systeminstanceid=, sourceschemaname=, pid=0244810006}
            },
            batchid=null,
            pdate=Fri Jul 28 01:26:22 IST 2017,
            locale=en_US
        }
    }
]

Here, I need to sort based on pdate.

List<Document> outList = documentList.stream()
    .filter(p -> p.getInteger(CommonConstants.VISIBILITY) == 1)
    .parallel()
    .sequential()
    .collect(Collectors.toCollection(ArrayList::new))
    .sort()
    .skip(skipValue)
    .limit(limtValue);

Not sure how to sort

"order by pdate DESC"

Thank you in advance!

M. Prokhorov almost 7 years

This will not compile: List does not have sort() method - it has sort(Comparator<T>), and that method does not return anything.
M. Prokhorov almost 7 years

Also, combination .parallel().sequential() is an exact equivalent to .sequential() alone. That chain looks like you thrown some random operations at it, to see if anything sticks.
fps almost 7 years

This seems like the return value of some NoSql database to me... Tell the database to sort, skip and limit, it doesn't make any sense to load the whole dataset in memory to then skip and limit the final result...
Grzegorz Piwowarek almost 7 years

if you found an answer helpful, please accept it

Anthony Raymond almost 7 years

is a simple comparator is going to be in DESC order?
Trash Can almost 7 years

By default, it is in ascending order
Anthony Raymond almost 7 years

As i though, the OP asked for DESC order
Anthony Raymond almost 7 years

I just deleted my last comment when i saw your edit ^^.
Trash Can almost 7 years

@AnthonyRaymond the OP didn't state that. My bad, see it
M. Prokhorov almost 7 years

@Dummy, see at the bottom, he wanted order by pdate DESC.
Anthony Raymond almost 7 years

@Dummy read the question again (at the very end) order by pdate DESC
Trash Can almost 7 years

@M.Prokhorov, fixed it.
Trash Can almost 7 years

@AnthonyRaymond Fixed it
M. Prokhorov almost 7 years

Don't sorted streams materialize their whole datasets into memory (at least I think they do) regardless of whether limit is in place? If so, it warrants a mention, I think.
Anthony Raymond almost 7 years

@M.Prokhorov they does indeed, but the returned list will contains only limit elements. I don't think is was use for optimization purpose here.
M. Prokhorov almost 7 years

@AnthonyRaymond, well, in case that DocumentList is huge and streamed over IO from disk, he may get a surprise at later point.
Anthony Raymond almost 7 years

@M.Prokhorov He may, let's hope he won't :)
Grzegorz Piwowarek almost 7 years

@M.Prokhorov good idea - I added info about that just in case - but I assume that OP realizes that sorting can be performed only on the fully loaded data set
Bharathiraja S almost 7 years

Thank you all! && (!StringUtils.isEmpty(req.getPType()) ? (((Document)d.get("idmapping")).getString("ptype").equalsIgn‌oreCase(req.getPType‌())) : true)) .sorted(Comparator.comparing(Document::getPdate).reverse‌d()) .collect(Collectors.toList()); Compile Error Getting The type Document does not define getPublicationdate(T) that is applicable here, in "comparing(Document::getPdate"
Bharathiraja S almost 7 years

After adding sorted am getting below error. The type Document does not define getPdate(T) that is applicable here
M. Prokhorov almost 7 years

@pivovarit, I made example implementation that might enable sort and limit even for huge datasets, based on collector that skips elements. See my answer for details.
Trash Can almost 7 years

What is the method name in Document class that returns the pdate ?
Holger almost 7 years

Considering that contains and remove have the same complexity as add, all bearing a lookup operation, it might be better to use just if(set.add(el)) set.pollLast();, which might create a node that will be removed right afterwards, but bears only a single lookup operation and is much simpler than E tailEl = set.last(); Comparator<E> c = set.comparator(); if (c.compare(tailEl, el) <= 0) { return; } else { set.remove(tailEl); set.add(el); }…
M. Prokhorov almost 7 years

@Holger, what you suggest is entirely valid, assuming that we are OK replacing SortedSet with NavigableSet (in my example implementation we can, since TreeSet is-a NavigableSet anyway). I understand that my implementation might not be among the better approaches, and I even said that I only intend to give basic idea.
Holger almost 7 years

If you follow the usual pattern, you provide a factory method return a Collector<E,?,List<E>>, making the actual intermediate container type an implementation detail. Using methods like Collector.of in the factory method, you don’t even need to implement Collector.
M. Prokhorov almost 7 years

@Holger, Collector.of would require function factories for accumulator and combiner, that capture maxSize parameter, and pass it to extracted static methods with arity of 3 (current implementation works around by having captured a reference to this). While this approach is generally more preferred because it allows testing things separately, I feel this would be slightly too much code for such simple answer. Good points on design though.
Holger almost 7 years

Well, the accumulator could be simplified to a single lambda expression, (set,el) -> { if(set.add(el) && set.size()>maxSize) set.pollLast(); }, likewise the combiner (set1,set2) -> { if(set1.addAll(set2)) { while(set1.size()>maxSize) set1.pollLast(); } return set1; }, indeed, they are capturing maxSize.