What is more efficient: sorted stream or sorting a list?
Solution 1
It is safe to say that two forms of sort will have the same complexity ... even without looking at the code. (If they didn't then one form would be severely broken!)
Looking at Java 8 source code for streams (specifically the internal class java.util.stream.SortedOps
), the sorted()
method adds a component to a stream pipeline that captures all of the stream elements into either an array or an ArrayList
.
-
An array is used if and only if the pipeline assembly code can deduce the number of elements in the stream ahead of time.
-
Otherwise, an
ArrayList
is used to gather the elements to be sorted.
If an ArrayList
is used, you incur the extra overhead of building / growing the list.
Then we return to two versions of the code:
List<Item> sortedItems = new ArrayList<>(items);
Collections.sort(sortedItems, itemComparator);
In this version, the ArrayList
constructor copies the elements items
to an appropriately sized array, and then Collections.sort
does an in-place sort of that array. (This happens under the covers).
List<Item> sortedItems = items
.stream()
.sorted(itemComparator)
.collect(Collectors.toList());
In this version, as we have seen above, the code associated with sorted()
either builds and sorts an array (equivalent to what happens above) or it builds the ArrayList
the slow way. But on top of that, there are the overheads of stream the data from items
and to the collector.
Overall (with the Java 8 implementation at least) code examination tells me that first version of the code cannot be slower than the second version, and in most (if not all) cases it will be faster. But as the list gets larger, the O(NlogN)
sorting will tend to dominate the O(N)
overheads of copying. That will mean that the relative difference between the two versions will get smaller.
If you really care, you should write a benchmark to test the actual difference with a specific implementation of Java, and a specific input dataset. (Or adapt @Eugene's benchmark!)
Solution 2
To be honest I don't trust myself too much either in JMH
(unless I understand the assembly, which takes lots of time in my case), especially since I've used @Setup(Level.Invocation)
, but here is a small test (I took the StringInput
generation from some other test I did, but it should not matter, it's just some data to sort)
@State(Scope.Thread)
public static class StringInput {
private String[] letters = { "q", "a", "z", "w", "s", "x", "e", "d", "c", "r", "f", "v", "t", "g", "b",
"y", "h", "n", "u", "j", "m", "i", "k", "o", "l", "p" };
public String s = "";
public List<String> list;
@Param(value = { "1000", "10000", "100000" })
int next;
@TearDown(Level.Invocation)
public void tearDown() {
s = null;
}
@Setup(Level.Invocation)
public void setUp() {
list = ThreadLocalRandom.current()
.ints(next, 0, letters.length)
.mapToObj(x -> letters[x])
.map(x -> Character.toString((char) x.intValue()))
.collect(Collectors.toList());
}
}
@Fork(1)
@Benchmark
public List<String> testCollection(StringInput si){
Collections.sort(si.list, Comparator.naturalOrder());
return si.list;
}
@Fork(1)
@Benchmark
public List<String> testStream(StringInput si){
return si.list.stream()
.sorted(Comparator.naturalOrder())
.collect(Collectors.toList());
}
Results show that Collections.sort
is faster, but not by a big margin:
Benchmark (next) Mode Cnt Score Error Units
streamvsLoop.StreamVsLoop.testCollection 1000 avgt 2 0.038 ms/op
streamvsLoop.StreamVsLoop.testCollection 10000 avgt 2 0.599 ms/op
streamvsLoop.StreamVsLoop.testCollection 100000 avgt 2 12.488 ms/op
streamvsLoop.StreamVsLoop.testStream 1000 avgt 2 0.048 ms/op
streamvsLoop.StreamVsLoop.testStream 10000 avgt 2 0.808 ms/op
streamvsLoop.StreamVsLoop.testStream 100000 avgt 2 15.652 ms/op
Solution 3
Below is my benchmark (not really sure if it is correct):
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Set;
import java.util.TreeSet;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OperationsPerInvocation;
import org.openjdk.jmh.annotations.OutputTimeUnit;
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(MyBenchmark.N)
public class MyBenchmark {
public static final int N = 50;
public static final int SIZE = 100000;
static List<Integer> sourceList = new ArrayList<>();
static {
System.out.println("Generating the list");
for (int i = 0; i < SIZE; i++) {
sourceList.add(i);
}
System.out.println("Shuffling the list.");
Collections.shuffle(sourceList);
}
@Benchmark
public List<Integer> sortingList() {
List<Integer> sortedList = new ArrayList<>(sourceList);
Collections.sort(sortedList);
return sortedList;
}
@Benchmark
public List<Integer> sortedStream() {
List<Integer> sortedList = sourceList.stream().sorted().collect(Collectors.toList());
return sortedList;
}
@Benchmark
public List<Integer> treeSet() {
Set<Integer> sortedSet = new TreeSet<>(sourceList);
List<Integer> sortedList = new ArrayList<>(sortedSet);
return sortedList;
}
}
Results:
Benchmark Mode Cnt Score Error Units
MyBenchmark.sortedStream avgt 200 300691.436 ± 15894.717 ns/op
MyBenchmark.sortingList avgt 200 262704.939 ± 5073.915 ns/op
MyBenchmark.treeSet avgt 200 856577.553 ± 49296.565 ns/op
As in @Eugene's benchmark, sorting list is slightly (ca. 20%) faster than sorted stream. What surprizes me a bit is that treeSet
is significantly slower. I did not expect that.
Related videos on Youtube
lexicore
Please see my Xing profile if you want a business contact.
Updated on September 15, 2022Comments
-
lexicore over 1 year
Assume we have some items in a collection and we want to sort them using certain comparator, expecting result in a list:
Collection<Item> items = ...; Comparator<Item> itemComparator = ...;
One of the approaches is to sort items in a list, something like:
List<Item> sortedItems = new ArrayList<>(items); Collections.sort(sortedItems, itemComparator);
Anothe approach is using a sorted stream:
List<Item> sortedItems = items .stream() .sorted(itemComparator) .collect(Collectors.toList());
I wonder, which approach is more efficient? Are there any advantages of a sorted stream (like faste sorting on multiple cores)?
Efficient in a sense of runtime complexity/fastest.
I don't trust myself to implement a perfect benchmark and studying
SortedOps
did not really enlighten me.-
Eugene about 6 yearswell at least one is sorting in place and there is no way to make it parallel (if you have enough data to squeeze some performance)
-
OldCurmudgeon about 6 yearsWhat do you mean by efficient? Fastest =
Collections.sort
; Most Readable =Stream
; Most memory efficient =Collections.sort
perhaps ... -
lexicore about 6 years@OldCurmudgeon Fastest. I've added that to the question.
-
Eugene about 6 yearsboth use
Arrays#sort
as far as I see -
Turing85 about 6 years@Eugene just checked again. My statement is only true for primitives. =) (that's the reason I deleted my comment)
-
Eugene about 6 years@Aominè almost smart - it will only not do anything for a natural sort... stackoverflow.com/questions/44486170/…
-
Eugene about 6 yearsinteresting... why would you offer a bounty here? not that I mind, I don't really care, but wondering what the two answers have (plus yours) have not already answered
-
lexicore about 6 years@Eugene As it is stated in the description of the bounty, I want to award it to one of the existing answers - yours. You put an effort to actually write a benchmark, I think it is worth additional bounty.
-
-
lexicore about 6 yearsI've also started writing a benchmark, we'll see what I get.
-
Stephen C about 6 years"So stream is a little bit slower." - As predicted :-)
-
Malte Hartwig about 6 yearsIn a comment on a completely unrelated question/answer, Holger has once explained me why
TreeSet
is slower: Adding to aTreeSet
is O(log n), and unless thesourceList
is sorted by the same comparator,addAll
will simply calladd
for each element. In plain terms, the slower performance stems from the fact thatTreeSet
keeps itself sorted at all times. This results in way more comparisons when adding many elements, as opposed to adding all elements first and then sorting like you do insortingList
. -
Holger about 6 yearsI’m a bit surprised about the baroque way of creating the list, concatenating strings just to split them afterwards. How about
int[] letters = IntStream.range('A', 'Z').toArray(); List<String> random = IntStream.range(0, listSize) .mapToObj(ix -> ThreadLocalRandom.current() .ints(ThreadLocalRandom.current().nextInt(1, maxWordSize), 0, letters.length) .map(i -> letters[i]) .collect(StringBuilder::new, StringBuilder::appendCodePoint, StringBuilder::append).toString()) .collect(Collectors.toCollection(ArrayList::new));
? -
Eugene about 6 years@Holger no no, you are correct, its morw than ugly, I had 20 minutes between two meetings, had to build the JMH test amd run it, so I took it from a test I already had in place. Very nice what you have done btw :)
-
Eugene about 6 years@MalteHartwig nice! just looked up the sources for it. thank you
-
Eugene about 6 years@Holger while this is indeed much nicer, it does not fit the JMH rules of testing (at least not something I would like). What I would like to create is the same input for each iteration of the tests. so
test1
must getabcd
andtest2
would still getabcd
... -
Holger about 6 yearsHow does your code ensure that each iteration gets the same list?
-
Eugene about 6 years@Holger for the test where I got this input from, what I needed is the same sizes of Strings
-
Holger about 6 yearsWell, it’s so hard to recognize what your code actually does. When you want a fixed word size, just replace
ThreadLocalRandom.current().nextInt(1, maxWordSize)
withdesiredWordSize
in my code and you’re done… But this has nothing to do with the JMH rules, as your code still generates a new list each time just like mine. -
Holger about 6 years@MalteHartwig well, sorting the list also has a worst case of O(n×log n), but in practice, it’s between O(n) and that worst case. But keep in mind that inserting into a tree map also implies allocating a node object and potentially rebalancing the tree. In contrast,
new ArrayList<>(collection)
just allocates and populates one array. -
Eugene about 6 years@Holger didn't see the edit. it should generate a new List, actually it does so every time because of
Level.Invocation
. I don't get your point here. Let's suppose I want to test "something" that deals with Strings and their lengths, well if I have two tests to benchmark - I want them both to get the same input for each trial, and by "same" I really mean of same lengths. Since@Setup
is invoked for each trial - if I used any kind of "random", I might get different lengths fortestA
andtestB
-
Eugene about 6 years@Holger thus I've used
@Param
here, and I know that for each trial whenparam == 1000
for example, I would generate inputs for the benchmark that would fit into "the same" pattern. I've been through the JMH documentation and samples a few times - even pinged Shipilev on this a few times, on how to actually do that in a JMH-ish way, because right now I don't like it either too much -
Eugene about 6 years@lexicore there are a few problems as far as I see here, the only way to do this correctly is to go to the JMH samples (numerous and numerous times.. ), simple tests are simple to do, well a bit more complicated and it gets fun! that being said here are the thing I don't like personally: 1) One of your tests does this
List<Integer> sortedList = new ArrayList<>(sourceList);
- implies that this is now part of the benchmark itself (Setup methods classes are for this) 2) you are shuffling your collection once - after that it is the same for all threads/tests -
Eugene about 6 years@lexicore (continued) /trials; this might trigger optimizations that you are no aware of. correct way to do it would be to generate something pseudo-random for... well the next word is not easy to choose. In your example it should really be
Invocation
, since you are sorting, and IIRC sorting an already sorted list is much faster then one that is not (JMH even has an example about this with bubble sort). So data should be provided to each trial already shuffled and out of order (read my last two comments at my answer) -
Eugene about 6 years@lexicore 3) then read about
@Fork
4)@OperationsPerInvocation
is dangerous, like any other JMH annotation that you don't really understand - the samples are there to help! 5) The most painful one : JMH is not going to tell you what you did wrong - you are going to measure "something", whatever that makes sense or not is an entire different story -
lexicore about 6 years@Eugene This is the first benchmark I ever did with JMH, so I'm not suprprized I did not do it right. This was one of the reasons I actually did not want to do it myself.
-
Eugene about 6 years@lexicore understood, boy you will have fun :) go through the examples - its an entire new world there!
-
lexicore about 6 years@Eugene I actually cargo-culted this from some example I've found somewhere.
-
lexicore about 6 years@Eugene Thank you for your feedback. The only think I don't quite agree is 1). I think
new ArrayList<>(sourceList);
should be part of the benchmark, as creating the resulting list is a part of every version. I test creation of a sorted copy of the given list. -
Eugene about 6 years@lexicore may be - this depends on what you really want to test - just the sorting?
-
lexicore about 6 years@Eugene I want to test sorted stream vs. sorting list. But in case of a stream I have to collect items in a list to get a result. This basically creates a sorted copy of the original list. So to make it fair, I have to create a copy of the original list for in-place sorting as well.
-
Eugene about 6 years@lexicore makes sense, yes. But in case of the stream there is no way to specify the initial capacity, so lots of resizes pottentially, they could steal the time you see in output... If I make sense
-
lexicore about 6 years@Eugene Sure, I get it. My interest is actually quite practical. I saw a lot of usages of a sorted stream to sort something in answers recently. I was sceptical that it's better that creating a copy and
Collections.sort
ing that copy. -
Holger about 6 years@Eugene my point was, my suggested alternative is not less “JMH-ish” than yours, just much simpler. Since your code is so complicated, I didn’t recognize that it generates words of the same size and provided a solution generating words of random size. But as shown in this comment, it’s easy to make it fixed word sized; it’s even simpler. You could even make the desired word size a parameter.
-
Eugene about 6 years@Holger agreed. I looked at it again and you make perfect sense, I only used a smaller version of your proposal here though. thank you
-
Holger about 6 yearsYou seem to really like that
Character.toString(…)
method, don’t you? Now, you’re even trying to invoke it on aString
(yourletters
array still is that pre-shuffled array of strings). In the end, your shortened version would create single-character words only, if it worked. -
Eugene about 6 years@Holger taking into consideration that I thought of single letter words only indeed, why is
Character#toString
so bad here? :| -
Holger about 6 yearsYou already have a single letter string. You are trying to invoke
intValue()
on it (which doesn’t compile), just to invokeCharacter.toString
on it. I thought, I saw this already somewhere else, but maybe I’m wrong. Anyway,.mapToObj(x -> letters[x])
produces aString
, asletters
has the typeString[]
. The entire subsequentmap
step is obsolete.