Is it faster to sort a list after inserting items or adding them to a sorted list

62,058

Solution 1

If you add enough items that you're effectively building the list from scratch, you should be able to get better performance by sorting the list afterwards.

If items are mostly in order, you can tweak both incremental update and regular sorting to take advantage of that, but frankly, it usually isn't worth the trouble. (You also need to be careful of things like making sure some unexpected ordering can't make your algorithm take much longer, q.v. naive quicksort)

Both incremental update and regular list sort are O(N log N) but you can get a better constant factor sorting everything afterward (I'm assuming here that you've got some auxiliary datastructure so your incremental update can access list items faster than O(N)...). Generally speaking, sorting all at once has a lot more design freedom than maintaining the ordering incrementally, since incremental update has to maintain a complete order at all times, but an all-at-once bulk sort does not.

If nothing else, remember that there are lots of highly-optimized bulk sorts available.

Solution 2

Usually it's far better to use a heap. in short, it splits the cost of maintaining order between the pusher and the picker. Both operations are O(log n), instead of O(n log n), like most other solutions.

Solution 3

If you're adding in bunches, you can use a merge sort. Sort the list of items to be added, then copy from both lists, comparing items to determine which one gets copied next. You could even copy in-place if resize your destination array and work from the end backwards.

The efficiency of this solution is O(n+m) + O(m log m) where n is the size of the original list, and m is the number of items being inserted.

Edit: Since this answer isn't getting any love, I thought I'd flesh it out with some C++ sample code. I assume that the sorted list is kept in a linked list rather than an array. This changes the algorithm to look more like an insertion than a merge, but the principle is the same.

// Note that itemstoadd is modified as a side effect of this function
template<typename T>
void AddToSortedList(std::list<T> & sortedlist, std::vector<T> & itemstoadd)
{
    std::sort(itemstoadd.begin(), itemstoadd.end());
    std::list<T>::iterator listposition = sortedlist.begin();
    std::vector<T>::iterator nextnewitem = itemstoadd.begin();
    while ((listposition != sortedlist.end()) || (nextnewitem != itemstoadd.end()))
    {
        if ((listposition == sortedlist.end()) || (*nextnewitem < *listposition))
            sortedlist.insert(listposition, *nextnewitem++);
        else
            ++listposition;
    }
}

Solution 4

I'd say, let's test it! :)

I tried with quicksort, but sorting an almost sorting array with quicksort is... well, not really a good idea. I tried a modified one, cutting off at 7 elements and using insertion sort for that. Still, horrible performance. I switched to merge sort. It might need quite a lot of memory for sorting (it's not in-place), but the performance is much better on sorted arrays and almost identical on random ones (the initial sort took almost the same time for both, quicksort was only slightly faster).

This already shows one thing: The answer to your questions depends strongly on the sorting algorithm you use. If it will have poor performance on almost sorted lists, inserting at the right position will be much faster than adding at the end and then re-sorting it; and merge sort might be no option for you, as it might need way too much external memory if the list is huge. BTW I used a custom merge sort implementation, that only uses 1/2 of external storage to the naive implementation (which needs as much external storage as the array size itself).

If merge sort is no option and quicksort is no option for sure, the best alternative is probably heap sort.

My results are: Adding the new elements simply at the end and then re-sorting the array was several magnitudes faster than inserting them in the right position. However, my initial array had 10 mio elements (sorted) and I was adding another mio (unsorted). So if you add 10 elements to an array of 10 mio, inserting them correctly is much faster than re-sorting everything. So the answer to your question also depends on how big the initial (sorted) array is and how many new elements you want to add to it.

Solution 5

In principle, it's faster to create a tree than to sort a list. The tree inserts are O(log(n)) for each insert, leading to overall O(nlog(n)). Sorting in O(nlog(n)).

That's why Java has TreeMap, (in addition to TreeSet, TreeList, ArrayList and LinkedList implementations of a List.)

  • A TreeSet keeps things in object comparison order. The key is defined by the Comparable interface.

  • A LinkedList keeps things in the insertion order.

  • An ArrayList uses more memory, is faster for some operations.

  • A TreeMap, similarly, removes the need to sort by a key. The map is built in key order during the inserts and maintained in sorted order at all times.

However, for some reason, the Java implementation of TreeSet is quite a bit slower than using an ArrayList and a sort.

[It's hard to speculate as to why it would be dramatically slower, but it is. It should be slightly faster by one pass through the data. This kind of thing is often the cost of memory management trumping the algorithmic analysis.]

Share:
62,058
Steve
Author by

Steve

Born a Developer, will die a developer!

Updated on July 04, 2021

Comments

  • Steve
    Steve almost 3 years

    If I have a sorted list (say quicksort to sort), if I have a lot of values to add, is it better to suspend sorting, and add them to the end, then sort, or use binary chop to place the items correctly while adding them. Does it make a difference if the items are random, or already more or less in order?

  • bmdhacks
    bmdhacks over 15 years
    Inserting into a sorted list is O(log n). Inserting into a hash is O(1).
  • bmdhacks
    bmdhacks over 15 years
    Ok, you fixed your notation, but now your first statement is incorrect. Sorting and inserting are the same speed. Sorting is O(N log N), and inserting is performing an O(log N) operation N times, thus being O(N log N).
  • Remo.D
    Remo.D over 15 years
    Not to be picky but N is the number of elements to insert, so the last paragraph of your answer do not make too much sense to me! Did you mean, "if N is not too large"?
  • Martin Beckett
    Martin Beckett over 15 years
    But it is different N, if you only need to insert 10 items into a million then 10 * (log 1M) beats 10 + (1M log 1M) ps. Sorry I left you a comment, thanking you for spotting the typo but it seems to have gone?
  • bmdhacks
    bmdhacks over 15 years
    Edited to clarify after Remo.D's comment.
  • bmdhacks
    bmdhacks over 15 years
    Fair enough. Technically Big-O doesn't care about the size of N, only Big-Omega does, but only computer-science professors probably care. Thanks for putting up with my scrutiny.
  • Martin Beckett
    Martin Beckett over 15 years
    And most people assume that O() tells you everything about the speed. Building pyramids is O(n) but is still a lot slower than sorting their heights!
  • Tony BenBrahim
    Tony BenBrahim over 15 years
    paragraph 2 is wrong in some cases. Doing a quick sort on an almost sorted list approaches O(n^2) rather than O(n log n).
  • hazzen
    hazzen over 15 years
    I'd be careful saying a tree is faster than a list. It really depends on the size of the input and the tree implementation used.
  • hazzen
    hazzen over 15 years
    Run some speed tests and you will see that is not the case. TreeSet vs. ArrayList, ArrayList was ~2x faster to add 500k random numbers, sort, and dump them into another list. If we don't dump them to another list, ArrayList wins by ~1.6x.
  • hazzen
    hazzen over 15 years
    TreeSet and TreeMap are essentially the same class; a TreeSet<E> is a TreeMap<E, Object> with the value set to a singleton object on insert. The times are almost identical, and still ~2x slower than an ArrayList solution.
  • hazzen
    hazzen over 15 years
    I said that insert all into an ArrayList + Collections.sort is about 2x faster than just insert all into a Tree[Set|Map]. This is for large numbers of values. The difference is still about 2x for small numbers of values, but 1ms vs. 2ms doesn't really matter.
  • ddimitrov
    ddimitrov over 15 years
    The reason about the speed difference is that ArrayList is implemented using a single array, while the tree map is a linked structure with different node objects for each entry. Accessing arrays is much faster and the JVM can optimize better than objects (reuse registers, better cache locality)
  • warren
    warren over 15 years
    the insert in a dynamic construct such as a linked list is still O(1) per insert. So yes, overall that adds up to an O(N) - but it's not multiplicative, it's additive (ie 2 times O(n), not O(n^2)).
  • Greg Rogers
    Greg Rogers over 15 years
    if it was a vector, finding where and inserting an item is O(N). But appending and sorting afterwards for just one element is still worse - O(N log N).
  • Mecki
    Mecki over 15 years
    I think you have a totally incorrect view of the O notation. Inserting an item into a list is not O(n), it is always O(1) in algorithm theorem. Moving millions of byte in memory may not be a constant operation, but O notation is not about the time it takes, but about the complexity, which is 1
  • tloach
    tloach over 15 years
    inserting should be O(log(N)) if you do it right and have relatively evenly distributed data
  • hazzen
    hazzen over 15 years
    If it is not a constant operation, it isn't O(1). Period. The code for insert into a list is (for array-based list): for (i = last; i > idx; --i) { list[i + 1] = list[i]; } list[idx] = item; I don't think you will debate that is O(n). You can't just ignore part of your code in Big O.
  • Daniel Rikowski
    Daniel Rikowski over 15 years
    That is especially good advice if the list is some kind of priority queue. Google for weak heaps in that case.
  • Mike Dunlavey
    Mike Dunlavey over 15 years
    It is O(1) if it is bounded by some constant for any N. There are ways to organize an array so insertion is efficient, such as by making it out of blocks that have a certain amount of empty space.
  • Maciej Piechotka
    Maciej Piechotka over 13 years
    @Martin Beckett: Unless you happen to build a lots of pyramids ;)
  • Miles Rout
    Miles Rout about 11 years
    O(n+m) + O(m log m) is O(n+m)
  • Mark Ransom
    Mark Ransom about 11 years
    @MilesRout, not true at all. m log m > m, so the best you can simplify it is O(n+(m log m)).
  • Miles Rout
    Miles Rout about 11 years
    oops, didn't see the m before the log m. ignore me!
  • Peter Perháč
    Peter Perháč over 10 years
    TreeList? I have never seen one of those. Searched for it, in Java™ Platform, Standard Edition 7 and could not find it.
  • Peter Cordes
    Peter Cordes over 8 years
    Your first paragraph is describing a single merge of two sorted linked lists. If one merge is O(N), your total sort will be O(NlogN), unless you somehow can get an O(1) number of sorted chunks in less than O(NlogN) time. Incrementally sorting by inserting each element into a binary search tree is O(N log N), because the insertion operation is O(logN), and you have to do it N times. (simple binary trees have O(N) worst case insertion for one element.) Anyway, the last two paragraphs are nonsense. Neither of these helps you beat O(NlogN), or even beat qsort.
  • warren
    warren over 8 years
    @PeterCordes- I am not describing a merge of two sorted lists at all: I am describing adding items of an unknown sorted order to an already-sorted list
  • iAdjunct
    iAdjunct over 3 years
    @MikeDunlavey For anybody else who comes by this, we've entered the realm of implementation-dependent-complexity. It may be possible to construct a list where insertion is an O(1) operation, but that doesn't mean your list is. For example, std::vector::insert is O(n) whereas std::list::insert is O(1).
  • mahee96
    mahee96 almost 3 years
    insertion of an element into a sorted list may take O(logn) time to binary search the insertion point but if insertion point is the start, then the list has to be shifted almost n times to right which is O(n). So for n insertions time complexity is not O(nlogn) but O(n^2) worst case!!. (assuming array) or AUX = O(1)
  • mahee96
    mahee96 almost 3 years
    insertion into a linear sorted array will take O(n) not O(1) assuming n shifts to right for almost filled in array/ArrayLisy. preserving AUX = O(1) condition, if a tree was used still insertion is O(n) for a skewed tree or an almost sorted tree unless self adjusting trees like AVL or Redblack tree was used where it is O(logn) still not O(1). also for a tree AUX = O(n)
  • mahee96
    mahee96 almost 3 years
    @MikeDunlavey, how would you insert an element into 0th index in an array of n elements without shifting/copying_to_new_place in O(1) time. I am not sure if I am missing something on arrays and linked list just in case.
  • Mike Dunlavey
    Mike Dunlavey almost 3 years
    @mahee96: In arrays, suppose in every sequential group of 10 items, you initially leave 5 empty spaces. Then you can add new items easily, until you get to a point where you need to re-make the array.
  • mahee96
    mahee96 almost 3 years
    @MikeDunlavey, yeah I guess the input being bounded or unbounded makes a difference in this case, lf the input was bounded but still the list needs to be incremental coz at each increment someone looks for it to be sorted(like in heap access) then this approach of filling would be better, if input was unbounded and incremental sort was required, then it can still be done with linearly filled or binary filled(and expanded after certain increments).
  • warren
    warren almost 3 years
    @mahee96 - thanks for agreeing with me. Not sure why you felt the need to reword my answer into a comment stating the same thing, though