Which is more efficient, Sorting and then Binary Search over a Collection or Linear Search in java
Solution 1
It depends.
- If you are searching for only one string the linear search is better because it is in
O(n)
- If you are searching for multiple strings first sorting and then binary searching maybe better. it will be
O(logn + n*logn)
which isO(n*logn)
. So if you are checking for ca.n
strings, this one is better. - If you only want to know if your Collection contains an element (ignoring order), you should consider using
HashSet
which hasO(1)
. - If you need order and a fast contains method, use
LinkedHashSet
P.S. premature optimization is the root of all evil.
Solution 2
If you do the search just one time:
- Complexity for sort + binary search will be
O(n * log n)
. - Complexity for linear search is
O(n)
.
If you do the search for more than one time, let's say k
times:
- Complexity for sort + binary search will be
O((n + k) * log n)
. - Complexity for linear search will be
O(k * n)
.
So if you do the search just one time you should go with linear search. If you do the search for more than one time, most probably you should sort first.
Also, maybe in this case you can consider using a hash table, which has amortized complexity of O(1)
for an element search.
Zeeshan
Updated on June 21, 2022Comments
-
Zeeshan almost 2 years
Suppose I am having a Collection of object:
List<String> myList = populateMyArrayList(); //Here I am having an ArrayList with 1000 elements
Which is the better approach:
1 : Mergesort then Binary Search
Collections.sort(myList); int keyIndex = Collections.binarySearch(myList, key);
2 : Sequential Search
for(String s : myList){ if(s.equals(key)){ return s; } }
Should there be a difference in searching approach based on the size of the collection to be searched? If YES then how to decide.
EDIT1: Suppose I have to search the list a couple of times, and no new elements will be added in the list.
EDIT2: I could have gone for a
HashSet
, but I am actually having aList<CustomObject>
and I can search theList
multiple times based on different attributes of CustomObject. So I can't have a overriddenequals
method in my CustomObject -
Thomas Jungblut almost 10 yearsHow is that premature?
-
Absurd-Mind almost 10 yearsHe didn't check if a naive solution suffices his needs.
-
SJuan76 almost 10 yearsIt is worth mentioning that "big O" notation expresses the behavior of the algorithm as
n
varies, but it does not imply that one algorithm is "faster" than other for givenn
value. Alog(n)
algorithm could be slower than aO(n^2)
algorithm for a givenn
(ifn
is big enough, in the end the algorithm with the better function will win, but suchn
value may be large enough to be meaningless). Not that this comments belongs only to your answer, but I had to post it somewhere. -
Absurd-Mind almost 10 years@SJuan76 Big O assumes that
n
tends to infinity, so it does not say anything about smalln
. -
TheLostMind almost 10 yearsComplexity for sort + binary search will be O(nLogn + log n) not O(n * log n)
-
SJuan76 almost 10 yearsThat is my point... For a limited value (
1000
in this case), determining which algorithm is "faster" is not so easy. -
Andrei Bozantan almost 10 years@TheLostMind In complexity theory O(n * log n + log n) is considered equal to O(n * log n), since only the term with the highest rank is considered important. Please check the other answers for this question also, and reconsider your negative vote.
-
Absurd-Mind almost 10 years@SJuan76 I did not say anything about speed. The question was 'what is more efficient' and just by looking at the formulas it is linear search in case of one search string. Furthermore i tried to suggest that the OP should first try if any approach suffices ("Premature optimization...").
-
Archeg almost 10 yearsComplexity for sort + binary search in case of k runs will be O(k * log n) only if k >> n and it definitely remains O(n * log n) if k = 1. The proper complexity I believe is O( (k + n)*log n) or at least a bit of explanation is required.
-
TheLostMind almost 10 years@bosonix - My bad. I thought you had entered the complexity of mergesort as "n". Please be more clear the next time around.
-
adaba over 3 years@AndreiBozantan why is it O(n * log n) ? is the quicksort (assuming that's the most common sorting used in languages) complexity negligeable to the binarysearch complexity ?