Which is more efficient, Sorting and then Binary Search over a Collection or Linear Search in java

11,210

Solution 1

It depends.

  • If you are searching for only one string the linear search is better because it is in O(n)
  • If you are searching for multiple strings first sorting and then binary searching maybe better. it will be O(logn + n*logn) which is O(n*logn). So if you are checking for ca. n strings, this one is better.
  • If you only want to know if your Collection contains an element (ignoring order), you should consider using HashSet which has O(1).
  • If you need order and a fast contains method, use LinkedHashSet

P.S. premature optimization is the root of all evil.

Solution 2

If you do the search just one time:

  • Complexity for sort + binary search will be O(n * log n).
  • Complexity for linear search is O(n).

If you do the search for more than one time, let's say k times:

  • Complexity for sort + binary search will be O((n + k) * log n).
  • Complexity for linear search will be O(k * n).

So if you do the search just one time you should go with linear search. If you do the search for more than one time, most probably you should sort first.

Also, maybe in this case you can consider using a hash table, which has amortized complexity of O(1) for an element search.

Share:
11,210
Zeeshan
Author by

Zeeshan

Updated on June 21, 2022

Comments

  • Zeeshan
    Zeeshan almost 2 years

    Suppose I am having a Collection of object:

    List<String> myList = populateMyArrayList();
    //Here I am having an ArrayList with 1000 elements
    

    Which is the better approach:

    1 : Mergesort then Binary Search

    Collections.sort(myList);
    int keyIndex = Collections.binarySearch(myList, key);
    

    2 : Sequential Search

    for(String s : myList){
       if(s.equals(key)){
          return s;
       }
    }
    

    Should there be a difference in searching approach based on the size of the collection to be searched? If YES then how to decide.

    EDIT1: Suppose I have to search the list a couple of times, and no new elements will be added in the list.

    EDIT2: I could have gone for a HashSet, but I am actually having a List<CustomObject> and I can search the List multiple times based on different attributes of CustomObject. So I can't have a overridden equals method in my CustomObject

  • Thomas Jungblut
    Thomas Jungblut almost 10 years
    How is that premature?
  • Absurd-Mind
    Absurd-Mind almost 10 years
    He didn't check if a naive solution suffices his needs.
  • SJuan76
    SJuan76 almost 10 years
    It is worth mentioning that "big O" notation expresses the behavior of the algorithm as n varies, but it does not imply that one algorithm is "faster" than other for given n value. A log(n) algorithm could be slower than a O(n^2) algorithm for a given n(if n is big enough, in the end the algorithm with the better function will win, but such n value may be large enough to be meaningless). Not that this comments belongs only to your answer, but I had to post it somewhere.
  • Absurd-Mind
    Absurd-Mind almost 10 years
    @SJuan76 Big O assumes that n tends to infinity, so it does not say anything about small n.
  • TheLostMind
    TheLostMind almost 10 years
    Complexity for sort + binary search will be O(nLogn + log n) not O(n * log n)
  • SJuan76
    SJuan76 almost 10 years
    That is my point... For a limited value (1000 in this case), determining which algorithm is "faster" is not so easy.
  • Andrei Bozantan
    Andrei Bozantan almost 10 years
    @TheLostMind In complexity theory O(n * log n + log n) is considered equal to O(n * log n), since only the term with the highest rank is considered important. Please check the other answers for this question also, and reconsider your negative vote.
  • Absurd-Mind
    Absurd-Mind almost 10 years
    @SJuan76 I did not say anything about speed. The question was 'what is more efficient' and just by looking at the formulas it is linear search in case of one search string. Furthermore i tried to suggest that the OP should first try if any approach suffices ("Premature optimization...").
  • Archeg
    Archeg almost 10 years
    Complexity for sort + binary search in case of k runs will be O(k * log n) only if k >> n and it definitely remains O(n * log n) if k = 1. The proper complexity I believe is O( (k + n)*log n) or at least a bit of explanation is required.
  • TheLostMind
    TheLostMind almost 10 years
    @bosonix - My bad. I thought you had entered the complexity of mergesort as "n". Please be more clear the next time around.
  • adaba
    adaba over 3 years
    @AndreiBozantan why is it O(n * log n) ? is the quicksort (assuming that's the most common sorting used in languages) complexity negligeable to the binarysearch complexity ?