distance between std::set begin() and std::set iterator in O(logn)

21,220

Solution 1

You can use sorted std::vector<int>. If it is sorted, you can find element in O(log n). And you can find distance in constant time O(1).

By sorted vector I mean that after every insertion (or after many insertions) you do std::sort(v.begin(), v.end());

If your type inside std::set<T> is not as light like int - you can keep both - std::set<T> and sorted vector of iterators std::vector<std::set<T>::iterator>. But it could not trivial to keep these structures in sync. Maybe you can add some like position to T? Or keep std::set<std::pair<T,int>, comp_first_of_pair<T>> where comp_first_of_pair is just to have set sorted only by T and the second int is for keeping position in set?

Just a few ideas - to have even O(1) distance time...

Solution 2

You can find the index of an element in a set in O(log(N)) with an ordered set: https://www.geeksforgeeks.org/ordered-set-gnu-c-pbds/ . This is implemented as a red-black tree. I know this topic is very old, but it might help readers in the future.

Solution 3

You can use the function std::set<>::find to search for an element x and compute the distance to the first iterator of the set.

std::distance(s.begin(), s.find(x))

However, as comments indicate the runtime of distance depends on the type of iterator used. In the case of a set this is a bidirectional iterator and distance is O(n).

Solution 4

If computing the index is really your bottleneck, then I see 2 options:

  • Store the index. Either in the nodes themselves or in a separate std::map. Of course this means you have to keep this cache updated.
  • Use a std::vector. That is not as bad as it might look at first. If you keep the vector always sorted you can use it like a set. The performance will be similar to set. The biggest drawback is: the node might be copied a lot. (This can be compensated by using pointers, boost:shared_ptr or std::unique_ptr [c++11 only])
    To look up an element you use std::lower_bound.
    Instead of insert/push_back you do: insert( lower_bound(b,e,x), x )

Solution 5

You cannot use matematics with bidirectional iterators. So only acceptable way is to count by yourself (how many int less than X you inserted into set).

But, if you have cleanly separated "data collection" and "data usage" stages - probably it is worth to replace std::set with sorted std::vector. Its harder to maintain, but have own benefits, including iterator matematics (so you can get search with O(log n) with std::binary_search and the distance with O(1) )

Share:
21,220
divanshu
Author by

divanshu

An avid learner of algorithms. Interest in Application development.

Updated on August 07, 2022

Comments

  • divanshu
    divanshu over 1 year

    I need to find the index of an element in std::set. This index can be visualized as the distance of the iterator from the beginning. One way can be:

    for(int i = 0, set<int>::iterator it = s.begin(); it != iteratorToBeFound; ++it, ++i);
    

    This clearly takes O(n) time. But we know that the distance from the root in a binary search tree as implemented by set internally can be found in O(log n) time.

    Is their any way to implement the same to find the index in O(log n) time in C++ set?

  • Xeo
    Xeo about 11 years
    That's O(log n + m), though. But the best you can do, AFAIK.
  • juanchopanza
    juanchopanza about 11 years
    But std::distance is O(N) here.
  • divanshu
    divanshu about 11 years
    I know about std::distance but this is implemented the same way as written in the question and definitely is O(n).
  • divanshu
    divanshu about 11 years
    But sorting after every insertion in std::vector would cost me O(nlogn). Where's the advantage?
  • PiotrNycz
    PiotrNycz about 11 years
    1) You can sort only after a series of consecutive insertions. 2) Cost of insertion in std::set<> is O(log n) - n insertions: O(n Log n). 3) Maybe you insert once - but testing distance many times....
  • chb
    chb over 4 years
    Rather than link to an external resource, please excerpt the relevant portion of it and include it in your answer.
  • Rithik Singh
    Rithik Singh over 4 years
    insert is a O(n) function in vector
  • vSzemkel
    vSzemkel about 2 years
    And remember to use specialized version of lower_bound for containers that implements it. Newer use std::lower_bound for ordered (multi)map and (multi)sets
  • Chris Vilches
    Chris Vilches almost 2 years
    Very good code, and I could solve one problem I had where the bottleneck was std::set's distance being O(N). This worked like magic!