indices of the k largest elements in an unsorted length n array

16,169

Solution 1

Here is my implementation that does what I want and I think is reasonably efficient:

#include <queue>
#include <vector>
// maxindices.cc
// compile with:
// g++ -std=c++11 maxindices.cc -o maxindices
int main()
{
  std::vector<double> test = {0.2, 1.0, 0.01, 3.0, 0.002, -1.0, -20};
  std::priority_queue<std::pair<double, int>> q;
  for (int i = 0; i < test.size(); ++i) {
    q.push(std::pair<double, int>(test[i], i));
  }
  int k = 3; // number of indices we need
  for (int i = 0; i < k; ++i) {
    int ki = q.top().second;
    std::cout << "index[" << i << "] = " << ki << std::endl;
    q.pop();
  }
}

which gives output:

index[0] = 3
index[1] = 1
index[2] = 0

Solution 2

This should be an improved version of @hazelnusse which is executed in O(nlogk) instead of O(nlogn)

#include <queue>
#include <iostream>
#include <vector>
// maxindices.cc
// compile with:
// g++ -std=c++11 maxindices.cc -o maxindices
int main()
{
  std::vector<double> test = {2, 8, 7, 5, 9, 3, 6, 1, 10, 4};
  std::priority_queue< std::pair<double, int>, std::vector< std::pair<double, int> >, std::greater <std::pair<double, int> > > q;
    int k = 5; // number of indices we need
  for (int i = 0; i < test.size(); ++i) {
    if(q.size()<k)
        q.push(std::pair<double, int>(test[i], i));
    else if(q.top().first < test[i]){
        q.pop();
        q.push(std::pair<double, int>(test[i], i));
    }
  }
  k = q.size();
  std::vector<int> res(k);
  for (int i = 0; i < k; ++i) {
    res[k - i - 1] = q.top().second;
    q.pop();
  }
  for (int i = 0; i < k; ++i) {
    std::cout<< res[i] <<std::endl;
  }
}

8 4 1 2 6

Solution 3

The question has the partial answer; that is std::nth_element returns the "the n-th statistic" with a property that none of the elements preceding nth one are greater than it, and none of the elements following it are less.

Therefore, just one call to std::nth_element is enough to get the k largest elements. Time complexity will be O(n) which is theoretically the smallest since you have to visit each element at least one time to find the smallest (or in this case k-smallest) element(s). If you need these k elements to be ordered, then you need to order them which will be O(k log(k)). So, in total O(n + k log(k)).

Solution 4

You can use the basis of the quicksort algorithm to do what you need, except instead of reordering the partitions, you can get rid of the entries falling out of your desired range.

It's been referred to as "quick select" and here is a C++ implementation:

int partition(int* input, int p, int r)
{
    int pivot = input[r];

    while ( p < r )
    {
        while ( input[p] < pivot )
            p++;

        while ( input[r] > pivot )
            r--;

        if ( input[p] == input[r] )
            p++;
        else if ( p < r ) {
            int tmp = input[p];
            input[p] = input[r];
            input[r] = tmp;
        }
    }

    return r;
}

int quick_select(int* input, int p, int r, int k)
{
    if ( p == r ) return input[p];
    int j = partition(input, p, r);
    int length = j - p + 1;
    if ( length == k ) return input[j];
    else if ( k < length ) return quick_select(input, p, j - 1, k);
    else  return quick_select(input, j + 1, r, k - length);
}

int main()
{
    int A1[] = { 100, 400, 300, 500, 200 };
    cout << "1st order element " << quick_select(A1, 0, 4, 1) << endl;
    int A2[] = { 100, 400, 300, 500, 200 };
    cout << "2nd order element " << quick_select(A2, 0, 4, 2) << endl;
    int A3[] = { 100, 400, 300, 500, 200 };
    cout << "3rd order element " << quick_select(A3, 0, 4, 3) << endl;
    int A4[] = { 100, 400, 300, 500, 200 };
    cout << "4th order element " << quick_select(A4, 0, 4, 4) << endl;
    int A5[] = { 100, 400, 300, 500, 200 };
    cout << "5th order element " << quick_select(A5, 0, 4, 5) << endl;
}

OUTPUT:

1st order element 100
2nd order element 200
3rd order element 300
4th order element 400
5th order element 500

EDIT

That particular implementation has an O(n) average run time; due to the method of selection of pivot, it shares quicksort's worst-case run time. By optimizing the pivot choice, your worst case also becomes O(n).

Solution 5

The standard library won't get you a list of indices (it has been designed to avoid passing around redundant data). However, if you're interested in n largest elements, use some kind of partitioning (both std::partition and std::nth_element are O(n)):

#include <iostream>
#include <algorithm>
#include <vector>

struct Pred {
    Pred(int nth) : nth(nth) {};
    bool operator()(int k) { return k >= nth; }
    int nth;
};

int main() {

    int n = 4;
    std::vector<int> v = {5, 12, 27, 9, 4, 7, 2, 1, 8, 13, 1};

    // Moves the nth element to the nth from the end position.
    std::nth_element(v.begin(), v.end() - n, v.end());

    // Reorders the range, so that the first n elements would be >= nth.
    std::partition(v.begin(), v.end(), Pred(*(v.end() - n)));

    for (auto it = v.begin(); it != v.end(); ++it)
        std::cout << *it << " ";
    std::cout << "\n";

    return 0;
}
Share:
16,169
Luke Peterson
Author by

Luke Peterson

Updated on June 14, 2022

Comments

  • Luke Peterson
    Luke Peterson almost 2 years

    I need to find the indices of the k largest elements of an unsorted, length n, array/vector in C++, with k < n. I have seen how to use nth_element() to find the k-th statistic, but I'm not sure if using this is the right choice for my problem as it seems like I would need to make k calls to nth_statistic, which I guess it would have complexity O(kn), which may be as good as it can get? Or is there a way to do this just in O(n)?

    Implementing it without nth_element() seems like I will have to iterate over the whole array once, populating a list of indices of the largest elements at each step.

    Is there anything in the standard C++ library that makes this a one-liner or any clever way to implement this myself in just a couple lines? In my particular case, k = 3, and n = 6, so efficiency isn't a huge concern, but it would be nice to find a clean and efficient way to do this for arbitrary k and n.

    It looks like Mark the top N elements of an unsorted array is probably the closest posting I can find on SO, the postings there are in Python and PHP.

  • Luke Peterson
    Luke Peterson about 11 years
    I specifically need the indices.
  • amdn
    amdn about 11 years
    I timed an implementation using nth_element and one with partial_sort and using a custom comparator... your implementation is faster.
  • Jim Mischel
    Jim Mischel about 11 years
    There's no need to add all the items to the priority queue. That makes the algorithm O(n log n). It can be done in O(n log k) if you don't add things that are smaller than the smallest item already on the queue. See stackoverflow.com/q/7746648/56778 for discussion.
  • Ziyuan
    Ziyuan about 11 years
    @hazelnusse You can define a structure type for your elements, storing both value and the original index, and meanwhile define the comparator for it.
  • spurra
    spurra over 9 years
    @JimMischel Perhaps I'm missing something, but as far as I can see, if I only add elements which are bigger than the smallest element in the queue I may end up missing some of the k-top elements. E.g if the first element I add into the priority queue is the maximum element, it is at the same time the smallest element in the queue and would result in the algorithm not adding any additional elements.
  • Jim Mischel
    Jim Mischel over 9 years
    @BananaCode: If you look at the linked answer, you'll see that you initially populate the priority queue with the first k elements. Then you use the only-add-if-larger-than-smallest rule on the remaining elements.
  • Asad Saeeduddin
    Asad Saeeduddin over 8 years
    This finds the k largest elements, whereas the OP's requirement is finding the k largest indices.
  • Halil  Sen
    Halil Sen over 8 years
    Well, you are right and (looking at the question again) I don't know why I gave this answer in the first place and why people up-voted it. But most probably, they misunderstood the question just like me, and apparently, this answer helped them in some way so I will keep it like this.
  • Oleg Shirokikh
    Oleg Shirokikh about 7 years
    What about the case when there's a tie(s) with k-th largest element? Would be nice to have this extension to your method.