C++ How to merge sorted vectors into a sorted vector / pop the least element from all of them?

c++ sorting vector mergesort processing-efficiency

11,117

One option is to use a std :: priority queue to maintain a heap of iterators, where the iterators bubble up the heap depending on the values they point at.

You could also consider using repeating applications of std :: inplace_merge. This would involve appending all the data together into a big vector and remembering the offsets at which each distinct sorted block begins and ends, and then passing those into inplace_merge. This would probably be faster then the heap solution, although I think fundamentally the complexity is equivalent.

Update: I've implemented the second algorithm I just described. Repeatedly doing a mergesort in place. This code is on ideone.

This works by first concatenating all the sorted lists together into one long list. If there were three source lists, this means there are four 'offsets', which are four points in the full list between which the elements are sorted. The algorithm will then pull off three of these at a time, merging the two corresponding adjacent sorted lists into one sorted list, and then remembering two of those three offsets to be used in the new_offsets.

This repeats in a loop, with pairs of adjacent sorted ranges merged together, until only one sorted range remains.

Ultimately, I think the best algorithm would involve merging the shortest pairs of adjacent ranges together first.

// http://stackoverflow.com/questions/9013485/c-how-to-merge-sorted-vectors-into-a-sorted-vector-pop-the-least-element-fro/9048857#9048857
#include <iostream>
#include <vector>
#include <algorithm>
#include <cassert>
using namespace std;

template<typename T, size_t N>
vector<T> array_to_vector( T(*array)[N] ) { // Yes, this works. By passing in the *address* of
                                            // the array, all the type information, including the
                                            // length of the array, is known at compiler. 
        vector<T> v( *array, &((*array)[N]));
        return v;
}   

void merge_sort_many_vectors() {

    /* Set up the Sorted Vectors */ 
    int a1[] = { 2, 10, 100};
    int a2[] = { 5, 15, 90, 200};
    int a3[] = { 12 };

    vector<int> v1  = array_to_vector(&a1);
    vector<int> v2  = array_to_vector(&a2);
    vector<int> v3  = array_to_vector(&a3);


    vector<int> full_vector;
    vector<size_t> offsets;
    offsets.push_back(0);

    full_vector.insert(full_vector.end(), v1.begin(), v1.end());
    offsets.push_back(full_vector.size());
    full_vector.insert(full_vector.end(), v2.begin(), v2.end());
    offsets.push_back(full_vector.size());
    full_vector.insert(full_vector.end(), v3.begin(), v3.end());
    offsets.push_back(full_vector.size());

    assert(full_vector.size() == v1.size() + v2.size() + v3.size());

    cout << "before:\t";
    for(vector<int>::const_iterator v = full_vector.begin(); v != full_vector.end(); ++v) {
            cout << ", " << *v;
    }       
    cout << endl;
    while(offsets.size()>2) {
            assert(offsets.back() == full_vector.size());
            assert(offsets.front() == 0);
            vector<size_t> new_offsets;
            size_t x = 0;
            while(x+2 < offsets.size()) {
                    // mergesort (offsets[x],offsets[x+1]) and (offsets[x+1],offsets[x+2])
                    inplace_merge(&full_vector.at(offsets.at(x))
                                 ,&full_vector.at(offsets.at(x+1))
                                 ,&(full_vector[offsets.at(x+2)]) // this *might* be at the end
                                 );
                    // now they are sorted, we just put offsets[x] and offsets[x+2] into the new offsets.
                    // offsets[x+1] is not relevant any more
                    new_offsets.push_back(offsets.at(x));
                    new_offsets.push_back(offsets.at(x+2));
                    x += 2;
            }
            // if the number of offsets was odd, there might be a dangling offset
            // which we must remember to include in the new_offsets
            if(x+2==offsets.size()) {
                    new_offsets.push_back(offsets.at(x+1));
            }
            // assert(new_offsets.front() == 0);
            assert(new_offsets.back() == full_vector.size());
            offsets.swap(new_offsets);

    }
    cout << "after: \t";
    for(vector<int>::const_iterator v = full_vector.begin(); v != full_vector.end(); ++v) {
            cout << ", " << *v;
    }
    cout << endl;
}

int main() {
        merge_sort_many_vectors();
}

11,117

Deniz

Python, C++

Updated on June 04, 2022

Comments

Deniz almost 2 years

I have a collection of about a hundred or so sorted vector<int>'s Although most vectors have a small number of integers in them, some of the vectors contain a large (>10K) of them (thus the vectors don't necessarily have the same size).

What I'd like to do essentially iterate through smallest to largest integer, that are contained in all these sorted vectors.

One way to do it would be to merge all these sorted vectors into a sorted vector & simply iterate. Thus,

Question 1: What is the fastest way to merge sorted vectors into a sorted vector?

I'm sure on the other hand there are faster / clever ways to accomplish this without merging & re-sorting the whole thing -- perhaps popping the smallest integer iteratively from this collection of sorted vectors; without merging them first.. so:

Question 2: What is the fasted / best way to pop the least element from a bunch of sorted vector<int>'s?

Based on replies below, and the comments to the question I've implemented an approach where I make a priority queue of iterators for the sorted vectors. I'm not sure if this is performance-efficient, but it seems to be very memory-efficient. I consider the question still open, since I'm not sure we've established the fastest way yet.

// compare vector pointers by integers pointed
struct cmp_seeds {
    bool operator () (const pair< vector<int>::iterator, vector<int>::iterator> p1, const pair< vector<int>::iterator, vector<int>::iterator> p2) const {
        return *(p1.first) >  *(p2.first);      
    }
};

int pq_heapsort_trial() {

    /* Set up the Sorted Vectors */ 
    int a1[] = { 2, 10, 100};
    int a2[] = { 5, 15, 90, 200};
    int a3[] = { 12 };

    vector<int> v1 (a1, a1 + sizeof(a1) / sizeof(int));
    vector<int> v2 (a2, a2 + sizeof(a2) / sizeof(int));
    vector<int> v3 (a3, a3 + sizeof(a3) / sizeof(int));

    vector< vector <int> * > sorted_vectors;
    sorted_vectors.push_back(&v1);
    sorted_vectors.push_back(&v2);
    sorted_vectors.push_back(&v3);
    /* the above simulates the "for" i have in my own code that gives me sorted vectors */

    pair< vector<int>::iterator, vector<int>::iterator> c_lead;
    cmp_seeds mycompare;

    priority_queue< pair< vector<int>::iterator, vector<int>::iterator>, vector<pair< vector<int>::iterator, vector<int>::iterator> >, cmp_seeds> cluster_feeder(mycompare);


    for (vector<vector <int> *>::iterator k = sorted_vectors.begin(); k != sorted_vectors.end(); ++k) {
        cluster_feeder.push( make_pair( (*k)->begin(), (*k)->end() ));
    }


    while ( cluster_feeder.empty() != true) {
        c_lead = cluster_feeder.top();
        cluster_feeder.pop();
        // sorted output
        cout << *(c_lead.first) << endl;

        c_lead.first++;
        if (c_lead.first != c_lead.second) {
            cluster_feeder.push(c_lead);
        }
    }

    return 0;
}

Deniz over 12 years

thanks Aaron, implemented the first suggestion and posted code -- any suggestions? If i get around to doing the inplace_merge will update again.
Aaron McDaid over 12 years

@Deniz, your priority_queue algorithm looks good. I've now updated my answer here to include an implementation of my second algorithm, where pairs of adjacent sorted ranges are repeatedly merge-sorted together until only one range remains.
SyncMaster almost 12 years

@AaronMcDaid I tried the above program with different inputs and the results were not in sorted order. Input: int a1[] = { 30, 50, 3, 8}; int a2[] = { 11, 14, 19, 6, 8, 30}; int a3[] = { 8, 6 }; Output: 11, 14, 19, 6, 8, 30, 30, 50, 3, 8, 6, 8
Aaron McDaid almost 12 years

@SyncMaster, the question assumes that the input vectors are already sorted. But each vector that you have supplied is not already sorted. So I think my program is still correct for the question. If the goal simply was to merge a number of unsorted vectors, then the solution is simply to concatenate the vectors and then run a standard std::sort on it. But the goal here is to use the fact that the inputs are already sorted, and use this fact to get a faster sorting.
Eric Roller over 5 years

this has a bug when one of the input sorted vectors is empty. you can just check for this upfront before adding to cluster_feeder