C++ STL: Custom sorting one vector based on contents of another

29,799

Solution 1

Obvious Approach

Instead of creating two separate vectors/lists, the usual way to handle this is to create a single vector/list of objects that include both names and ages:

struct person { 
    std::string name;
    int age;
};

To get a sort based on age, pass a comparator that looks at the ages:

std::sort(people.begin(), people.end(), 
          [](auto const &a, auto const &b) { return a.age < b.age; });

In older C++ (pre C++11, so no lambda expressions) you can define the comparison as a member overload of operator< or else as a function-object (an object that overloads operator()) to do the comparison:

struct by_age { 
    bool operator()(person const &a, person const &b) const noexcept { 
        return a.age < b.age;
    }
};

Then your sort would look something like:

std::vector<person> people;
// code to put data into people goes here.

std::sort(people.begin(), people.end(), by_age());

As for choosing between defining operator< for the class, or using a separate comparator object as I show above, it's mostly a question of whether there's a single ordering that's "obvious" for this class.

In my opinion, it's not necessarily obvious that sorting people would always happen by age. If, however, in the context of your program it would be obvious that sorting people would be done by age unless you explicitly specified otherwise, then it would make sense to implement the comparison as person::operator< instead of in a separate comparison class the way I've done it above.

Other Approaches

All that having been said, there are a few cases where it really is impractical or undesirable to combine the data into a struct before sorting.

If this is the case, you have a few options to consider. If a normal sort is impractical because the key you're using is too expensive to swap (or can't be swapped at all, though that's pretty rare), you might be able to use a type where you store the data to be sorted along with just an index into the collection of keys associated with each:

using Person = std::pair<int, std::string>;

std::vector<Person> people = {
    { "Anne", 0},
    { "Bob", 1},
    { "Charlie", 2},
    { "Douglas", 3}
};

std::vector<int> ages = {23, 28, 25, 21};

std::sort(people.begin(), people.end(), 
    [](Person const &a, person const &b) { 
        return Ages[a.second] < Ages[b.second];
    });

You can also pretty easily create a separate index that you sort in the order of the keys, and just use that index to read through the associated values:

std::vector<std::string> people = { "Anne", "Bob", "Charlie", "Douglas" };   
std::vector<int> ages = {23, 28, 25, 21};

std::vector<std::size_t> index (people.size());
std::iota(index.begin(), index.end(), 0);

std::sort(index.begin(), index.end(), [&](size_t a, size_t b) { return ages[a] < ages[b]; });

for (auto i : index) { 
    std::cout << people[i] << "\n";
}

Note, however, that in this case, we haven't really sorted the items themselves at all. We've just sorted the index based on the ages, then used the index to index into the array of data we wanted sorted--but both the ages and names remain in their original order.

Of course, it's theoretically possible that you have such a bizarre situation that none of the above will work at all, and you'll need to re-implement sorting to do what you really want. While I suppose the possibility could exist, I've yet to see it in practice (nor do I even recall seeing a close call where I almost decided that was the right thing to do).

Solution 2

As others have noted, you should consider grouping People and Ages.

If you can't/don't want to, you could create an "index" to them, and sort that index instead. For example:

// Warning: Not tested
struct CompareAge : std::binary_function<size_t, size_t, bool>
{
    CompareAge(const std::vector<unsigned int>& Ages)
    : m_Ages(Ages)
    {}

    bool operator()(size_t Lhs, size_t Rhs)const
    {
        return m_Ages[Lhs] < m_Ages[Rhs];
    }

    const std::vector<unsigned int>& m_Ages;
};

std::vector<std::string> people = ...;
std::vector<unsigned int> ages = ...;

// Initialize a vector of indices
assert(people.size() == ages.size());
std::vector<size_t> pos(people.size());
for (size_t i = 0; i != pos.size(); ++i){
    pos[i] = i;
}


// Sort the indices
std::sort(pos.begin(), pos.end(), CompareAge(ages));

Now, the name of the nth person is people[pos[n]] and its age is ages[pos[n]]

Solution 3

Generally you wouldn't put data that you want to keep together in different containers. Make a struct/class for Person and overload operator<.

struct Person
{
    std::string name;
    int age;
}

bool operator< (const Person& a, const Person& b);

Or if this is some throw-away thing:

typedef std::pair<int, std::string> Person;
std::vector<Person> persons;
std::sort(persons.begin(), persons.end());

std::pair already implement comparison operators.

Solution 4

It doesn't make sense to keep them in two separate data structures: if you reorder People, you no longer have a sensible mapping to Ages.

template<class A, class B, class CA = std::less<A>, class CB = std::less<B> >
struct lessByPairSecond
    : std::binary_function<std::pair<A, B>, std::pair<A, B>, bool>
{
    bool operator()(const std::pair<A, B> &left, const std::pair<A, B> &right) {
        if (CB()(left.second, right.second)) return true;
        if (CB()(right.second, left.second)) return false;
        return CA()(left.first, right.first);
    }
};

std::vector<std::pair<std::string, int> > peopleAndAges;
peopleAndAges.push_back(std::pair<std::string, int>("Anne", 23));
peopleAndAges.push_back(std::pair<std::string, int>("Bob", 23));
peopleAndAges.push_back(std::pair<std::string, int>("Charlie", 23));
peopleAndAges.push_back(std::pair<std::string, int>("Douglas", 23));
std::sort(peopleAndAges.begin(), peopleAndAges.end(),
        lessByPairSecond<std::string, int>());
Share:
29,799
Admin
Author by

Admin

Updated on June 11, 2020

Comments

  • Admin
    Admin almost 4 years

    This is probably best stated as an example. I have two vectors/lists:

    People = {Anne, Bob, Charlie, Douglas}
    Ages   = {23, 28, 25, 21}
    

    I want to sort the People based on their ages using something like sort(People.begin(), People.end(), CustomComparator), but I don't know how to write the CustomComparator to look at Ages rather than People.

  • Xavier Nodet
    Xavier Nodet over 14 years
    But would yhe default operator< for a pair sort on the second member, as the questions ask? I doubt it...
  • Jerry Coffin
    Jerry Coffin over 14 years
    @Xavier: he made the int the first member of the std::pair.
  • jm1234567890
    jm1234567890 almost 14 years
    This line should be std::sort(people.begin(), people.end(), by_age); I think there shouldn't be parentheses after "by_age"
  • Jerry Coffin
    Jerry Coffin almost 14 years
    @jm1234567890: not so. by_age is a class, and what we need is an object. The parens mean we're invoking its ctor, so we're passing an object of that class to do the comparison for sorting. If by_age were a function, however, you'd be absolutely correct.
  • pretzlstyle
    pretzlstyle over 5 years
    Thanks a bunch. Just a note: some people here are treating "people and ages should be in the same container" as an answer, when it isn't, and there are legitimate reasons to want to be able to do what OP has asked.
  • Kari
    Kari over 4 years
    +1. By the way what's the purpose of inheriting from binary_function<size_t, size_t>, is it possible to remove this inheritance, and still use CompareAge?
  • Éric Malenfant
    Éric Malenfant over 4 years
    @Kari: binary_function is deprecated since C++11. Prior to that, its purpose was to provide typedefs that were required by some function adaptors. Since no such adaptors are used here, it was not required but it was common practice to define functors like this to make sure that they were compatible with the adaptors.
  • mabraham
    mabraham over 4 years
    Sure it can make sense, like if Ages is rarely used you pay for it's existence in increased cache pressure every time you refer to People