merge vector into existing vector

10,260

Solution 1

I would at least try:

std::vector<T> tmp;
tmp.reserve(src.size() + dst.size()); // commenters are probably right about this
std::merge(src.begin(), src.end(), dst.begin(), dst.end(), std::back_inserter(tmp));
src.swap(tmp);

But I suspect that much depends on the nature of T, the size of src and dst, and why we need to optimize.

Solution 2

It can be done more efficiently if T is heavy to copy and your compiler supports C++0x.

#include <iterator> // for make_move_iterator

size_t n = dst.size();

dst.insert(dst.end(),
    std::make_move_iterator(src.begin()),
    std::make_move_iterator(src.end()));

std::inplace_merge(dst.begin(), dst.begin() + n, dst.end());

Using make_move_iterator() will cause insert() to move the contents of src into dst instead of copying them.

Update:

You're dealing with POD types and you're already resizing/copying everything in the dst vector in the likely case that insert() overflows the reserve, so it could be faster to just use std::merge() into a new vector. This would avoid that initial copy AND have a better worst-case complexity:

inplace_merge() has a best case of O(n) complexity, but degrades into a worst-case O(n log n) depending on your data.

merge() has a worst-case O(n) so it is guaranteed to be at least as fast, potentially much faster. It also has move optimization built-in.

Share:
10,260
zwol
Author by

zwol

If you want to know about me, please see my website: https://www.owlfolio.org/ . DO NOT CONTACT ME WITH ANY SORT OF JOB OFFER. The policy of treating comment threads as ephemeral and "not for extended discussion" is wrong, is actively harmful to the community, and I will not cooperate with it in any way. Using people's preferred pronouns is basic courtesy and it is reasonable for the code of conduct to insist on it. (I go by "he", for the record.)

Updated on July 18, 2022

Comments

  • zwol
    zwol almost 2 years

    In C++, given vector<T> src, dst, both already sorted, is there a more efficient way to merge the contents of src into dst than

    size_t n = dst.size();
    dst.insert(dst.end(), src.begin(), src.end());
    std::inplace_merge(dst.begin(), dst.begin() + n, dst.end());
    

    ? In the case I care about, T is a small (12-16 bytes, depending on ABI) POD structure, but each vector contains millions of elements, so the total amount of memory in play is tens to hundreds of megabytes.

  • Oliver Charlesworth
    Oliver Charlesworth almost 13 years
    To avoid growing tmp with a back_inserter, why not construct it as std::vector<T> tmp(src.size() + dst.size())? Or use reserve?
  • pmr
    pmr almost 13 years
    @Oli, I'd rather use reserve in case default construction is expensive.
  • Kerrek SB
    Kerrek SB almost 13 years
    That would construct lots of elements -- why not just reserve the vector before the merging?
  • Karl Knechtel
    Karl Knechtel almost 13 years
    Sure, .reserve() is a good idea here... edited accordingly.
  • Kerrek SB
    Kerrek SB almost 13 years
    It should be dst.end() in the final line, and you can probably make n const.
  • Steve Jessop
    Steve Jessop almost 13 years
    And if T is heavy to copy but your compiler doesn't support move semantics, you could resize dst and then swap each element of src into place. Obviously T needs an optimized swap. And I've never used inplace_merge, but it looks as though it does a lot of assignment and/or copying, which might kill performance anyway in the absence of move.
  • zwol
    zwol almost 13 years
    In the case I care about, T is a small (12 bytes, possibly padded to 16 in a vector) POD structure, so it seems to me that copy and move will both degenerate to memcpy.
  • Steve Jessop
    Steve Jessop almost 13 years
    And if T is a nasty type, expensive to copy, then you could unpack merge and change it to default-construct and then swap from the orginal, rather than copying. C++0x has a push_back(T &&), but that won't be used without help since the thing being pushed isn't necessarily disposable, so combine this answer with Cory's.
  • zwol
    zwol almost 13 years
    I might try this, but the vectors in question are potentially very large (each element is small, but there could be millions) so I expect that would wind up hurting more than it helps.
  • Cory Nelson
    Cory Nelson almost 13 years
    Yes. In that case you can still try the merge suggestion in the other answer. He doesn't say it, but merge can actually be more efficient depending on your data: inplace_merge is O(n) in the best case, and O(n log n) in the worst. merge is always O(n).
  • zwol
    zwol almost 13 years
    The vectors potentially have millions of elements, I don't think copying everything to a new vector is a good idea.
  • Steve Jessop
    Steve Jessop almost 13 years
    @Zack: you're going to copy (nearly) everything anyway, unless you're really lucky and (nearly) all of src is greater than (nearly) all of dst. The difference is that inplace_merge (usually) copies everything more than once, and (if necessary) uses less RAM as working space.
  • zwol
    zwol over 12 years
    Months later: this was, in fact, an improvement over what I had, although not as much as I was hoping. An improvement of the magnitude I wanted probably requires a smarter algorithm, and I'm no longer working on that particular problem, so this'll do for the accepted answer.
  • Admin
    Admin almost 11 years
    @KarlKnechtel Why it give me error on std::back_inserter(tmp)); as i am using it with string , std::vector<string> tmp; , Error : std ahs member back_inserter
  • Karl Knechtel
    Karl Knechtel almost 11 years
    It comes from <iterator>.