Most efficient way of erasing/deleting multiple std::vector elements while retaining original order?
Solution 1
In <algorithm>
there is a remove_if
function which squeezes all values not removed to the front maintaining the order. This works if those 200 elements can be purely determined by the values, not index.
This is essentially the Erase-remove idiom you have linked to. remove_if
is guaranteed to perform O(N) comparisons (and at most O(N) copyings), which would be more efficient than sorting (O(N log N)), although your last option doesn't actually require sorting if the indices are determined from values (just scan in the reversed direction while copying).
Nevertheless, using remove_if
(if you can) is better than the other 2 options because the implementation has already been written for you, so there's less chance of logical error and conveys better what (not how) to do.
Solution 2
How about looping through the vector, and for each element that needs to be removed, copy the next element that doesn't need to be removed in to that position. Then when you get to the end, truncate it.
int last = 0;
for(int i=0; i<vec.size(); ++i, ++last)
{
while(needs_to_be_removed(i))
++i;
if(i >= vec.size()) break;
vec[last] = vec[i];
}
vec.resize(last);
Solution 3
First thing is, don't call erase
more times than you have to, because for a vector it shuffles all the later elements down, giving the whole operation an Ω(n*m) worst case run time (n the size of the vector, m the size of the list of indexes to remove).
I think the first thing I'd try would be similar to your current code:
- sort the indexes
- create a new vector of size n - m
- iterate over the original vector, copying
indexes[0]
elements, skipping an element, then copyingindexes[1] - indexes[0] - 1
elements, skip an element, and so on. -
swap
the original vector with the new one.
You might be able to do the third step with remove_copy_if
and a predicate which contains state (counting how many items it has copied and how far it is through the sorted list of indexes), but for extremely tedious and obscure reasons this isn't guaranteed to work (algorithm predicates with mutable state are problematic, it seems to be the consensus that the standard doesn't guarantee that the same copy of the predicate is used throughout the algorithm). So I really don't advise trying it, but it might help to bear in mind that what you're writing basically is a modified version of remove_copy_if
.
You could avoid the second step using a back_inserter
rather than presizing the vector, although you'd presumably still reserve the space in advance.
[Edit: come to think of it, why am I copying anything? Rather than implementing a modified remove_copy_if
, implement a modified remove_if
, and just copy to an earlier point in the vector. Then erase
/resize
at the end. I wouldn't worry about the O(m log m)
sort of the indexes until proven to be a problem, because it's unlikely to be significantly slower than the Ω(m) operation to read all the values to be removed, and store them in some kind of container. Then, using this container in the predicate to remove_if
may or may not be O(1)
. Sorting might turn out faster for plausible values of m
.]
Solution 4
You can copy all elements of the vector to a list unless the index in your second container, and then back to a vector. Even with your algorithm of going from the end of the vector to the front, there's a lot of work going on behind the scenes in your vector.
Make your second container a map so it keeps the indeces sorted for you automatically.
edit:
To respond to the comment
The cost of maintaining a map is worst case the same as maintaining another structure (list or vector) and then sorting it. If you're already doing that, you might as well keep it as a map. It doesn't make sense to complain about the overhead of a map vs. the overhead of sorting a list.
As for the performance of my suggested algorithm, if m is the number of elements to be deleted, and n is the total number of elements then it results in O(n - m).
Of course, this is mostly just humoring your attempt to optimize with a vector.
1 - You shouldn't be using a vector if you want to do random access deletes. That's not what they're good at, use a list if at all possible. And since you seem to be much more interested in relative order rather than absolute index, I am wondering why a vector is needed at all. If you gave the entire problem, there's probably a common solution to let you use the most efficient data structure to solve it.
2 - Instead of maintaining a second data structure, mark elements that need to be deleted directly in their container. A trivial way is instead using a container< T > use a container< std::pair< T, char > > and use the char to keep track of the element status.
If you do 1 and 2, you remove all copying completely and get a much more efficient implementation.
Solution 5
Elements of what? Maybe I'm taking your post to seriously but if you have a vector of 1000 elements why not mark the ones that are not valid anymore and do away with erasing in the first place. Obviously I'm making an assumption here that your elements are not demanding a lot of memory.
I only bring this up because you seem to be concerned with speed. If the suggestions already given don't do the trick maybe this idea is worth a thought! In essence speed things up by not doing the operation in the first place.
sascha
Updated on January 01, 2021Comments
-
sascha over 3 years
i have astd::vector<int>
and a second container holding iterators or indexes (no keys, i want constant access to the element) to this vector for deletion purposes. Let's assume i have a vector of 1000 elements and want to erase 200 of them. The order of the non-removed elements should be the same after the deletion operations like before.One more thing i missed in the first version of my question: the values are unique. They are identities.
How would you do that in a safe (regarding the stl rules) and efficient manner (the decision for a vector shall be final)?
Possibilities or Methods i thought about:
- the erase-remove idiom (http://en.wikipedia.org/wiki/Erase-remove_idiom): originally for the deletion of elements which fulfill a condition (including linear search) but i think with ranges of size 1 this method could be used to with already given iterators and a dummy condition. Question: is the original order of elements kept and is it more performant than the last method?
- loop over the indexes and erase the elements with the use of
vector.erase(vector.begin()+index+offset)
while keeping the indexes removed in a container for calculating the offset. This offset could be determined for every remove iteration with the use of astd::lower_bound
n the container of already removed elements. The problem: A lot of binary_searches for getting the offset and a lot of move operations because of random-location-deletion. - At the moment I'm doing the following: get all the iterators for the elements to remove. Sort them in descending order according to the location in the vector and loop over them for the final deletion with
vector.erase
. Now I'm not invalidating any iterator and there are no vector rearrange-operations except for the deletion itself. The problem: a lot of sorting
So, how would you tackle this? Any new ideas? Any recommendations?
Thanks for your input.
Sascha
Edit / Update / Own results: I implemented the erase-remove idiom, which was also mentioned by KennyTM, with a predicate based on the lookup in a boost::dynamic_bitset and it's insanely fast. Furthermore i tried PigBen's move-and-truncate method (also mentioned by Steve Jessop) which is also accessing the bitset in it's while-loop. Both seem to be equally fast with my kind of data. I tried to delete 100 of 1000 Elements (unsigned ints), did this 100 deletes 1M times and there was no significant difference. Because i think the stl-based erase-remove idiom is kinda more "natural, i'm choosing this method (argument was also mentioned by KennyTM).
-
sascha over 13 yearsOkay, thanks for your answer. Now my only problem is that i don't have a "fast" predicate for the remove_if function, but "already decided" indexes/iterators. Seems that i have to convert my decisions to a bitset which is telling me for each element if a remove should be done for using a predicate which runs in constant time. O(n) more work. But asymptotically it should be way better. Maybe i should rewrite my "decision-algorithm" to give out a bitset. Thanks again for your answer, especially mentioning that the original order of elements is kept.
-
sascha over 13 yearsDoesn't seem to efficient to me, but i could be wrong. How does this work you mention compare to the asymptotic guarantees that KennyTM gave in his answer? If you're talking about resizing operations, then i would say there are no ops before the final deletions (because remove is only swapping). And a map alway brings a huge overhead (personal opinion, no big fan of balanced search trees).
-
sascha over 13 yearsStill not convinced. I don't need the sorting step anymore (with the erase-remove idiom), so the asymptotic argument of holding a map is gone. And for the "don't use a vector" -> i wrote that this decision is final. I know the advantages and disadvantages of a vector (at least i think so). The reason for using vector: raw-iteration performance and cache-efficieny (which is bad for list AND map of course). I do a lot more iterating than deleting. My question is: are you thinking that your solution is faster than the erase-remove idiom where i have a constant predicate?
-
sascha over 13 yearsThat sounds simple. I will try it. Thanks for your answer.
-
sascha over 13 yearsYour edited solution kinda sounds like the one proposed by PigBen, or did i misunderstood? So no more sorting involved or is it?
-
Steve Jessop over 13 years@sascha: yes, it's pretty similar. I hadn't seen PigBen's answer. The difference is that he has a function
needs_to_be_removed
, where I count. Also, I waffle more. -
sascha over 13 yearsLike assumed, and mentioned in the opening, the elements are small integral types (int). I think the suggestions by the others did the trick and i'm satisfied now but nonetheless, i like your solution. The only question is, if when iterating trough these elements, the cache-efficiency is still high (because we now have much more elements.) But it seems like a good solution for some applications.
-
Admin over 9 yearsThis is equivalent to
vec.erase(std::remove_if(vec.begin(), vec.end(), needs_to_be_removed), vec.end());
. -
Benjamin Lindley over 9 years@robinjam: No it isn't.
remove_if
removes based on the values of the elements. This solution removes based on the element's position in the vector. -
Admin over 9 years@BenjaminLindley: Right you are. Apologies, I misread one of the lines in your code snippet.
-
Christopher Johnson about 8 yearsCan the
while(needs_to_be_removed(i))
loop overrun the range [0,vec.size()) here? If the entire vector is marked for deletion,i
will take on the value ofvec.size()
, whichneeds_to_be_removed
will have to handle. -
Benjamin Lindley about 8 years@ChristopherJohnson:
needs_to_be_removed
is a hypothetical function. It can be written to handle the case ofi >= vec.size()
, and return false in that case. -
smead almost 6 yearsi like this implementation except for the fact that it alters the
indexes
variable by sorting it. that is not the variable we are operating on so it should not be changed. better to pass by value and sort our local copy instead -
Valeriy Ivanov almost 6 yearsYou are right. But vectors could be large. The vector with indexes could be almost as large as the vector with values. So I decided to pass the vector with indices by reference.