Fast intersection of sets: C++ vs C#

15,575

Solution 1

There are several problems with your test.

First, you are not testing set intersection, but "create a couple of arrays, fill them with random numbers, and then perform set intersection". You should only time the portion of the code you're actually interested in. Even if you're going to want to do those things, they should not be benchmarked here. Measure one thing at a time, to reduce uncertainty. If you want your C++ implementation to perform better, you first need to know which part of it is slower than expected. Which means you have to separate setup code from intersection test.

Second, you should run the test a large number of times to take possible caching effects and other uncertainties into account. (And probably output one total time for, say, 1000 runs, rather than an individual time for each. That way you reduce the uncertainty from the timer which might have limited resolution and report inaccurate results when used in the 0-20ms range.

Further, as far as I can read from the docs, the input to set_intersection should be sorted, which set2 won't be. An there seems to be no reason to use unordered_map, when unordered_set would be a far better match for what you're doing.

About the setup code being needed, note that you probably don't need to populate vectors in order to run the intersection. Both your own implementation and set_intersection work on iterators already, so you can simply pass them a pair of iterators to the data structures your inputs are in already.

A few more specific comments on your code:

  • Use ++iterator instead of iterator++
  • rather than calling vector.end() at each loop iteration, call it once and cache the result
  • experiment with using sorted vectors vs std::set vs unordered_set (not unordered_map)

Edit:

I haven't tried your C# version, so I can't compare the numbers properly, but here's my modified test. Each is run 1000 times, on a Core 2 Quad 2.5GHz with 4GB RAM:

std::set_intersection on std::set: 2606ms
std::set_intersection on tr1::unordered_set: 1014ms
std::set_intersection on sorted vectors: 171ms
std::set_intersection on unsorted vectors: 10140ms

The last one is a bit unfair, because it has to both copy and sort the vectors. Ideally, only the sort should be part of the benchmark. I tried creating a version that used an array of 1000 unsorted vectors (so I woudln't have to copy the unsorted data in each iteration), but the performance was about the same, or a bit worse, because this would cause constant cache misses, so I reverted back to this version

And my code:

#define _SECURE_SCL 0

#include <ctime>
#include <vector>
#include <set>
#include <iostream>
#include <algorithm>
#include <unordered_set>
#include <windows.h>

template <typename T, typename OutIter>
void stl_intersect(const T& set1, const T& set2, OutIter out){
    std::set_intersection(set1.begin(), set1.end(), set2.begin(), set2.end(), out);
}

template <typename T, typename OutIter>
void sort_stl_intersect(T& set1, T& set2, OutIter out){
    std::sort(set1.begin(), set1.end());
    std::sort(set2.begin(), set2.end());
    std::set_intersection(set1.begin(), set1.end(), set2.begin(), set2.end(), out);
}


template <typename T>
void init_sorted_vec(T first, T last){
    for ( T cur = first; cur != last; ++cur)
    {
        int i = cur - first;
        int value = 1000000000 + i;
        *cur = value;
    }
}

template <typename T>
void init_unsorted_vec(T first, T last){
    for ( T cur = first; cur != last; ++cur)
    {
        int i = rand() % 200000 + 1;
        i *= 10;

        int value = 1000000000 + i;
        *cur = value;
    }
}

struct resize_and_shuffle {
    resize_and_shuffle(int size) : size(size) {}

    void operator()(std::vector<int>& vec){
        vec.resize(size);

    }
    int size;
};

int main()
{
    srand ( time(NULL) );
    std::vector<int> out(100000);

    std::vector<int> sortedvec1(100000);
    std::vector<int> sortedvec2(1000);

    init_sorted_vec(sortedvec1.begin(), sortedvec1.end());
    init_unsorted_vec(sortedvec2.begin(), sortedvec2.end());
    std::sort(sortedvec2.begin(), sortedvec2.end());

    std::vector<int> unsortedvec1(sortedvec1.begin(), sortedvec1.end());
    std::vector<int> unsortedvec2(sortedvec2.begin(), sortedvec2.end());

    std::random_shuffle(unsortedvec1.begin(), unsortedvec1.end());
    std::random_shuffle(unsortedvec2.begin(), unsortedvec2.end());

    std::vector<int> vecs1[1000];
    std::vector<int> vecs2[1000];

    std::fill(vecs1, vecs1 + 1000, unsortedvec1);
    std::fill(vecs2, vecs2 + 1000, unsortedvec2);

    std::set<int> set1(sortedvec1.begin(), sortedvec1.end());
    std::set<int> set2(sortedvec2.begin(), sortedvec2.end());

    std::tr1::unordered_set<int> uset1(sortedvec1.begin(), sortedvec1.end());
    std::tr1::unordered_set<int> uset2(sortedvec2.begin(), sortedvec2.end());

    DWORD start, stop;
    DWORD delta[4];

    start = GetTickCount();
    for (int i = 0; i < 1000; ++i){
        stl_intersect(set1, set2, out.begin());
    }
    stop = GetTickCount();
    delta[0] = stop - start;

    start = GetTickCount();
    for (int i = 0; i < 1000; ++i){
        stl_intersect(uset1, uset2, out.begin());
    }
    stop = GetTickCount();
    delta[1] = stop - start;

    start = GetTickCount();
    for (int i = 0; i < 1000; ++i){
        stl_intersect(sortedvec1, sortedvec2, out.begin());
    }
    stop = GetTickCount();
    delta[2] = stop - start;

    start = GetTickCount();
    for (int i = 0; i < 1000; ++i){
        sort_stl_intersect(vecs1[i], vecs1[i], out.begin());
    }
    stop = GetTickCount();
    delta[3] = stop - start;

    std::cout << "std::set_intersection on std::set: " << delta[0] << "ms\n";
    std::cout << "std::set_intersection on tr1::unordered_set: " << delta[1] << "ms\n";
    std::cout << "std::set_intersection on sorted vectors: " << delta[2] << "ms\n";
    std::cout << "std::set_intersection on unsorted vectors: " << delta[3] << "ms\n";


    return 0;
}

There's no reason why C++ should always be faster than C#. C# has a few key advantages that require a lot of care to compete with in C++. The primary one I can think of is that dynamic allocations are ridiculously cheap in .NET-land. Every time a C++ vector, set or unordered_set (or any other container) has to resize or expand, it is a very costly malloc operation. In .NET, a heap allocation is little more than adding an offset to a pointer.

So if you want the C++ version to compete, you'll probably have to solve that, allowing your containers to resize without having to perform actual heap allocations, probably by using custom allocators for the containers (perhaps boost::pool might be a good bet, or you can try rolling your own)

Another issue is that set_difference only works on sorted input, and in order to reproduce tests results that involve a sort, we have to make a fresh copy of the unsorted data in each iteration, which is costly (although again, using custom allocators will help a lot). I don't know what form your input takes, but it is possible that you can sort your input directly, without copying it, and then run set_difference directly on that. (That would be easy to do if your input is an array or a STL container at least.)

One of the key advantages of the STL is that it is so flexible, it can work on pretty much any input sequence. In C#, you pretty much have to copy the input to a List or Dictionary or something, but in C++, you might be able to get away with running std::sort and set_intersection on the raw input.

Finally, of course, try running the code through a profiler and see exactly where the time is being spent. You might also want to try running the code through GCC instead. It's my impression that STL performance in MSVC is sometimes a bit quirky. It might be worth testing under another compiler just to see if you get similar timings there.

Finally, you might find these blog posts relevant for performance of C++ vs C#: http://blogs.msdn.com/ricom/archive/2005/05/10/416151.aspx

The morale of those is essentially that yes, you can get better performance in C++, but it is a surprising amount of work.

Solution 2

One problem I see right away is that you're passing the sets in C++ by value and not by const reference. So you're copying them every time you pass them around!

Also, I would not use a set for the target of set_intersection. I would use something like

int runSetIntersection(const set<int>& set1, const set<int>& set2)
{   
    vector<int> intersection;
    intersection.reserve(10000) // or whatever the max is

    set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), back_inserter(intersection));

    return intersection.size(); 
}

This code, however, still allocates inside the function. Even faster would be

int runSetIntersection(const set<int>& set1, const set<int>& set2, vector<int>& scratch)
{   
    scratch.reserve(10000) // or whatever the max is

    set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), back_inserter(scratch));

    return scratch.size(); 
}

And then allocate scratch before you start the timer.

Though, if you're just looking for the size, a hand-written for loop, combined with set::find might give even better results.

Solution 3

Use this...

vector<int> set1(10000);
vector<int> set2(1000);

... to get vectors of non-zero initial size. Then don't use push_back, but just update the values directly.

Solution 4

It may also be worthwhile looking at the boost Disjoint Set container, which is specially optimized for certain kinds of large set operations.

It works by treating a group of sets as the unions of several disjoint sets, making it possible to build other sets, such as intersections or unions very cheaply, once the initial set of disjoint sets is constructed. If you expect to be doing a lot of set operations on sets that don't change much, you can probably expect this to be very fast. If, on the other hand, you will use each set once and throw it away, it's probably not going to do too much.

Anyway, you'd be doing yourself a favor to at least experiment with this to see if it gives you any bump in your specific case.

Solution 5

By the way, if you have large sorted sets std::set_intersection is not the fastest algorithm. std::set_intersection takes up to 2*(m+n)-1 comparisons but algorithms like the one from Baeza-Yates can be faster. For small m, Baeza-Yates is O(m * log(n)), while for n = alpha * m it is O(n). The basic idea is to do a kind of 2 way binary search.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.7899&rep=rep1&type=pdf

Experimental Analysis of a Fast Intersection Algorithm for Sorted Sequences Ricardo Baeza-Yates and Alejandro Salinger

OR

R. Baeza-Yates. A Fast Set Intersection Algorithm for Sorted Sequences. In Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM 2004), Springer LNCS 3109, pp 400-408, Istanbul, Turkey, July 2004.

Below is an explanation and an implementation by Erik Frey where he shows significantly faster results than std::set_intersection with a binary probe. I have not tried his code yet.
http://fawx.com/

  1. Pick the median element, A, in the smaller set.
  2. Search for its insertion-position element, B, in the larger set.
  3. If A and B are equal, append the element to the result.
  4. Repeat steps 1-4 on non-empty subsets on either side of elements A and B.

;



/* * baeza_intersect */ template< template class Probe, class RandomAccessIterator, class OutputIterator> void baeza_intersect(RandomAccessIterator begin1, RandomAccessIterator end1, RandomAccessIterator begin2, RandomAccessIterator end2, OutputIterator out) { RandomAccessIterator probe1, probe2;

if ( (end1 - begin1) < ( end2 - begin2 ) ) { if ( begin1 == end1 ) return; probe1 = begin1 + ( ( end1 - begin1 ) >> 1 ); probe2 = lower_bound< Probe >( begin2, end2, *probe1 ); baeza_intersect< Probe >(begin1, probe1, begin2, probe2, out); // intersect left if (! (probe2 == end2 || *probe1 < *probe2 )) *out++ = *probe2++; baeza_intersect< Probe >(++probe1, end1, probe2, end2, out); // intersect right } else { if ( begin2 == end2 ) return; probe2 = begin2 + ( ( end2 - begin2 ) >> 1 ); probe1 = lower_bound< Probe >( begin1, end1, *probe2 ); baeza_intersect< Probe >(begin1, probe1, begin2, probe2, out); // intersect left if (! (probe1 == end1 || *probe2 < *probe1 )) *out++ = *probe1++; baeza_intersect< Probe >(probe1, end1, ++probe2, end2, out); // intersect right } }

/* * with a comparator */ template< template class Probe, class RandomAccessIterator, class OutputIterator, class Comparator > void baeza_intersect(RandomAccessIterator begin1, RandomAccessIterator end1, RandomAccessIterator begin2, RandomAccessIterator end2, OutputIterator out, Comparator cmp) { RandomAccessIterator probe1, probe2;

  if ( (end1 - begin1) < ( end2 - begin2 ) )
  {
    if ( begin1 == end1 )
      return;
    probe1 = begin1 + ( ( end1 - begin1 ) >> 1 );
    probe2 = lower_bound< Probe >( begin2, end2, *probe1, cmp );
    baeza_intersect< Probe >(begin1, probe1, begin2, probe2, out, cmp); // intersect left
    if (! (probe2 == end2 || cmp( *probe1, *probe2 ) ))
      *out++ = *probe2++;
    baeza_intersect< Probe >(++probe1, end1, probe2, end2, out, cmp); // intersect right
  }
  else
  {
    if ( begin2 == end2 )
      return;
    probe2 = begin2 + ( ( end2 - begin2 ) >> 1 );
    probe1 = lower_bound< Probe >( begin1, end1, *probe2, cmp );
    baeza_intersect< Probe >(begin1, probe1, begin2, probe2, out, cmp); // intersect left
    if (! (probe1 == end1 || cmp( *probe2, *probe1 ) ))
      *out++ = *probe1++;
    baeza_intersect< Probe >(probe1, end1, ++probe2, end2, out, cmp); // intersect right
  }
}

// probe.hpp

/** * binary probe: pick the next element by choosing the halfway point between low and high */ template< class RandomAccessIterator, class T > struct binary_probe { RandomAccessIterator operator()(RandomAccessIterator begin, RandomAccessIterator end, const T & value) { return begin + ( (end - begin) >> 1); } };

/** * lower_bound: like stl's lower_bound but with different kinds of probing * note the appearance of the rare template parameter template! */ template< template class Probe, class RandomAccessIterator, class T > RandomAccessIterator lower_bound(RandomAccessIterator begin, RandomAccessIterator end, const T & value) { RandomAccessIterator pit; Probe< RandomAccessIterator, T > pfunc; // probe-functor (wants to get func'd up)

while ( begin < end ) { pit = pfunc(begin, end, value); if ( *pit < value ) begin = pit + 1; else end = pit; } return begin; }

/* * this time with a comparator! */ template< template class Probe, class RandomAccessIterator, class T, class Comparator > RandomAccessIterator lower_bound(RandomAccessIterator begin, RandomAccessIterator end, const T & value, Comparator cmp) { RandomAccessIterator pit; Probe< RandomAccessIterator, T > pfunc;

while ( begin < end ) { pit = pfunc(begin, end, value); if ( cmp(*pit, value) ) begin = pit + 1; else end = pit; } return begin; }

Share:
15,575
Alex Black
Author by

Alex Black

Alex Black's blog Alex Black on LinkedIn

Updated on July 27, 2022

Comments

  • Alex Black
    Alex Black almost 2 years

    On my machine (Quad core, 8gb ram), running Vista x64 Business, with Visual Studio 2008 SP1, I am trying to intersect two sets of numbers very quickly.

    I've implemented two approaches in C++, and one in C#. The C# approach is faster so far, I'd like to improve the C++ approach so its faster than C#, which I expect C++ can do.

    Here is the C# output: (Release build)

    Found the intersection 1000 times, in 4741.407 ms
    

    Here is the initial C++ output, for two different approaches (Release x64 build):

    Found the intersection (using unordered_map) 1000 times, in 21580.7ms
    Found the intersection (using set_intersection) 1000 times, in 22366.6ms
    

    Here is the latest C++ output, for three approaches (Release x64 build):

    Latest benchmark:

    Found the intersection of 504 values (using unordered_map) 1000 times, in 28827.6ms
    Found the intersection of 495 values (using set_intersection) 1000 times, in 9817.69ms
    Found the intersection of 504 values (using unordered_set) 1000 times, in 24769.1ms
    

    So, the set_intersection approach is now approx 2x slower than C#, but 2x faster than the initial C++ approaches.

    Latest C++ code:

    Code:
    
    // MapPerformance.cpp : Defines the entry point for the console application.
    //
    
    #include "stdafx.h"
    #include <hash_map>
    #include <vector>
    #include <iostream>
    #include <time.h>
    #include <algorithm>
    #include <set>
    #include <unordered_set>
    
    #include <boost\unordered\unordered_map.hpp>
    
    #include "timer.h"
    
    using namespace std;
    using namespace stdext;
    using namespace boost;
    using namespace tr1;
    
    
    int runIntersectionTest2(const vector<int>& set1, const vector<int>& set2)
    {
        // hash_map<int,int> theMap;
        // map<int,int> theMap;
        unordered_set<int> theSet;      
    
         theSet.insert( set1.begin(), set1.end() );
    
        int intersectionSize = 0;
    
        vector<int>::const_iterator set2_end = set2.end();
    
        for ( vector<int>::const_iterator iterator = set2.begin(); iterator != set2_end; ++iterator )
        {
            if ( theSet.find(*iterator) != theSet.end() )
            {
                    intersectionSize++;
            }
        }
    
        return intersectionSize;
    }
    
    int runIntersectionTest(const vector<int>& set1, const vector<int>& set2)
    {
        // hash_map<int,int> theMap;
        // map<int,int> theMap;
        unordered_map<int,int> theMap;  
    
        vector<int>::const_iterator set1_end = set1.end();
    
        // Now intersect the two sets by populating the map
        for ( vector<int>::const_iterator iterator = set1.begin(); iterator != set1_end; ++iterator )
        {
            int value = *iterator;
    
            theMap[value] = 1;
        }
    
        int intersectionSize = 0;
    
        vector<int>::const_iterator set2_end = set2.end();
    
        for ( vector<int>::const_iterator iterator = set2.begin(); iterator != set2_end; ++iterator )
        {
            int value = *iterator;
    
            unordered_map<int,int>::iterator foundValue = theMap.find(value);
    
            if ( foundValue != theMap.end() )
            {
                theMap[value] = 2;
    
                intersectionSize++;
            }
        }
    
        return intersectionSize;
    
    }
    
    int runSetIntersection(const vector<int>& set1_unsorted, const vector<int>& set2_unsorted)
    {   
        // Create two vectors
        std::vector<int> set1(set1_unsorted.size());
        std::vector<int> set2(set2_unsorted.size());
    
        // Copy the unsorted data into them
        std::copy(set1_unsorted.begin(), set1_unsorted.end(), set1.begin());
        std::copy(set2_unsorted.begin(), set2_unsorted.end(), set2.begin());
    
        // Sort the data
        sort(set1.begin(),set1.end());
        sort(set2.begin(),set2.end());
    
        vector<int> intersection;
        intersection.reserve(1000);
    
        set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), back_inserter(intersection));
    
        return intersection.size(); 
    }
    
    void createSets( vector<int>& set1, vector<int>& set2 )
    {
        srand ( time(NULL) );
    
        set1.reserve(100000);
        set2.reserve(1000);
    
        // Create 100,000 values for set1
        for ( int i = 0; i < 100000; i++ )
        {
            int value = 1000000000 + i;
            set1.push_back(value);
        }
    
        // Try to get half of our values intersecting
        float ratio = 200000.0f / RAND_MAX;
    
    
        // Create 1,000 values for set2
        for ( int i = 0; i < 1000; i++ )
        {
            int random = rand() * ratio + 1;
    
            int value = 1000000000 + random;
            set2.push_back(value);
        }
    
        // Make sure set1 is in random order (not sorted)
        random_shuffle(set1.begin(),set1.end());
    }
    
    int _tmain(int argc, _TCHAR* argv[])
    {
        int intersectionSize = 0;
    
        vector<int> set1, set2; 
        createSets( set1, set2 );
    
        Timer timer;
        for ( int i = 0; i < 1000; i++ )
        {
            intersectionSize = runIntersectionTest(set1, set2);
        }
        timer.Stop();
    
        cout << "Found the intersection of " << intersectionSize << " values (using unordered_map) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;
    
        timer.Reset();
        for ( int i = 0; i < 1000; i++ )
        {
            intersectionSize = runSetIntersection(set1,set2);
        }
        timer.Stop();
    
        cout << "Found the intersection of " << intersectionSize << " values (using set_intersection) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;
    
        timer.Reset();
        for ( int i = 0; i < 1000; i++ )
        {
            intersectionSize = runIntersectionTest2(set1,set2);
        }
        timer.Stop();
    
        cout << "Found the intersection of " << intersectionSize << " values (using unordered_set) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;
    
        getchar();
    
        return 0;
    }
    

    C# code:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    
    namespace DictionaryPerformance
    {
        class Program
        {
            static void Main(string[] args)
            {
                List<int> set1 = new List<int>(100000);
                List<int> set2 = new List<int>(1000);
    
                // Create 100,000 values for set1
                for (int i = 0; i < 100000; i++)
                {
                    int value = 1000000000 + i;
                    set1.Add(value);
                }
    
                Random random = new Random(DateTime.Now.Millisecond);
    
                // Create 1,000 values for set2
                for (int i = 0; i < 1000; i++)
                {
                    int value = 1000000000 + (random.Next() % 200000 + 1);
                    set2.Add(value);
                }
    
                long start = System.Diagnostics.Stopwatch.GetTimestamp();
                for (int i = 0; i < 1000; i++)
                {
                    runIntersectionTest(set1,set2);
                }
                long duration = System.Diagnostics.Stopwatch.GetTimestamp() - start;
    
                Console.WriteLine(String.Format("Found the intersection 1000 times, in {0} ms", ((float) duration * 1000.0f) / System.Diagnostics.Stopwatch.Frequency));
    
                Console.ReadKey();
            }
    
            static int runIntersectionTest(List<int> set1, List<int> set2)
            {
    
                Dictionary<int,int> theMap = new Dictionary<int,int>(100000);
    
                // Now intersect the two sets by populating the map
                foreach( int value in set1 )
                {
                    theMap[value] = 1;
                }
    
                int intersectionSize = 0;
    
                foreach ( int value in set2 )
                {
                    int count;
                    if ( theMap.TryGetValue(value, out count ) )
                    {
                        theMap[value] = 2;
                        intersectionSize++;
                    }
                }
    
                return intersectionSize;
            }
        }
    }
    

    C++ code:

    // MapPerformance.cpp : Defines the entry point for the console application.
    //
    
    #include "stdafx.h"
    #include <hash_map>
    #include <vector>
    #include <iostream>
    #include <time.h>
    #include <algorithm>
    #include <set>
    
    #include <boost\unordered\unordered_map.hpp>
    
    #include "timer.h"
    
    using namespace std;
    using namespace stdext;
    using namespace boost;
    
    int runIntersectionTest(vector<int> set1, vector<int> set2)
    {
        // hash_map<int,int> theMap;
        // map<int,int> theMap;
        unordered_map<int,int> theMap;
    
        // Now intersect the two sets by populating the map
        for ( vector<int>::iterator iterator = set1.begin(); iterator != set1.end(); iterator++ )
        {
            int value = *iterator;
    
            theMap[value] = 1;
        }
    
        int intersectionSize = 0;
    
        for ( vector<int>::iterator iterator = set2.begin(); iterator != set2.end(); iterator++ )
        {
            int value = *iterator;
    
            unordered_map<int,int>::iterator foundValue = theMap.find(value);
    
            if ( foundValue != theMap.end() )
            {
                theMap[value] = 2;
    
                intersectionSize++;
            }
        }
    
        return intersectionSize;
    
    }
    
    int runSetIntersection(set<int> set1, set<int> set2)
    {   
        set<int> intersection;
    
        set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), inserter(intersection, intersection.end()));
    
        return intersection.size(); 
    }
    
    
    
    int _tmain(int argc, _TCHAR* argv[])
    {
        srand ( time(NULL) );
    
        vector<int> set1;
        vector<int> set2;
    
        set1.reserve(10000);
        set2.reserve(1000);
    
        // Create 100,000 values for set1
        for ( int i = 0; i < 100000; i++ )
        {
            int value = 1000000000 + i;
            set1.push_back(value);
        }
    
        // Create 1,000 values for set2
        for ( int i = 0; i < 1000; i++ )
        {
            int random = rand() % 200000 + 1;
            random *= 10;
    
            int value = 1000000000 + random;
            set2.push_back(value);
        }
    
    
        Timer timer;
        for ( int i = 0; i < 1000; i++ )
        {
            runIntersectionTest(set1, set2);
        }
        timer.Stop();
    
        cout << "Found the intersection (using unordered_map) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;
    
        set<int> set21;
        set<int> set22;
    
        // Create 100,000 values for set1
        for ( int i = 0; i < 100000; i++ )
        {
            int value = 1000000000 + i;
            set21.insert(value);
        }
    
        // Create 1,000 values for set2
        for ( int i = 0; i < 1000; i++ )
        {
            int random = rand() % 200000 + 1;
            random *= 10;
    
            int value = 1000000000 + random;
            set22.insert(value);
        }
    
        timer.Reset();
        for ( int i = 0; i < 1000; i++ )
        {
            runSetIntersection(set21,set22);
        }
        timer.Stop();
    
        cout << "Found the intersection (using set_intersection) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;
    
        getchar();
    
        return 0;
    }
    

    Ok, here is the latest, with some changes:

    • The C++ sets are now properly setup so they have a 50% intersection (like the C#)
    • Set1 is shuffled so its not sorted, set2 was already not sorted
    • The set_intersection implementation now uses vectors, and sorts them first

    C++ (Release, x64) Results:

    Found the intersection of 503 values (using unordered_map) 1000 times, in 35131.1ms
    Found the intersection of 494 values (using set_intersection) 1000 times, in 10317ms
    

    So its 2x slower than C#. @Jalf: You're getting some pretty fast numbers, is there something I'm doing wrong here?

    C++ Code:

    // MapPerformance.cpp : Defines the entry point for the console application.
    //
    
    #include "stdafx.h"
    #include <hash_map>
    #include <vector>
    #include <iostream>
    #include <time.h>
    #include <algorithm>
    #include <set>
    
    #include <boost\unordered\unordered_map.hpp>
    
    #include "timer.h"
    
    using namespace std;
    using namespace stdext;
    using namespace boost;
    
    int runIntersectionTest(const vector<int>& set1, const vector<int>& set2)
    {
        // hash_map<int,int> theMap;
        // map<int,int> theMap;
        unordered_map<int,int> theMap;  
    
        vector<int>::const_iterator set1_end = set1.end();
    
        // Now intersect the two sets by populating the map
        for ( vector<int>::const_iterator iterator = set1.begin(); iterator != set1_end; ++iterator )
        {
            int value = *iterator;
    
            theMap[value] = 1;
        }
    
        int intersectionSize = 0;
    
        vector<int>::const_iterator set2_end = set2.end();
    
        for ( vector<int>::const_iterator iterator = set2.begin(); iterator != set2_end; ++iterator )
        {
            int value = *iterator;
    
            unordered_map<int,int>::iterator foundValue = theMap.find(value);
    
            if ( foundValue != theMap.end() )
            {
                theMap[value] = 2;
    
                intersectionSize++;
            }
        }
    
        return intersectionSize;
    
    }
    
    int runSetIntersection(const vector<int> set1_unsorted, const vector<int> set2_unsorted)
    {   
        // Create two vectors
        std::vector<int> set1(set1_unsorted.size());
        std::vector<int> set2(set2_unsorted.size());
    
        // Copy the unsorted data into them
        std::copy(set1_unsorted.begin(), set1_unsorted.end(), set1.begin());
        std::copy(set2_unsorted.begin(), set2_unsorted.end(), set2.begin());
    
        // Sort the data
        sort(set1.begin(),set1.end());
        sort(set2.begin(),set2.end());
    
        vector<int> intersection;
        intersection.reserve(1000);
    
        set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), inserter(intersection, intersection.end()));
    
        return intersection.size(); 
    }
    
    void createSets( vector<int>& set1, vector<int>& set2 )
    {
        srand ( time(NULL) );
    
        set1.reserve(100000);
        set2.reserve(1000);
    
        // Create 100,000 values for set1
        for ( int i = 0; i < 100000; i++ )
        {
            int value = 1000000000 + i;
            set1.push_back(value);
        }
    
        // Try to get half of our values intersecting
        float ratio = 200000.0f / RAND_MAX;
    
    
        // Create 1,000 values for set2
        for ( int i = 0; i < 1000; i++ )
        {
            int random = rand() * ratio + 1;
    
            int value = 1000000000 + random;
            set2.push_back(value);
        }
    
        // Make sure set1 is in random order (not sorted)
        random_shuffle(set1.begin(),set1.end());
    }
    
    int _tmain(int argc, _TCHAR* argv[])
    {
        int intersectionSize = 0;
    
        vector<int> set1, set2; 
        createSets( set1, set2 );
    
        Timer timer;
        for ( int i = 0; i < 1000; i++ )
        {
            intersectionSize = runIntersectionTest(set1, set2);
        }
        timer.Stop();
    
        cout << "Found the intersection of " << intersectionSize << " values (using unordered_map) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;
    
        timer.Reset();
        for ( int i = 0; i < 1000; i++ )
        {
            intersectionSize = runSetIntersection(set1,set2);
        }
        timer.Stop();
    
        cout << "Found the intersection of " << intersectionSize << " values (using set_intersection) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;
    
        getchar();
    
        return 0;
    }
    
  • GManNickG
    GManNickG almost 15 years
    Or call reserve(1000), but continue using push_back
  • josesuero
    josesuero almost 15 years
    or just don't include this in your timing in the first place. Your setup code should not be part of the benchmark.
  • Alex Black
    Alex Black almost 15 years
    My C++ code includes an implementation using set_intersection. I will take a look at "includes".
  • Alex Black
    Alex Black almost 15 years
    I want to include the setup time, because my data is not in one of these data structures at the start... so if the approach requires me to put it into a data structure then I need to include that tme.
  • Roddy
    Roddy almost 15 years
    @jalf. Fair point. I was assuming he actually wanted to measure that but as well...
  • Kevin Anderson
    Kevin Anderson almost 15 years
    I agree that there is a vast difference between initializing something to 10000 bytes for the C# and doing multiple re-allocs on the C++ side.
  • josesuero
    josesuero almost 15 years
    nevertheless, it is not part of this test. You'll get much more useful benchmarking data by treating them as separate problems. First worry about the intersection performance, and then do a separate test for the "build initial data structures" part.
  • Alex Black
    Alex Black almost 15 years
    Agreed, the test is not strictly set intersection, it also includes populating any data structures needed for the test. I've updated the test to run 1000 times.
  • josesuero
    josesuero almost 15 years
    Populating the data structures is not part of this test though. Measure that separately. You are introducing a huge amount of uncertainty which essentially invalidates your results.
  • Alex Black
    Alex Black almost 15 years
    I modified the C++ implementation that uses ordered_set and vector to call reserve() on the vectors. Not much change.
  • Alex Black
    Alex Black almost 15 years
    I modified the code and re-ran the benchmarks with the population of the data structures done before the timed tests.
  • Alex Black
    Alex Black almost 15 years
    Jalf: it looks to me like std::set is a sorted data structure. sgi.com/tech/stl/set.html
  • josesuero
    josesuero almost 15 years
    oh right, I thought you were running that one on vectors too. (Why aren't you?)
  • Alex Black
    Alex Black almost 15 years
    because the first example I found of how to use set_intersection showed it working on sets. If you think switching to vectors will speed things up I will give it a try.
  • Alex Black
    Alex Black almost 15 years
    which ones? Optimization is set to "Maximize Speed (/O2)"
  • josesuero
    josesuero almost 15 years
    I don't know if it'll speed it up, but set_intersection works on any input iterator, as long as it is a sorted sequence (so if you use vectors, you'll have to first sort it)
  • Alex Black
    Alex Black almost 15 years
    I set "Favor Size or Speed" to "Favor Fast Code (/Ot)", no difference.
  • Alex Black
    Alex Black almost 15 years
    @Jalf: I modified the code to pass sorted vectors into set_intersection. If I sorted them before the test, then the intersection was done in 352ms (1000 times!) so very fast. But if I do the sort inside the test, then it takes 2401ms.
  • Magnus
    Magnus almost 15 years
    Did you figure out what was taking most of the time?
  • Alex Black
    Alex Black almost 15 years
    no I haven't figured out what takes most of the time. In the set_intersection test there isn't much going on except calling set_intersection :)
  • josesuero
    josesuero almost 15 years
    2401 is still twice as good as your C# code though, isn't it?
  • Andreas Magnusson
    Andreas Magnusson almost 15 years
    Good spotting! That ought to count for some of the slowness.
  • Andreas Magnusson
    Andreas Magnusson almost 15 years
    Don't pass vectors by value!! const vector<int> &set1 is the C++ way.
  • Alex Black
    Alex Black almost 15 years
    yes, 2401ms twice as fast as C# and very good.. I like it. I'm just wondering if its a valid test given the data in set1 is already sorted (my real data is not). I'm going to look at updating the test to work on two random (not sorted) tests.
  • Alex Black
    Alex Black almost 15 years
    good call, sloppy mistake on my part trying to rapidly change the code.
  • Alex Black
    Alex Black almost 15 years
    I'll adjust and repost without copying... I have set _SECURE_SCL to 0. (#define _SECURE_SCL 0)
  • Alex Black
    Alex Black almost 15 years
    hey Jalf, thats awesome, very fast.. However, would it be easy to adjust so that the vectors are re-sorted each time? My data isn't sorted, so I am going to have to sort it. In real life this won't be run 1000 times on pre-sorted vectors.
  • SingleNegationElimination
    SingleNegationElimination almost 15 years
    This structure has just given me a chill, for it contains the first practical use of the Ackermann function, or rather its inverse. Amazing!
  • josesuero
    josesuero almost 15 years
    Done. I test with both pre-sorted and unsorted vectors now. It looks like both set and unordered_set are a lot faster than in your own tests though. Can you reproduce the same results if you run my code?
  • Alex Black
    Alex Black almost 15 years
    Nice. One more question: set1 is already sorted... I'm trying to find a good way to randomize it while still ensuring it has values from 0 to 100,000 (excluding the offset).
  • josesuero
    josesuero almost 15 years
    looks like std::random_shuffle can do that :)
  • Alex Black
    Alex Black almost 15 years
    I posted an updated benchmark and code below, I'm not seeing the crazy fast results you are Jalf. (it was fast, until I shuffled set1 and sorted it each time)
  • josesuero
    josesuero almost 15 years
    You could write the intersection results to a vector instead of a set (should be a lot faster). Also I'd expect unordered_set to be a lot faster than plain sets
  • Alex Black
    Alex Black almost 15 years
    You're now getting the same times I'm getting, about 10s to use set_intersection on unsorted lists. This is a 2x improvement on where we started, but still 2x slower than C#.
  • josesuero
    josesuero almost 15 years
    Does that one matter though? If set or unsorted_set is still 4x faster than C#
  • Alex Black
    Alex Black almost 15 years
    Set: the sorting is done outside the benchmark... so doesn't seem fair. Unordered_set: does this work? You're passing what looks like an unsorted set to set_intersection...
  • josesuero
    josesuero almost 15 years
    Set is an ordered data structure, remember? So whether the input it is built from is sorted or not doesn't matter. It sorts its input on insertion anyway. About unordered_set, I think you're right, that isn't valid, since it obviously isn't sorted. Hadn't really thought of that. :)
  • Alex Black
    Alex Black almost 15 years
    Set: yes, so essentially it 'sorts' as you insert.. Insertions are slower into sets than into vectors. So if I started with my unsorted data, it will take me longer to get it into a set than a vector.
  • Alex Black
    Alex Black almost 15 years
    I adjusted the C# code now, to shuffle set1, it didn't make any diffence. The C# method doesn't care if the sets are sorted or not.
  • josesuero
    josesuero almost 15 years
    yeah, I realized the same thing. I implemented the same algorithm again in C++, and so far, getting around 600ms on it, but I think I have a bug or two, so it'll be a few mins before I post it.
  • Alex Black
    Alex Black almost 15 years
    cool - one thing to do is to output the count of items in the intersection, as a sanity check (it should be ~500)
  • josesuero
    josesuero almost 15 years
    yep, it was a bug, was testing intersect(set2, set2) :) Well, it's bedtime for me. Going to add a bit to my answer, and then head off. :)
  • josesuero
    josesuero almost 15 years
    Ok, posted my final update for tonight. No new code or benchmarks, but a few hints on what you should probably try next. I might give it a shot again tomorrow. :)
  • Alex Black
    Alex Black almost 15 years
    I made those changes, no noticeable difference. Your new runIntersectionTest is similar in performance to the unordered_map one (about 2x slower than set_intersection)
  • deft_code
    deft_code almost 15 years
    The code I posted deliberately didn't pass the vectors by reference, so the compiler would implicitly copy them. If you'd like to do it your way, replace to code before the sort with this: vector<int> set1( unsorted_set1 ); vector<int> set2( unsorted_set2 ); this way the vectors are efficiently copied. The code as it stands fills each vector with zeros. Then it copies the desired value over the top. The above method allocates enough space then copies the values from the parameter with no extra assignments. This will speed things up, though I don't know by how much
  • Alex Black
    Alex Black almost 15 years
    I'm not sure I 100% get you. I made a change though, I made runSetIntersection take vector<int> not vector<int>& (so it gets a copy), and then removed the code inside that function that copies the vector. This didn't make a noticeable difference. I do agree in general the benchmark would be more fair if the vector was not copied each time.