Fast intersection of sets: C++ vs C#

c++ performance stl intersection

15,575

Solution 1

There are several problems with your test.

First, you are not testing set intersection, but "create a couple of arrays, fill them with random numbers, and then perform set intersection". You should only time the portion of the code you're actually interested in. Even if you're going to want to do those things, they should not be benchmarked here. Measure one thing at a time, to reduce uncertainty. If you want your C++ implementation to perform better, you first need to know which part of it is slower than expected. Which means you have to separate setup code from intersection test.

Second, you should run the test a large number of times to take possible caching effects and other uncertainties into account. (And probably output one total time for, say, 1000 runs, rather than an individual time for each. That way you reduce the uncertainty from the timer which might have limited resolution and report inaccurate results when used in the 0-20ms range.

Further, as far as I can read from the docs, the input to set_intersection should be sorted, which set2 won't be. An there seems to be no reason to use unordered_map, when unordered_set would be a far better match for what you're doing.

About the setup code being needed, note that you probably don't need to populate vectors in order to run the intersection. Both your own implementation and set_intersection work on iterators already, so you can simply pass them a pair of iterators to the data structures your inputs are in already.

A few more specific comments on your code:

Use ++iterator instead of iterator++
rather than calling vector.end() at each loop iteration, call it once and cache the result
experiment with using sorted vectors vs std::set vs unordered_set (not unordered_map)

Edit:

I haven't tried your C# version, so I can't compare the numbers properly, but here's my modified test. Each is run 1000 times, on a Core 2 Quad 2.5GHz with 4GB RAM:

std::set_intersection on std::set: 2606ms
std::set_intersection on tr1::unordered_set: 1014ms
std::set_intersection on sorted vectors: 171ms
std::set_intersection on unsorted vectors: 10140ms

The last one is a bit unfair, because it has to both copy and sort the vectors. Ideally, only the sort should be part of the benchmark. I tried creating a version that used an array of 1000 unsorted vectors (so I woudln't have to copy the unsorted data in each iteration), but the performance was about the same, or a bit worse, because this would cause constant cache misses, so I reverted back to this version

And my code:

#define _SECURE_SCL 0

#include <ctime>
#include <vector>
#include <set>
#include <iostream>
#include <algorithm>
#include <unordered_set>
#include <windows.h>

template <typename T, typename OutIter>
void stl_intersect(const T& set1, const T& set2, OutIter out){
    std::set_intersection(set1.begin(), set1.end(), set2.begin(), set2.end(), out);
}

template <typename T, typename OutIter>
void sort_stl_intersect(T& set1, T& set2, OutIter out){
    std::sort(set1.begin(), set1.end());
    std::sort(set2.begin(), set2.end());
    std::set_intersection(set1.begin(), set1.end(), set2.begin(), set2.end(), out);
}


template <typename T>
void init_sorted_vec(T first, T last){
    for ( T cur = first; cur != last; ++cur)
    {
        int i = cur - first;
        int value = 1000000000 + i;
        *cur = value;
    }
}

template <typename T>
void init_unsorted_vec(T first, T last){
    for ( T cur = first; cur != last; ++cur)
    {
        int i = rand() % 200000 + 1;
        i *= 10;

        int value = 1000000000 + i;
        *cur = value;
    }
}

struct resize_and_shuffle {
    resize_and_shuffle(int size) : size(size) {}

    void operator()(std::vector<int>& vec){
        vec.resize(size);

    }
    int size;
};

int main()
{
    srand ( time(NULL) );
    std::vector<int> out(100000);

    std::vector<int> sortedvec1(100000);
    std::vector<int> sortedvec2(1000);

    init_sorted_vec(sortedvec1.begin(), sortedvec1.end());
    init_unsorted_vec(sortedvec2.begin(), sortedvec2.end());
    std::sort(sortedvec2.begin(), sortedvec2.end());

    std::vector<int> unsortedvec1(sortedvec1.begin(), sortedvec1.end());
    std::vector<int> unsortedvec2(sortedvec2.begin(), sortedvec2.end());

    std::random_shuffle(unsortedvec1.begin(), unsortedvec1.end());
    std::random_shuffle(unsortedvec2.begin(), unsortedvec2.end());

    std::vector<int> vecs1[1000];
    std::vector<int> vecs2[1000];

    std::fill(vecs1, vecs1 + 1000, unsortedvec1);
    std::fill(vecs2, vecs2 + 1000, unsortedvec2);

    std::set<int> set1(sortedvec1.begin(), sortedvec1.end());
    std::set<int> set2(sortedvec2.begin(), sortedvec2.end());

    std::tr1::unordered_set<int> uset1(sortedvec1.begin(), sortedvec1.end());
    std::tr1::unordered_set<int> uset2(sortedvec2.begin(), sortedvec2.end());

    DWORD start, stop;
    DWORD delta[4];

    start = GetTickCount();
    for (int i = 0; i < 1000; ++i){
        stl_intersect(set1, set2, out.begin());
    }
    stop = GetTickCount();
    delta[0] = stop - start;

    start = GetTickCount();
    for (int i = 0; i < 1000; ++i){
        stl_intersect(uset1, uset2, out.begin());
    }
    stop = GetTickCount();
    delta[1] = stop - start;

    start = GetTickCount();
    for (int i = 0; i < 1000; ++i){
        stl_intersect(sortedvec1, sortedvec2, out.begin());
    }
    stop = GetTickCount();
    delta[2] = stop - start;

    start = GetTickCount();
    for (int i = 0; i < 1000; ++i){
        sort_stl_intersect(vecs1[i], vecs1[i], out.begin());
    }
    stop = GetTickCount();
    delta[3] = stop - start;

    std::cout << "std::set_intersection on std::set: " << delta[0] << "ms\n";
    std::cout << "std::set_intersection on tr1::unordered_set: " << delta[1] << "ms\n";
    std::cout << "std::set_intersection on sorted vectors: " << delta[2] << "ms\n";
    std::cout << "std::set_intersection on unsorted vectors: " << delta[3] << "ms\n";


    return 0;
}

There's no reason why C++ should always be faster than C#. C# has a few key advantages that require a lot of care to compete with in C++. The primary one I can think of is that dynamic allocations are ridiculously cheap in .NET-land. Every time a C++ vector, set or unordered_set (or any other container) has to resize or expand, it is a very costly malloc operation. In .NET, a heap allocation is little more than adding an offset to a pointer.

So if you want the C++ version to compete, you'll probably have to solve that, allowing your containers to resize without having to perform actual heap allocations, probably by using custom allocators for the containers (perhaps boost::pool might be a good bet, or you can try rolling your own)

Another issue is that set_difference only works on sorted input, and in order to reproduce tests results that involve a sort, we have to make a fresh copy of the unsorted data in each iteration, which is costly (although again, using custom allocators will help a lot). I don't know what form your input takes, but it is possible that you can sort your input directly, without copying it, and then run set_difference directly on that. (That would be easy to do if your input is an array or a STL container at least.)

One of the key advantages of the STL is that it is so flexible, it can work on pretty much any input sequence. In C#, you pretty much have to copy the input to a List or Dictionary or something, but in C++, you might be able to get away with running std::sort and set_intersection on the raw input.

Finally, of course, try running the code through a profiler and see exactly where the time is being spent. You might also want to try running the code through GCC instead. It's my impression that STL performance in MSVC is sometimes a bit quirky. It might be worth testing under another compiler just to see if you get similar timings there.

Finally, you might find these blog posts relevant for performance of C++ vs C#: http://blogs.msdn.com/ricom/archive/2005/05/10/416151.aspx

The morale of those is essentially that yes, you can get better performance in C++, but it is a surprising amount of work.

Solution 2

One problem I see right away is that you're passing the sets in C++ by value and not by const reference. So you're copying them every time you pass them around!

Also, I would not use a set for the target of set_intersection. I would use something like

int runSetIntersection(const set<int>& set1, const set<int>& set2)
{   
    vector<int> intersection;
    intersection.reserve(10000) // or whatever the max is

    set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), back_inserter(intersection));

    return intersection.size(); 
}

This code, however, still allocates inside the function. Even faster would be

int runSetIntersection(const set<int>& set1, const set<int>& set2, vector<int>& scratch)
{   
    scratch.reserve(10000) // or whatever the max is

    set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), back_inserter(scratch));

    return scratch.size(); 
}

And then allocate scratch before you start the timer.

Though, if you're just looking for the size, a hand-written for loop, combined with set::find might give even better results.

Solution 3

Use this...

vector<int> set1(10000);
vector<int> set2(1000);

... to get vectors of non-zero initial size. Then don't use push_back, but just update the values directly.

Solution 4

It may also be worthwhile looking at the boost Disjoint Set container, which is specially optimized for certain kinds of large set operations.

It works by treating a group of sets as the unions of several disjoint sets, making it possible to build other sets, such as intersections or unions very cheaply, once the initial set of disjoint sets is constructed. If you expect to be doing a lot of set operations on sets that don't change much, you can probably expect this to be very fast. If, on the other hand, you will use each set once and throw it away, it's probably not going to do too much.

Anyway, you'd be doing yourself a favor to at least experiment with this to see if it gives you any bump in your specific case.

Solution 5

By the way, if you have large sorted sets std::set_intersection is not the fastest algorithm. std::set_intersection takes up to 2*(m+n)-1 comparisons but algorithms like the one from Baeza-Yates can be faster. For small m, Baeza-Yates is O(m * log(n)), while for n = alpha * m it is O(n). The basic idea is to do a kind of 2 way binary search.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.7899&rep=rep1&type=pdf

Experimental Analysis of a Fast Intersection Algorithm for Sorted Sequences Ricardo Baeza-Yates and Alejandro Salinger

R. Baeza-Yates. A Fast Set Intersection Algorithm for Sorted Sequences. In Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM 2004), Springer LNCS 3109, pp 400-408, Istanbul, Turkey, July 2004.

Below is an explanation and an implementation by Erik Frey where he shows significantly faster results than std::set_intersection with a binary probe. I have not tried his code yet.
http://fawx.com/

Pick the median element, A, in the smaller set.
Search for its insertion-position element, B, in the larger set.
If A and B are equal, append the element to the result.
Repeat steps 1-4 on non-empty subsets on either side of elements A and B.

;



/*
* baeza_intersect
*/
template< template  class Probe,
  class RandomAccessIterator, class OutputIterator>
void baeza_intersect(RandomAccessIterator begin1, RandomAccessIterator end1,
                     RandomAccessIterator begin2, RandomAccessIterator end2,
                     OutputIterator out)
{
  RandomAccessIterator probe1, probe2;

if ( (end1 - begin1) < ( end2 - begin2 ) )
  {
    if ( begin1 == end1 )
      return;
    probe1 = begin1 + ( ( end1 - begin1 ) >> 1 );
    probe2 = lower_bound< Probe >( begin2, end2, *probe1 );
    baeza_intersect< Probe >(begin1, probe1, begin2, probe2, out); // intersect left
    if (! (probe2 == end2 || *probe1 < *probe2 ))
      *out++ = *probe2++;
    baeza_intersect< Probe >(++probe1, end1, probe2, end2, out); // intersect right
  }
  else
  {
    if ( begin2 == end2 )
      return;
    probe2 = begin2 + ( ( end2 - begin2 ) >> 1 );
    probe1 = lower_bound< Probe >( begin1, end1, *probe2 );
    baeza_intersect< Probe >(begin1, probe1, begin2, probe2, out); // intersect left
    if (! (probe1 == end1 || *probe2 < *probe1 ))
      *out++ = *probe1++;
    baeza_intersect< Probe >(probe1, end1, ++probe2, end2, out); // intersect right
  }
}

/*
* with a comparator
*/
 template< template  class Probe,
      class RandomAccessIterator, class OutputIterator, class Comparator >
    void baeza_intersect(RandomAccessIterator begin1, RandomAccessIterator end1,
                         RandomAccessIterator begin2, RandomAccessIterator end2,
                         OutputIterator out, Comparator cmp)
    {
      RandomAccessIterator probe1, probe2;

  if ( (end1 - begin1) < ( end2 - begin2 ) )
  {
    if ( begin1 == end1 )
      return;
    probe1 = begin1 + ( ( end1 - begin1 ) >> 1 );
    probe2 = lower_bound< Probe >( begin2, end2, *probe1, cmp );
    baeza_intersect< Probe >(begin1, probe1, begin2, probe2, out, cmp); // intersect left
    if (! (probe2 == end2 || cmp( *probe1, *probe2 ) ))
      *out++ = *probe2++;
    baeza_intersect< Probe >(++probe1, end1, probe2, end2, out, cmp); // intersect right
  }
  else
  {
    if ( begin2 == end2 )
      return;
    probe2 = begin2 + ( ( end2 - begin2 ) >> 1 );
    probe1 = lower_bound< Probe >( begin1, end1, *probe2, cmp );
    baeza_intersect< Probe >(begin1, probe1, begin2, probe2, out, cmp); // intersect left
    if (! (probe1 == end1 || cmp( *probe2, *probe1 ) ))
      *out++ = *probe1++;
    baeza_intersect< Probe >(probe1, end1, ++probe2, end2, out, cmp); // intersect right
  }
}


// probe.hpp

/**
* binary probe: pick the next element by choosing the halfway point between low and high
*/
template< class RandomAccessIterator, class T >
struct binary_probe
{
  RandomAccessIterator operator()(RandomAccessIterator begin, RandomAccessIterator end, const T & value)
  {
    return begin + ( (end - begin) >> 1);
  }
};

/**
* lower_bound: like stl's lower_bound but with different kinds of probing
* note the appearance of the rare template parameter template!
*/
template< template  class Probe, class RandomAccessIterator, class T >
RandomAccessIterator lower_bound(RandomAccessIterator begin, RandomAccessIterator end, const T & value)
{
  RandomAccessIterator pit;
  Probe< RandomAccessIterator, T > pfunc; // probe-functor (wants to get func'd up)

while ( begin < end )
  {
    pit = pfunc(begin, end, value);
    if ( *pit < value )
      begin = pit + 1;
    else
      end = pit;
  }
  return begin;
}

/*
* this time with a comparator!
*/
template< template  class Probe, class RandomAccessIterator, class T, class Comparator >
RandomAccessIterator lower_bound(RandomAccessIterator begin, RandomAccessIterator end, const T & value, Comparator cmp)
{
  RandomAccessIterator pit;
  Probe< RandomAccessIterator, T > pfunc;

while ( begin < end )
  {
    pit = pfunc(begin, end, value);
    if ( cmp(*pit, value) )
      begin = pit + 1;
    else
      end = pit;
  }
  return begin;
}

View more solutions

15,575

Author by

Alex Black

Alex Black's blog Alex Black on LinkedIn

Updated on July 27, 2022

Comments

Alex Black almost 2 years

On my machine (Quad core, 8gb ram), running Vista x64 Business, with Visual Studio 2008 SP1, I am trying to intersect two sets of numbers very quickly.

I've implemented two approaches in C++, and one in C#. The C# approach is faster so far, I'd like to improve the C++ approach so its faster than C#, which I expect C++ can do.

Here is the C# output: (Release build)

Found the intersection 1000 times, in 4741.407 ms

Here is the initial C++ output, for two different approaches (Release x64 build):

Found the intersection (using unordered_map) 1000 times, in 21580.7ms
Found the intersection (using set_intersection) 1000 times, in 22366.6ms

Here is the latest C++ output, for three approaches (Release x64 build):

Latest benchmark:

Found the intersection of 504 values (using unordered_map) 1000 times, in 28827.6ms
Found the intersection of 495 values (using set_intersection) 1000 times, in 9817.69ms
Found the intersection of 504 values (using unordered_set) 1000 times, in 24769.1ms

So, the set_intersection approach is now approx 2x slower than C#, but 2x faster than the initial C++ approaches.

Latest C++ code:

Code:

// MapPerformance.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include <hash_map>
#include <vector>
#include <iostream>
#include <time.h>
#include <algorithm>
#include <set>
#include <unordered_set>

#include <boost\unordered\unordered_map.hpp>

#include "timer.h"

using namespace std;
using namespace stdext;
using namespace boost;
using namespace tr1;


int runIntersectionTest2(const vector<int>& set1, const vector<int>& set2)
{
    // hash_map<int,int> theMap;
    // map<int,int> theMap;
    unordered_set<int> theSet;      

     theSet.insert( set1.begin(), set1.end() );

    int intersectionSize = 0;

    vector<int>::const_iterator set2_end = set2.end();

    for ( vector<int>::const_iterator iterator = set2.begin(); iterator != set2_end; ++iterator )
    {
        if ( theSet.find(*iterator) != theSet.end() )
        {
                intersectionSize++;
        }
    }

    return intersectionSize;
}

int runIntersectionTest(const vector<int>& set1, const vector<int>& set2)
{
    // hash_map<int,int> theMap;
    // map<int,int> theMap;
    unordered_map<int,int> theMap;  

    vector<int>::const_iterator set1_end = set1.end();

    // Now intersect the two sets by populating the map
    for ( vector<int>::const_iterator iterator = set1.begin(); iterator != set1_end; ++iterator )
    {
        int value = *iterator;

        theMap[value] = 1;
    }

    int intersectionSize = 0;

    vector<int>::const_iterator set2_end = set2.end();

    for ( vector<int>::const_iterator iterator = set2.begin(); iterator != set2_end; ++iterator )
    {
        int value = *iterator;

        unordered_map<int,int>::iterator foundValue = theMap.find(value);

        if ( foundValue != theMap.end() )
        {
            theMap[value] = 2;

            intersectionSize++;
        }
    }

    return intersectionSize;

}

int runSetIntersection(const vector<int>& set1_unsorted, const vector<int>& set2_unsorted)
{   
    // Create two vectors
    std::vector<int> set1(set1_unsorted.size());
    std::vector<int> set2(set2_unsorted.size());

    // Copy the unsorted data into them
    std::copy(set1_unsorted.begin(), set1_unsorted.end(), set1.begin());
    std::copy(set2_unsorted.begin(), set2_unsorted.end(), set2.begin());

    // Sort the data
    sort(set1.begin(),set1.end());
    sort(set2.begin(),set2.end());

    vector<int> intersection;
    intersection.reserve(1000);

    set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), back_inserter(intersection));

    return intersection.size(); 
}

void createSets( vector<int>& set1, vector<int>& set2 )
{
    srand ( time(NULL) );

    set1.reserve(100000);
    set2.reserve(1000);

    // Create 100,000 values for set1
    for ( int i = 0; i < 100000; i++ )
    {
        int value = 1000000000 + i;
        set1.push_back(value);
    }

    // Try to get half of our values intersecting
    float ratio = 200000.0f / RAND_MAX;


    // Create 1,000 values for set2
    for ( int i = 0; i < 1000; i++ )
    {
        int random = rand() * ratio + 1;

        int value = 1000000000 + random;
        set2.push_back(value);
    }

    // Make sure set1 is in random order (not sorted)
    random_shuffle(set1.begin(),set1.end());
}

int _tmain(int argc, _TCHAR* argv[])
{
    int intersectionSize = 0;

    vector<int> set1, set2; 
    createSets( set1, set2 );

    Timer timer;
    for ( int i = 0; i < 1000; i++ )
    {
        intersectionSize = runIntersectionTest(set1, set2);
    }
    timer.Stop();

    cout << "Found the intersection of " << intersectionSize << " values (using unordered_map) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;

    timer.Reset();
    for ( int i = 0; i < 1000; i++ )
    {
        intersectionSize = runSetIntersection(set1,set2);
    }
    timer.Stop();

    cout << "Found the intersection of " << intersectionSize << " values (using set_intersection) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;

    timer.Reset();
    for ( int i = 0; i < 1000; i++ )
    {
        intersectionSize = runIntersectionTest2(set1,set2);
    }
    timer.Stop();

    cout << "Found the intersection of " << intersectionSize << " values (using unordered_set) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;

    getchar();

    return 0;
}

C# code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace DictionaryPerformance
{
    class Program
    {
        static void Main(string[] args)
        {
            List<int> set1 = new List<int>(100000);
            List<int> set2 = new List<int>(1000);

            // Create 100,000 values for set1
            for (int i = 0; i < 100000; i++)
            {
                int value = 1000000000 + i;
                set1.Add(value);
            }

            Random random = new Random(DateTime.Now.Millisecond);

            // Create 1,000 values for set2
            for (int i = 0; i < 1000; i++)
            {
                int value = 1000000000 + (random.Next() % 200000 + 1);
                set2.Add(value);
            }

            long start = System.Diagnostics.Stopwatch.GetTimestamp();
            for (int i = 0; i < 1000; i++)
            {
                runIntersectionTest(set1,set2);
            }
            long duration = System.Diagnostics.Stopwatch.GetTimestamp() - start;

            Console.WriteLine(String.Format("Found the intersection 1000 times, in {0} ms", ((float) duration * 1000.0f) / System.Diagnostics.Stopwatch.Frequency));

            Console.ReadKey();
        }

        static int runIntersectionTest(List<int> set1, List<int> set2)
        {

            Dictionary<int,int> theMap = new Dictionary<int,int>(100000);

            // Now intersect the two sets by populating the map
            foreach( int value in set1 )
            {
                theMap[value] = 1;
            }

            int intersectionSize = 0;

            foreach ( int value in set2 )
            {
                int count;
                if ( theMap.TryGetValue(value, out count ) )
                {
                    theMap[value] = 2;
                    intersectionSize++;
                }
            }

            return intersectionSize;
        }
    }
}

C++ code:

// MapPerformance.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include <hash_map>
#include <vector>
#include <iostream>
#include <time.h>
#include <algorithm>
#include <set>

#include <boost\unordered\unordered_map.hpp>

#include "timer.h"

using namespace std;
using namespace stdext;
using namespace boost;

int runIntersectionTest(vector<int> set1, vector<int> set2)
{
    // hash_map<int,int> theMap;
    // map<int,int> theMap;
    unordered_map<int,int> theMap;

    // Now intersect the two sets by populating the map
    for ( vector<int>::iterator iterator = set1.begin(); iterator != set1.end(); iterator++ )
    {
        int value = *iterator;

        theMap[value] = 1;
    }

    int intersectionSize = 0;

    for ( vector<int>::iterator iterator = set2.begin(); iterator != set2.end(); iterator++ )
    {
        int value = *iterator;

        unordered_map<int,int>::iterator foundValue = theMap.find(value);

        if ( foundValue != theMap.end() )
        {
            theMap[value] = 2;

            intersectionSize++;
        }
    }

    return intersectionSize;

}

int runSetIntersection(set<int> set1, set<int> set2)
{   
    set<int> intersection;

    set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), inserter(intersection, intersection.end()));

    return intersection.size(); 
}



int _tmain(int argc, _TCHAR* argv[])
{
    srand ( time(NULL) );

    vector<int> set1;
    vector<int> set2;

    set1.reserve(10000);
    set2.reserve(1000);

    // Create 100,000 values for set1
    for ( int i = 0; i < 100000; i++ )
    {
        int value = 1000000000 + i;
        set1.push_back(value);
    }

    // Create 1,000 values for set2
    for ( int i = 0; i < 1000; i++ )
    {
        int random = rand() % 200000 + 1;
        random *= 10;

        int value = 1000000000 + random;
        set2.push_back(value);
    }


    Timer timer;
    for ( int i = 0; i < 1000; i++ )
    {
        runIntersectionTest(set1, set2);
    }
    timer.Stop();

    cout << "Found the intersection (using unordered_map) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;

    set<int> set21;
    set<int> set22;

    // Create 100,000 values for set1
    for ( int i = 0; i < 100000; i++ )
    {
        int value = 1000000000 + i;
        set21.insert(value);
    }

    // Create 1,000 values for set2
    for ( int i = 0; i < 1000; i++ )
    {
        int random = rand() % 200000 + 1;
        random *= 10;

        int value = 1000000000 + random;
        set22.insert(value);
    }

    timer.Reset();
    for ( int i = 0; i < 1000; i++ )
    {
        runSetIntersection(set21,set22);
    }
    timer.Stop();

    cout << "Found the intersection (using set_intersection) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;

    getchar();

    return 0;
}

Ok, here is the latest, with some changes:

The C++ sets are now properly setup so they have a 50% intersection (like the C#)
Set1 is shuffled so its not sorted, set2 was already not sorted
The set_intersection implementation now uses vectors, and sorts them first

C++ (Release, x64) Results:

Found the intersection of 503 values (using unordered_map) 1000 times, in 35131.1ms
Found the intersection of 494 values (using set_intersection) 1000 times, in 10317ms

So its 2x slower than C#. @Jalf: You're getting some pretty fast numbers, is there something I'm doing wrong here?

C++ Code:

// MapPerformance.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include <hash_map>
#include <vector>
#include <iostream>
#include <time.h>
#include <algorithm>
#include <set>

#include <boost\unordered\unordered_map.hpp>

#include "timer.h"

using namespace std;
using namespace stdext;
using namespace boost;

int runIntersectionTest(const vector<int>& set1, const vector<int>& set2)
{
    // hash_map<int,int> theMap;
    // map<int,int> theMap;
    unordered_map<int,int> theMap;  

    vector<int>::const_iterator set1_end = set1.end();

    // Now intersect the two sets by populating the map
    for ( vector<int>::const_iterator iterator = set1.begin(); iterator != set1_end; ++iterator )
    {
        int value = *iterator;

        theMap[value] = 1;
    }

    int intersectionSize = 0;

    vector<int>::const_iterator set2_end = set2.end();

    for ( vector<int>::const_iterator iterator = set2.begin(); iterator != set2_end; ++iterator )
    {
        int value = *iterator;

        unordered_map<int,int>::iterator foundValue = theMap.find(value);

        if ( foundValue != theMap.end() )
        {
            theMap[value] = 2;

            intersectionSize++;
        }
    }

    return intersectionSize;

}

int runSetIntersection(const vector<int> set1_unsorted, const vector<int> set2_unsorted)
{   
    // Create two vectors
    std::vector<int> set1(set1_unsorted.size());
    std::vector<int> set2(set2_unsorted.size());

    // Copy the unsorted data into them
    std::copy(set1_unsorted.begin(), set1_unsorted.end(), set1.begin());
    std::copy(set2_unsorted.begin(), set2_unsorted.end(), set2.begin());

    // Sort the data
    sort(set1.begin(),set1.end());
    sort(set2.begin(),set2.end());

    vector<int> intersection;
    intersection.reserve(1000);

    set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), inserter(intersection, intersection.end()));

    return intersection.size(); 
}

void createSets( vector<int>& set1, vector<int>& set2 )
{
    srand ( time(NULL) );

    set1.reserve(100000);
    set2.reserve(1000);

    // Create 100,000 values for set1
    for ( int i = 0; i < 100000; i++ )
    {
        int value = 1000000000 + i;
        set1.push_back(value);
    }

    // Try to get half of our values intersecting
    float ratio = 200000.0f / RAND_MAX;


    // Create 1,000 values for set2
    for ( int i = 0; i < 1000; i++ )
    {
        int random = rand() * ratio + 1;

        int value = 1000000000 + random;
        set2.push_back(value);
    }

    // Make sure set1 is in random order (not sorted)
    random_shuffle(set1.begin(),set1.end());
}

int _tmain(int argc, _TCHAR* argv[])
{
    int intersectionSize = 0;

    vector<int> set1, set2; 
    createSets( set1, set2 );

    Timer timer;
    for ( int i = 0; i < 1000; i++ )
    {
        intersectionSize = runIntersectionTest(set1, set2);
    }
    timer.Stop();

    cout << "Found the intersection of " << intersectionSize << " values (using unordered_map) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;

    timer.Reset();
    for ( int i = 0; i < 1000; i++ )
    {
        intersectionSize = runSetIntersection(set1,set2);
    }
    timer.Stop();

    cout << "Found the intersection of " << intersectionSize << " values (using set_intersection) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;

    getchar();

    return 0;
}

GManNickG almost 15 years

Or call reserve(1000), but continue using push_back
josesuero almost 15 years

or just don't include this in your timing in the first place. Your setup code should not be part of the benchmark.
Alex Black almost 15 years

My C++ code includes an implementation using set_intersection. I will take a look at "includes".
Alex Black almost 15 years

I want to include the setup time, because my data is not in one of these data structures at the start... so if the approach requires me to put it into a data structure then I need to include that tme.
Roddy almost 15 years

@jalf. Fair point. I was assuming he actually wanted to measure that but as well...
Kevin Anderson almost 15 years

I agree that there is a vast difference between initializing something to 10000 bytes for the C# and doing multiple re-allocs on the C++ side.
josesuero almost 15 years

nevertheless, it is not part of this test. You'll get much more useful benchmarking data by treating them as separate problems. First worry about the intersection performance, and then do a separate test for the "build initial data structures" part.
Alex Black almost 15 years

Agreed, the test is not strictly set intersection, it also includes populating any data structures needed for the test. I've updated the test to run 1000 times.
josesuero almost 15 years

Populating the data structures is not part of this test though. Measure that separately. You are introducing a huge amount of uncertainty which essentially invalidates your results.
Alex Black almost 15 years

I modified the C++ implementation that uses ordered_set and vector to call reserve() on the vectors. Not much change.
Alex Black almost 15 years

I modified the code and re-ran the benchmarks with the population of the data structures done before the timed tests.
Alex Black almost 15 years

Jalf: it looks to me like std::set is a sorted data structure. sgi.com/tech/stl/set.html
josesuero almost 15 years

oh right, I thought you were running that one on vectors too. (Why aren't you?)
Alex Black almost 15 years

because the first example I found of how to use set_intersection showed it working on sets. If you think switching to vectors will speed things up I will give it a try.
Alex Black almost 15 years

which ones? Optimization is set to "Maximize Speed (/O2)"
josesuero almost 15 years

I don't know if it'll speed it up, but set_intersection works on any input iterator, as long as it is a sorted sequence (so if you use vectors, you'll have to first sort it)
Alex Black almost 15 years

I set "Favor Size or Speed" to "Favor Fast Code (/Ot)", no difference.
Alex Black almost 15 years

@Jalf: I modified the code to pass sorted vectors into set_intersection. If I sorted them before the test, then the intersection was done in 352ms (1000 times!) so very fast. But if I do the sort inside the test, then it takes 2401ms.
Magnus almost 15 years

Did you figure out what was taking most of the time?
Alex Black almost 15 years

no I haven't figured out what takes most of the time. In the set_intersection test there isn't much going on except calling set_intersection :)
josesuero almost 15 years

2401 is still twice as good as your C# code though, isn't it?
Andreas Magnusson almost 15 years

Good spotting! That ought to count for some of the slowness.
Andreas Magnusson almost 15 years

Don't pass vectors by value!! const vector<int> &set1 is the C++ way.
Alex Black almost 15 years

yes, 2401ms twice as fast as C# and very good.. I like it. I'm just wondering if its a valid test given the data in set1 is already sorted (my real data is not). I'm going to look at updating the test to work on two random (not sorted) tests.
Alex Black almost 15 years

good call, sloppy mistake on my part trying to rapidly change the code.
Alex Black almost 15 years

I'll adjust and repost without copying... I have set _SECURE_SCL to 0. (#define _SECURE_SCL 0)
Alex Black almost 15 years

hey Jalf, thats awesome, very fast.. However, would it be easy to adjust so that the vectors are re-sorted each time? My data isn't sorted, so I am going to have to sort it. In real life this won't be run 1000 times on pre-sorted vectors.
SingleNegationElimination almost 15 years

This structure has just given me a chill, for it contains the first practical use of the Ackermann function, or rather its inverse. Amazing!
josesuero almost 15 years

Done. I test with both pre-sorted and unsorted vectors now. It looks like both set and unordered_set are a lot faster than in your own tests though. Can you reproduce the same results if you run my code?
Alex Black almost 15 years

Nice. One more question: set1 is already sorted... I'm trying to find a good way to randomize it while still ensuring it has values from 0 to 100,000 (excluding the offset).
josesuero almost 15 years

looks like std::random_shuffle can do that :)
Alex Black almost 15 years

I posted an updated benchmark and code below, I'm not seeing the crazy fast results you are Jalf. (it was fast, until I shuffled set1 and sorted it each time)
josesuero almost 15 years

You could write the intersection results to a vector instead of a set (should be a lot faster). Also I'd expect unordered_set to be a lot faster than plain sets
Alex Black almost 15 years

You're now getting the same times I'm getting, about 10s to use set_intersection on unsorted lists. This is a 2x improvement on where we started, but still 2x slower than C#.
josesuero almost 15 years

Does that one matter though? If set or unsorted_set is still 4x faster than C#
Alex Black almost 15 years

Set: the sorting is done outside the benchmark... so doesn't seem fair. Unordered_set: does this work? You're passing what looks like an unsorted set to set_intersection...
josesuero almost 15 years

Set is an ordered data structure, remember? So whether the input it is built from is sorted or not doesn't matter. It sorts its input on insertion anyway. About unordered_set, I think you're right, that isn't valid, since it obviously isn't sorted. Hadn't really thought of that. :)
Alex Black almost 15 years

Set: yes, so essentially it 'sorts' as you insert.. Insertions are slower into sets than into vectors. So if I started with my unsorted data, it will take me longer to get it into a set than a vector.
Alex Black almost 15 years

I adjusted the C# code now, to shuffle set1, it didn't make any diffence. The C# method doesn't care if the sets are sorted or not.
josesuero almost 15 years

yeah, I realized the same thing. I implemented the same algorithm again in C++, and so far, getting around 600ms on it, but I think I have a bug or two, so it'll be a few mins before I post it.
Alex Black almost 15 years

cool - one thing to do is to output the count of items in the intersection, as a sanity check (it should be ~500)
josesuero almost 15 years

yep, it was a bug, was testing intersect(set2, set2) :) Well, it's bedtime for me. Going to add a bit to my answer, and then head off. :)
josesuero almost 15 years

Ok, posted my final update for tonight. No new code or benchmarks, but a few hints on what you should probably try next. I might give it a shot again tomorrow. :)
Alex Black almost 15 years

I made those changes, no noticeable difference. Your new runIntersectionTest is similar in performance to the unordered_map one (about 2x slower than set_intersection)
deft_code almost 15 years

The code I posted deliberately didn't pass the vectors by reference, so the compiler would implicitly copy them. If you'd like to do it your way, replace to code before the sort with this: vector<int> set1( unsorted_set1 ); vector<int> set2( unsorted_set2 ); this way the vectors are efficiently copied. The code as it stands fills each vector with zeros. Then it copies the desired value over the top. The above method allocates enough space then copies the values from the parameter with no extra assignments. This will speed things up, though I don't know by how much
Alex Black almost 15 years

I'm not sure I 100% get you. I made a change though, I made runSetIntersection take vector<int> not vector<int>& (so it gets a copy), and then removed the code inside that function that copies the vector. This didn't make a noticeable difference. I do agree in general the benchmark would be more fair if the vector was not copied each time.