Chunk partitioning IEnumerable in Parallel.Foreach

15,272

If your IEnumerable was really something that had a an indexer (i.e you could do obj[1] to get a item out) you could do the following

    var rangePartitioner = Partitioner.Create(0, source.Length);
    Parallel.ForEach(rangePartitioner, (range, loopState) =>
    {
        // Loop over each range element without a delegate invocation. 
        for (int i = range.Item1; i < range.Item2; i++)
        {
            var item = source[i]
            //Do work on item
        }
    });

However if it can't do that you must write a custom partitioner by creating a new class derived from System.Collections.Concurrent.Partitioner<TSource>. That subject is too broad to cover in a SO answer but you can take a look at this guide on the MSDN to get you started.

UPDATE: As of .NET 4.5 they added a Partitioner.Create overload that does not buffer data, it has the same effect of making a custom partitioner with a range max size of 1. With this you won't get a single thread that has a bunch of queued up work if it got unlucky with a bunch of slow items in a row.

var partitoner = Partitioner.Create(source, EnumerablePartitionerOptions.NoBuffering);
Parallel.ForEach(partitoner, item =>
{
    //Do work
}
Share:
15,272

Related videos on Youtube

bjoern
Author by

bjoern

Updated on June 17, 2022

Comments

  • bjoern
    bjoern almost 2 years

    Does anyone know of a way to get the Parallel.Foreach loop to use chunk partitioning versus, what i believe is range partitioning by default. It seems simple when working with arrays because you can just create a custom partitioner and set load-balancing to true.

    Since the number of elements in an IEnumerable isn't known until runtime I can't seem to figure out a good way to get chunk partitioning to work.

    Any help would be appreciated.

    thanks!

    The tasks i'm trying to perform on each object take significantly different times to perform. At the end i'm usually waiting hours for the last thread to finish its work. What I'm trying to achieve is to have the parallel loop request chunks along the way instead of pre-allocating items to each thread.

    • It'sNotALie.
      It'sNotALie. about 11 years
      Why would you want this (out of interest)
    • SimpleVar
      SimpleVar about 11 years
      You chunk it up by number of chunks or by size per chunk?
    • Scott Chamberlain
      Scott Chamberlain about 11 years
      Do you have an IEnumerable, or do you have something that implements an indexer (so you could do obj[i] on it)? If you can pass in an index I have a solution.
    • JosephHirn
      JosephHirn about 11 years
      I'm not entirely sure I understand what you're trying to do. But it sounds like you could start by looking into the Partitioner class.
    • bjoern
      bjoern about 11 years
      i'm trying to dynamically partition it by number of chunks so the parallel.Foreach loop keeps asking the parititioner for more work. It seems easy to do with arrays: msdn.microsoft.com/en-us/library/dd997411(v=vs.100).aspx
    • SimpleVar
      SimpleVar about 11 years
      If you're doing it by number of chunks, you must have the total count. To get it, you need to consume the IEnumerable either way, so I would just use an array.
    • JosephHirn
      JosephHirn about 11 years
      If it's easy to do with arrays, and you have an IEnumerable, why not just .ToArray() it?
    • bjoern
      bjoern about 11 years
      What's the downside of converting my IEnumerable to an array prior to entering the loop? I'm assuming there is a pretty large one-time performance hit, no?
    • Scott Chamberlain
      Scott Chamberlain about 11 years
      It depends on what the IEnumerable is sourced from and what is happening in the yield return loop from the data source. If the concrete object is an Array already (almost) no work is performed at all.
    • SimpleVar
      SimpleVar about 11 years
      Well, it does take it's time to consume the IEnumerable. BUT if you know that you'll go over it all either way, it can even improve performance. The whole idea of IEnumerable is "not having to consume all, for the case where you don't go over all". And as Scott mentioned, if the IEnumerable wraps an array or a list already, then there is no performance issue at all.
    • SimpleVar
      SimpleVar about 11 years
      By the way, is the order of the elements an issue? If not, you could "push" the elements into different IEnumerable's using their index % chunksCount as "chunk id" (even with Linq).
    • bjoern
      bjoern about 11 years
      no, the order isn't an issue. The problem with the .ToArray() might be that this array will have a count() around 3 billion. Even though every object is relatively small (5 string properties). I'm going to try to see if the array even returns.
    • bjoern
      bjoern about 11 years
      thanks guys! .ToArray() was the way to go