Chunk partitioning IEnumerable in Parallel.Foreach
If your IEnumerable was really something that had a an indexer (i.e you could do obj[1]
to get a item out) you could do the following
var rangePartitioner = Partitioner.Create(0, source.Length);
Parallel.ForEach(rangePartitioner, (range, loopState) =>
{
// Loop over each range element without a delegate invocation.
for (int i = range.Item1; i < range.Item2; i++)
{
var item = source[i]
//Do work on item
}
});
However if it can't do that you must write a custom partitioner by creating a new class derived from System.Collections.Concurrent.Partitioner<TSource>
. That subject is too broad to cover in a SO answer but you can take a look at this guide on the MSDN to get you started.
UPDATE: As of .NET 4.5 they added a Partitioner.Create
overload that does not buffer data, it has the same effect of making a custom partitioner with a range max size of 1. With this you won't get a single thread that has a bunch of queued up work if it got unlucky with a bunch of slow items in a row.
var partitoner = Partitioner.Create(source, EnumerablePartitionerOptions.NoBuffering);
Parallel.ForEach(partitoner, item =>
{
//Do work
}
Related videos on Youtube
bjoern
Updated on June 17, 2022Comments
-
bjoern almost 2 years
Does anyone know of a way to get the Parallel.Foreach loop to use chunk partitioning versus, what i believe is range partitioning by default. It seems simple when working with arrays because you can just create a custom partitioner and set load-balancing to true.
Since the number of elements in an IEnumerable isn't known until runtime I can't seem to figure out a good way to get chunk partitioning to work.
Any help would be appreciated.
thanks!
The tasks i'm trying to perform on each object take significantly different times to perform. At the end i'm usually waiting hours for the last thread to finish its work. What I'm trying to achieve is to have the parallel loop request chunks along the way instead of pre-allocating items to each thread.
-
It'sNotALie. about 11 yearsWhy would you want this (out of interest)
-
SimpleVar about 11 yearsYou chunk it up by number of chunks or by size per chunk?
-
Scott Chamberlain about 11 yearsDo you have an IEnumerable, or do you have something that implements an indexer (so you could do
obj[i]
on it)? If you can pass in an index I have a solution. -
JosephHirn about 11 yearsI'm not entirely sure I understand what you're trying to do. But it sounds like you could start by looking into the
Partitioner
class. -
bjoern about 11 yearsi'm trying to dynamically partition it by number of chunks so the parallel.Foreach loop keeps asking the parititioner for more work. It seems easy to do with arrays: msdn.microsoft.com/en-us/library/dd997411(v=vs.100).aspx
-
SimpleVar about 11 yearsIf you're doing it by number of chunks, you must have the total count. To get it, you need to consume the IEnumerable either way, so I would just use an array.
-
JosephHirn about 11 yearsIf it's easy to do with arrays, and you have an
IEnumerable
, why not just.ToArray()
it? -
bjoern about 11 yearsWhat's the downside of converting my IEnumerable to an array prior to entering the loop? I'm assuming there is a pretty large one-time performance hit, no?
-
Scott Chamberlain about 11 yearsIt depends on what the IEnumerable is sourced from and what is happening in the
yield return
loop from the data source. If the concrete object is an Array already (almost) no work is performed at all. -
SimpleVar about 11 yearsWell, it does take it's time to consume the
IEnumerable
. BUT if you know that you'll go over it all either way, it can even improve performance. The whole idea ofIEnumerable
is "not having to consume all, for the case where you don't go over all". And as Scott mentioned, if theIEnumerable
wraps an array or a list already, then there is no performance issue at all. -
SimpleVar about 11 yearsBy the way, is the order of the elements an issue? If not, you could "push" the elements into different
IEnumerable
's using theirindex % chunksCount
as "chunk id" (even with Linq). -
bjoern about 11 yearsno, the order isn't an issue. The problem with the .ToArray() might be that this array will have a count() around 3 billion. Even though every object is relatively small (5 string properties). I'm going to try to see if the array even returns.
-
bjoern about 11 yearsthanks guys! .ToArray() was the way to go
-