Parallel.ForEach vs Task.Factory.StartNew

144,490

Solution 1

The first is a much better option.

Parallel.ForEach, internally, uses a Partitioner<T> to distribute your collection into work items. It will not do one task per item, but rather batch this to lower the overhead involved.

The second option will schedule a single Task per item in your collection. While the results will be (nearly) the same, this will introduce far more overhead than necessary, especially for large collections, and cause the overall runtimes to be slower.

FYI - The Partitioner used can be controlled by using the appropriate overloads to Parallel.ForEach, if so desired. For details, see Custom Partitioners on MSDN.

The main difference, at runtime, is the second will act asynchronous. This can be duplicated using Parallel.ForEach by doing:

Task.Factory.StartNew( () => Parallel.ForEach<Item>(items, item => DoSomething(item)));

By doing this, you still take advantage of the partitioners, but don't block until the operation is complete.

Solution 2

I did a small experiment of running a method "1,000,000,000 (one billion)" times with "Parallel.For" and one with "Task" objects.

I measured the processor time and found Parallel more efficient. Parallel.For divides your task in to small work items and executes them on all the cores parallely in a optimal way. While creating lot of task objects ( FYI TPL will use thread pooling internally) will move every execution on each task creating more stress in the box which is evident from the experiment below.

I have also created a small video which explains basic TPL and also demonstrated how Parallel.For utilizes your core more efficiently http://www.youtube.com/watch?v=No7QqSc5cl8 as compared to normal tasks and threads.

Experiment 1

Parallel.For(0, 1000000000, x => Method1());

Experiment 2

for (int i = 0; i < 1000000000; i++)
{
    Task o = new Task(Method1);
    o.Start();
}

Processor time comparison

Solution 3

Parallel.ForEach will optimize(may not even start new threads) and block until the loop is finished, and Task.Factory will explicitly create a new task instance for each item, and return before they are finished (asynchronous tasks). Parallel.Foreach is much more efficient.

Solution 4

In my view the most realistic scenario is when tasks have a heavy operation to complete. Shivprasad's approach focuses more on object creation/memory allocation than on computing itself. I made a research calling the following method:

public static double SumRootN(int root)
{
    double result = 0;
    for (int i = 1; i < 10000000; i++)
        {
            result += Math.Exp(Math.Log(i) / root);
        }
        return result; 
}

Execution of this method takes about 0.5sec.

I called it 200 times using Parallel:

Parallel.For(0, 200, (int i) =>
{
    SumRootN(10);
});

Then I called it 200 times using the old-fashioned way:

List<Task> tasks = new List<Task>() ;
for (int i = 0; i < loopCounter; i++)
{
    Task t = new Task(() => SumRootN(10));
    t.Start();
    tasks.Add(t);
}

Task.WaitAll(tasks.ToArray()); 

First case completed in 26656ms, the second in 24478ms. I repeated it many times. Everytime the second approach is marginaly faster.

Share:
144,490

Related videos on Youtube

stackoverflowuser
Author by

stackoverflowuser

Updated on December 24, 2020

Comments

  • stackoverflowuser
    stackoverflowuser over 3 years

    What is the difference between the below code snippets? Won't both be using threadpool threads?

    For instance if I want to call a function for each item in a collection,

    Parallel.ForEach<Item>(items, item => DoSomething(item));
    
    vs
    
    foreach(var item in items)
    {
      Task.Factory.StartNew(() => DoSomething(item));
    }
    
  • Mal Ross
    Mal Ross about 13 years
    IIRC, the default partitioning done by Parallel.ForEach also takes into account the number of hardware threads available, saving you from having to work out the optimum number of Tasks to start. Check out Microsoft's Patterns of Parallel Programming article; it's got great explanations of all of this stuff in it.
  • Reed Copsey
    Reed Copsey about 13 years
    @Mal: Sort of... That's actually not the Partitioner, but rather the job of the TaskScheduler. The TaskScheduler, by default, uses the new ThreadPool, which handles this very well now.
  • Mal Ross
    Mal Ross about 13 years
    Thanks. I knew I should've left in the "I'm no expert, but..." caveat. :)
  • Konstantin Tarkus
    Konstantin Tarkus almost 12 years
    @ReedCopsey: How to attach tasks started via Parallel.ForEach to the wrapper task? So that when you call .Wait() on a wrapper task it hangs until tasks running in parallel are completed?
  • Konstantin Tarkus
    Konstantin Tarkus almost 12 years
    ..for example if I want to make multiple HTTP requests in parallel with HttpClient.GetStringAsync() should I still use Paralle.ForEach or in this case it won't make any sense?
  • Reed Copsey
    Reed Copsey almost 12 years
    @Tarkus If you're making multiple requests, you're better off just using HttpClient.GetString in each work item (in your Parallel loop). No reason to put an async option inside of the already concurrent loop, typically...
  • Tim
    Tim over 10 years
    It would be more efficient and the reason behind that creating threads is costly Experiment 2 is a very bad practice.
  • Shivprasad Koirala
    Shivprasad Koirala almost 10 years
    @Georgi-it please care about talking more on what is bad.
  • Georgi-it
    Georgi-it almost 10 years
    I am sorry, my mistake, I should have clarified. I mean the creation of Tasks in a loop to 1000000000. The overhead is unimaginable. Not to mention that the Parallel cannot create more than 63 tasks at a time, which makes it much more optimized in the case.
  • Tedd Hansen
    Tedd Hansen almost 9 years
    This is true for 1000000000 tasks. However when I process an image (repeatedly, zooming fractal) and do Parallel.For on lines a lot of the cores are idle while waiting for the last threads to finish up. To make it faster I subdivided the data myself into 64 work packages and created tasks for it. (Then Task.WaitAll to wait for completion.) The idea is to have idle threads pick up a work package to help finish the work instead of waiting for 1-2 threads to finish up their (Parallel.For) assigned chunk.
  • Paul Chernoch
    Paul Chernoch almost 9 years
    Thanks for the Task.Factory.StartNew wrapping Parallel.ForEach idea. Exactly what I wanted. I need to wait for all my producer tasks to complete, then I can mark the BlockingCollection as finished. Only when the collection won't be getting any more items can I wait for the single consumer task to complete.
  • Sudhir.net
    Sudhir.net almost 9 years
    @Reed, Parallel is not asyncronous and mulithtreaded like task. If multicore system then it can provide parallelism?
  • Reed Copsey
    Reed Copsey almost 9 years
    @Sudhir.net Parallel is most definitely multithreaded... The Parallel methods are not asynchronous, hence the suggestion to wrap in a task if that's required.
  • Sudhir.net
    Sudhir.net almost 9 years
    Thanks for Prompt Response. Also multithreaded and asynchronous are not same ? I want to understand what is difference between task class and parallel class in real time projects?
  • Reed Copsey
    Reed Copsey almost 9 years
    @Sudhir.net I recommend reading reedcopsey.com/series/parallelism-in-net4
  • Sudhir.net
    Sudhir.net over 8 years
    @Reed--Thanks i am going through that blog. Just can you please tell me what is difference between Task and ThreadPool
  • Sudhir.net
    Sudhir.net over 8 years
    @Reed : Please look at thisquery :stackoverflow.com/questions/31871206/… Provide some suggestion
  • Zapnologica
    Zapnologica almost 7 years
    What does Mehthod1() do in this example?
  • Suncat2000
    Suncat2000 over 4 years
    Using Parallel.For is the old-fashioned way. Using Task is recommended for units of work that are not uniform. Microsoft MVPs and designers of the TPL also mention that using Tasks will use threads more efficiently, i.e.not block as many while waiting for other units to complete.