Why is await async so slow?

13,112

Solution 1

Your benchmark has a couple of flaws:

  • You are timing the first run which includes initialization time (loading class Task, JIT-compilation etc.)
  • You are using DateTime.Now, which is too inaccurate for timings in the millisecond range. You'll need to use StopWatch

With these two issues fixed; I get the following benchmark results:

Regular Sum:  499946 in 00:00:00.0047378
Async Sum:    499946 in 00:00:00.0016994
Threaded Sum: 499946 in 00:00:00.0026898

Async now comes out as the fastest solution, taking less than 2ms.

This is the next problem: timing something as fast as 2ms is extremely unreliable; your thread can get paused for longer than that if some other process is using the CPU in the background. You should average the results over several thousands of benchmark runs.

Also, what's going on with your number of core detection? My quad-core is using a chunk size of 333334 which allows only 3 threads to run.

Solution 2

It's important to separate "asynchrony" from "parallelization". await is there to help make writing asynchronous code easier. Code that runs in parallel may (or may not) involve asynchrony, and code that is asynchronous may or may not run in parallel.

Nothing about await is designed to make parallel code faster. The purpose of await is to make writing asynchronous code easier, while minimizing the negative performance implications. Using await won't ever be faster than correctly written non-await asynchronous code (although because writing correct code with await is easier, it will sometimes be faster because the programmer isn't capable of writing that asynchronous code correctly without await, or isn't willing to put the time in to do so. If the non-async code is written well it will perform about as well, if not a tad better, than the await code.

C# does have support specifically for parallelization, it's just not specifically though await. The Task Parallel Library (TPL) as well as Parallel LINQ (PLINQ) have several very effective means of parallelizing code that is generally more efficient than naive threaded implementations.

In your case, an effective implementation using PLINQ might be something like this:

public static int Sum(int[] array)
{
    return array.AsParallel().Sum();
}

Note that this will take care of efficiently partitioning the input sequence into chunks that will be run in parallel; it will take care of determining the appropriate size of chunks, and the number of concurrent workers, and it will appropriately aggregate the results of those workers in a manor that is both properly synchronized to ensure a correct result (unlike your threaded example) and efficient (meaning that it won't completely serialize all aggregation).

Solution 3

async isn't intended for heavy-duty parallel computation. You can do basic parallel work using Task.Run with Task.WhenAll, but any serious parallel work should be done using the task parallel library (e.g., Parallel). Asynchronous code on the client side is about responsiveness, not parallel processing.

A common approach is to use Parallel for the parallel work, and then wrap it in a Task.Run and use await on it to keep the UI responsive.

Solution 4

On a quick look, the results are expected: your async sum is using just one thread, while you asynchronously wait for it to finish, so it's slower than the multi-threaded sum.

You'd use async in case you have something else to finish while it's doing its job. So, this wouldn't be the right test for any speed/response improvements.

Share:
13,112
ohmusama
Author by

ohmusama

I work for Microsoft coding all the c# I was also involved with the minecraft modding community. I have created a simple Biome Object Builder to hook into Phoenix Terrain Mod in c#. You can check it out http://faskerstudio.com/minecraft/BBOB

Updated on June 14, 2022

Comments

  • ohmusama
    ohmusama almost 2 years

    I finally got VS2012 and got a simple demo up and working to check out the potential performance boost of async and await, but to my dismay it is slower! Its possible I'm doing something wrong, but maybe you can help me out. (I also added a simple Threaded solution, and that runs faster as expected)

    My code uses a class to sum an array, based on the number of cores on your system (-1) Mine had 4 cores, so I saw about a 2x speed up (2.5 threads) for threading, but a 2x slow down for the same thing but with async/await.

    Code: (Note you will need to added the reference to System.Management to get the core detector working)

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Threading.Tasks;
    using System.Threading;
    using System.Management;
    using System.Diagnostics;
    
    namespace AsyncSum
    {
        class Program
        {
            static string Results = "";
    
            static void Main(string[] args)
            {
                Task t = Run();
                t.Wait();
    
                Console.WriteLine(Results);
                Console.ReadKey();
            }
    
            static async Task Run()
            {
                Random random = new Random();
    
                int[] huge = new int[1000000];
    
                for (int i = 0; i < huge.Length; i++)
                {
                    huge[i] = random.Next(2);
                }
    
                ArraySum summer = new ArraySum(huge);
    
                Stopwatch sw = new Stopwatch();
    
                sw.Restart();
                long tSum = summer.Sum();
                for (int i = 0; i < 100; i++)
                {
                    tSum = summer.Sum();
                }
                long tticks = sw.ElapsedTicks / 100;
    
                long aSum = await summer.SumAsync();
                sw.Restart();
                for (int i = 0; i < 100; i++)
                {
                    aSum = await summer.SumAsync();
                }
                long aticks = sw.ElapsedTicks / 100;
    
                long dSum = summer.SumThreaded();
                sw.Restart();
                for (int i = 0; i < 100; i++)
                {
                    dSum = summer.SumThreaded();
                }
                long dticks = sw.ElapsedTicks / 100;
    
    
                long pSum = summer.SumParallel();
                sw.Restart();
                for (int i = 0; i < 100; i++)
                {
                    pSum = summer.SumParallel();
                }
                long pticks = sw.ElapsedTicks / 100;
    
                Program.Results += String.Format("Regular Sum: {0} in {1} ticks\n", tSum, tticks);
                Program.Results += String.Format("Async Sum: {0} in {1} ticks\n", aSum, aticks);
                Program.Results += String.Format("Threaded Sum: {0} in {1} ticks\n", dSum, dticks);
                Program.Results += String.Format("Parallel Sum: {0} in {1} ticks\n", pSum, pticks);
            }
        }
    
        class ArraySum
        {
            int[] Data;
            int ChunkSize = 1000;
            int cores = 1;
    
    
            public ArraySum(int[] data)
            {
                Data = data;
    
                cores = 0;
                foreach (var item in new System.Management.ManagementObjectSearcher("Select * from Win32_Processor").Get())
                {
                    cores += int.Parse(item["NumberOfCores"].ToString());
                }
                cores--;
                if (cores < 1) cores = 1;
    
                ChunkSize = Data.Length / cores + 1;
            }
    
            public long Sum()
            {
                long sum = 0;
                for (int i = 0; i < Data.Length; i++)
                {
                    sum += Data[i];
                }
                return sum;
            }
    
            public async Task<long> SumAsync()
            {
                Task<long>[] psums = new Task<long>[cores];
                for (int i = 0; i < psums.Length; i++)
                {
                    int start = i * ChunkSize;
                    int end = start + ChunkSize;
    
                    psums[i] = Task.Run<long>(() =>
                    {
                        long asum = 0;
                        for (int a = start; a < end && a < Data.Length; a++)
                        {
                            asum += Data[a];
                        }
                        return asum;
                    });
                }
    
                long sum = 0;
                for (int i = 0; i < psums.Length; i++)
                {
                    sum += await psums[i];
                }
    
                return sum;
            }
    
            public long SumThreaded()
            {
                long sum = 0;
                Thread[] threads = new Thread[cores];
                long[] buckets = new long[cores];
                for (int i = 0; i < cores; i++)
                {
                    int start = i * ChunkSize;
                    int end = start + ChunkSize;
                    int bucket = i;
                    threads[i] = new Thread(new ThreadStart(() =>
                    {
                        long asum = 0;
                        for (int a = start; a < end && a < Data.Length; a++)
                        {
                            asum += Data[a];
                        }
                        buckets[bucket] = asum;
                    }));
                    threads[i].Start();
                }
    
                for (int i = 0; i < cores; i++)
                {
                    threads[i].Join();
                    sum += buckets[i];
                }
    
                return sum;
            }
    
            public long SumParallel()
            {
                long sum = 0;
                long[] buckets = new long[cores];
                ParallelLoopResult lr = Parallel.For(0, cores, new Action<int>((i) =>
                {
                    int start = i * ChunkSize;
                    int end = start + ChunkSize;
                    int bucket = i;
                    long asum = 0;
                    for (int a = start; a < end && a < Data.Length; a++)
                    {
                        asum += Data[a];
                    }
                    buckets[bucket] = asum;
                }));
    
                for (int i = 0; i < cores; i++)
                {
                    sum += buckets[i];
                }
    
                return sum;
            }
        }
    }
    

    Any thoughts? Am I doing async/await wrong? I'll be happy to try any suggestions.

  • Jodrell
    Jodrell about 11 years
    @ohmusama, Not reinventing the wheel has numerous benefits +1, unless you know something special about the data this is about as fast as anything you can come up with. Its quick to type and easy to get right. If the data was distributed according to a known function, you might be able to short circuit the operation but that doesn't apply here.
  • Alexei Levenkov
    Alexei Levenkov about 11 years
    +1. And being single threaded in async/await case explains slowdown - there is much more code to run than basic Sum call to support all async infrastructure.
  • Servy
    Servy about 11 years
    The issues with benchmarking go beyond just that. You also aren't doing enough for the results to be statistically meaningful. The run method should be called a few thousand times and the results averaged for starters, GC collections should be run between timings, warmup runs through the code to ensure everything is JITted should be done, and I'm sure many other things. Having said all that, the question itself is flawed (as per my answer) not just the benchmarking tests.
  • ohmusama
    ohmusama about 11 years
    I like that I can just do a parallel sum, but, I want to know the full way of doing it. so I added a parallel test, is this a good example?
  • Servy
    Servy about 11 years
    @ohmusama Not really. Parallel.For isn't particularly good at aggregating work in parallel, so you'll do much better using PLINQ, as per my example. Your loop isn't really going to function much differently than your other examples, as opposed to a non-aggreate function in which each item in the collection was processed independently, in which case the ability of Paralell.For to provision the data better than you can and dynamically determine the appropriate number of workers, will help you see noticeable improvements.
  • Jodrell
    Jodrell about 11 years
    I suppose, given what we know about the generation of the data, it may be optimal to approximate the answer to 1000000
  • Stephen Cleary
    Stephen Cleary about 11 years
    Parallel.For can aggregate, but the syntax is awkward. For a simple summing example, PLINQ is better.
  • Daniel
    Daniel about 11 years
    This solution is slower than the single-threaded one! Not sure why; maybe AsParallel() forces access to the array to go through an IEnumerator, adding significant overhead?
  • Daniel
    Daniel about 11 years
    The results I posted are quite stable across multiple runs; but you're right that you should do many thousands of runs so that you don't run the risk of a context switch completely screwing up your results; I've edited my answer.
  • Servy
    Servy about 11 years
    @Daniel Well, the first thing I'd question is your method of benchmarking, because benchmarking is hard and people don't often do it right. Next, there is overhead involved in setting all of this up, and if the time spent doing the productive work is sufficiently small the overhead can cost more than the gains from parallelization; adding more work, or more costly work, will help mitigate that overhead in benchmarks.
  • ohmusama
    ohmusama about 11 years
    @Servy Okay, I ran the same test over again, and Daniel is right, the slow down is actually caused by the lambda expression being calculated on run. Running it again, results in faster times than the linear approach (even with async/await).
  • ohmusama
    ohmusama about 11 years
    @Daniel, You actually have the right answer up in my comment, if you could add it to your answer here, I will give you an upvote. The thing is, that that lambda expression must be compiled at run time, and thus the first run through is very slow, but then the second run through is faster than the linear sum. Thanks for finding that!
  • Daniel
    Daniel about 11 years
    @ohmusama: that's what I meant with the first point (initialization time). And it's not related to the use of a lambda expression (initialization of lambdas isn't any more expensive than initialization of ordinary methods), just the fact that more code needs to be loaded for the whole async/await machinery to work.
  • Marc2001
    Marc2001 over 5 years
    @StephenCleary It doesn't really look like hard compute-bound work to me