Process List of Files Asynchronously using async and await in C# Console App

10,621

Solution 1

I combined the comments from above in order to reach my solution. Indeed, I didn't need to use the async or await keywords at all. I merely had to create a list of tasks, start them all, then call WaitAll. Nothing need be decorated with the async or await keywords. Here is the resulting code:

public class MyClass
{
    private int filesRead = 0;

    public void Go()
    {
        string[] fileSystemEntries = Directory.GetFileSystemEntries(@"Path\To\Files");

        Console.WriteLine("Starting to read from files! Count: {0}", fileSystemEntries.Length);
        List<Task> tasks = new List<Task>();
        foreach (var filePath in fileSystemEntries.OrderBy(s => s))
        {
            Task task = Task.Run(() => DoStuff(filePath));
            tasks.Add(task);
        }
        Task.WaitAll(tasks.ToArray());
        Console.WriteLine("Finish! Read {0} file(s).", filesRead);
    }

    private void DoStuff(string filePath)
    {
        string fileName = Path.GetFileName(filePath);
        string firstLineOfFile = File.ReadLines(filePath).First();
        Console.WriteLine("[{0}] {1}: {2}", Thread.CurrentThread.ManagedThreadId, fileName, firstLineOfFile);
        filesRead++;
    }
}

When testing, I added Thread.Sleep calls, as well as busy loops to peg the CPUs on my machine. Opening Task Manager, I observed all of the cores being pegged during the busy loops, and every time I run the program, the files are run in an inconsistent order (a good thing, since that shows that the only bottleneck is the number of available threads).

Every time I run the program, fileSystemEntries.Length always matched filesRead.

EDIT: Based on the comment discussion above, I've found a cleaner (and, based on the linked question in the comments, more efficient) solution is to use Parallel.ForEach:

public class MyClass
{
    private int filesRead;

    public void Go()
    {
        string[] fileSystemEntries = Directory.GetFileSystemEntries(@"Path\To\Files");

        Console.WriteLine("Starting to read from files! Count: {0}", fileSystemEntries.Length);
        Parallel.ForEach(fileSystemEntries, DoStuff);
        Console.WriteLine("Finish! Read {0} file(s).", filesRead);
    }

    private void DoStuff(string filePath)
    {
        string fileName = Path.GetFileName(filePath);
        string firstLineOfFile = File.ReadLines(filePath).First();
        Console.WriteLine("[{0}] {1}: {2}", Thread.CurrentThread.ManagedThreadId, fileName, firstLineOfFile);
        filesRead++;
    }
}

It seems there are many ways to approach asynchronous programming in C# now. Between Parallel and Task and async/await, there's a lot to choose from. Based upon this thread, it looks like the best solution for me is Parallel, as it provides the cleanest solution, is more efficient than manually creating Task objects myself, and does not clutter the code with async and await keywords while acheiving similar results.

Solution 2

One of the major design goals behind async/await was to facilitate the use of naturally asynchronous I/O APIs. In this light, your code might be rewritten like this (untested):

public class MyClass
{
    private int filesRead = 0;

    public void Go()
    {
        GoAsync().Wait();
    }

    private async Task GoAsync()
    {
        string[] fileSystemEntries = Directory.GetFileSystemEntries(@"Path\To\Files");

        Console.WriteLine("Starting to read from files! Count: {0}", fileSystemEntries.Length);

        var tasks = fileSystemEntries.OrderBy(s => s).Select(
            fileName => DoStuffAsync(fileName));
        await Task.WhenAll(tasks.ToArray());

        Console.WriteLine("Finish! Read {0} file(s).", filesRead);
    }

    private async Task DoStuffAsync(string filePath)
    {
        string fileName = Path.GetFileName(filePath);
        using (var reader = new StreamReader(filePath))
        {
            string firstLineOfFile = 
                await reader.ReadLineAsync().ConfigureAwait(false);
            Console.WriteLine("[{0}] {1}: {2}", Thread.CurrentThread.ManagedThreadId, fileName, firstLineOfFile);
            Interlocked.Increment(ref filesRead);
        }
    }
}

Note, it doesn't spawn any new threads explicitly, but that may be happening behind the scene with await reader.ReadLineAsync().ConfigureAwait(false).

Share:
10,621
Doctor Blue
Author by

Doctor Blue

Updated on June 05, 2022

Comments

  • Doctor Blue
    Doctor Blue almost 2 years

    I'm playing around with async and await in C# in a simple little console application. My goal is simple: To process a list of files in asynchronous manner, so that the processing of one file does not block the processing of others. None of the files are dependent on one-another, and there are (let's say) thousands of files to go through.

    Here's is the code I have currently.

    public class MyClass
    {
        public void Go()
        {
            string[] fileSystemEntries = Directory.GetFileSystemEntries(@"Path\To\Files");
    
            Console.WriteLine("Starting to read from files!");
            foreach (var filePath in fileSystemEntries.OrderBy(s => s))
            {
                Task task = new Task(() => DoStuff(filePath));
                task.Start();
                task.Wait();
            }
        }
    
        private async void DoStuff(string filePath)
        {
            await Task.Run(() =>
            {
                Thread.Sleep(1000);
                string fileName = Path.GetFileName(filePath);
                string firstLineOfFile = File.ReadLines(filePath).First();
                Console.WriteLine("{0}: {1}", fileName, firstLineOfFile);
            });
        }
    }
    

    And my Main() method simply invokes this class:

    public static class Program
    {
        public static void Main()
        {
            var myClass = new MyClass();
            myClass.Go();
        }
    }
    

    There's some piece to this asynchronous programming patten that I seem to be missing, though, since whenever I run the program, it seems random how many files are actually processed, anywhere from none of them to all six of them (in my example file set).

    Basically, the main thread isn't waiting for all of the files to be processed, which I suppose is part of the point of asynchronously-running things, but I don't quite want that. All I want is: Process as many of these files in as many threads as you can, but still wait for them all to complete processing before finishing up.

  • Stephen Cleary
    Stephen Cleary about 10 years
    Tip: Use Task.Run instead of new Task + Task.Start.
  • Doctor Blue
    Doctor Blue about 10 years
    Thanks! Updated answer to reflect this.
  • Daniel Mann
    Daniel Mann about 10 years
    Task.Run is the wrong approach here. Use Task.Run for long-running, CPU bound operations. Use async/await for long-running I/O bound operations. Noseratio's answer is the correct approach to be taking here.
  • Doctor Blue
    Doctor Blue about 10 years
    I actually think that Parallel.ForEach is serving my purposes best now. You are correct, though: Task isn't what I should be using, I think.