When to use BlockingCollection and when ConcurrentBag instead of List<T>?

50,278

Solution 1

You can indeed use a BlockingCollection, but there is absolutely no point in doing so.

First off, note that BlockingCollection is a wrapper around a collection that implements IProducerConsumerCollection<T>. Any type that implements that interface can be used as the underlying storage:

When you create a BlockingCollection<T> object, you can specify not only the bounded capacity but also the type of collection to use. For example, you could specify a ConcurrentQueue<T> object for first in, first out (FIFO) behavior, or a ConcurrentStack<T> object for last in,first out (LIFO) behavior. You can use any collection class that implements the IProducerConsumerCollection<T> interface. The default collection type for BlockingCollection<T> is ConcurrentQueue<T>.

This includes ConcurrentBag<T>, which means you can have a blocking concurrent bag. So what's the difference between a plain IProducerConsumerCollection<T> and a blocking collection? The documentation of BlockingCollection says (emphasis mine):

BlockingCollection<T> is used as a wrapper for an IProducerConsumerCollection<T> instance, allowing removal attempts from the collection to block until data is available to be removed. Similarly, a BlockingCollection<T> can be created to enforce an upper-bound on the number of data elements allowed in the IProducerConsumerCollection<T> [...]

Since in the linked question there is no need to do either of these things, using BlockingCollection simply adds a layer of functionality that goes unused.

Solution 2

  • List<T> is a collection designed to use in single thread applications.

  • ConcurrentBag<T> is a class of Collections.Concurrent namespace designed to simplify using collections in multi-thread environments. If you use ConcurrentCollection you will not have to lock your collection to prevent corruption by other threads. You can insert or take data from your collection with no need to write special locking codes.

  • BlockingCollection<T> is designed to get rid of the requirement of checking if new data is available in the shared collection between threads. if there is new data inserted into the shared collection then your consumer thread will awake immediately. So you do not have to check if new data is available for consumer thread in certain time intervals typically in a while loop.

Solution 3

Whenever you find the need for a thread-safe List<T>, in most cases neither the ConcurrentBag<T> nor the BlockingCollection<T> are going to be your best option. Both collections are specialized for facilitating producer-consumer scenarios, so unless you have more than one threads that are concurrently adding and removing items from the collection, you should look for other options (with the best candidate being the ConcurrentQueue<T> in most cases).

Regarding especially the ConcurrentBag<T>, it's an extremely specialized class targeting mixed producer-consumer scenarios. This means that each worker-thread is expected to be both a producer and a consumer (that adds and removes items from the same collection). It could be a good candidate for the internal storage of an ObjectPool class, but beyond that it is hard to imagine any advantageous usage scenario for this class.

People usually think that the ConcurrentBag<T> is the thread-safe equivalent of a List<T>, but it's not. The similarity of the two APIs is misleading. Calling Add to a List<T> results to adding an item at the end of the list. Calling Add to a ConcurrentBag<T> results instead to the item being added at a random slot inside the bag. The ConcurrentBag<T> is essentially unordered. It is not optimized for being enumerated, and does a lousy job when it is commanded to do so. It maintains internally a bunch of thread-local queues, so the order of its contents is dominated by which thread did what, not by when did something happened.

These characteristics make the ConcurrentBag<T> a less than ideal choice for storing the results of a Parallel.For/Parallel.ForEach loop.

A better thread-safe substitute of the List<T>.Add is the ConcurrentQueue<T>.Enqueue method. "Enqueue" is a less familiar word than "Add", but it actually does what you expect it to do.

There is nothing that a ConcurrentBag<T> can do that a ConcurrentQueue<T> can't. For example neither collection offers a way to remove a specific item from the collection. If you want a concurrent collection with a TryRemove method that has a key parameter, you could look at the ConcurrentDictionary<K,V> class.

The ConcurrentBag<T> appears frequently in the Task Parallel Library-related examples in Microsoft's documentation. Like here for example. Whoever wrote the documentation, apparently they valued more the tiny usability advantage of writing Add instead of Enqueue, than the behavioral/performance disadvantage of using the wrong collection. This makes some sense considering that the examples were authored at a time when the TPL was new, and the goal was the fast adoption of the library by developers who were mostly unfamiliar with parallel programming. I get it, Enqueue is a scary word when you see it for the first time. Unfortunately now there is a whole generation of developers that have incorporated the ConcurrentBag<T> in their mental tools, although it has no business being there, considering how specialized this collection is.

In case you want to collect the results of a Parallel.ForEach loop in exactly the same order as the source elements, you can use a List<T> protected with a lock. In most cases the overhead should be negligible, especially if the work inside the loop is chunky. An example is shown below, featuring the Select LINQ operator for getting the index of each element.

var indexedSource = source.Select((item, index) => (item, index));

List<TResult> results = new();

Parallel.ForEach(indexedSource, parallelOptions, entry =>
{
    var (item, index) = entry;
    TResult result = GetResult(item);
    lock (results)
    {
        while (results.Count <= index) results.Add(default);
        results[index] = result;
    }
});

Solution 4

Yes, you could use BlockingCollection for that. finishedProxies would be defined as:

BlockingCollection<string> finishedProxies = new BlockingCollection<string>();

and to add an item, you would write:

finishedProxies.Add(checkResult);

And when it's done, you could create a list from the contents.

Share:
50,278
Fulproof
Author by

Fulproof

Updated on November 10, 2021

Comments

  • Fulproof
    Fulproof over 2 years

    The accepted answer to question "Why does this Parallel.ForEach code freeze the program up?" advises to substitute the List usage by ConcurrentBag in a WPF application.

    I'd like to understand whether a BlockingCollection can be used in this case instead?

  • Fulproof
    Fulproof about 11 years
    @ Jon, thanks, this helped me a lot to break me from the state of idiocy and stop loosing time studying ConcurrentBag and BlockingCollection when I really need ConcurrentDictionary
  • Fulproof
    Fulproof about 11 years
    Jim Mischel, reading your .NET Reference Guide. I wish I could find it earlier, so close to my practical needs
  • C. Tewalt
    C. Tewalt over 4 years
    I see no class for ConcurrentCollection<T> From the decompiler: public class ConcurrentBag<T> : IProducerConsumerCollection<T>, IEnumerable<T>, IEnumerable, ICollection, IReadOnlyCollection<T>
  • C. Tewalt
    C. Tewalt over 4 years
    I still gave +1 - it helped to have the clarification on the consumer thread awakening on BlockingCollection<T> Thanks!