11

Preface: I'm only asking this because I don't have an environment (dataset large enough + computing power) to test it in a reliable fashion.

Question: Given a ConcurrentBag<T>, loaded with billions of items, being accessed/used by a single thread, does it perform similar to a List<T>? Putting in another words, is the enumeration over a ConcurrentBag<T> any more or less performatic than over a List<T>?

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
Leonardo
  • 10,737
  • 10
  • 62
  • 155
  • To what conclusion did your benchmark results lead you? – Lucas Trzesniewski Feb 15 '16 at 18:33
  • @LucasTrzesniewski my benchmark was inconclusive because test results very wildly. As mentioned in the question, my dev env is very weak (8gb ram only and i3 processor) and i don't have a good dataset to really stress it... – Leonardo Feb 15 '16 at 18:41
  • How will you be accessing the collection? Will you always be enumerating over it? (`ConcurrentBag` doesn't have indexers.) – Douglas Feb 15 '16 at 18:42
  • @Douglas the bag can be iterated over, as in `foreach` – Leonardo Feb 15 '16 at 18:43
  • And is there a possibility that your thread will add more items to the collection while iterating over it? – Douglas Feb 15 '16 at 18:44
  • @Douglas no items will be added after the enumeration starts – Leonardo Feb 15 '16 at 18:47
  • If you want to compare the performance simply Run your horses!! May be you can use library like [BenchmarkIt](https://github.com/bodyloss/BenchmarkIt) to help you doing it quickly. But again performance would vary based on the system configuration and architecture. – vendettamit Feb 15 '16 at 18:48
  • 3
    Why are you using a collection specifically designed to be accessed from multiple threads if you're only going to use it from a single thread? Also note that without knowing what you're actually doing with the collection, we couldn't possibly answer such a question. – Servy Feb 15 '16 at 18:52
  • @Servy i'm not going to use it ONLY from a single thread, but there's a possibility that due to high load, only 1 thread will iterate it! And as far as the operations being performed on the collection, at the collection lvl, its only read... no items being removed or inserted, only having properties updated – Leonardo Feb 15 '16 at 20:32

3 Answers3

12

ConcurrentBag<T> will inevitably be less performant than List<T>. Although you will only be accessing it from a single thread, the structure still needs to have mechanisms in place to protect against the possibility of race hazards should concurrent access arise.

If you will be loading the collection from a single thread before starting your enumerations, you can avoid the performance overhead by using the ConcurrentBag(IEnumerable<T>) constructor, rather than adding each item individually through its Add method.

ConcurrentBag<T> provides “moment-in-time snapshot” semantics for enumerations; see the remarks for its GetEnumerator method. When you access ConcurrentBag<T> from a foreach loop, it will first copy its entire contents into a plain List<T>, then enumerate over that. This will incur a substantial performance overhead (both computation- and memory-wise) each time you use it in a loop.

If your scenario is that your list will be populated by multiple threads, but then only read by one thread, then you should convert it to a List<T> as soon as the writers are done.

Douglas
  • 53,759
  • 13
  • 140
  • 188
7

Billions of items and List or Concurrent bag? That is a "no go".

As far as performance goes try this to test adding: (feel free to modify this to test other operations)

using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

namespace ConcurrentBagTest
{
    // You must compile this for x64 or you will get OutOfMemory exception
    class Program
    {
        static void Main(string[] args)
        {
            ListTest(10000000);
            ListTest(100000000);
            ListTest(1000000000);
            ConcurrentBagTest(10000000);
            ConcurrentBagTest(100000000);

            Console.ReadKey();

        }


        static void ConcurrentBagTest(long count)
        {
            try
            {
                var bag = new ConcurrentBag<long>();
                Console.WriteLine($"--- ConcurrentBagTest count = {count}");
                Console.WriteLine($"I will use {(count * sizeof(long)) / Math.Pow(1024, 2)} MiB of RAM");
                Stopwatch stopwatch = new Stopwatch();
                stopwatch.Start();
                for (long i = 0; i < count; i++)
                {
                    bag.Add(i);
                }
                stopwatch.Stop();
                Console.WriteLine($"Inserted {bag.LongCount()} items in {stopwatch.Elapsed.TotalSeconds} s");
                Console.WriteLine();
                Console.WriteLine();
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.ToString());
            }
            GC.Collect();
            GC.WaitForPendingFinalizers();
        }

        static void ListTest(long count)
        {
            try
            {
                var list = new List<long>();
                Console.WriteLine($"--- ListTest count = {count}");
                Console.WriteLine($"I will use {(count * sizeof(long)) / Math.Pow(1024, 2)} MiB of RAM");
                Stopwatch stopwatch = new Stopwatch();
                stopwatch.Start();
                for (long i = 0; i < count; i++)
                {
                    list.Add(i);
                }
                stopwatch.Stop();
                Console.WriteLine($"Inserted {list.LongCount()} items in {stopwatch.Elapsed.TotalSeconds} s");
                Console.WriteLine();
                Console.WriteLine();
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.ToString());
            }
            GC.Collect();
            GC.WaitForPendingFinalizers();
        }
    }
}

My Output:

--- ListTest count = 10000000
I will use 76,2939453125 MiB of RAM
Inserted 10000000 items in 0,0807315 s


--- ListTest count = 100000000
I will use 762,939453125 MiB of RAM
Inserted 100000000 items in 0,7741546 s


--- ListTest count = 1000000000
I will use 7629,39453125 MiB of RAM
System.OutOfMemoryException: Array dimensions exceeded supported range.

--- ConcurrentBagTest count = 10000000
I will use 76,2939453125 MiB of RAM
Inserted 10000000 items in 1,0744069 s


--- ConcurrentBagTest count = 100000000
I will use 762,939453125 MiB of RAM
Inserted 100000000 items in 11,3976436 s

Using CPU: Intel Core i7-2600 @ 3.4 GHz,

Using RAM: 16 GB

Also take a look at this answer for limitations.

Community
  • 1
  • 1
Eiver
  • 2,594
  • 2
  • 22
  • 36
2

But, if you need to remove items, the ConcurrentBag is SIGNIFICANTLY faster than the List

void Main()
{
    ConcurrentBag<int> bag = new ConcurrentBag<int>();
    ConcurrentStack<int> stack = new ConcurrentStack<int>();
    ConcurrentQueue<int> q = new ConcurrentQueue<int>();
    List<int> list = new List<int>();

    Stopwatch sw = new Stopwatch();
    int count = 100000;
    sw.Start();
    for (int i = 0; i < count; i++)
    {
        bag.Add(i);
    }
    for (int i = 0; i< count; i++)
    {
        bag.TryTake(out _);
    }
    sw.Elapsed.Dump("BAG");
    sw.Start();
    for (int i = 0; i < count; i++)
    {
        stack.Push(i);
    }
    for (int i = 0; i < count; i++)
    {
        stack.TryPop(out _);
    }
    sw.Elapsed.Dump("Stack");
    sw.Start();
    for (int i = 0; i < count; i++)
    {
        q.Enqueue(i);
    }
    for (int i = 0; i < count; i++)
    {
        q.TryDequeue(out _);
    }
    sw.Elapsed.Dump("Q");
    sw.Start();
    for (int i = 0; i < count; i++)
    {
        list.Add(i);
    }
    for (int i = 0; i < count; i++)
    {
        list.RemoveAt(0);
    }
    sw.Elapsed.Dump("list remove at 0");
    sw.Start();
    for (int i = 0; i < count; i++)
    {
        list.Add(i);
    }
    for (int i = 0; i < count; i++)
    {
        list.RemoveAt(list.Count -1);
    }
    sw.Elapsed.Dump("list remove at end");
}

Results:

BAG 00:00:00.0144421

Stack 00:00:00.0341379

Q 00:00:00.0400114

list remove at 0 00:00:00.6188329

list remove at end 00:00:00.6202170

  • Your benchmark is wrong. You need to [`Restart`](https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.stopwatch.restart) the stopwatch, not just `Start` it. Removing from the end of the `List` [is actually the fastest](https://dotnetfiddle.net/Erbzp4). – Theodor Zoulias Feb 02 '23 at 15:57