Optimizing concurrent count operations

I recently reviewed a function that looked something like this:public class WorkQueue { private readonly ConcurrentQueue _embeddingsQueue = new(); private long _approximateCount = 0; public long ApproximateCount => Interlocked.Read(ref _approximateCount); public void Register(IEnumerable items) { foreach (var item in items) { _embeddingsQueue.Enqueue(item); Interlocked.Increment(ref _approximateCount); } } }I commented that we should move the Increment() operation outside of the loop because if two threads are calling Register() at the same time, we’ll have a lot of contention here.The reply was that this was intentional since calling Interlocked.CompareExchange() to do the update in a batch manner is more complex. The issue was a lack of familiarity with the Interlocked.Add() function, which allows us to write the function as:public void Register(IEnumerable items) { int count = 0; foreach (var item in items) { _embeddingsQueue.Enqueue(item); count++; } Interlocked.Add(ref _approximateCount, count); }This allows us to perform just one atomic operation on the count. In terms of assembly, we are going to have these two options:lock inc qword ptr [rcx] ; Interlocked.Increment() lock add [rbx], rcx ; Interlocked.Add()Both options have essentially the same exact performance characteristics, but if we need to register a large batch of items, the second option drastically reduces the contention. In this case, we don’t actually care about having an accurate count as items are added, so there is no reason to avoid the optimization.

Mar 26, 2025 - 14:40
 0

I recently reviewed a function that looked something like this:


public class WorkQueue<T>
{
    private readonly ConcurrentQueue<T> _embeddingsQueue = new();
    private long _approximateCount = 0;


    public long ApproximateCount => Interlocked.Read(ref _approximateCount);


    public void Register(IEnumerable<T> items)
    {
        foreach (var item in items)
        {
            _embeddingsQueue.Enqueue(item);


            Interlocked.Increment(ref _approximateCount);
        }
    }
}

I commented that we should move the Increment() operation outside of the loop because if two threads are calling Register() at the same time, we’ll have a lot of contention here.

The reply was that this was intentional since calling Interlocked.CompareExchange() to do the update in a batch manner is more complex. The issue was a lack of familiarity with the Interlocked.Add() function, which allows us to write the function as:


public void Register(IEnumerable<T> items)
{
    int count = 0;
    foreach (var item in items)
    {
        _embeddingsQueue.Enqueue(item);
        count++;
    }
    Interlocked.Add(ref _approximateCount, count);
}

This allows us to perform just one atomic operation on the count. In terms of assembly, we are going to have these two options:


lock inc qword ptr [rcx] ; Interlocked.Increment()
lock add [rbx], rcx      ; Interlocked.Add()

Both options have essentially the same exact performance characteristics, but if we need to register a large batch of items, the second option drastically reduces the contention.

In this case, we don’t actually care about having an accurate count as items are added, so there is no reason to avoid the optimization.