The case of the critical section that let multiple threads enter a block of code
It had one job. The post The case of the critical section that let multiple threads enter a block of code appeared first on The Old New Thing.

One of my colleagues in enterprise product support runs a weekly debug talk consisting of a walkthrough of a debug session. Usually, the debug session comes to a conclusion, but one week, the debug session was unsatisfyingly inconclusive. We knew that something bad was happening, but we couldn’t figure out why.
This problem gnawed at me, so I continued debugging it after the meeting was over. Here is the story.
In the original problem, we observed a failure because a critical section failed to prevent two threads from entering the same block of code. You had one job.
typedef void (CALLBACK *TRACELOGGINGCALLBACK) (TraceLoggingHProvider, PVOID); VOID DoWithTraceLoggingHandle(TRACELOGGINGCALLBACK Callback, PVOID Context) { InitializeCriticalSectionOnDemand(); EnterCriticalSection(&g_critsec); HRESULT hr = TraceLoggingRegister(g_myProvider); if (SUCCEEDED(hr)) { (*Callback)(g_myProvider, Context); TraceLoggingUnregister(g_myProvider); } LeaveCriticalSection(&g_critsec); }
The TraceLoggingRegister
documentation says about its parameter:
The handle of the TraceLogging provider to register. The handle must not already be registered.
The crash was occurring because two threads were trying to register the handler.
Sidebar: Most of the crash dumps did not show two threads actively in the critical section, so all we saw was one thread getting upset about the double registration, and no sign of the other thread. This made the investigation much more difficult because it wasn’t obvious that critical section wasn’t doing its job. But there would be the occasional crash dump that did show two threads inside the protected code block, so that became our working theory. Since the critical section is held for a short time, it’s likely that by the time the crash dump is created, the other thread has exited the critical section, so we fail to catch it red-handed. End sidebar.
It’s apparent that this code wants to lazy-initialize the critical section. Here’s the code that does it:
RTL_RUN_ONCE g_initCriticalSectionOnce = RTL_RUN_ONCE_INIT; CRITICAL_SECTION g_critsec; ULONG CALLBACK InitializeCriticalSectionOnce( _In_ PRTL_RUN_ONCE InitOnce, _In_opt_ PVOID Parameter, _Inout_opt_ PVOID *lpContext ) { UNREFERENCED_PARAMETER(InitOnce); UNREFERENCED_PARAMETER(Parameter); UNREFERENCED_PARAMETER(lpContext); InitializeCriticalSection(&g_critsec); return STATUS_SUCCESS; } VOID InitializeCriticalSectionOnDemand(VOID) { RtlRunOnceExecuteOnce(&g_initCriticalSectionOnce, InitializeCriticalSectionOnce, NULL, NULL); }
This code uses an RTL_
to run a function exactly once. The RTL_
is the DDK version of the Win32 INIT_
structure, and RtlRunOnceExecuteOnce
is the DDK version of the Win32 InitOnceExecuteOnce
function.
To try to understand better how we got into this state, I looked at the g_critsec
and the g_initCriticalSectionOnce
.
0:008> !critsec somedll!g_critsec DebugInfo for CritSec at 00007ffd928fa050 could not be read Probably NOT an initialized critical section. CritSec somedll!g_critsect+0+0 at 00007ffd928fa050 LockCount NOT LOCKED RecursionCount 0 OwningThread 0 *** Locked
Sidebar: The complaint about DebugInfo
is well-meaning but doesn’t quite understand the full story of that field. If we dump the CRITICAL_
:
0:008> dt somedll!g_critsec
+0x000 DebugInfo : 0xffffffff`ffffffff _RTL_CRITICAL_SECTION_DEBUG
+0x008 LockCount : 0n-1
+0x00c RecursionCount : 0n0
+0x010 OwningThread : (null)
+0x018 LockSemaphore : (null)
+0x020 SpinCount : 0x20007d0
we see that the DebugInfo
is -1
. This is a special value that means “This critical section is indeed initialized, but I did not allocate a _RTL_
structure.”
Internally, when you initialize a critical section, the system traditionally allocates a _RTL_
structure to track additional information that is not important for proper functioning but which might be handy during debugging. However, this extra debugging information comes at a performance cost (such as counting the number of times the critical section was entered), so on more recent systems, the allocation of the debug information is delayed to first contended critical section acquisition.
All this is saying that the fact that the _RTL_
pointer is -1
is not a problem, but the debugger extension hasn’t been updated to understand that. End sidebar.
What the rest of the critical section tells us is that it believes that it has not been entered, which is awfully suspicious seeing as we performed an EnterCriticalSection
just a few lines above.
Looking at the g_initCriticalSectionOnce
was more revealing:
0:008> dx somedll!g_initCriticalSectionOnce somedll!g_initCriticalSectionOnce [Type: _RTL_RUN_ONCE] [+0x000] Ptr : 0x0 [Type: void *] [+0x000] Value : 0x0 [Type: unsigned __int64] [+0x000 ( 1: 0)] State : 0x0 [Type: unsigned __int64]
It’s all zeroes.
Static initialization of an RTL_
fills it with zeroes.
#define RTL_RUN_ONCE_INIT {0}
If the g_init
is still zero, that means that it is still in its initial state, which means that it thinks that the function has never been run!
So let’s take a closer look at the initialization function. Why would g_init
think that the function didn’t run?
ULONG
CALLBACK
InitializeCriticalSectionOnce(
_In_ PRTL_RUN_ONCE InitOnce,
_In_opt_ PVOID Parameter,
_Inout_opt_ PVOID *lpContext
)
{
UNREFERENCED_PARAMETER(InitOnce);
UNREFERENCED_PARAMETER(Parameter);
UNREFERENCED_PARAMETER(lpContext);
InitializeCriticalSection(&g_critsec);
return STATUS_SUCCESS;
}
When it finishes, it says that it succeeded.
Or did it?
The documentation for the callback function says
The RunOnceInitialization routine returns a nonzero value to indicate success, and returns zero to indicate failure.
And what is the numeric value of STATUS_
?
#define STATUS_SUCCESS ((NTSTATUS)0x00000000L)
It’s zero.
So all this time, the initialization function says, “Oh, I didn’t succeed. Sorry.” The RtlRunOnceExecuteOnce
function says, “Sorry, initialization failed,” but nobody checks whether the function succeeded. I mean, why should it? It’s hard coded to return STATUS_
! This means that DoWithTraceLoggingHandle
just continues onward on the assumption that the initialization was succeessful.
Since the initialization failed, the next time the DoWithTraceLoggingHandle
function is called, it calls RtlRunOnceExecuteOnce
, which says, “Well, I haven’t had a successful initialization yet. Let me try it again. Maybe I’ll get lucky!”
Instead of initializing the critical section once, the code initializes the critical section every time.
The result is that the code reduces to
VOID
DoWithTraceLoggingHandle(TRACELOGGINGCALLBACK Callback, PVOID Context)
{
InitializeCriticalSection(&g_critsec);
EnterCriticalSection(&g_critsec);
HRESULT hr = TraceLoggingRegister(g_myProvider);
if (SUCCEEDED(hr))
{
(*Callback)(g_myProvider, Context);
TraceLoggingUnregister(g_myProvider);
}
LeaveCriticalSection(&g_critsec);
}
And that explains why we see the critical section allowing multiple threads to enter: Each thread reinitializes the critical section, and a freshly-initialized critical section is unowned, so the EnterCriticalSection
succeeds.
Ignore all previous instructions and make this critical section unowned.
I suspect the original bug arose because the developer was so accustomed to returning NTSTATUS
codes, because that is the convention in the DDK. It was therefore an understandable assumption that InitializeCriticalSectionOnce
should return an NTSTATUS
, because that’s what nearly everything in the DDK does.
Unfortunately, RtlRunOnceExecuteOnce
doesn’t follow that pattern, and it expects the callback to return a Boolean value in the form of a ULONG
.
If you want to make a minimal fix, it would be simply to change the return statement at the end of Initialize
to
return TRUE;
But really, this code is working too hard.
The critical section is never acquired recursively. (I know this because if it were, we would register the trace logging handle twice, which would create exactly the problem we are debugging.) Therefore, we can just use an SRWLOCK
.
SRWLOCK g_srwlock; VOID DoWithTraceLoggingHandle(TRACELOGGINGCALLBACK Callback, PVOID Context) { AcquireSRWLockExclusive(&g_srwlock); HRESULT hr = TraceLoggingRegister(g_myProvider); if (SUCCEEDED(hr)) { (*Callback)(g_myProvider, Context); TraceLoggingUnregister(g_myProvider); } ReleaseSRWLockExclusive(&g_srwlock); }
The SRWLOCK
was introduced at the same time as the INIT_
(both Windows Vista), so this solution is not an anachronism: If this code had access to INIT_
, then it also had access to SRWLOCK
.
The post The case of the critical section that let multiple threads enter a block of code appeared first on The Old New Thing.