Crate-training Tiamat, un-calling Cthulhu:Taming the UB monsters in C++

For more background on safety and security issues related to C++, including definitions of “language safety” and “software security” and similar terms, see my March 2024 essay “C++ safety, in context.” This essay picks up our story where that one left off to bring us up to date with a specific focus on undefined behavior … Continue reading Crate-training Tiamat, un-calling Cthulhu:Taming the UB monsters in C++ →

Mar 31, 2025 - 09:21
 0
Crate-training Tiamat, un-calling Cthulhu:Taming the UB monsters in C++

For more background on safety and security issues related to C++, including definitions of “language safety” and “software security” and similar terms, see my March 2024 essay “C++ safety, in context.” This essay picks up our story where that one left off to bring us up to date with a specific focus on undefined behavior (aka UB).

This is a status update on improvements currently in progress for hardening and securing our C++ software.

The C++ community broadly has a lot of hardening work well underway. Across the industry, this includes work being done by individual vendors, that they are then contributing to the standardization process so C++ programmers can use it portably. In the standard, it includes things we have had for a while (UB-free constexpr compile-time code) to things we’ve done recently (in draft C++26: erroneous behavior, bounds-hardened standard library, and contracts for functional safety) to proposals we’re actively pursuing next (in progress: Bjarne Stroustrup’s profiles, Úlfar Erlingsson’s remote code execution hardening).

A common underlying thread of all this work is that each piece addresses more and more of C++’s undefined behavior (aka UB), and especially the UB most exploited by attackers. We’re addressing UB methodically, starting with addressing the common high-value cases that will do the most to harden our code: uninitialized variables, out-of-bounds access, pointer misuse, and the key UB cases that adversaries need to implement remote code execution. These are the weaknesses that attackers exploit, and that we are locking down to lock them out.

Common (dis)belief: “UB is just too central to C++, trying to improve it enough to matter is hopeless”

Tiamat and Cthulhu in a cage, with a happy person in front making a thumbs-up sign
For the sake of discussion, assume the cage is impervious to dragon breath and psionics. It’s just a metaphor.

Tech pundits still seem to commonly assume that UB is so fundamentally entangled in C++’s specification and programs that C++ will never be able to address enough UB to really matter. And it is true that it’s currently way too easy to accidentally let tendrils of silent UB slither pervasively throughout our C++ code.

Background in a nutshell: In C++, code that (usually accidentally) exercises UB is the primary root cause of our memory safety and security vulnerability issues. When a program contains UB, anything can happen; it’s common to call the whole thing “the UB dragon” and say “UB can reformat your hard drive or make demons fly out your nose” — hence the Tiamat and Cthulhu metaphors. Worse than those things, however, is that UB regularly leads to exploitable security vulnerabilities and other expensive-to-fix bugs. (For more details about UB, see the Appendix.)

So it’s valid to ask: Can and will C++ ever do enough about UB to make a major difference?

Summary and spoilers

In this post, I’m happy to report that serious taming of C++ UB is underway…

(1) Since C++11 in 2011, more and more C++ code has already become UB-free. Most people just didn’t notice.

  • Spoiler: All constexpr/consteval compile-time code is UB-free. As of C++26 almost the entire language and much of the standard library is available at compile time, and is UB-free when executed at compile time (but not when the code is executed at run time, hence the following additional work all of which is about run-time execution).

(2) Since March 2024, the draft C++26 standard has already removed key “low-hanging fruit” run-time UB cases that were the root cause of significant categories of security vulnerabilities.

  • Spoiler: In draft C++26, uninitialized local variables are no longer UB, and most common non-iterator bounds errors in the hardened standard library, such as for string and vector and string_view and span, will no longer be UB in a “hardened” implementation. (And C++26 also has language contracts for a different aspect of safety, namely functional safety for defensive programming to reduce bugs in general.)

(3) Now, we’re undertaking to add more tools and to systematically catalog and address run-time UB in the C++ language.

If successful, these steps would achieve parity with the other modern memory-safe languages as measured by the number of security vulnerabilities, which would eliminate any safety-related reasons not to use C++. Note that leveling the playing field with other languages still means there are other security issues that need to be addressed too, in all languages, such as logic bugs for functional safety (C++26 contracts will help here); we’re first addressing the most valuable target to get to parity with other modern languages and then will continue to do more.

Importantly, this approach to hardening C++ doesn’t change C++’s value proposition — it keeps C++ still C++, it doesn’t try to turn C++ into “something else” such as by requiring mandatory performance overheads. All of the above embrace C++’s existing source code and its “zero-overhead, don’t pay for it if you don’t use it” core values, and just make it convenient to make memory safety the default — always with an opt-out, so that full performance and control is always available when you want to let Tiamat and Cthulhu use their powers in your service, under your control and for good.

And it’s designed to be super adoptable to bring existing code forward:

  • Many of the improvements are adoptable without any code changes (really!) — just recompile your existing project with a C++26 compiler, and your code will be safer. This is important because when you write code you write bugs, and even when you write code to fix bugs you write new bugs; this is part of the cost of requiring code changes that we’d like to minimize.
  • Even when you opt into a profile language subset that rejects unsafe code by default, you can still opt back out to writing the unsafe thing with an explicit, greppable, and auditable “suppress safety rule here” annotation (similar to “unsafe” in other languages).

That’s it — if you stop reading here, you have the full story.

But I think the details are pretty interesting, so join me if you like as we dive further into the above points (1), (2), and (3)…

(1) Since 2011: constexpr code

Starting in C++11, C++’s compile-time constexpr world has already become a sandbox free from undefined behavior, quietly revolutionizing C++ by enabling powerful compile-time computation while also ensuring safety. During constexpr evaluation, the language mandates well-defined behavior — no wild pointers, no uninitialized reads, no surprises. If an operation might trigger undefined behavior, the compiler simply rejects the constexpr evaluation at compile time. This guarantees correctness before execution time, empowering developers to write faster, safer, and more expressive code.

Every release of C++ has continued making more of the language and standard library available in compile-time constexpr code, so that as of C++26 nearly the entire language and much of the standard library is available in constexpr code.

This is modern C++ at its best: unleashing compile-time power while also enforcing its correctness.

This is in production use, not vaporware: All major compilers have supported UB-free constexpr compile-time code for over a decade and it’s in widespread production use. Probably almost every nontrivial C++ project today is already using at least some UB-free constexpr code, unless it is very old code compiled with a very old compiler.

(2) Since 2024: Language safety and software security improvements adopted for C++26

Over the past year, C++26 has made further solid progress on language safety and software security. Briefly, here’s what C++26 has already adopted (some of this material is repeated from my previous trip reports; see the links for much more detail and discussion):

  • In March 2024 (see my March 2024 trip report), draft C++26 eliminated UB for uninitialized variables by turning it instead into a new kind of behavior: erroneous behavior (aka EB) that is still considered “wrong code” (so compilers should still warn about it) but is now well-defined so it is no longer UB-dragon-bait even if your code does transgress. That eliminates one root cause of a serious class of security vulnerabilities.
  • Last month (see my February 2025 trip report), draft C++26 additionally added a specification for a hardened standard library. Just recompiling with a hardened library gives our programs bounds safety guarantees for many common non-iterator C++26 standard library operations, including common operations on very popular standard types: string, string_view, span, mdspan, vector, array, optional, expected, bitset, and vararray. (At the same meeting, we also adopted language contracts to help improve functional safety for defensive programming to reduce bugs in general.)

Importantly, both of these achieve the holy grail of adoptability: “Just recompile all your existing code with a C++26 compiler / hardened library, and it will be safer.” That’s just an awesome adoption story. If you’ve seen any of my recent talks, you know this is close to my heart… see especially this short clip from my November talk in Poland and also this short clip in the Q&A about the societal value of improving C++. Of course, getting full safety improvements will sometimes require code changes, nobody is saying otherwise — for example, if you write a dangling pointer because your code is confused about ownership then you really will need to go fix and possibly restructure your code. But it’s pretty nice that we can get a subset of the safety improvements even just by recompiling our existing code!

Again, this is in production use, not vaporware: The support for uninitialized variables and the hardened standard library may be new to draft standard C++26, but they are already well supported on existing compilers. For uninitialized variables, you can already use the pre-standard compiler switches -ftrivial-auto-var-init=pattern (GCC, Clang) and /RTC1 (MSVC). For the hardened standard library, as the P3471 authors note, it has already been deployed in major commercial environments (you can use it today in libc++, see documentation here; MS-STL and libstdc++ have some similar options):

“We have experience deploying hardening on Apple platforms in several existing codebases.

Google recently published an article where they describe their experience with deploying this very technology to hundreds of millions of lines of code. They reported a performance impact as low as 0.3% and finding over 1000 bugs, including security-critical ones.

Google Andromeda published an article ~1 year ago about their successful experience enabling hardening.

The libc++ maintainers have received numerous informal reports of hardening being turned on and helping find bugs in codebases.

Overall, standard library hardening has been a huge success, in fact we never expected so much success. The reception has been overwhelmingly positive …”

This really demonstrates the value of addressing low-hanging fruit, and the Pareto principle (aka 80/20 rule): Often 80% of the benefit comes from the first 20% of investment.

(3) Since the past month: More work ongoing in the C++26 timeframe

For about a year now, multiple C++ committee experts have independently proposed systematically cataloging and/or addressing UB in C++:

  • December 2023: Shafik Yaghmour’s proposal P3075R0 to catalog C++’s language UB and document it as an Annex to the standard. (Building on his earlier pre-pandemic paper P1705R1.) This was encouraged by the core language specification subgroup (aka CWG) at the March 2024 meeting.
  • October 2024: My proposal P3436R0 to catalog UB and systematically address it using the opt-in mechanism of Bjarne Stroustrup and Gabriel Dos Reis’ language profiles proposal which has the ability to designate profiles as “named groups” of related compile-time restrictions and run-time checks that are easy to opt into to make safety the default. For more details, see my November 2024 trip report. This was unanimously encouraged by the Safety and Security subgroup (aka SG23) at the November 2024 meeting.
  • October 2024: Timur Doumler, Gašper Ažman, and Joshua Berne’s proposal P3100R1 to catalog UB and systematically address it as contract violations, using the new C++26 contract_assert feature to perform run time checks also for problematic language features. There is a related proposal P3400 to designate contract labels as “named groups” of related run-time checks that are easy to opt into to make safety the default. P3100 was unanimously encouraged by the Contracts subgroup (aka SG21) at the November 2024 meeting.

You can see the pattern: there are proposers and volunteers to

  • systematically catalog language UB,
  • specify a way to eliminate the UB (make it illegal, or well-defined including where necessary with a run-time check such as a bounds check),
  • make that elimination happen preferably all the time where it’s efficient enough (as C++26 is doing for uninitialized local variables) or else under a named group that’s easy to opt into (profile name, or contract label name), and
  • realizing that different UB cases need to be addressed in different ways, and we’re willing to put in the effort… no magic wand, Just Engineering.

At our February 2025 meeting, the main subgroup responsible for all language evolution (aka EWG) took these suggestions and gathered them together, and the group approved a mandate to pursue

Note that this is separate from C++26, because C++26 is now undergoing feature freeze and will spend the next year doing comment review and fit-and-finish, so we cannot now add new material (such as UB mitigations) to C++26 itself. But we want to keep our momentum and not let this important work wait for C++29, so concurrently with C++26 “in the C++26 timeframe” we intend to work on a white paper to catalog and address C++ language UB, that we hope to publish around the same time as C++26 is published.

Note: A white paper is an ISO publication that’s a flavor of Technical Specification (TS); think of a white paper or TS as a “feature branch.” The C++ committee has already published a dozen TSes since 2012, such as the concepts and modules TSes, most of which have already been merged into the “trunk” international standard (aka IS). A white paper and TS use the same process within the C++ committee, but a white paper just has less ISO red tape at the end compared to a TS so it can be published faster.

So now and over the next year or two, we’re undertaking to systematically catalog cases of UB in the C++ language to put a visible label on each fang and tentacle. Then, starting with the most important high-value targets, start deciding whether and how to address each in the most appropriate way but likely using those three tools mentioned in the mandate:

  1. C++26 erroneous behavior, which you’ll recall the draft C++26 standard is already using to deal with uninitialized local variables.
  2. Bjarne Stroustrup’s profiles and Gabriel Dos Reis’ P3589 profiles framework which allow us to create named groups of rules and checks, so that program code can easily opt into full safety by default and tactically opt out again where needed. Efforts now underway are focusing on implementation and deployment of the profiles framework and a few key profiles for experimentation across the C++ ecosystem.
  3. C++26 contract assertions to check language features, as extended with P3400 labels which allow us to create named groups of checks.

I won’t lie: This is going to be a metric ton of work. And it’s work that I think some people don’t expect C++ to ever be able to do. But I think that it is achievable, and that it will be worth it, and we appreciate and want to thank all the committee members who have already expressed interest in volunteering to help — thank you!

New a week ago: P3656 strongly encouraged

Gašper Ažman and I got appointed to try to organize the work. So to get this started, Gašper and I wrote paper P3656 to detail a proposed procedure and plan. On March 19, EWG reviewed this in a telecon and voted strong encouragement that

So here’s a quick overview of what we aim to do over the coming year or two, in the same timeframe as C++26…

First, list cases: Enumerate language UB

The goal of this part is to tag every case of language UB directly in the standard’s LaTeX sources, with at least a short description and code example. Using LaTeX tags right in the standard’s sources will let us automatically build another Annex to list the UB in one place, as the standard already does for the grammar for example. Additional detailed discussion and selected mitigations will go into the white paper.

We will also likely tag some basic attributes of each UB, such as:

  • have security experts tag whether it is directly exploitable, so that we can prioritize security-critical low-hanging fruit first; and
  • tag whether it is cheap to check locally with information already available (such as null pointer dereference which is easy to check locally with ptr != nullptr) or requires more information (such as other-than-null dangling pointer dereference which is more challenging, and some UB may be too expensive to entirely remove).

This also creates backpressure to reduce adding future UB, by requiring discussion and documentation in this list for any new UB proposals.

Second, list tools: Create a “non-exhaustive starter menu of tools”

The idea is to make an initial list of the tool(s) we can apply to each case of UB.

The EWG mandate already included erroneous behavior (EB), profiles, and contracts as the primary expected tools, so a slightly more detailed candidate list might be:

  • make the UB well-defined (just fix it always, no opt-in required; this could be a run-time check);
  • make the UB fail to compile (e.g., make it ill-formed which could change the meaning of SFINAE code that could use a different fallback path to avoid the UB path, or make it directly rejected without changing any meaning), either always or when a profile/label is enforced;
  • make the UB deprecated, either always or when a profile/label is enforced; and/or
  • make the UB be EB instead, either always (as we did for uninitialized locals) or when a profile/label is enforced.

This list is not exhaustive; we may find UB we want to handle using another technique, but I expect most cases of UB can be handled well using these tools.

We also intend to write some initial guidelines, for EWG to review and approve, about when to use each tool, including performance considerations, adoption hurdles (like frequency of that UB, or consequences of crashes), and other common considerations.

Third, apply: For each case of UB, say how we plan to address it

In many cases, this will require thoughtful papers, including strong implementation experience when there is a risk that performance or deployability may be difficult. My expectation is that we will find groups of similar UB that can all be handled in one paper, but the point is we want to be methodical about this… we aim to move fast, but the primary goal here is to make sure we actually unbreak things.

Fourth, group: Group UB cases into cohesive groups (profiles names / contract labels)

Finally, we can identify cohesive groups of UB that programs will want to address together, which makes them easy to opt into as a unit; for example, a “bounds_safety” group could include all bounds safety-related UB. These groups can overlap; for example, the same UB fix might be selectable as part of a “bounds_safety” group and as part of a general larger “strict_cplusplus” group.

New a few days ago: Efforts in progress to lock down the specific UB that malicious code relies on

Relatedly, a very interesting proposal was brought to the February ISO C++ meeting by Úlfar Erlingsson, Google’s DE for Cloud Security, P3627R0 (slides): “Easy-to-adopt security profiles for preventing RCE (remote code execution) in existing C++ code.”

Summarizing Úlfar’s premise:

  • We have already developed sufficient hardening implementation technology in modern compilers to effectively harden existing C++ code without code changes — not by aiming for language memory safety guarantees broadly, but by surgically targeting key UB that makes remote code execution (RCE) possible. Specifically: Stack integrity, control-flow integrity (CFI), heap data integrity, and pointer integrity and unforgeability. (Note: Úlfar was the first to efficiently implement stack integrity with strong guarantees, working with George Necula who originally designed it in CCured; and he and collaborators were the first to propose and implement CFI.)
  • If we do nothing more than take away the UB that can be used as building blocks for RCE (even if we still allowed other corruption), then bad actors would lose most of the tools they use to gain control over execution and run their malware, and we would dramatically harden the world’s code.
  • A key problem is that right now these technologies exist as separate features when the real benefit comes from enabling them together, and so we should standardize a profile that lets programmers tell their compilers to activate them together.

On Thursday, Úlfar published a new paper elaborating these ideas: “How to Secure Existing C and C++ Software without Memory Safety” describes how these techniques could not only prevent most RCE but also generally retake control of execution away from the attackers.

It’s well worth reading. An updated paper proposing this material for C++ standardization is expected soon in the C++ committee. As Úlfar notes (emphasis added): “This is a big change and will require a team effort: Researchers and standards bodies need to work together to define a set of protection profiles that can be applied to secure existing software — without new risks or difficulties — easily, at the flip of a flag …”

Note: A related new publication updated a week ago is the OpenSSF “Compiler Options Hardening Guide for C and C++.” This is a useful guide to existing security options that are good to know about and can be used in today’s compilers. These options add a variety of warnings and mechanisms that will help with security, including some used in Úlfar’s proposal (CFI and address space layout randomization, aka ASLR). However, these options are all “best effort,” and do not promise any guarantees, even when used all together — including options needing source code changes and those with noticeable overhead. What makes Úlfar’s approach different is that it carefully selects four specific techniques designed to reinforce each other such that they establish guarantees about the nested execution of functions, and the use of heap objects and pointers. Those guarantees eliminate almost all of the specific UB that malware authors rely on, and will hold even when the remaining UB is triggered, e.g., to corrupt memory.

If the language UB white paper could achieve not only its first goal of a broad systematic cataloging and mitigation of UB (grouped into profile/label names that programmers can turn on), but also specifically a “controlled_execution_security” profile that eliminates nearly all remote code execution attacks, that would be a great outcome — and would dramatically reduce C++ software security vulnerability exposure to parity (equality) with the other modern languages.

Summary, and what’s next

As a wise sage once said: “If you choose not to decide, you still have made a choice.”

For many years, software security may not have seemed pressing enough for C++ standardization broadly to make it a top priority, though gradual improvement has always continually occurred. But times have changed; we have been confronted with a spike of cyberattacks and cyberwar that creates serious threats to the systems we rely on to sustain our civilization, and faced stark choices: react decisively? and how? or not? Making a choice was not optional, as the sage pointed out.

We have chosen: to focus on improving C++ language safety as a priority, with the goal of achieving parity (as measured in number of security vulnerabilities) with other modern languages.

We have already accomplished a great deal. Compile-time C++ is already fully free of UB, which means a huge chunk of real-world C++ is already UB-free today. In C++26 we’re already eliminating several frequent vulnerability UB root causes, where in the language uninitialized variables are no longer UB and in the standard library many common operations on widely used types like vector and string and span and string_view are becoming bounds-safe in a C++26 hardened implementation. Although these are new to the standard, all have been deployed at scale in the field, and making them standard will make them easier to adopt even more widely. (In C++26, we are also shipping language contracts for a different aspect of safety, namely functional safety for defensive programming to reduce bugs in general.)

It’s working: The price of zero-day exploits has already increased. Now we have a path to get the rest of the way to taming UB in C++. Yes, there’s still a great deal of work ahead, but if we can make a solid push over the next one to two years we do have a real shot at systematically addressing UB in C++, including eliminating nearly all remote code execution attacks. If these efforts to cage the monsters works out even half as well as we hope, I think a lot of folks are going to be very (and I think happily) surprised.

As several other wise sages said: “Let the good times roll.”

If you’re one of the ones helping with either what’s been accomplished already and/or with our next steps above, we want to again say a big “thank you!” — your help is appreciated, and it really matters.

Thanks, very much.


Appendix: UB, briefly

Historically, UB was allowed in C and then C++ as the basis for compiler optimizations: Compilers are allowed to assume that UB never happens and optimize your program based on that assumption. In the real world, compilers are variously aggressive about making that assumption; for a survey of what common examples the different major compilers actually do optimize in what ways at different optimization levels, see my 2020 paper P2064R0 section 3.4.

We have now been reconsidering UB for two reasons, which to me corresponds to that the UB dragon has multiple heads:

  • UB often has directly safety and security implications. For example, if a program sometimes tries to access out-of-bounds memory, a malicious actor can use that vulnerability to write an exploit that will install malware to steal cryptocurrency or worse.
  • UB also has indirect safety and security implications. For example, if the compiler encounters an if/else branch and notices that one side of the branch would always encounter UB, it can not only assume that branch is never taken, but it can also assume that the condition the branch is testing is always true (or always false) and so not even test it — which is problematic if the branch was doing a deliberately-enabled safety-related contract check that the compiler ends up silently optimizing out of the compiled program so that the check is never performed at all.
  • UB optimizations also just create mysterious ordinary bugs, such as variables that appear to be simultaneously true and false, unreachable code that gets executed anyway, and “time travel” optimizations that change code that precedes the point where the UB can happen (hence, the idea of UB ‘reaching back to modify the past’).

Less of all those things, please. Over the past decade C++ has been pursuing ways to keep all our glorious optimizations but to specify the optimizability in ways other than fire-breathing mind-flaying UB.

Notes:

  • Addressing UB in C++ is easier to do than in C, because C is a fine language but is lower-level with fewer standard abstractions, which means it has fewer universally available alternatives to recommend and fewer standard library features that the standard can directly harden.
  • UB is closely related to another technical concept in the C++ standard called “ill-formed, no diagnostic required” (IF-NDR). For convenience, herein I’m saying just “UB” as a shorthand for “UB and [or, including] IF-NDR.”