How to Optimize your Python Program for Slowness

Write a short program that finishes after the universe dies The post How to Optimize your Python Program for Slowness appeared first on Towards Data Science.

Apr 8, 2025 - 02:39

Also available: A Rust version of this article.

Everyone talks about making Python programs faster [1, 2, 3], but what if we pursue the opposite goal? Let’s explore how to make them slower — absurdly slower. Along the way, we’ll examine the nature of computation, the role of memory, and the scale of unimaginably large numbers.

Our guiding challenge: write short Python programs that run for an extraordinarily long time.

To do this, we’ll explore a sequence of rule sets — each one defining what kind of programs we’re allowed to write, by placing constraints on halting, memory, and program state. This sequence isn’t a progression, but a series of shifts in perspective. Each rule set helps reveal something different about how simple code can stretch time.

Here are the rule sets we’ll investigate:

Anything Goes — Infinite Loop
Must Halt, Finite Memory — Nested, Fixed-Range Loops
Infinite, Zero-Initialized Memory — 5-State Turing Machine
Infinite, Zero-Initialized Memory — 6-State Turing Machine (>10↑↑15 steps)
Infinite, Zero-Initialized Memory — Plain Python (compute 10↑↑15 without Turing machine emulation)

Aside: 10↑↑15 is not a typo or a double exponent. It’s a number so large that “exponential” and “astronomical” don’t describe it. We’ll define it in Rule Set 4.

We start with the most permissive rule set. From there, we’ll change the rules step by step to see how different constraints shape what long-running programs look like — and what they can teach us.

Rule Set 1: Anything Goes — Infinite Loop

We begin with the most permissive rules: the program doesn’t need to halt, can use unlimited memory, and can contain arbitrary code.

If our only goal is to run forever, the solution is immediate:

while True:
  pass

This program is short, uses negligible memory, and never finishes. It satisfies the challenge in the most literal way — by doing nothing forever.

Of course, it’s not interesting — it does nothing. But it gives us a baseline: if we remove all constraints, infinite runtime is trivial. In the next rule set, we’ll introduce our first constraint: the program must eventually halt. Let’s see how far we can stretch the running time under that new requirement — using only finite memory.

Rule Set 2: Must Halt, Finite Memory — Nested, Fixed-Range Loops

If we want a program that runs longer than the universe will survive and then halts, it’s easy. Just write two nested loops, each counting over a fixed range from 0 to 10¹⁰⁰−1:

for a in range(10**100):
  for b in range(10**100):
      if b % 10_000_000 == 0:
          print(f"{a:,}, {b:,}")

You can see that this program halts after 10¹⁰⁰ × 10¹⁰⁰ steps. That’s 10²⁰⁰. And — ignoring the print—this program uses only a small amount of memory to hold its two integer loop variables—just 144 bytes.

My desktop computer runs this program at about 14 million steps per second. But suppose it could run at Planck speed (the smallest meaningful unit of time in physics). That would be about 10⁵⁰ steps per year — so 10¹⁵⁰ years to complete.

Current cosmological models estimate the heat death of the universe in 10¹⁰⁰ years, so our program will run about 100,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 times longer than the projected lifetime of the universe.

Aside: Practical concerns about running a program beyond the end of the universe are outside the scope of this article.

For an added margin, we can use more memory. Instead of 144 bytes for variables, let’s use 64 gigabytes — about what you’d find in a well-equipped personal computer. That’s about 500 million times more memory, which gives us about one billion variables instead of 2. If each variable iterates over the full 10¹⁰⁰ range, the total number of steps becomes roughly 10¹⁰⁰^(10⁹), or about 10^(100 billion) steps. At Planck speed — roughly 10⁵⁰ steps per year — that corresponds to 10^(100 billion − 50) years of computation.

Can we do better? Well, if we allow an unrealistic but interesting rule change, we can do much, much better.

Rule Set 3: Infinite, Zero-Initialized Memory — 5-State Turing Machine

What if we allow infinite memory — so long as it starts out entirely zero?

Aside: Why don’t we allow infinite, arbitrarily initialized memory? Because it trivializes the challenge. For example, you could mark a single byte far out in memory with a 0x01—say, at position 10¹²⁰—and write a tiny program that just scans until it finds it. That program would take an absurdly long time to run — but it wouldn’t be interesting. The slowness is baked into the data, not the code. We’re after something deeper: small programs that generate their own long runtimes from simple, uniform starting conditions.

My first idea was to use the memory to count upward in binary:

We can do that — but how do we know when to stop? If we don’t stop, we’re violating the “must halt” rule. So, what else can we try?

Let’s take inspiration from the father of Computer Science, Alan Turing. We’ll program a simple abstract machine — now known as a Turing machine — under the following constraints:

The machine has infinite memory, laid out as a tape that extends endlessly in both directions. Each cell on the tape holds a single bit: 0 or 1.
A read/write head moves across the tape. On each step, it reads the current bit, writes a new bit (0 or 1), and moves one cell left or right.

A read/write head positioned on an infinite tape.

The machine also has an internal variable called state, which can hold one of n values. For example, with 5 states, we might name the possible values A, B, C, D, and E—plus a special halting state H, which we don’t count among the five. The machine always starts in the first state, A.

We can express a full Turing machine program as a transition table. Here’s an example we’ll walk through step by step.

A 5-state Turing machine transition table.

Each row corresponds to the current tape value (0 or 1).
Each column corresponds to the current state (A through E).
Each entry in the table tells the machine what to do next:
- The first character is the bit to write (0 or 1)
- The second is the direction to move (L for left, R for right)
- The third is the next state to enter (A, B, C, D, E, or H, where H is the special halting state).

Now that we’ve defined the machine, let’s see how it behaves over time.

We’ll refer to each moment in time — the full configuration of the machine and tape — as a step. This includes the current tape contents, the head position, and the machine’s internal state (like A, B, or H).

Below is Step 0. The head is pointing to a 0 on the tape, and the machine is in state A.

Looking at row 0, column A in the program table, we find the instruction 1RB. That means:

Write 1 to the current tape cell.
Move the head Right.
Enter state B.

Step 0:

This puts us in Step 1:

The machine is now in state B, pointing at the next tape cell (again 0).

What will happen if we let this Turing machine keep running? It will run for exactly 47,176,870 steps — and then halt.

Aside: With a Google sign in, you can run this yourself via a Python notebook on Google Colab. Alternatively, you can copy and run the notebook locally on your own computer by downloading it from GitHub.

That number 47,176,870 is astonishing on its own, but seeing the full run makes it more tangible. We can visualize the execution using a space-time diagram, where each row shows the tape at a single step, from top (earliest) to bottom (latest). In the image:

The first row is blank — it shows the all-zero tape before the machine takes its first step.
1s are shown in orange.
0s are shown in white.
Light orange appears where 0s and 1s are so close together they blend.

Space-time diagram for the champion 5-state Turing machine. It runs for 47,176,870 steps before halting. Each row shows the tape at a single step, starting from the top. Orange represents 1, white represents 0.

In 2023, an online group of amateur researchers organized through bbchallenge.org proved that this is the longest-running 5-state Turing machine that eventually halts.

Want to see this Turing machine in motion? You can watch the full 47-million-step execution unfold in this pixel-perfect video:

Or interact with it directly using the Busy Beaver Blaze web app.

The video generator and web app are part of busy-beaver-blaze, the open-source Python & Rust project that accompanies this article.

It’s hard to believe that such a small machine can run 47 million steps and still halt. But it gets even more astonishing: the team at bbchallenge.org found a 6-state machine with a runtime so long it can’t even be written with ordinary exponents.

Rule Set 4: Infinite, Zero-Initialized Memory — 6-State Turing Machine (>10↑↑15 steps)

As of this writing, the longest running (but still halting) 6-state Turing machine known to humankind is:

A   B   C   D   E   F
0   1RB 1RC 1LC 0LE 1LF 0RC
1   0LD 0RF 1LA 1RH 0RB 0RE

Here is a video showing its first 10 trillion steps:

And here you can run it interactively via a web app.

So, if we are patient — comically patient — how long will this Turing machine run? More than 10↑↑15 where “10 ↑↑ 15” means:

This is not the same as 10¹⁵ (which is just a regular exponent). Instead:

10¹ = 10
10¹⁰ = 10,000,000,000
10^10^10 is 10¹⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰, already unimaginably large.
10↑↑4 is so large that it vastly exceeds the number of atoms in the observable universe.
10↑↑15 is so large that writing it in exponent notation becomes annoying.

Pavel Kropitz announced this 6-state machine on May 30, 2022. Shawn Ligocki has a great write up explaining both his and Pavel’s discoveries. To prove that these machines run so long and then halt, researchers used a mix of analysis and automated tools. Rather than simulating every step, they identified repeating structures and patterns that could be proven — using formal, machine-verified proofs — to eventually lead to halting.

Up to this point, we’ve been talking about Turing machines — specifically, the longest-known 5- and 6-state machines that eventually halt. We ran the 5-state champion to completion and watched visualizations to explore its behavior. But the discovery that it’s the longest halting machine with 5 states — and the identification of the 6-state contender — came from extensive research and formal proofs, not from running them step-by-step.

That said, the Turing machine interpreter I built in Python can run for millions of steps, and the visualizer written in Rust can handle trillions (see GitHub). But even 10 trillion steps isn’t an atom in a drop of water in the ocean compared to the full runtime of the 6-state machine. And running it that far doesn’t get us any closer to understanding why it runs so long.

Aside: Python and Rust “interpreted” the Turing machines up to some point — reading their transition tables and applying the rules step by step. You could also say they “emulated” them, in that they reproduced their behavior exactly. I avoid the word “simulated”: a simulated elephant isn’t an elephant, but a simulated computer is a computer.

Returning to our central challenge: we want to understand what makes a short program run for a long time. Instead of analyzing these Turing machines, let’s construct a Python program whose 10↑↑15 runtime is clear by design.

Rule Set 5: Infinite, Zero-Initialized Memory — Plain Python (compute 10↑↑15 without Turing machine emulation)

Our challenge is to write a small Python program that runs for at least 10↑↑15 steps, using any amount of zero-initialized memory.

To achieve this, we’ll compute the value of 10↑↑15 in a way that guarantees the program takes at least that many steps. The ↑↑ operator is called tetration—recall from Rule Set 4 that ↑↑ stacks exponents: for example, 10↑↑3 means 10^(10^10). It’s an extremely fast-growing function. We will program it from the ground up.

Rather than rely on built-in operators, we’ll define tetration from first principles:

Tetration, implemented by the function tetrate, as repeated exponentiation
Exponentiation, via exponentiate, as repeated multiplication
Multiplication, via multiply, as repeated addition
Addition, via add, as repeated increment

Each layer builds on the one below it, using only zero-initialized memory and in-place updates.

We’ll begin at the foundation — with the simplest operation of all: increment.

Increment

Here’s our definition of increment and an example of its use:

from gmpy2 import xmpz

def increment(acc_increment):
  assert is_valid_accumulator(acc_increment), "not a valid accumulator"
  acc_increment += 1

def is_valid_accumulator(acc):
  return isinstance(acc, xmpz) and acc >= 0  

b = xmpz(4)
print(f"++{b} = ", end="")
increment(b)
print(b)
assert b == 5

Output:

++4 = 5

We’re using xmpz, a mutable arbitrary-precision integer type provided by the gmpy2 library. It behaves like Python’s built-in int in terms of numeric range—limited only by memory—but unlike int, it supports in-place updates.

To stay true to the spirit of a Turing machine and to keep the logic minimal and observable, we restrict ourselves to just a few operations:

Creating an integer with value 0 (xmpz(0))
In-place increment (+= 1) and decrement (-= 1)
Comparing with zero

All arithmetic is done in-place, with no copies and no temporary values. Each function in our computation chain modifies an accumulator directly. Most functions also take an input value a, but increment—being the most basic—does not. We use descriptive names like increment_acc, add_acc, and so on to make the operation clear and to support later functions where multiple accumulators will appear together.

Aside: Why not use Python’s built-in int type? It supports arbitrary precision and can grow as large as your memory allows. But it’s also immutable, meaning any update like += 1 creates a new integer object. Even if you think you’re modifying a large number in place, Python is actually copying all of its internal memory—no matter how big it is.
For example:

x = 10**100
y = x
x += 1
assert x == 10**100 + 1 and y == 10**100

Even though x and y start out identical, x += 1 creates a new object—leaving y unchanged. This behavior is fine for small numbers, but it violates our rules about memory use and in-place updates. That’s why we use gmpy2.xmpz, a mutable arbitrary-precision integer that truly supports efficient, in-place changes.

Addition

With increment defined, we next define addition as repeated incrementing.

def add(a, add_acc):
  assert is_valid_other(a), "not a valid other"
  assert is_valid_accumulator(add_acc), "not a valid accumulator"
  for _ in range(a):
      add_acc += 1

def is_valid_other(a):
  return isinstance(a, int) and a >= 0      

a = 2
b = xmpz(4)
print(f"Before: id(b) = {id(b)}")
print(f"{a} + {b} = ", end="")
add(a, b)
print(b)
print(f"After:  id(b) = {id(b)}")  # ← compare object IDs
assert b == 6

Output:

Before: id(b) = 2082778466064
2 + 4 = 6
After:  id(b) = 2082778466064

The function adds a to add_acc by incrementing add_acc one step at a time, a times. The before and after ids are the same, showing that no new object was created—add_acc was truly updated in place.

Aside: You might wonder why add doesn’t just call our increment function. We could write it that way—but we’re deliberately inlining each level by hand. This keeps all loops visible, makes control flow explicit, and helps us reason precisely about how much work each function performs.

Even though gmpy2.xmpz supports direct addition, we don’t use it. We’re working at the most primitive level possible—incrementing by 1—to keep the logic simple, intentionally slow, and to make the amount of work explicit.

As with increment_acc, we update add_acc in place, with no copying or temporary values. The only operation we use is += 1, repeated a times.

Next, we define multiplication.

Multiplication

With addition in place, we can now define multiplication as repeated addition. Here’s the function and example usage. Unlike add and increment, this one builds up a new xmpz value from zero and returns it.

def multiply(a, multiply_acc):
  assert is_valid_other(a), "not a valid other"
  assert is_valid_accumulator(multiply_acc), "not a valid accumulator"

  add_acc = xmpz(0)
  for _ in count_down(multiply_acc):
      for _ in range(a):
          add_acc += 1
  return add_acc

def count_down(acc):
  assert is_valid_accumulator(acc), "not a valid accumulator"
  while acc > 0:
      acc -= 1
      yield

a = 2
b = xmpz(4)
print(f"{a} * {b} = ", end="")
c = multiply(a, b)
print(c)
assert c == 8
assert b == 0

Output:

2 * 4 = 8

This multiplies a by the value of multiply_acc by adding a to add_acc once for every time multiply_acc can be decremented. The result is returned and then assigned to c. The original multiply_acc is decremented to zero and consumed in the process.

You might wonder what this line does:

for _ in count_down(multiply_acc):

While xmpz technically works with range(), doing so converts it to a standard Python int, which is immutable. That triggers a full copy of its internal memory—an expensive operation for large values. Worse, each decrement step would involve allocating a new integer and copying all previous bits, so what should be a linear loop ends up doing quadratic total work. Our custom count_down() avoids all that by decrementing in place, yielding control without copying, and maintaining predictable memory use.

We’ve built multiplication from repeated addition. Now it’s time to go a layer further: exponentiation.

Exponentiation

We define exponentiation as repeated multiplication. As before, we perform all work using only incrementing, decrementing, and in-place memory. As with multiply, the final result is returned while the input accumulator is consumed.

Here’s the function and example usage:

def exponentiate(a, exponentiate_acc):
  assert is_valid_other(a), "not a valid other"
  assert is_valid_accumulator(exponentiate_acc), "not a valid accumulator"
  assert a > 0 or exponentiate_acc != 0, "0^0 is undefined"

  multiply_acc = xmpz(0)
  multiply_acc += 1
  for _ in count_down(exponentiate_acc):
      add_acc = xmpz(0)
      for _ in count_down(multiply_acc):
          for _ in range(a):
              add_acc += 1
      multiply_acc = add_acc
  return multiply_acc


a = 2
b = xmpz(4)
print(f"{a}^{b} = ", end="")
c = exponentiate(a, b)
print(c)
assert c == 16
assert b == 0

Output:

2^4 = 16

This raises a to the power of exponentiate_acc, using only incrementing, decrementing, and loop control. We initialize multiply_acc to 1 with a single increment—because repeatedly multiplying from zero would get us nowhere. Then, for each time exponentiate_acc can be decremented, we multiply the current result (multiply_acc) by a. As with the earlier layers, we inline the multiply logic directly instead of calling the multiply function—so the control flow and step count stay fully visible.

Aside: And how many times is += 1 called? Obviously at least 2⁴ times—because our result is 2⁴, and we reach it by incrementing from zero. More precisely, the number of increments is:
• 1 increment — initializing multiply_acc to one
Then we loop four times, and in each loop, we multiply the current value of multiply_acc by a = 2, using repeated addition:

• 2 increments — for multiply_acc = 1, add 2 once
• 4 increments — for multiply_acc = 2, add 2 twice
• 8 increments — for multiply_acc = 4, add 2 four times
• 16 increments — for multiply_acc = 8, add 2 eight times

That’s a total of 1 + 2 + 4 + 8 + 16 = 31 increments, which is 2⁵-1. In general, the number of calls to increment will be exponential, but the number is not the same exponential that we are computing.

With exponentiation defined, we’re ready for the top of our tower: tetration.

Tetration

Here’s the function and example usage:

def tetrate(a, tetrate_acc):
  assert is_valid_other(a), "not a valid other"
  assert is_valid_accumulator(tetrate_acc), "not a valid accumulator"
  assert a > 0, "we don't define 0↑↑b"

  exponentiate_acc = xmpz(0)
  exponentiate_acc += 1
  for _ in count_down(tetrate_acc):
      multiply_acc = xmpz(0)
      multiply_acc += 1
      for _ in count_down(exponentiate_acc):
          add_acc = xmpz(0)
          for _ in count_down(multiply_acc):
              for _ in range(a):
                  add_acc += 1
          multiply_acc = add_acc
      exponentiate_acc = multiply_acc
  return exponentiate_acc


a = 2
b = xmpz(3)
print(f"{a}↑↑{b} = ", end="")
c = tetrate(a, b)
print(c)
assert c == 16  # 2^(2^2)
assert b == 0   # Confirm tetrate_acc is consumed

Output:

2↑↑3 = 16

This computes a ↑↑ tetrate_acc, meaning it exponentiates a by itself repeatedly, tetrate_acc times.

For each decrement of tetrate_acc, we exponentiate the current value. We in-line the entire exponentiate and multiply logic again, all the way down to repeated increments.

As expected, this computes 2^(2^2) = 16. With a Google sign-in, you can run this yourself via a Python notebook on Google Colab. Alternatively, you can copy the notebook from GitHub and then run it on your own computer.

We can also run tetrate on 10↑↑15. It will start running, but it won’t stop during our lifetimes — or even the lifetime of the universe:

a = 10
b = xmpz(15)
print(f"{a}↑↑{b} = ", end="")
c = tetrate(a, b)
print(c)

Let’s compare this tetrate function to what we found in the previous Rule Sets.

Rule Set 1: Anything Goes — Infinite Loop

Recall our first function:

while True:
  pass

Unlike this infinite loop, our tetrate function eventually halts — though not anytime soon.

Rule Set 2: Must Halt, Finite Memory — Nested, Fixed-Range Loops

Recall our second function:

for a in range(10**100):
  for b in range(10**100):
      if b % 10_000_000 == 0:
          print(f"{a:,}, {b:,}")

Both this function and our tetrate function contain a fixed number of nested loops. But tetrate differs in an important way: the number of loop iterations grows with the input value. In this function, in contrast, each loop runs from 0 to 10¹⁰⁰-1—a hardcoded bound. In contrast, tetrate’s loop bounds are dynamic — they grow explosively with each layer of computation.

Rule Sets 3 & 4: Infinite, Zero-Initialized Memory — 5- and 6-State Turing Machines

Compared to the Turing machines, our tetrate function has a clear advantage: we can directly see that it will call += 1 more than 10↑↑15 times. Even better, we can also see — by construction — that it halts.

What the Turing machines offer instead is a simpler, more universal model of computation — and perhaps a more principled definition of what counts as a “small program.”

Conclusion

So, there you have it — a journey through writing absurdly slow programs. Along the way, we explored the outer edges of computation, memory, and performance, using everything from deeply nested loops to Turing machines to a hand-inlined tetration function.

Here’s what surprised me:

Nested loops are enough.
If you just want a short program that halts after outliving the universe, two nested loops with 144 bytes of memory will do the job. I hadn’t realized it was that simple.
Turing machines escalate fast.
The jump from 5 to 6 states unleashes a dramatic leap in complexity and runtime. Also, the importance of starting with zero-initialized memory is obvious in retrospect — but it wasn’t something I’d considered before.
Python’s int type can kill performance
Yes, Python integers are arbitrary precision, which is great. But they’re also immutable. That means every time you do something like x += 1, Python silently allocates a brand-new integer object—copying all the memory of x, no matter how big it is. It feels in-place, but it’s not. This behavior turns efficient-looking code into a performance trap when working with large values. To get around this, we use the gmpy2.xmpz type—a mutable, arbitrary-precision integer that allows true in-place updates.
There’s something beyond exponentiation — and it’s called tetration.
I didn’t know this. I wasn’t familiar with the ↑↑ notation or the idea that exponentiation could itself be iterated to form something even faster-growing. It was surprising to learn how compactly it can express numbers that are otherwise unthinkably large.
And because I know you’re asking — yes, there’s something beyond tetration too. It’s called pentation, then hexation, and so on. These are part of a whole hierarchy known as hyperoperations. There’s even a metageneralization: systems like the Ackermann function and fast-growing hierarchies capture entire families of these functions and more.
Writing Tetration with Explicit Loops Was Eye-Opening
I already knew that exponentiation is repeated multiplication, and so on. I also knew this could be written recursively. What I hadn’t seen was how cleanly it could be written as nested loops, without copying values and with strict in-place updates.

Thank you for joining me on this journey. I hope you now have a clearer understanding of how small Python programs can run for an astonishingly long time — and what that reveals about computation, memory, and minimal systems. We’ve seen programs that halt only after the universe dies, and others that run even longer.

All code from this article is available in an open-source GitHub repository.

Please follow Carl on Towards Data Science and on @carlkadie.bsky.social. I write on scientific programming in Python and Rust, machine learning, and statistics. I tend to write about one article per month.

The post How to Optimize your Python Program for Slowness appeared first on Towards Data Science.