S37

The Thermodynamic Endgame for AI

When tokens are free, physics becomes the competitive frontier

We're living through a brief historical anomaly when the cost of intelligence is measured in dollars per token. This won't last. Token prices are collapsing at roughly 10x per year. The trajectory is clear enough that we can see where it leads. It's can't go to zero (you can't do computation for free), but to a floor set by physics rather than economics. When we hit that floor, the entire landscape of artificial intelligence shifts from "who can train the biggest model" to "who can run inference in the most thermodynamically constrained environments."

This will be the age of entropy-constrained computing.

The Landauer Floor

There's a fundamental limit to how cheaply you can process information: Landauer's principle. Erasing one bit of information requires dissipating at least kT ln(2) joules of energy, about 2.8 × 10⁻²¹ joules at room temperature. This is just a physics constraint. The second law of thermodynamics encoded as a tax on computation.

Current AI hardware operates roughly a million times above this theoretical minimum. That gap will close, but it won't disappear. When token costs approach the Landauer limit, three things become competitively decisive:

  1. Joules per inference: How much energy does your reasoning require?

  2. Thermal budget: How much heat can your deployment environment dissipate?

  3. Connectivity independence: Can you operate without talking to a central server?

These will become the new moats.

Inference at the Edge

If inference becomes nearly free at the margin, the constraint shifts to where you can run it. And physics says that everywhere is different.

Consider the deployment scenarios that become possible and necessary when intelligence is cheap but thermodynamics remains expensive:

In each case, the winning architecture avoids "the cloud" to solve it locally, with the joules you have, within the thermal budget available."

The Decentralization Gradient

This creates a natural gradient: as you move away from data centers, intelligence must become more energy and thermally efficient. The center has abundant power and cooling. The edge has neither.

The Architectural Shift

Model architecture matters differently. Not "how accurate does it perform on benchmark X?" but "how much accuracy per watt per degree?" Sparse activation, quantization, and distillation will become first-class design constraints rather than optimization techniques.

Hardware specialization will also have to accelerate. The neuromorphic approaches that seemed academic suddenly matter. Chips designed for specific inference patterns at specific thermal envelopes. The end of the general-purpose GPU as the universal intelligence substrate. Sovereignty becomes physical.

When your AI can run offline on local hardware within local thermal constraints, you don't need to phone home. Data stays where it's generated. Decisions happen where they matter. The "mothership" becomes optional.

I'll be honest here: I'm a decentralization freak, so my bias might be showing through here. But I think the analysis is sharp. So... come at me.

The Architectural Shift

The companies that win this transition need to be thinking about three architectural layers:

The active inference framework becomes particularly relevant here.

Systems that minimize surprise, that maintain accurate models of their environment while expending minimum energy, are exactly what thermodynamically constrained deployment requires. It's not coincidental that the mathematics of efficient inference and the physics of efficient computation point the same direction.

Thermodynamic Investment Thesis

Where does value accrue in this transition?

Custom silicon for edge inference: No big phat NVIDIA GPUs or general-purpose NPUs. Instead chips optimized for specific inference patterns at specific thermal envelopes. The surface area for differentiation is enormous. I think the endgame will include Active Inference algos on thermodynamic substrates, if anyone can actually build it at scale (I'm looking at you, Extropic).

Energy harvesting and management: The limiting factor for deployed intelligence is the power. Better energy harvesting, better power management, better thermal design.

Sparse and efficient model architectures: The research that matters will be parsimonious "smarter models." Models that achieve adequate performance with radically less computation.

Decentralized coordination protocols: How do swarms of edge devices share intelligence without centralized infrastructure? This is distributed systems meets thermodynamics.

The attention economy of AI, including training compute, frontier models, and data center scale is a transition state. The equilibrium is different: intelligence embedded everywhere, constrained by physics rather than economics, sovereign rather than dependent.

When tokens are free, energy and heat is the currency. The winners are whoever can think coldest.