Insights

The Compute Bill Comes Due

By Eric Pilkington, Chief Executive and General Manager, UST | Strategy Perspective | June 2026

For three years, enterprise AI strategy has been written as if intelligence were free and infinite. It is neither. The constraint that will define the next planning cycle is not model quality, regulation, or talent. It is supply.

By Eric Pilkington, Chief Executive and General Manager, UST | Strategy Perspective | June 2026

There is a number that should be on the cover of every 2027 operating plan. Between January and March of this year, the weekly volume of tokens routed through OpenRouter, the largest neutral aggregator of large-language-model traffic, rose fourfold. By April, year-over-year growth at the same platform was running seven to eight times. Inside China, MiniMax has been suspending API access. Anthropic experienced a major outage in April under a load it had not forecast. Two of the most severe incidents in the history of public cloud have already occurred in the first half of 2026. The pattern is not a string of unrelated operational stumbles. It is the first visible edge of a compute supply shortage that the industry has been quietly pricing in for eighteen months—and that most enterprise buyers have not yet priced in at all.

The thesis of this piece is narrow and specific: the economics of enterprise AI are about to invert. For three years, the default assumption in every transformation roadmap, every agentic-AI pilot, every “AI-native” product reorg has been that tokens are cheap, getting cheaper, and effectively unrationed. That assumption was correct in 2023 and 2024. It is no longer correct in 2026, and it will not be correct again before 2028 at the earliest. The companies that recognize the inversion early will buy capacity, lock pricing, and re-architect their products around it. The companies that do not will discover—somewhere between the next board meeting and the one after—that their AI strategy has a supply chain, and that they do not control it.

DIVIDER

What Actually Broke

Three things broke at the same time, and their convergence is the story.

The first is demand. The shift from training to inference, which was the consensus capital-allocation thesis of 2025, turned out to be a polite description of what happened. Inference did not replace training. It stacked on top of it. Agentic workloads, in which a single user request fans out into dozens or hundreds of model calls before a result comes back, have rewritten the compute math by an order of magnitude. JPMorgan Asset Management’s April 2026 read on the sector put it directly: a single user with the latest agentic tools can demand 10 to 100 times more compute than the same user with a 2024 chat interface. Coding assistants are the leading edge. OpenRouter’s quadrupling between January and March was driven primarily by code-generation traffic, where every prompt expands into iterative tool calls, file reads, and verification loops. The “tokenmaxxing” culture inside Silicon Valley, in which model providers and application developers openly compete on how many tokens they consume per task, is not vanity. It is what the frontier of capability looks like when reasoning is priced in tokens.

The second is chips. Cumulative orders for Nvidia GPUs through 2027 have surpassed $1 trillion, double the figure from a year ago. Lead times on advanced GPUs and custom silicon now run close to a year. High-bandwidth memory—the chokepoint inside the chokepoint—is sold out across all three suppliers for the entirety of 2026. TSMC, which fabricates the advanced nodes that Nvidia, AMD, and Google all depend on, will not bring its next generation of three-nanometer fabs online until 2027 and 2028. Its 2026 capital expenditure, projected at roughly $56 billion and up 37% year over year, is already the maximum the company can deploy against the demand curve. The bottleneck is not money. It is fab capacity, and fab capacity is governed by physics, lithography, and construction timelines that do not respond to urgency.

The third is everything around the chips. Power generation, transmission, water, land, and the data-center construction pipeline itself have all become binding constraints, in roughly that order. Deloitte’s late-2025 prediction that the industry would need every data center currently in the pipeline, plus every kilowatt-hour required to run them, has aged into a near-term reality. Hyperscalers are no longer competing only on chip allocation. They are competing for substation interconnect queues, gas turbine slots, and 20-year power purchase agreements with utilities and independent producers. The center of gravity of the AI capital stack has moved upstream from silicon to electrons.

The result is a market that looks superficially healthy and is structurally rationed. Cloud providers are still selling AI compute. Model labs are still shipping new releases. Token prices, per unit, continue to fall in headline terms. Underneath, capacity is being allocated by relationship, by contract vintage, and increasingly by who locked in supply in 2024. The publicly visible outages are the metered release of a system operating much closer to its physical limit than the marketing surface admits.

DIVIDER

Why the 2024 Playbook Will Fail

The enterprise AI playbook that every Fortune 500 has been executing against was built on three assumptions, each of which has now broken.

The first assumption was that token cost would fall faster than token consumption would rise. For two years, that was true, and it produced the comfortable internal math in which AI adoption was self-funding because each new use case was cheaper than the last. That math is now wrong, not because per-token prices stopped falling, but because the unit of useful work moved. A 2024 use case consumed a few thousand tokens to draft an email. A 2026 agentic use case consumes a few million to complete a multi-step business process. Per-token deflation of thirty percent against a hundredfold expansion in tokens per task is a cost increase, not a cost decrease—and it is the cost increase that lands in the operating budget.

The second assumption was that capacity was elastic. CIOs treated AI compute the way they treated SaaS seats: order more, pay more, receive more, with at most a procurement-cycle friction point. That assumption is now wrong on a multi-quarter horizon. The hyperscalers that the enterprise depends on are themselves rationed by Nvidia, TSMC, and the power grid. A request for a tenfold increase in inference capacity for a 2027 launch is no longer a commercial conversation. It is a capacity-planning conversation—and in some geographies and some model families, it is a queue.

The third assumption was that the model layer was commoditizing. It is, in capability terms. A capable open-weights model is now within striking distance of a frontier closed model on most enterprise tasks, and the gap is closing. What is not commoditizing—and what 2024 strategy decks systematically underestimate—is the infrastructure layer underneath the model. Pricing power is migrating from the model providers to the providers of scarce inputs: advanced semiconductors, high-bandwidth memory, data-center capacity, and dispatchable power. The strategic implication is the opposite of what most enterprise AI roadmaps assume. The leverage point is not which model you choose. It is whether you have contracted access to the computer it runs on.

DIVIDER

What This Means for the Enterprise

A supply-constrained AI market produces a different set of winners than an abundant one, and the operating implications fall into four buckets.

The first is unit economics. Every active AI use case in the enterprise portfolio needs to be re-underwritten against a realistic 2027 token-cost curve, rather than the 2024 extrapolation most business cases were built on. The use cases that look marginal at four times current consumption and flat unit pricing are not marginal. They are the ones whose economics will be tested first, because they consume the most tokens per dollar of business outcome. Coding copilots, agentic customer service, and multi-step research workflows top this list. The discipline to apply is the one large enterprises learned the hard way with cloud in the mid-2010s: usage is a P&L line, not a footnote, and it needs a Finance owner with the authority to govern it.

The second is sourcing. The procurement model that worked for SaaS does not work for AI compute in a rationed market. Single-vendor dependence on a single hyperscaler or model lab is now a capacity risk, not just a commercial one. The companies that will be operationally resilient through 2027 are those running at least two model providers in production for every business-critical workload, with the ability to route traffic based on price and availability. That capability is not free. It requires an inference gateway, cross-provider observability, and the engineering discipline to keep prompts and tool definitions portable. It is the AI equivalent of multi-cloud, and the rationale is the same: the supplier you cannot leave is the supplier who will eventually price you accordingly.

The third is architecture. In an abundant market, the right answer is almost always to call the largest available frontier model and let it figure out the task. In a rationed market, that answer is wasteful and increasingly expensive. The architectural pattern that wins is graduated compute: small models for routing and triage, mid-tier models for the bulk of structured work, frontier models reserved for the steps that genuinely require frontier reasoning. The same logic applies to context. Pulling a hundred-thousand-token context window for a task that needs five thousand tokens of retrieval is not engineering; it is a transfer of margin to the model provider. The companies running this pattern internally are already seeing two to four times lower inference cost per task at equivalent quality, and they are buying themselves headroom against the supply curve.

The fourth is positioning against suppliers. Most enterprises today treat AI infrastructure the way they treated cloud in 2015: as a utility they buy in arrears. In a supply-constrained market, that is the wrong posture. The companies that secure their 2027 and 2028 compute envelope in 2026 will operate from a different cost base than those that wait. Multi-year compute commitments, reserved capacity, co-location agreements, and in some cases direct power procurement are no longer exotic moves. The hyperscalers are doing them upstream because they have to. The largest enterprise buyers should be doing them downstream for the same reason. Capacity is the asset. Whoever locks it first, prices it best.

DIVIDER

The Decision Before Us

There is a recurring failure mode in enterprise technology cycles. The constraint that ultimately defines the cycle is visible to the suppliers eighteen months before it is visible to the buyers, and the buyers spend those eighteen months optimizing the wrong variable. In 2014, it was cloud capacity. In 2019, it was data engineering talent. In 2026, it is a computer supply.

The signals are already in the operating data. Token volumes are quadrupling per quarter at the busiest aggregator. Memory suppliers are sold out for the year. Frontier Labs is taking outages that they did not forecast. Chinese AI providers are openly rationing access. The trillion dollars of Nvidia orders through 2027 is not a forward-looking bet. It is the supply side telling the demand side what the next two years will cost.

The companies that treat this as a procurement problem will be reorganizing their AI roadmaps in eighteen months under pressure from the capacity they cannot get and prices they did not budget for. The companies that treat it as a strategy problem will spend the next two quarters doing four things: re-underwriting every use case against a realistic compute curve, building multi-provider inference into the production stack, redesigning workloads for graduated compute, and locking multi-year capacity at 2026 prices.

The question for the executive committee is not whether AI will deliver. It will. The question is who pays for the supply crunch that gets us there—and whether your operating model is built to absorb it or to be absorbed by it.

formId
7e9cb740-6027-49a3-b9de-37c112daede2
portalId
6761677
name
Contact us