Why AI changes the economics of computer memory

2025 · last edited June 2026

Pre-AI, computer memory was usually bought to clear a bottleneck. Post-AI, memory increasingly determines what the machine can be.

For most of computing history, memory mattered up to a threshold. A program had a working set: the data and code it needed in fast memory during execution. If that working set fit, the program ran well. If it did not, the machine fetched from slower storage and performance suffered. Beyond that threshold, additional memory mostly bought headroom: a database query that no longer spilled to disk, say. It rarely changed the kind of work the machine could do.

AI is beginning to change this. The shift is not yet complete, and present-day mechanisms impose many local limits.[1] Still, the direction matters. By memory here I mean high-capacity, high-performance memory: large, fast active state the processor operates over directly, not slow bulk storage. As AI systems become more general, that memory behaves less like a level you reach and then stop thinking about, and more like a dimension along which capability continues to scale.

A useful lens is at the level of marginal value. In the era of fixed-purpose software, the value of an additional gigabyte was lumpy: large up to the working-set threshold, small beyond it. For a thinking system the curve is different. A model that can keep more state active, can carry more context across turns, hold more retrieved evidence in attention, sustain longer chains of reasoning, and track more parallel lines of inquiry. The task expands to absorb whatever memory it can use. There is no point at which a thinking system simply has enough: each added gigabyte of capacity is worth a little less than the last, but, over the range where more still helps rather than distracts, never nothing. So the buyer keeps adding memory as long as the next gigabyte is worth more than it costs. And because that worth no longer collapses past a threshold, steadily cheaper capacity keeps demand climbing rather than letting it hit a wall. The question itself changes: not how much memory the workload requires, but what larger class of problems becomes thinkable when more of the world can stay resident at once.

This isn’t the kind of scaling law people talk about for training today. It’s a claim about a regime. As machines shift from running fixed programs to open-ended reasoning, high-capacity memory starts to behave the way compute and data do in those laws, becoming a resource that keeps buying capability across a wide range. The claim is not that every gigabyte buys the same capability, nor that adequate hardware automatically becomes intelligent; only that, over that range, memory keeps converting into more powerful computation. For fixed-purpose software, memory mostly helped a known computation fit. For AI, memory increasingly helps define the computation itself.

To see where this leads, consider an AI scientist a decade from now working on a hard problem: protein design, climate dynamics, new materials. Given modest usable memory, the system must attend to one paper, one simulation, one cluster of hypotheses at a time, much as a human researcher does. Given vastly more, it can keep the live frontier of a research program resident in active state: the relevant literature, the current hypotheses and their dependencies, the in-progress simulations and their failure modes, the experimental constraints, the historical record of what has been tried. Because everything is available at once, the system can notice connections that would otherwise be lost: that an experiment a graduate student ran in 2007 quietly contradicts a result published yesterday, and that the contradiction bears on an experiment underway now. What emerges is a kind of cognition no human institution currently sustains: everything bearing on the problem, held in view at once.

The human analogy is useful here. A person with more working memory can hold more variables active, compare more ideas at once, and sustain a longer thread of reasoning before things slip. Working memory capacity is the strongest known predictor of fluid intelligence in humans. Machines face a structurally similar constraint, even if they do not share working memory in any biological sense. Thinking, in large part, is spotting how things connect; the more a system can hold active and actually attend to at once, the more connections it can find. That is what a larger workspace buys, and why the size of the active workspace tends to set the size of the thought: how much can be held in relation in a single step of reasoning.

For fixed-purpose computers, memory demand is bounded by the workload. For general thinking machines, the workload is bounded by memory. The marginal value of a gigabyte no longer plateaus. So, increasingly, more memory means a larger space of thought.

Notes

[1] These limits live in the concrete forms active state takes today: the parameters of trained models, the KV caches that hold attention during inference, retrieval that brings documents into the context window, and the bandwidth and latency of the memory hierarchy feeding the accelerator. Each has its own ceiling: context length, retrieval quality, cost, energy. The directional claim is about what becomes possible as systems grow more general; it does not assume any current architecture already shows the full pattern.