,

·

MIT Called It a Disenchanted Intern. METR Says Check the Growth Rate.

Something happened this week that I keep turning over.

MIT published findings this month showing that when 41 AI models were tested across more than 11,000 real workplace tasks, the result was, in their words, like a “disenchanted intern” — hitting minimum benchmarks about 65% of the time, but never exceeding 50% success on tasks requiring genuinely superior-quality output. If you work in software, marketing, legal services, or knowledge work of any kind, that’s the snapshot.

METR — a nonprofit focused on measuring AI capabilities — published a different kind of snapshot. Their metric is the “time horizon”: the maximum length of autonomous task a frontier AI can reliably complete. In 2019, the best AI could handle roughly a two-minute task without human intervention. By the end of 2025, that had grown to roughly an hour. The doubling time across that whole period: around seven months.

METR’s January 2026 update tightened that number further. Post-2023, the best estimate for the doubling period is now 130 days — closer to four months.

My read on this:

The MIT study and the METR data aren’t in conflict. They’re measuring different things at different timescales. MIT is taking a photograph. METR is measuring the shutter speed. And the shutter speed is getting faster.

I don’t think the “disenchanted intern” framing is wrong — it describes today accurately. What I’m less sure about is the assumption, implicit in most of the coverage I’ve read this week, that “today” is a stable state. An intern who gets twice as capable every four months is not the same resource at the end of the year as they are today.

What I keep returning to is the gap between the current snapshot and the trajectory — and the opportunity that opens up in that gap. The MIT data is a photograph of now. The METR data is the shutter speed. Anyone building workflows, designing teams, or structuring how they work around AI capability today is working from a reference point that will be measurably out of date within a single planning cycle. That’s an opportunity signal at a scale and pace most planning assumptions don’t account for.

Three things I’m watching:

1. Where the doubling curve hits friction. Every exponential eventually meets a wall — physical limits, data constraints, regulatory friction. METR’s time-horizon metric is useful precisely because it measures real-world task completion, not synthetic benchmark scores. When the doubling cadence breaks, that will be the signal that the curve has met something real. I expect that to happen. I just don’t know when.

2. Whether “minimally sufficient” matters or not. MIT’s 65% minimally sufficient rate sounds modest. But most enterprise workflows run on people who are minimally sufficient most of the time. The threshold isn’t excellence — it’s “acceptable at scale, around the clock, at near-zero marginal cost.” That bar is lower than it sounds, and closer than the headline number implies.

3. The infrastructure spend as an access unlock. Alphabet, Meta, Microsoft, and Amazon are projected to spend nearly $700 billion combined on AI infrastructure in 2026 — roughly double what they spent last year. That capital isn’t just building capacity for the current snapshot. It’s funding the cost compression that makes the next several capability doublings broadly accessible. When the infrastructure matures, the cost floor drops — and the surface area for building on top of it expands with it.

The disenchanted intern framing is apt today. My expectation is that it’s a better description of 2025 than it is of 2027.

References

— Andreas