The pilot-to-production gap is an execution problem, not a model problem

What was announced

Through the week of February 9–15, 2026, the enterprise AI deployment story sharpened around a paradox: 95% of generative AI pilots still fail to reach production, yet 42% of enterprises now run agentic AI in production and 72% have agentic systems live in production or pilot. Microsoft’s February enterprise update reframed Copilot from “assistant” to “governance-first agent” capable of completing entire workflows. Oracle introduced Fusion Agentic Applications for finance, supply chain, and HR. OutSystems research released the same week reported that 94% of enterprises adopting agentic AI now flag agent sprawl as a primary concern.

What it means

The two statistics are not in conflict. They describe two different populations of organizations. The 95%-pilot-failure number describes how the average enterprise treats generative AI: a proof-of-concept budget, a small team, and a handoff to operations that never happens. The 42%-in-production number describes a smaller cohort that has done the operational work — governance, identity, runtime monitoring, rollback procedures, and explicit ownership of the agent fleet. The gap between the two cohorts is not technical. It is procedural.

Microsoft’s “governance-first agent” framing acknowledges this directly. The next phase of enterprise AI is not better models. It is the operating discipline around models — who deploys them, who owns them when they misbehave, who pays for the inference, and how the organization rolls back a bad agent without disrupting downstream work. That is a CIO problem, not a CTO problem.

Andreas’s view

My read on this: the production cohort is pulling away from the pilot cohort, and the gap is widening every quarter. The companies in production are accumulating an operational learning curve — what governance looks like, how to staff agent operations, how to track agent behavior in production, how to compose agents into workflows without losing accountability. The companies still iterating on pilots are accumulating learnings about prompts and demos. Those are different skill sets and they compound at different rates.

I don’t think the next 12 months reward the companies that pick the best model. They reward the companies that figured out how to operate any reasonable model at production scale, with controls, with monitoring, and with an explicit chain of accountability when an agent does the wrong thing. Agent sprawl is the leading indicator that the operations layer is missing — when 94% of practitioners flag it as a top concern, the conversation has moved past whether agents work and onto whether they are manageable.

The way I see it: the clearest signal a board can get on where an organization actually stands is whether the CIO can produce a production agent inventory — by name, by owner, by usage volume, by incident count. If the question produces a list, the organization is in the production cohort. If it produces “we are still piloting,” it is in the failure cohort, and the strategic gap to peers will be visible in operating costs by mid-2027.

Three things I’m watching

Three things I’m watching:

I’ll be watching whether companies can produce a named, owned, monitored agent inventory with rollback procedures on demand — that capability is the clearest proxy I have for whether a real agent operating model exists or not.
The organizations that interest me are the ones shifting pilot evaluation from “did the demo work” to “did the agent ship to production with controls in place” — and backing that shift by defunding pilots that stay in demo mode past a fixed time-box.
The question I’d be asking myself is whether a dedicated agent-operations lead — with explicit authority over the production fleet and seniority equivalent to the head of enterprise systems — is in place. Without single ownership, sprawl is the default outcome, and I expect that to show up clearly in incident and cost data over the next several quarters.

References and related signals

OpenAI: the next phase of enterprise AI
BusinessWire: Agentic AI mainstream in enterprise — 94% concerned about sprawl (OutSystems)
Computerworld: Agentic AI — ongoing coverage of its impact on the enterprise
Microsoft’s Copilot-to-governance-first-agents transition
Related signal: Anthropic’s Model Context Protocol crossed 97 million installs in March — production-grade agent infrastructure is consolidating around a small number of standards, which lowers the operational excuse not to ship.

Andreas Timm