Blog

  • The pilot-to-production gap is an execution problem, not a model problem

    The pilot-to-production gap is an execution problem, not a model problem

    What was announced

    Through the week of February 9–15, 2026, the enterprise AI deployment story sharpened around a paradox: 95% of generative AI pilots still fail to reach production, yet 42% of enterprises now run agentic AI in production and 72% have agentic systems live in production or pilot. Microsoft’s February enterprise update reframed Copilot from “assistant” to “governance-first agent” capable of completing entire workflows. Oracle introduced Fusion Agentic Applications for finance, supply chain, and HR. OutSystems research released the same week reported that 94% of enterprises adopting agentic AI now flag agent sprawl as a primary concern.

    What it means

    The two statistics are not in conflict. They describe two different populations of organizations. The 95%-pilot-failure number describes how the average enterprise treats generative AI: a proof-of-concept budget, a small team, and a handoff to operations that never happens. The 42%-in-production number describes a smaller cohort that has done the operational work — governance, identity, runtime monitoring, rollback procedures, and explicit ownership of the agent fleet. The gap between the two cohorts is not technical. It is procedural.

    Microsoft’s “governance-first agent” framing acknowledges this directly. The next phase of enterprise AI is not better models. It is the operating discipline around models — who deploys them, who owns them when they misbehave, who pays for the inference, and how the organization rolls back a bad agent without disrupting downstream work. That is a CIO problem, not a CTO problem.

    Andreas’s view

    My read on this: the production cohort is pulling away from the pilot cohort, and the gap is widening every quarter. The companies in production are accumulating an operational learning curve — what governance looks like, how to staff agent operations, how to track agent behavior in production, how to compose agents into workflows without losing accountability. The companies still iterating on pilots are accumulating learnings about prompts and demos. Those are different skill sets and they compound at different rates.

    I don’t think the next 12 months reward the companies that pick the best model. They reward the companies that figured out how to operate any reasonable model at production scale, with controls, with monitoring, and with an explicit chain of accountability when an agent does the wrong thing. Agent sprawl is the leading indicator that the operations layer is missing — when 94% of practitioners flag it as a top concern, the conversation has moved past whether agents work and onto whether they are manageable.

    The way I see it: the clearest signal a board can get on where an organization actually stands is whether the CIO can produce a production agent inventory — by name, by owner, by usage volume, by incident count. If the question produces a list, the organization is in the production cohort. If it produces “we are still piloting,” it is in the failure cohort, and the strategic gap to peers will be visible in operating costs by mid-2027.

    Three things I’m watching

    Three things I’m watching:

    1. I’ll be watching whether companies can produce a named, owned, monitored agent inventory with rollback procedures on demand — that capability is the clearest proxy I have for whether a real agent operating model exists or not.
    2. The organizations that interest me are the ones shifting pilot evaluation from “did the demo work” to “did the agent ship to production with controls in place” — and backing that shift by defunding pilots that stay in demo mode past a fixed time-box.
    3. The question I’d be asking myself is whether a dedicated agent-operations lead — with explicit authority over the production fleet and seniority equivalent to the head of enterprise systems — is in place. Without single ownership, sprawl is the default outcome, and I expect that to show up clearly in incident and cost data over the next several quarters.

    References and related signals

  • Hyperscaler 2026 capex hits ~$700B. Free cash flow is the variable that breaks.

    Hyperscaler 2026 capex hits ~$700B. Free cash flow is the variable that breaks.

    What was announced

    On February 6, CNBC reported that combined 2026 AI capex commitments across Amazon, Google, Microsoft, and Meta now approach $700 billion. Amazon: roughly $200 billion. Alphabet: up to $185 billion. Microsoft: increase from prior 2025 levels (analyst consensus near $99 billion FY26, ending June). Meta: budgeted $115–135 billion. Approximately 75% of the spend is AI-related — call it $450 billion of AI infrastructure in a single year, up about 36% versus 2025. Free cash flow projections for the same set of companies show meaningful compression; Amazon is forecast to turn negative, with analyst projections of negative free cash flow between $17 billion and $28 billion in 2026.

    What it means

    Capex of this magnitude rewrites the financial model for the entire frontier compute stack. The hyperscalers are no longer building toward a near-term revenue profile — they are building toward a 5-to-7-year usage curve they believe is coming. That is a different posture than the 2018–2022 capex cycle, which was largely demand-led. This one is conviction-led, and the conviction is asymmetric: if AI compute demand materializes at the projected rate, today’s capex looks conservative; if it lags by even 18 months, the depreciation schedule eats free cash flow at a rate the public markets have not yet priced.

    A second-order effect matters more for non-hyperscalers: every CIO planning AI infrastructure in 2026 is now negotiating against a supplier base whose capacity is partially already absorbed by internal hyperscaler workloads. Pricing power for capacity is structurally higher, lead times for premium GPU instances are longer, and the cost-per-token of frontier inference will move on hyperscaler margin compression rather than competition.

    Andreas’s view

    My read on this: $700 billion is not a number that resolves itself by spreadsheet logic. It resolves itself by which hyperscaler is willing to absorb the cash-flow hit longest. The strategic question inside each company is no longer “should we build” but “which competitor blinks first when the free-cash-flow line turns red on quarterly reporting.” Amazon is closest to that line. Microsoft has the strongest cash position to absorb it. Google sits in between. Meta has the most flexibility because its core ad business is funding the AI infrastructure with the lightest accounting drag.

    I don’t think the capex commitment will be revised down materially in 2026. The competitive cost of unilaterally easing off — handing GPU capacity, customer relationships, and the model-training cadence to a competitor — is too high. What will happen instead is creative financing: more debt, more partnerships with sovereign wealth and infrastructure funds, more long-term capacity contracts that move spend off the balance sheet. The capex will continue. The accounting around it will get more interesting.

    The way I see it, adjacent businesses should not assume the capacity they need will be available at the price they modeled. My expectation is that premium-tier inference and training capacity will be priced as a scarce resource for the rest of 2026 and most of 2027. Any AI roadmap that depends on flat or declining unit costs over that window has a hidden assumption built in that I think is unlikely to hold.

    Three things I’m watching

    1. I’ll be watching whether companies move to lock multi-year capacity contracts for premium inference and training now, or wait — because negotiating against scarcity in 2027 will be more expensive than over-committing modestly in 2026.
    2. The companies that preserve optionality will be the ones that have stress-tested their AI cost models against a scenario where frontier-tier compute prices are flat or rising for 18 months — and redesigned the workflow, not the budget, when the unit economics broke.
    3. Hyperscaler free-cash-flow disclosures over the next four quarters are the leading indicator I’m focused on — they will show whether the capex commitments hold or quietly compress.

    References and related signals

  • Davos 2026 made AI sovereignty the policy line — and the corporate one

    Davos 2026 made AI sovereignty the policy line — and the corporate one

    What was announced

    The World Economic Forum 2026 met in Davos January 19–23 with AI as the dominant agenda item. The conversation converged on three themes: risk-proportionate governance, runtime governance for multi-agent systems, and what Microsoft CEO Satya Nadella framed as “corporate AI sovereignty” — firms owning the intelligence layer that encodes their distinctive capabilities. Anthropic CEO Dario Amodei warned the forum that frontier AI is uniquely well-suited to autocracy, calling for targeted chip-export controls. A WEF press release on the same week reported leading organizations are shifting from “potential” to “performance” — measuring AI by realized output rather than pilot count.

    What it means

    The vocabulary shift is the substantive event. For two years, AI policy discussion at this forum was framed as risk management — what to restrict, what to monitor, what to ban. The 2026 framing is different. It treats AI as critical infrastructure where the governance question is who owns it, not whether it should exist. “Sovereignty” applied to AI is a deliberate echo of “data sovereignty” — a recognition that the layer of intelligence inside an organization is becoming as load-bearing as its data layer was a decade ago.

    For governments, this redirects policy from rule-writing to capability-building: domestic compute, domestic foundation models, controlled exports. For corporations, it redirects strategy from procurement to capability ownership: which models do you fine-tune yourself, which workflows encode your tacit knowledge, and which partners do you let inside the trust boundary. Both translations point to the same architectural question: where does the irreducible cognitive core of your organization live, and who can take it from you.

    Andreas’s view

    My read on this: Davos is a leading indicator of where C-suite vocabulary moves over the next 12 months. “Corporate AI sovereignty” is not a slogan — it is a framing that makes specific decisions easier to defend in a board meeting. Building your own model fine-tunes is sovereignty. Choosing not to send your customer interactions through a third-party model API is sovereignty. Maintaining a private inference cluster is sovereignty. The vocabulary justifies budgets that previously read as duplicative or paranoid.

    I don’t think the sovereignty framing is purely defensive. There is a competitive argument inside it: organizations that operate as pure consumers of frontier models are paying rent on the cognitive layer of their own business. Organizations that operate as owner-operators of a fine-tuned, workflow-embedded intelligence layer pay less rent and accumulate a moat that compounds with their data. The Davos talking points are starting to reflect that distinction.

    The way I see it, the question that matters this quarter is not “what is our AI strategy” but “what would it take to lose access to our primary model provider, and what would happen to the business if we did.” If the answer is catastrophic, the sovereignty argument is operational, not philosophical, and it has a budget implication.

    Three things I’m watching

    1. I’ll be watching whether companies run model-dependency stress tests — simulating the operational impact of losing their primary frontier-model provider for 30, 90, and 180 days. The result is the size of their sovereignty problem, and whether they even know that number tells me a lot.
    2. The companies that preserve strategic optionality will be the ones that draw a clear line between work requiring owned cognition (fine-tuned, embedded, internal) and work that can run on rented cognition (API-served frontier models) — and treat that boundary as a capital decision, not a procurement decision.
    3. I’ll be watching how the policy direction develops across major operating jurisdictions. Sovereignty framing in Davos has a consistent track record of translating into sovereignty requirements in regulated industries within 12–24 months.

    References and related signals

  • When 88% of organizations have adopted AI, adoption stops being the question

    When 88% of organizations have adopted AI, adoption stops being the question

    What was announced

    The Stanford HAI 2026 AI Index landed in mid-January with a set of numbers that close out a debate. Organizational AI adoption reached 88% globally. Global corporate AI investment more than doubled in 2025 to $581.7 billion. Generative AI hit 53% population adoption within three years — faster than the personal computer or the internet. Four out of five university students now use generative AI as part of their coursework.

    What it means

    When adoption crosses the 80% line, the question of “should we adopt” becomes structurally uninteresting. Every relevant comparison group has already answered it. What remains is differentiation — and differentiation in a world of universal access is harder, not easier, than in a world of selective access. The strategic margin moves from access to integration depth, from licenses to workflow penetration, and from procurement decisions to operating-model decisions.

    The investment number is the more telling signal. $581.7 billion of corporate AI investment in a single year is a capital allocation that prices in a specific belief: that AI capability will compound at a rate that makes today’s spending the cheap option in retrospect. That belief either turns out to be correct, in which case the laggards face a permanent gap, or it overshoots, in which case the survivors of the correction still own infrastructure and skills the laggards do not.

    Andreas’s view

    My read on this: the AI Index numbers are not a celebration of momentum, they are a notice of obsolescence. Adoption was the entry-level metric — the one that let companies say “we are doing AI” without committing to anything that mattered. With 88% adoption, that metric is exhausted. The companies that conflate “we have AI deployed” with “we have an AI strategy” will be the ones surprised in 18 months when peers with the same headline adoption rate are operating at a fundamentally different unit-economics base.

    I don’t think the next two years will be about adopting more. They will be about routing work differently — deciding which functions become AI-native, which roles get redesigned, which middle-management layers compress, and which workflows get rebuilt from the ground up rather than augmented. The companies treating this as a tooling question will keep the org chart they had in 2024 and bolt assistants onto it. The companies treating it as a structural question will redesign for AI-native operations and harvest a different cost base.

    My expectation is that boards still reporting on adoption rates are measuring the wrong thing entirely. The number that matters is the percentage of work routed through AI-native processes versus AI-augmented legacy processes. Those are two different cost structures and two different competitive positions. The first is a step change. The second is a feature.

    Three things I’m watching

    1. I’ll be watching whether companies move away from adoption KPIs toward integration-depth KPIs — specifically, the percentage of revenue-generating workflows that are AI-native, not just AI-touched.
    2. The companies that stand out to me will be the ones that build the comparison the AI Index doesn’t make for them: how their spend per FTE on AI infrastructure and tooling stacks up against the 90th-percentile peer in their sector. If that number isn’t visible to leadership, it isn’t informing strategy.
    3. I’ll be watching whether organizations use the next 12 months as a workflow-redesign window rather than a tooling-procurement window. The structural opportunity narrows the moment competitors finish their redesign.

    References and related signals

  • Humanoids crossed from demo to deployment in one week

    Humanoids crossed from demo to deployment in one week

    What was announced

    At CES 2026 in Las Vegas (Jan 5–9), a cluster of robotics announcements crossed the same threshold in a single week. Boston Dynamics unveiled the production-ready electric Atlas with Hyundai committing the first fleet to its Metaplant in Savannah, Georgia, and announced a partnership with Google DeepMind to integrate Gemini Robotics models into the platform. LG demonstrated CLOiD performing real household work — laundry, dishwasher loading, food preparation — in a staged living environment. EngineAI introduced the T800 with a $25,000 starting price and mid-2026 shipping. CES listed 40 companies referencing humanoids on the show floor.

    What it means

    For three years humanoids were a category of demo videos. CES 2026 is where the category became a category of contracts. Production is committed, factories are named, prices are listed, and the foundation-model layer (Gemini Robotics, comparable initiatives at other labs) supplies the cognitive component that previously made every demo brittle. The constraint is no longer “can it walk on stage.” The constraint is “what does the deployment workflow look like, and who owns the integration.”

    From this follows a second-order effect: industrial buyers now have a real procurement question to answer in 2026 — not in 2030. Hyundai’s timeline (Atlas at Metaplant, dedicated robotics factory targeting 30,000 units per year by 2028) is the explicit benchmark. Every competing automaker, every large logistics operator, and every contract manufacturer now sits with a known reference deployment to react to.

    Andreas’s view

    My read on this: the news is not that the robots are good enough. The news is that buyers have decided they are good enough to commit — and the price has moved into range. At $25,000, a humanoid sits below the annual cost of an industrial worker in most developed markets. That shifts the question from “is this technology real” to “where does it amortize fastest.”

    My three takeaways:

    1. The barrier that fell was cognitive, not mechanical. The hardware has been close to ready for years. What changed is that foundation models — think Atlas plus Gemini Robotics — absorbed the cognitive deficit that kept robots out of unstructured environments. CES 2026 looks different because the system is different, not just the chassis. I think anyone framing this as “better robots” is underestimating the speed of what comes next.

    2. The 2030 humanoid timeline is already stale. In my view, this is now a 2026 pilot conversation for any organization with manufacturing, warehousing, or fulfillment in its operations footprint — anywhere unit-level labor is the dominant cost driver. Not as a capex bet, but as a learning investment. The compounding advantage goes to whoever builds operational muscle around these systems first.

    3. The real cost of waiting isn’t hardware — it’s the operating model. Hardware will be available to everyone. What won’t be available off the shelf is three years of deployment experience. My expectation is that late movers won’t just be buying machines from competitors — they’ll be importing the playbook for how to use them.

    References and related signals

  • The agentic year begins underprepared

    The agentic year begins underprepared

    The year opens with a measurable gap. McKinsey’s 2026 trust maturity survey, fielded in December and January, puts twenty-three percent of organizations into the scaling phase for agentic systems and thirty-nine percent into experimentation. The remaining majority — nearly two thirds — has not yet begun scaling AI across the enterprise. The capability frontier moved twelve to eighteen months faster than the operating models around it. That gap is no longer an experimentation question. It is the year’s defining strategic risk.

    The boards that close this gap first will not be using better models than their competitors. They will be running organizations that can metabolize what the models already do. The constraint is no longer technology. It is adoption — and adoption is a leadership problem.

    The shift is structural, not cyclical

    Agentic systems are not a new feature inside a familiar product. They are a new class of worker. They take a goal, decompose it into steps, hold state across those steps, call other tools, recover from errors, and return a completed unit of work. That changes what a job is, not how a job is done.

    The 2025 narrative — copilots, productivity boosts, ten percent uplift — is over. The 2026 question is harder. What units of work no longer require a human originator? What units of work now require a human reviewer instead of a human executor? Which decisions can be delegated to a system that explains its reasoning? The companies asking these questions on a Monday morning are reorganizing. The companies still benchmarking model accuracy are stalling.

    The shift is one-way. No board will vote in 2027 to remove agentic systems from a workflow they reduced from forty hours to four. The architectural choices made this year will compound.

    The role change has already happened on the ground

    Inside organizations that have actually shipped agentic systems, the role redefinition is happening informally, by individual contributors, ahead of any HR process. A senior analyst who used to write three reports a week now reviews twelve agent-drafted reports a week and signs off on the analysis. A staff engineer who used to write three pull requests a day now reviews fifteen agent-generated pull requests a day. An account manager who used to draft proposals now edits proposals the agent has built from CRM context.

    The work that survives is judgment, taste, accountability, and relationship. The work that does not survive is execution under specification. Job titles still describe the second category. Job content has already shifted to the first.

    First-line managers feel this most acutely. They were trained to manage humans doing execution work. They are now managing humans doing review work, who in turn are managing systems doing execution work. That is a different management discipline — closer to portfolio management of automated processes than to people management of execution teams.

    The organizational consequence is delayering

    Span of control widens when the work below each manager becomes more automated and more reviewable. McKinsey’s parallel work on the state of organizations points in the same direction: companies that scale agentic systems also flatten by removing one to two layers of middle management. The economic logic is direct. Middle layers existed to translate strategy into execution and to coordinate the humans doing that execution. When the execution is increasingly handled by systems and the translation is increasingly handled by models, the layer is doing less.

    This is not the 2024 layoff cycle that hit individual contributors. This is a 2026 reorganization that compresses the manager-of-managers layer. It is structurally different and politically harder. The people most threatened by it are the people running the budget meetings about it.

    Organizations that resist the delayering will have a temporary cost advantage and a permanent decision-velocity disadvantage. Decision cycles compress when fewer humans need to be in the loop. The competitor who removed two layers will commit to a market move three weeks faster. Over a year, that compounds into a different market position.

    So what boards should do this quarter

    Two actions belong on the Q1 agenda. First, demand a workforce plan that names the units of work moving from human execution to human review, with a twelve-month horizon. Vague AI strategies are no longer acceptable as deliverables; the question is which jobs, which tasks, which review cadences, which accountability lines.

    Second, name an executive owner for the operating-model redesign — not for AI strategy as a separate track, but for the way the company will be organized around the systems it has already deployed. The CHRO and the COO are the natural owners. The CTO is not. The technology decision is downstream of the operating-model decision, and treating it as upstream is how organizations end up with sophisticated tools and a 2023 org chart.

    The year that just started will be measured by the gap between capability and operating model. The companies that close it first set the pace for the rest of the decade. The risk is not moving too fast. The risk is moving too late. Execution speed will separate leaders from followers.