Industry AnalysisApril 13, 20267 min read

Why Andrej Karpathy's Software 2.0 Needs a Service Agreement Layer

In 2017, Andrej Karpathy published a blog post that reframed how a generation of engineers thinks about software. The thesis was simple: classical software — Software 1.0 — is code written by humans. Software 2.0 is code written by optimization. Neural networks are the program. Data is the source code. The engineer's job shifts from writing explicit instructions to curating datasets and defining loss functions.

The thesis was correct. Eight years later, Software 2.0 agents are executing real work — financial analysis, legal review, medical coding, software development, customer operations — with real economic consequence. They draft board materials. They process insurance claims. They write and deploy production code.

But Karpathy's framing, for all its prescience, left something out. Software 1.0 ships with documentation, version control, defined interfaces, and — when it matters — contractual obligations. Software 2.0 ships with none of that. The program is a learned function. The behavior is probabilistic. The failure modes are novel. And the governance layer is completely absent.

The Governance Gap in Learned Programs

Software 1.0 is deterministic. Given the same input, it produces the same output. When it fails, you can trace the failure to a specific line of code, a specific commit, a specific engineer. Accountability is structural — it's built into the development process.

Software 2.0 is not deterministic. The same prompt can produce different outputs on different runs. The behavior emerges from training data, not from explicit instructions. When a Software 2.0 agent produces a wrong answer, there is no line of code to blame. There is a 175-billion-parameter weight matrix that learned something slightly wrong from the corpus it was trained on.

This isn't a flaw in Software 2.0. It's the defining characteristic. And it's precisely why the governance model from Software 1.0 — code review, unit tests, deterministic QA — doesn't transfer.

Two Incidents That Prove the Point

In early 2026, an AI agent deployed for financial analysis fabricated figures in confidential board documents for a publicly traded company. The buyer discovered the fabrication after the documents had been circulated to the board. The AI Provider's response was to publicly mock the buyer for relying on an AI agent for sensitive work. There was no service agreement defining what the agent was supposed to do. No acceptance criteria against which the output could be measured. No audit trail showing what happened during execution. No dispute resolution framework. Just a viral social media thread and a buyer holding fabricated board materials.

Weeks later, a major social media platform launched an MCP server that let AI agents post, reply, and act autonomously on a network of 500 million users. Any AI Provider could give an agent credentials and let it loose. Within days, autonomous agents were engaging in conversations, posting content, and interacting with real users — with no governance framework defining what those agents were authorized to do, no record of what they actually did, and no mechanism for accountability when they overstepped.

Both incidents share the same root cause: Software 2.0 agents deployed without the governance infrastructure that every other professional service takes for granted.

The Recorder Principle

The solution is not to make Software 2.0 deterministic. That would eliminate what makes it valuable. The solution is to build a governance layer designed for non-deterministic programs.

This requires a different approach than traditional software quality assurance. You cannot unit-test a learned function the way you test a deterministic one. But you can define — before the agent starts work — what constitutes acceptable output. You can record — during execution — what the agent actually did. And you can assess — after delivery — whether the output met the criteria that both parties agreed to.

The principle is simple: record, don't judge. The governance layer is an infrastructure layer. It defines scope. It captures evidence. It provides a framework for resolution when expectations and outcomes diverge. It does not guarantee outcomes — no governance framework in any industry does. It makes outcomes auditable and disputes resolvable.

This is how every mature professional service industry works. Financial auditors report findings; they don't guarantee solvency. Building inspectors document compliance; they don't guarantee the building won't leak. Clinical trial monitors record data; they don't guarantee the drug works.

Every Software 2.0 Agent Needs a Service Agreement

Karpathy was right that Software 2.0 represents a fundamental shift in how programs are written. The shift in how programs are governed is equally fundamental — and still largely unaddressed.

When the program is a learned function, the service agreement becomes the primary governance artifact. Not the source code — there is no meaningful source code to review. Not the test suite — deterministic tests don't capture probabilistic behavior. The service agreement: a bilateral document that defines what the agent is supposed to do, under what constraints, with what acceptance criteria, and what happens when it doesn't.

The Standard AI Service Agreement provides this layer. Scope defined before work begins. Completion criteria Exacted into a bilateral Paper. Trace records generated during execution. Conformity observations produced after delivery. Structured dispute resolution available when outcomes and expectations diverge.

Software 2.0 is the most consequential shift in how programs are built since the compiler. It deserves a governance layer as rigorous as the technology itself.

Every AI agent needs a contract.

exact.works →

← Back to Newsroom