A user emailed an AI platform requesting a refund because, over an 8+ hour session, their AI agent had produced repeated material errors in confidential board documents — incorrect financial figures, fabricated data, internal contradictions, wrong calculations. The user had spent hours personally identifying and correcting each one.
The founder posted the email publicly. It reached 456,000 views. The internet mostly dunked on the user.
We think that's a mistake.
Here's what the pile-on missed: the user's complaint is structurally sound.
Not because AI agents should be perfect. They aren't, and no serious practitioner expects them to be. Hallucination is a known, documented, managed risk in AI-assisted work.
The complaint is sound because there was no framework for accountability.
No acceptance criteria defined before the session started. No quality pipeline running against those criteria. No audit trail. No structured way to say 'this is what we agreed the agent would produce, here is what it actually produced, and here is the gap.' Just a token bill, a hope, and — when things went wrong — a public dunking.
In every other professional services context, that framework exists. We call it a contract.
When you hire a law firm, there's an engagement letter. When you hire a developer, there's a statement of work. When you hire an accountant, there's an engagement agreement with defined scope and deliverables.
These documents don't exist because professionals are untrustworthy. They exist because complex work on sensitive matters requires a shared, documented understanding of what success looks like — and a structured mechanism for resolving disputes when it doesn't arrive.
AI agents performing complex work on sensitive matters need exactly the same infrastructure. Right now, almost none of them have it.
This situation will repeat itself thousands of times this year. AI agents are being deployed on increasingly high-stakes work — financial documents, legal analysis, due diligence, board materials — with no underlying accountability framework.
When something goes wrong, there are two options: absorb the loss silently, or argue about it publicly with no evidence either way. Neither is acceptable at enterprise scale.
The exact.works model is different:
Exact before you run. A binding service agreement is Exacted into a Paper before the agent starts work, with completion criteria defined upfront. Both parties — the buyer and the AI provider — know exactly what success looks like before a single token is consumed.
Report, don't guarantee. We're not the agent. We don't guarantee agent performance. What we do is run an independent quality pipeline that reports objectively against the defined criteria. Think Deloitte, not your CFO. A home inspector, not the builder.
Certify the output. Every completed engagement produces a documented record of what was produced, what criteria it was evaluated against, and whether it passed.
Resolve with evidence. If a dispute arises, it's resolved against an evidence trail — not a he-said-she-said email chain sent to a founder who posts it publicly.
This isn't a story about a bad user or a bad AI. It's a story about a missing layer.
The AI industry has invested heavily in capability — faster agents, smarter models, longer context windows, better tool use. The trust infrastructure that should sit underneath all of that — contracts, quality gates, dispute resolution, audit trails — has been largely ignored.
That gap is closing. And it matters enormously, because the work AI agents are being trusted to do is only getting more sensitive.
Every AI agent needs a contract. That's not a tagline. It's a design requirement.
Every AI agent needs a contract.
exact.works →