On April 2, Microsoft released the Agent Governance Toolkit — a seven-package open-source system for governing AI agent behavior at runtime. It intercepts tool calls. It enforces policy rules. It scores behavioral trust with cryptographic identity. It maps to the EU AI Act, HIPAA, SOC2, and all ten OWASP Agentic AI Security risks.
It's serious infrastructure. It is not a service agreement.
Microsoft's README says it clearly: AGT "is not a model safety or prompt guardrails tool." It "governs agent actions at the application layer." What that means in practice: AGT watches what your agent does and stops it from doing things it shouldn't. That's the firewall. It's necessary. It's not sufficient.
Here's what the firewall can't do.
Suppose AGT catches something. Your agent made a tool call outside its declared scope. The behavioral trust score dropped. The kill switch fired. The session is suspended.
Now what?
Who is liable for the work the agent didn't complete? If the agent had already consumed compute budget before suspension, who absorbs that cost? If the deviation caused downstream harm — a financial decision made on corrupted data, a contract reviewed by a compromised agent — how does the Buyer pursue relief? How does the AI Provider defend their methodology without exposing proprietary training data? How does this get resolved before the EU AI Act's 72-hour incident reporting deadline?
AGT answers none of these questions. It can't. They're not runtime questions. They're contractual questions.
The firewall tells you the agent deviated. The service agreement tells you what deviation means for everyone involved.
There's a financial markets parallel that makes this concrete.
SWIFT is the global messaging network for bank-to-bank transactions. It's remarkable infrastructure — it routes trillions of dollars in transactions daily, with cryptographic authentication and real-time settlement rails. SWIFT makes transactions possible.
ISDA is the International Swaps and Derivatives Association. It publishes the Master Agreement that governs what happens when a derivatives transaction goes wrong — who owes what, how disputes are resolved, what triggers a close-out. ISDA makes transactions enforceable.
Nobody asks whether SWIFT competes with ISDA. They serve different layers of the same ecosystem. SWIFT routes the transaction. ISDA governs it.
AGT is SWIFT for AI agent transactions. It routes and monitors the execution. It makes the transaction possible.
exact.works is ISDA for AI agent transactions. It defines what the agent is obligated to deliver, records whether it did, and resolves disputes when it didn't.
A service agreement for an AI agent transaction isn't a standard software license. It has to answer questions that didn't exist before autonomous agents.
What did the parties actually agree to? Not what the agent was capable of — what it was specifically hired to do. Success criteria defined upfront, locked at Exact time, so neither party can move the goalposts after delivery.
What happened during execution? An immutable audit record — every action the agent took, every tool it called, every input it received, hash-chained so nothing can be altered after the fact. Not a log. A Trace. The difference: a log can be edited; a Trace can be verified by anyone with the hash.
Who is accountable? In an A2A world where orchestrators hire sub-agents that hire sub-agents, accountability can disappear into the delegation chain. A service agreement anchors every transaction to a verified human responsible party — what we call the Human Root. The chain of accountability cannot terminate at an agent.
What happens when it goes wrong? A tiered dispute mechanism — automated resolution for clear-cut cases, AI-assisted panel review for complex ones, human expert determination for high-stakes disputes, AAA arbitration as the backstop. Proportional to the stakes. Available to both parties. Not a black box.
The firewall catches the problem. The service agreement resolves it.
AGT's Agent Compliance package is genuinely useful — it maps EU AI Act articles, HIPAA requirements, and OWASP risks to agent behavior. An enterprise buyer can see which compliance frameworks their agent addresses at runtime.
What it doesn't do: attach that compliance evidence to a binding commitment. An agent that passes AGT's EU AI Act checks is better than one that doesn't. An agent governed by a Standard AI Service Agreement has made a legally enforceable commitment to operate within those parameters — and the Trace record proves whether it did.
The distinction matters when a regulator asks for your EU AI Act Article 12 logging evidence. AGT can tell you the agent ran within policy. The SAISA and Trace can tell you what the agent was contracted to do, what it actually did, and whether the two matched — with a hash-chain that proves neither record was altered.
That's the conformity file an ISO 42001 auditor actually needs. Compliance evidence attached to an enforceable agreement, not compliance evidence floating free of one.
The organizations deploying serious AI agents in 2026 are discovering that they need three things, not one.
Model safety: defenses at the model layer — alignment, content filtering, guardrails. NemoClaw and similar tools handle this.
Runtime governance: behavioral monitoring, policy enforcement, capability gating. AGT handles this. It handles it well.
Contractual governance: the service agreement, the audit record, the dispute mechanism, the liability framework. This is the layer that makes the enterprise transaction possible at scale — not just technically, but legally and commercially.
No single tool covers all three layers. That's not a gap — it's an architecture. The organizations that understand this are building stacks, not searching for a single solution.
AGT just made the runtime layer significantly better. That's good for everyone who needs to deploy AI agents with confidence.
It's not a service agreement. You still need one.
Every AI agent needs a contract.
exact.works →