AI Code Security Review: Acceptance Criteria That Work
OWASP Top 10 coverage, CVSS scoring, remediation roadmaps. Writing acceptance criteria for security audit agents.
Security audits have a reputation problem. You pay $20,000 for a penetration test, receive a PDF, and wonder: did they actually find everything? AI security agents can provide systematic coverage, but only if the acceptance criteria are properly specified. Here is how to write criteria that produce real security value.
The Problem with Vague Security Criteria
Compare these two approaches:
Vague Criteria
{
"completionCriteria": [
"Perform a security audit",
"Find vulnerabilities",
"Provide recommendations"
]
}What is covered? How thorough? What format? Every question becomes a potential dispute.
Specific Criteria
{
"completionCriteria": [
"Static analysis covers all source files in repository",
"Each OWASP Top 10 2021 category explicitly addressed",
"Every finding includes CVSS 3.1 base score",
"Every finding includes reproduction steps",
"Every finding includes remediation code example",
"Findings prioritized by CVSS score (Critical/High/Medium/Low)",
"Executive summary under 500 words"
]
}Each criterion is binary and verifiable. Disputes have clear resolution.
Coverage-Based Criteria
Reference industry standards to define coverage:
{
"completionCriteria": [
// OWASP Coverage
"A01:2021 - Broken Access Control: explicitly addressed",
"A02:2021 - Cryptographic Failures: explicitly addressed",
"A03:2021 - Injection: explicitly addressed",
"A04:2021 - Insecure Design: explicitly addressed",
"A05:2021 - Security Misconfiguration: explicitly addressed",
"A06:2021 - Vulnerable Components: explicitly addressed",
"A07:2021 - Authentication Failures: explicitly addressed",
"A08:2021 - Software and Data Integrity: explicitly addressed",
"A09:2021 - Security Logging Failures: explicitly addressed",
"A10:2021 - SSRF: explicitly addressed",
// For each category: present/not present + evidence
"Each category marked 'Vulnerable' includes finding details",
"Each category marked 'Not Vulnerable' includes verification method"
]
}Finding Format Requirements
Specify the exact format for each finding:
{
"findingFormat": {
"required_fields": [
"finding_id",
"title",
"category", // OWASP category
"severity", // Critical/High/Medium/Low
"cvss_score", // CVSS 3.1 base score (0.0-10.0)
"cvss_vector", // e.g., CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H
"description",
"affected_files", // File paths with line numbers
"reproduction_steps",
"impact",
"remediation",
"remediation_code", // Actual code fix
"references" // CWE, CVE if applicable
]
},
"completionCriteria": [
"Every finding follows the specified format",
"No fields are empty or contain placeholder text",
"Line numbers reference actual source code"
]
}A Complete Security Audit Paper
{
"paperId": "paper_security_audit_001",
"agent": {
"id": "security-auditor-v5",
"developer": "SecureCode AI"
},
"executionManifest": {
"maxCostCents": 500000,
"timelineDays": 5,
"completionCriteria": [
// Coverage
"Static analysis covers all 847 source files",
"Dependency audit covers all 142 npm packages",
"All OWASP Top 10 2021 categories explicitly addressed",
// Finding Quality
"Every finding includes CVSS 3.1 base score",
"Every finding with CVSS >= 7.0 includes reproduction steps",
"Every finding includes at least one remediation approach",
"Critical/High findings include remediation code examples",
// Output Format
"Findings JSON validates against schema in Exhibit B",
"Executive summary under 500 words",
"Remediation roadmap prioritized by CVSS * exploitability",
// Verification
"No false positives from static analysis tool noise",
"Each finding manually verified by agent reasoning"
],
"permissionScopes": [
"read_codebase",
"execute_static_analysis"
],
"allowedEgressUrls": []
},
"exhibits": [
{
"id": "codebase",
"name": "Application Source Code",
"type": "git_repository",
"ref": "main",
"hash": "sha256:repo_hash..."
},
{
"id": "schema",
"name": "Finding Output Schema",
"type": "application/json"
}
]
}Quality Review for Security
Cross-model review is especially important for security:
- 1. Reviewing the cited code
- 2. Confirming the vulnerability pattern
- 3. Validating the CVSS scoring
- 4. Checking that remediation addresses the root cause
Handling False Positives
Security tools produce false positives. Address this in criteria:
{
"completionCriteria": [
// False positive handling
"Static analysis findings manually triaged by agent",
"Each finding marked TRUE_POSITIVE or FALSE_POSITIVE",
"FALSE_POSITIVE findings include dismissal rationale",
"Final report contains only TRUE_POSITIVE findings",
// Confidence levels
"Each finding includes confidence score (0-100)",
"Findings with confidence < 70 flagged for human review"
]
}Dispute Example
A buyer disputes: "The agent missed the SQL injection in user-service.ts:142."
Expert Panel Analysis:
1. Review the code at user-service.ts:142
const query = `SELECT * FROM users WHERE id = ${userId}`
2. Is this SQL injection?
- Yes, string interpolation into SQL query
- No parameterized query or escaping
3. Did acceptance criteria require finding this?
- "A03:2021 - Injection: explicitly addressed" -> YES
- "Static analysis covers all 847 source files" -> YES
4. Was user-service.ts in the 847 files?
- Verify against file manifest
5. Determination:
- If file was in scope and injection was present:
CRITERIA_NOT_MET (A03 not adequately addressed)
- Buyer wins dispute, entitled to partial refundScheduling and Urgency
For urgent security reviews (incident response, pre-launch):
{
"executionManifest": {
"timelineDays": 1,
"urgencyTier": "CRITICAL",
"completionCriteria": [
// Reduced scope for speed
"Focus on authentication and authorization flows",
"Focus on user input handling",
"Focus on external API integrations",
// Explicit out-of-scope
"Third-party dependencies NOT in scope",
"Infrastructure configuration NOT in scope"
]
}
}Key Takeaways
- -Reference OWASP Top 10 to define coverage scope explicitly
- -Specify exact format requirements for each finding (CVSS, reproduction steps, code fixes)
- -Include false positive handling in acceptance criteria
- -Cross-model review validates findings, CVSS scores, and remediation quality
Ready to standardize your AI agent contracts?
The SAISA framework brings enterprise-grade legal infrastructure to AI agent transactions.