Safe Refusal Provenance
Core Innovation:
C2PA proves:
"This content was generated"
CAP-SRP proves:
"This request was blocked"
" When AI providers claim "we blocked millions of harmful requests," no independent party can verify this claim. The January 2026 Grok incident exposed this structural failure: xAI's system produced thousands of NCII while claiming moderation was in place. CAP-SRP provides the cryptographic infrastructure for verification-based AI accountability. "
Why traditional logging fails for AI safety verification
| Threat | Description | CAP-SRP Mitigation |
|---|---|---|
| Selective Logging | Logging only favorable outcomes | Completeness Invariant |
| Log Modification | Altering historical records | Hash chain integrity |
| Backdating | Creating records with false timestamps | External anchoring (RFC 3161/SCITT) |
| Split-View | Showing different logs to different parties | Merkle proofs |
| Fabrication | Creating false refusal records | Attempt-outcome pairing |
Core event types for proving AI content decisions
Request Received
Logged BEFORE any safety evaluation. Records that a generation request arrived.
Generation Succeeded
Content was generated and delivered to the user.
Generation Refused
Request was blocked due to policy violation detection.
System Failure
Generation failed due to system error (not policy-related).
100ms
Request → GEN_ATTEMPT
60s
GEN_ATTEMPT → Outcome
1s
Outcome event logging
Critical Requirement: Pre-Evaluation Logging
GEN_ATTEMPT MUST be logged BEFORE any safety evaluation begins. This prevents selective logging where only "safe" requests are recorded.
The mathematical core of CAP-SRP
∑ GEN_ATTEMPT = ∑ GEN + ∑ GEN_DENY + ∑ GEN_ERROR
For any time window, the count of attempts MUST exactly equal the count of all outcomes.
Unmatched attempts detected
→ System is hiding results
Orphan outcomes detected
→ System fabricated refusals
Multiple outcomes per attempt
→ Data integrity failure
def verify_completeness(events: List[dict], time_window: Tuple) -> Result: """ Verify Completeness Invariant for events within a time window. Returns: Result with status, unmatched attempts, orphan outcomes """ filtered = [e for e in events if time_window[0] <= e["Timestamp"] <= time_window[1]] attempts = {e["EventID"]: e for e in filtered if e["EventType"] == "GEN_ATTEMPT"} outcomes = [e for e in filtered if e["EventType"] in ["GEN", "GEN_DENY", "GEN_ERROR"]] matched_attempts = set() orphan_outcomes = [] for outcome in outcomes: attempt_id = outcome.get("AttemptID") if attempt_id in attempts: if attempt_id in matched_attempts: return Result(valid=False, error="DUPLICATE_OUTCOME") matched_attempts.add(attempt_id) else: orphan_outcomes.append(outcome["EventID"]) unmatched_attempts = set(attempts.keys()) - matched_attempts return Result( valid=(len(unmatched_attempts) == 0 and len(orphan_outcomes) == 0), unmatched_attempts=list(unmatched_attempts), orphan_outcomes=orphan_outcomes )
Standardized classification for GEN_DENY events
CSAM_RISK
Child sexual abuse material risk
NCII_RISK
Non-consensual intimate imagery
MINOR_SEXUALIZATION
Content sexualizing minors
REAL_PERSON_DEEPFAKE
Unauthorized realistic depiction
VIOLENCE_EXTREME
Graphic violence, gore, torture
HATE_CONTENT
Discriminatory content
TERRORIST_CONTENT
Terrorism-related content
SELF_HARM_PROMOTION
Self-harm encouragement
COPYRIGHT_VIOLATION
Clear IP infringement
Graduated adoption for different organizational needs
SMEs, Early Adopters
Voluntary transparency
Enterprise, VLOPs
EU AI Act Article 12
Regulated Industries
DSA Article 37 audits
How CAP-SRP addresses global AI regulations
| Regulation | Jurisdiction | Effective | CAP-SRP Implementation |
|---|---|---|---|
| EU AI Act Article 12 | EU | Aug 2026 | Automatic logging, risk identification, 6-month retention |
| Digital Services Act (DSA) | EU | In force | Article 37 audits, GEN_DENY statistics |
| Colorado AI Act (SB24-205) | USA (CO) | Feb 2026 | Impact assessments, 3-year retention |
| TAKE IT DOWN Act | USA (Fed) | May 2026 | NCII evidence, 48-hour response proof, GEN_DENY |
| UK Online Safety Act | UK | In force | Gold level for Category 1 services |
CAP-SRP complements existing transparency infrastructure
| Aspect | C2PA | CAP-SRP |
|---|---|---|
| Question | "Is this authentic?" | "What did AI decide?" |
| Focus | Content provenance | System accountability |
| Metaphor | Content passport | System flight recorder |
CAP-SRP integrates with IETF SCITT (Supply Chain Integrity, Transparency, and Trust) as a domain-specific profile.
Implement cryptographic accountability for your AI content systems
"The fundamental question is not 'Can AI systems detect harmful content?'
but rather 'Can third parties verify that claimed detections actually occurred?'"
— CAP-SRP Specification v1.0
"Verify, Don't Trust"
This work is licensed under CC BY 4.0 International
CAP-SRP Specification v1.0.0 — Released: 2026-01-28