AI Decision Auditability Benchmark v1.0

Why Now

The current state of algorithmic trading auditability

Black-Box Decisions

AI-driven trading decisions are opaque. When regulators ask "why?", there's no auditable answer.

Logs Exist, But...

Logs are recorded, but authenticity and sequence cannot be proven. Timestamps can be disputed.

Evidence Quality Gap

Audits fail at the final stage: evidence quality. Manual gathering takes days, formats are inconsistent.

Reference Implementation

Auditability Benchmark — Reference Implementation

A local-only, audit-safe reference implementation for running the AI Decision Auditability Benchmark and exporting regulator-ready evidence.

VAP Scorecard Explorer is a reference implementation of the AI Decision Auditability Benchmark (10 criteria / 20 points).

Allows audit and assurance teams to:

Run a consistent, repeatable assessment
Record scoring rationale and evidence notes
Export an Audit Evidence Pack (ZIP / PDF)

Privacy & Security

All processing runs locally. No network communication. No external APIs. No analytics.

Open Scorecard Explorer (Local-Only)

Benchmark specification and scoring criteria are published openly as the canonical reference.

VAP Scorecard Explorer — Update History

v1.2 Current

• Executive Mode: One-click toggle between 🔧 Technical / 👔 Executive views
• "Why This Assessment Matters": Added regulatory context (EU AI Act, MiFID II, SEC Rule 17a-4)
• Executive-friendly criterion summaries: Business-focused one-liners (e.g., "Can auditors verify this?")
• 30-item confirmation checklist: 3 items per criterion, click to toggle (✓ Confirmed / ~ Partial / ○ Not confirmed)
• Checklist status dashboard: Real-time counts, priority items highlighted
• Checklist legend: Clear status explanations ("Audit-ready" / "Needs supporting docs" / "Cannot explain")
• Save toast notification: "✓ Saved" displayed for 2 seconds
• Larger checkbox click area: Full row clickable with hover feedback
• Simplified Chinese button in security modal: 🇺🇸 English / 🇯🇵 日本語 / 🇨🇳 简体中文
• Modal language persistence: Selected language carries over to main screen
• Checklist state persistence: Saved to LocalStorage, survives page reload
• Mobile responsive CSS: Fixed layout issues in Executive Mode

v1.1

• Simplified Chinese (zh-CN) full support: All UI text, criteria, rubrics, and guidance translated
• Terminology standardization: Official Chinese technical terms (可审计性, 可由第三方验证, 篡改可检测)
• Implementation-neutral disclaimer for Criterion 2: "These are examples only. Any cryptographic construction that enables detection of unauthorized modifications is acceptable."

What It Is

A diagnostic score, not an implementation proposal

Purpose

This is not a technology adoption proposal.

This benchmark enables organizations to diagnose their auditability using an industry-standard measure. Results directly serve as evidence quality for external audits and regulatory compliance.

Self-assessment tool for internal teams
Third-party evaluation framework
Vendor-neutral, technology-agnostic

Note: This benchmark does not provide certification or endorsement. It offers an independent, evidence-based assessment framework.

20

Maximum Points

Criteria 10

Points each 0 / 1 / 2

PoC Time ~3 hours

10 Evaluation Criteria

Ordered by audit relevance. Evidence-centric criteria first, technical implementation details later.

#1 0 / 1 / 2

Third-Party Verifiability

第三者検証可能性

"Can an external party independently verify the audit trail?"

0No external verification possible

1Partial verification with vendor assistance

2Full independent verification using standard tools

#2 0 / 1 / 2

Tamper Evidence

改ざん検知

"Can unauthorized modifications be detected?"

0No tamper detection; silent modification possible

1Basic checksums with gaps

2Cryptographic integrity (hash chains, Merkle trees)

#3 0 / 1 / 2

Sequence Fixation

順序の固定

"Is Decision → Order → Execution order immutable?"

0Events can be reordered post-hoc

1Timestamps exist but no cryptographic binding

2Monotonic sequencing with cryptographic linkage

#4 0 / 1 / 2

Decision Provenance

判断由来

"Can inputs, conditions, and rationale be traced?"

0Only outcomes recorded

1Some inputs logged but incomplete

2Full provenance: data, parameters, model state, logic

#5 0 / 1 / 2

Responsibility Boundaries

責任境界

"Who approved, modified, or overrode each action?"

0No attribution; generic accounts

1Username logged but no signature

2Digital signatures on all approvals/overrides

#6 0 / 1 / 2

Audit Submission Readiness

監査提出性

"Can evidence be exported for regulatory review?"

0Manual gathering required; takes days

1Partial export; separate extraction needed

2One-click export; complete package in <5 min

#7 0 / 1 / 2

Retention & Durability

保持期間・耐久運用

"Are records retained for required periods (e.g., 7 years)?"

0No policy; data may be lost

1Policy exists but incomplete enforcement

2Enforced retention with redundancy & integrity checks

#8 0 / 1 / 2

Timestamp Reliability

時刻の信頼性

"Are timestamps synchronized to a trusted source?"

0Local system clocks only

1NTP sync but no drift monitoring

2PTP or RFC 3161 with documented accuracy

#9 0 / 1 / 2

Cryptographic Strength

暗号強度

"Do algorithms meet current security standards?"

0Deprecated algorithms (MD5, SHA-1)

1Adequate but no key management

2Strong (Ed25519, SHA-256+) with key lifecycle

#10 0 / 1 / 2

Cryptographic Agility

暗号移行性（PQC準備）

"Can the system migrate to new algorithms?"

0Hard-coded; migration breaks verification

1Algorithm identifiers exist but untested

2Documented PQC migration path verified

3-Hour PoC Assessment

Minimum viable test procedure for all 10 criteria

Total Time: ~3 hours

1

Export & Verify

30 min

Export sample audit log (10-100 records). Give it to someone unfamiliar with your system.

Rule: No phone calls, no vendor support, no internal tools allowed.

2

Tamper Test

20 min

Modify one field in one historical record. Run integrity check.

Pass: Automatic detection with alert; modification location identified.

3

Sequence Check

15 min

Find a Decision → Order → Execution chain. Verify cryptographic binding.

Test: Try to insert a backdated event. If possible, score 0.

4

Provenance & Attribution

35 min

Pick a random decision from last week. Reconstruct: inputs, parameters, logic, approver.

Target: Full context retrievable in <10 min = Score 2.

5

Audit Export

30 min

Simulate: "Regulator requests all activity for Account X, Date Y."

Target: One-click export; complete package in <5 minutes = Score 2.

6

Technical Review

50 min

Review retention policy, time source, cryptographic algorithms, migration plan.

Covers: Criteria #7-10 (Retention, Timestamp, Crypto Strength, Agility)

Download Full PoC Guide (VSO-SCORE-002)

Evidence Pack Template

Third-party submission template for audit and regulatory review

CONFIDENTIAL Third-Party Submission Template | Version 1.0

Template Contents

Overall Score: /20 points with assessment level
Score Breakdown: All 10 criteria with individual scores
Evidence Index: Filename + SHA-256 hash for each item
Attestation: Assessor signature and date

Evidence Index Sample

#1 audit_log_2025-01.json

SHA-256: a7f3c9d2...

#2 tamper_test_results.pdf

SHA-256: b8e4d1f5...

Download Evidence Pack Template

EU AI Act Regulatory Mapping

Alignment with Regulation (EU) 2024/1689 for high-risk AI systems

EU AI Act Article	Requirement	Benchmark Coverage
Article 12	Record-keeping / Logging	✓ Direct Criteria 1-7
Article 13	Transparency	◐ Partial Criteria 4, 5
Article 14	Human Oversight	◐ Partial Criterion 5
Article 17	Quality Management	✓ Supported Criteria 6, 7

MiFID II / RTS 25 Synergy: Criterion #8 (Timestamp Reliability) also addresses RTS 25 clock synchronization requirements (±100μs for HFT, ±1ms for others).

Download EU AI Act Annex (VSO-SCORE-004)

Who Is This For

Industry stakeholders who benefit from standardized auditability measurement

Audit / Assurance

Set a common baseline for audit engagements. Compare systems objectively.

Standardized assessment criteria
Evidence Pack for submissions
Cross-organization comparison

RegTech Vendors

Demonstrate your product's auditability with quantifiable metrics.

Marketing with concrete scores
Product differentiation
Regulatory compliance proof

Brokers / Venues

Turn transparency into a competitive advantage. Speed up audit submissions.

Client trust differentiator
Faster regulatory responses
Reduced audit costs

What Your Score Means

Interpretation guide for assessment results

16-20

Strong Auditability

Ready for external audit and regulatory review. Continue maintaining best practices.

11-15

Moderate Auditability

Address gaps in 0-score areas before external audit. Focus on quick wins first.

6-10

Limited Auditability

Significant improvements needed. Prioritize evidence-centric criteria #1-6.

0-5

Inadequate

Fundamental gaps require immediate attention. Consider system redesign.

Downloads & Resources

All benchmark documents and resources

Scorecard

VSO-SCORE-001

10 criteria, scoring rubric, self-assessment sheet

PoC Guide

VSO-SCORE-002

Step-by-step test procedures (~3 hours total)

Evidence Pack

Submission Template

Third-party submission template with attestation

EU AI Act Annex

VSO-SCORE-004

Regulatory mapping to Articles 12, 13, 14, 17

View all files on GitHub

FAQ

Frequently asked questions about the benchmark

Do I need to adopt VCP to use this benchmark?

No. This benchmark is a measurement tool for auditability, usable regardless of technology choice. However, achieving scores close to 20 typically requires cryptographic integrity mechanisms—which VCP provides as one option.

Can audit firms use this for client assessments?

Yes. The benchmark is licensed under CC BY 4.0. Audit firms can use it for client engagements with attribution. The Evidence Pack provides a standardized submission format.

Does confidential data leave our organization?

Not necessarily. The benchmark is designed for internal self-assessment. For third-party submissions, the Evidence Pack uses SHA-256 hashes to prove file integrity without exposing actual content. You control what gets shared.

What should I submit for audit?

Use the Evidence Pack template: overall score, 10-criteria breakdown, Evidence Index (filename + SHA-256 hash), and assessor attestation. The hash-based index proves evidence authenticity without requiring full data disclosure.

What score is considered "good enough"?

16-20 points indicates strong auditability and readiness for external audit. 11-15 is moderate—address 0-score items first. Below 10 requires significant improvement before regulatory engagement.

Is there certification available?

This benchmark is for self-assessment and third-party evaluation. For formal certification, see the VC-Certified program which uses VCP compliance as its basis.

AI Decision Auditability Benchmark v1.2