# What an Audit-Grade Trail for Agents Actually Looks Like

> An audit log is a transcript. An audit-grade trail is a four-tuple: Intent + Evidence + Decision + Confidence. Why this distinction is what separates ship-it-to-prod from regulator-blocked.

URL: https://agentsbooks.com/blog/audit-trail-agents
Published: 2026-05-19T16:20:00Z
Category: Deep Dive
Tags: audit, compliance, spoke, p4, nist, eu-ai-act

An audit-grade trail isn't a transcript. It's a structured artefact that lets a regulator or auditor answer *"why did the agent do that, and on what basis?"* without inferring. Most agent frameworks ship a transcript and call it an audit log. This essay shows the four-tuple that actually qualifies.

## The four-tuple

For every agent decision worth auditing, the trail captures:

1. **Intent** — what the agent was trying to accomplish on this call. Encoded as a structured field, not as free text.
2. **Evidence** — the inputs the agent drew on. Includes retrieved Knowledge documents (with IDs), prior Memory items, the principal's request payload.
3. **Decision** — the structured output. Not the free-text reply — the typed decision object (`{verdict: "approve", risk_score: 0.34, reasons: [...]}`).
4. **Confidence** — the agent's self-reported confidence in the decision, plus the model's logprob distribution if available.

This four-tuple is what an auditor can *query*. *"Show me every decision in Q2 2026 where confidence was <0.7 but the verdict was 'approve'"* — answerable in seconds against a four-tuple log. Unanswerable against a transcript.

## Why this satisfies the regimes

The mapping to specific clauses:

- **[NIST AI RMF](https://www.nist.gov/itl/ai-risk-management-framework) MEASURE-2.7** (TEVV — test, evaluation, verification, validation) — TEVV requires structured outcomes. Four-tuple gives them.
- **EU AI Act Art. 12** (logging) — "automatically generated logs sufficient to trace decisions". Transcript ≠ traceable decision; four-tuple = traceable decision.
- **SOC 2 Processing Integrity PI1.4** — "system processing complete, valid, accurate". The four-tuple is what an attestor inspects.
- **ISO/IEC 42001 Clause 9** (performance evaluation) — same.

## How the substrate emits it

In the AgentsBooks substrate ([Pillar P1](/blog/eight-primitives-agentic-firm)), the four-tuple emits as a side-effect of operating. Each Heart task wraps the LLM call in an `audit_decorator` that captures:

- Intent: from the task definition's `goal` field.
- Evidence: from Memory's retrieval log + the inbound A2A/event payload.
- Decision: from the agent's typed output schema (defined in the task).
- Confidence: from the model's confidence reporting (when available) + a self-reported `confidence` field in the output schema.

The four-tuple lands in Episodic memory ([Pillar P8](/blog/agent-memory-knowledge)) and is exposed to the operator dashboard via a structured query.

## What to leave OUT of the audit log

Three things commonly bloat audit logs without adding compliance value:

1. **Full chain-of-thought.** Reasoning traces are useful for *debugging*, not for *auditing*. Keep them in a separate diagnostic store with shorter retention.
2. **Raw model API metadata** (response IDs, region, etc.) beyond what's needed for cost reconciliation.
3. **Repeated cacheable context.** Hash the prompt + cache flags; don't store the full 50K-token prompt for every call.

The audit log should be queryable in <500ms for any rolling 90-day window. If it's slower, you've stored too much.

## FAQ

**Q: How long should the audit trail be retained?**
A: Regulator-dependent. EU AI Act presumes 6 months minimum for high-risk systems; some financial regimes require 7 years. The substrate supports tiered retention (hot/warm/cold) so longer windows don't blow up query cost.

**Q: Can the auditor query the trail directly?**
A: With a tenant-scoped read-only token, yes. AgentsBooks exposes an `/audit/decisions` query endpoint that takes a filter spec and returns the four-tuples. Most attestation engagements run from this directly.

**Q: How does this relate to the rest of the compliance pillar?**
A: This spoke is the *evidence* layer. The [P4 pillar](/blog/compliance-agentic-systems) is the *control* layer (NIST + EU + SOC2 + ISO mapping). The substrate emits both.

---

*Need audit-grade agent behaviour? [Start free →](/login?returnTo=/onboarding)*
