Skip to content
Deep Dive compliance audit nist

Compliance & Auditability for Agentic Systems

The hardest thing about putting AI agents into a regulated workflow isn't getting them to work. It's proving — to a regulator, to an auditor, to a court — that they did work, and that the way they worked complies with the regime the firm operates under.

This essay is the compliance pillar for AgentsBooks. It maps the four regulatory regimes that matter most to AI-native service firms — NIST AI Risk Management Framework, the EU AI Act, SOC 2 Trust Services Criteria, and ISO/IEC 42001 — to specific demands they make of an agentic system, and to specific design choices in the substrate that meet those demands.

The framing throughout: compliance is not a layer you add on top of agents. It's a property of how the agents are built.

The four regimes — what each one wants

NIST AI RMF (US, voluntary but de-facto baseline)

NIST's AI Risk Management Framework 1.0 and the Generative AI Profile organize requirements into four functions: GOVERN, MAP, MEASURE, MANAGE.

What this means for an agent fleet:

  • GOVERN-1.4 — there must be a documented owner for each AI system. In the substrate: every agent has an Identity, every Identity has a tenant_id + owner_user_id. Pulling a roster is one query.
  • MAP-1.1 — the system's intended use must be documented. In the substrate: the agent's role, mission, and task definitions form the documented use.
  • MEASURE-2.3 — performance must be measured against intended use. In the substrate: each Heart task logs success/failure + token spend; aggregate-by-agent over rolling windows.
  • MANAGE-2.4 — high-impact risks must have escalation procedures. In the substrate: approvals + human-in-the-loop gates wired to specific task types via Heart's requires_approval flag.

NIST is voluntary in the US — but federal procurement increasingly requires conformance, and most enterprise buyers have made it the baseline they review against.

EU AI Act (EU, in-force from 2026)

The EU AI Act entered into force in 2024 with staggered enforcement; high-risk system requirements apply from August 2026 and General-Purpose AI obligations from 2025. The Act categorises AI systems by riskunacceptable, high-risk, limited risk, minimal risk — and applies different obligations to each.

What this means for an agent fleet operating in or selling into the EU:

  • Art. 9 (risk management) — continuous, iterative risk assessment must be in place. In the substrate: the Memory + Heart loop produces episodic logs that feed downstream risk dashboards.
  • Art. 12 (logging) — automatically generated logs sufficient to trace decisions. In the substrate: every model call, every tool invocation, every task firing is logged with agent_id, model, prompt_hash, output_hash, tokens, cost, timestamp.
  • Art. 13 (transparency to users) — users must be informed they're interacting with an AI system. In the substrate: the agent's profile page on Shares carries the disclosure; Control channels carry a per-message disclosure when configured.
  • Art. 14 (human oversight) — high-risk systems require human review. In the substrate: the approvals queue + Heart's requires_approval flag, wired to Slack notifications.
  • Art. 15 (accuracy, robustness, cybersecurity) — quantified metrics. In the substrate: eval harnesses on Heart tasks; periodic adversarial test runs against a held-out set.

GPAI obligations (Art. 53) apply upstream — to Anthropic, OpenAI, Google — not to the agentic firm. But the systemic risk clause for GPAI with significant impact does push obligations down the chain.

SOC 2 (US-led, financial-services-mandatory)

SOC 2 is an attestation framework — an auditor inspects the controls described against the Trust Services Criteria (Security, Availability, Processing Integrity, Confidentiality, Privacy) and writes a report. SOC 2 Type II covers a 6–12 month operating window.

For an agentic firm:

  • Security (CC6.x) — access control, change management. In the substrate: RBAC at the tenant + agent level; every config change is audit-logged.
  • Processing Integrity (PI1.x) — system processing complete, valid, accurate. In the substrate: episodic Memory + Heart task outcomes provide the trail; eval harnesses provide the accuracy metric.
  • Confidentiality (C1.x) — confidential information protected throughout its lifecycle. In the substrate: tenant isolation at the data layer; Knowledge documents tagged with confidentiality class; the Brain receives only the in-class subset.

SOC 2 is what your enterprise customers will ask for first. The audit isn't free — budget $30–80K for a SOC 2 Type II with a Big-4-tier firm — but it unlocks the lane to financial-services, healthcare, and B2B SaaS revenue.

ISO/IEC 42001 (international, 2023)

ISO/IEC 42001 is the first international management-system standard specifically for AI. Where SOC 2 audits controls and NIST gives a framework, ISO 42001 defines an AI management system (AIMS) — the meta-process by which an organization governs its AI.

The clauses that matter:

  • Clause 6 — planning: the organization must document AI objectives + AI risks.
  • Clause 7.4 — communication: stakeholders must be informed about AI behaviour boundaries.
  • Clause 8 — operation: processes for AI development, deployment, monitoring.
  • Clause 9 — performance evaluation: continual monitoring + internal audit.
  • Annex A — list of 38 control objectives covering data, model, deployment, transparency.

ISO 42001 is the regime most likely to become the global default — the EU AI Act references it; auditors are training to it; analyst firms (Gartner) treat 42001 certification as the proxy for "AI governance maturity."

How the 8 primitives map to the regimes

The pillar-1 essay (The 8 Primitives of an Agentic Firm) introduces the substrate. Here's the cross-mapping to the four regimes:

Primitive NIST EU AI Act SOC 2 ISO 42001
Identity GOVERN-1.4 (ownership) Art. 14 (oversight identity) CC6.1 (logical access) A.3.3 (roles)
Brain MAP-2.3 (system characterisation) Art. 53 (GPAI upstream) PI1.1 (input requirements) A.6.2 (model lifecycle)
Heart MEASURE-2.7 (TEVV) Art. 9 (risk management) PI1.4 (processing integrity) A.6.2.6 (operation)
Memory MANAGE-2.3 (incident logging) Art. 12 (logging) CC7.2 (system monitoring) A.7.5 (recording)
Control GOVERN-3.2 (workforce) Art. 13 (transparency) CC6.6 (channels) A.3.4 (responsibility)
Knowledge MAP-2.2 (context) Art. 10 (data) C1.1 (information lifecycle) A.7.2 (data quality)
Friends MAP-3.4 (third-party) Art. 16 (importer obligations) CC9.2 (vendor management) A.6.2.5 (interaction)
Shares GOVERN-5.1 (engagement) Art. 50 (disclosure) C1.2 (external) A.6.2.8 (transparency)

This is the spine of the compliance-agent-handbook satellite — each cell expands to a control specification: what to ship, what to test, what evidence to capture.

The audit-trail problem

The bottleneck for every audit-related regime above is the same: can you produce a forensic trail of why the agent did what it did?

Most agent frameworks can't. A LangChain or AutoGen agent typically logs the LLM call (prompt + completion + tokens) and the tool invocations. That's a transcript, not a forensic trail. An auditor asking "why did agent KYC-3 approve customer X on the third review pass" gets a transcript and has to infer the reasoning.

The fix is to make the trail structural — log the agent's intent, the evidence it drew on (with citations to specific Knowledge items), the decision, and the confidence. That's a 4-tuple, not a transcript. AgentsBooks emits all four for every audit-flagged task; the regulator gets a queryable structure, not a wall of text.

The technical pattern comes straight from Anthropic's research on long-running agent harnesses: persistent state + structured episodic memory + explicit reasoning traces. The compliance application is a natural fit.

What this costs

A defensible posture across all four regimes — meaning: a SOC 2 Type II report, ISO 42001 certification within 18 months, EU AI Act conformance for high-risk uses, NIST AI RMF alignment — runs $80–200K in audit + tooling for a 50-person firm. That's expensive only relative to a tech startup; relative to a regulated practice (where compliance is 10–15% of opex anyway), it's a reorg of existing spend.

The savings come from the substrate. Most of the artefacts auditors ask for (decision logs, role assignments, eval results, change-management records) are already produced by the 8 primitives as a side-effect of operating. You're not building an audit pipeline — you're exposing the one the substrate already maintains.

Counter-narratives we take seriously

The most common pushback: "compliance kills velocity."

The honest answer: yes, bolted-on compliance does. A team that built fast and then has to retrofit audit trails on a year-old codebase will lose 6 months. A team that built on a primitives-first substrate from the start gets the trail for free.

The second pushback: "regulators don't know what they're doing on AI yet — wait for the dust to settle."

This was defensible in 2024. It's not defensible in 2026. The EU AI Act timeline is locked. NIST has shipped 1.0 and the GenAI profile. ISO 42001 is in force. SOC 2 auditors have AI-specific test plans. The cost of waiting is now larger than the cost of complying.

Operator checklist (download)

For a copy of this matrix as a printable PDF, plus the 38 ISO 42001 Annex-A controls cross-referenced to specific AgentsBooks features, see the compliance-agent-handbook satellite. Bring it to your next audit kickoff.

Frequently asked questions

Q: Do I need all four regimes from day one?
A: No. Most firms start with SOC 2 (because their first enterprise customer asked). NIST AI RMF alignment usually follows naturally. EU AI Act + ISO 42001 are the bigger lifts and typically come in year 2.

Q: Can AgentsBooks itself produce my SOC 2 report?
A: AgentsBooks's substrate emits the artefacts auditors typically request (per-agent decision logs, change-management records, eval results, role assignments) as a side-effect of operating, which shortens the evidence-collection phase. Your own report still needs your own controls + your own auditor — we make evidence production faster, we don't replace the audit. Check the live trust page for our current attestation status.

Q: What about EU AI Act General-Purpose AI obligations?
A: Those land on the model providers (Anthropic, OpenAI, Google, Meta) under Art. 53. The agentic firm itself is the deployer (Art. 16) — different obligations, smaller scope.

Q: How does this compare to LangChain / AutoGen / OpenAI Assistants for compliance?
A: Those are agent toolkits, not substrates. They don't ship the primitives that produce audit-grade artefacts. You can layer the audit layer on top, but you're back in the bolted-on case above.


Building in a regulated vertical? Talk to AgentsBooks about a compliance-first deployment →

🚀 Ready to build this yourself?

Create the agent described in this article in under 2 minutes — no code required.

Try It Free → Book a Demo
Share this article
𝕏 Share 🔗 LinkedIn
Playbooks

Turn this into a working agent

Browse all playbooks →

Ready to build this agent?

Setup takes less than 2 minutes. No coding required.

Start Building Free →
Image
Copy link
X
LinkedIn
Reddit
Download