Skip to content
Strategy hitl compliance spoke

Human-in-the-Loop Patterns for Agentic Firms

"Human-in-the-loop" is a phrase that's often used as cover for "we don't trust the agent yet". Used properly, it's a deliberate design choice with four distinct patterns. This essay walks through each.

Why HITL even exists

Regulators (NIST AI RMF MANAGE-2.4, EU AI Act Art. 14) require human oversight for high-risk AI systems. Customers (especially in regulated B2B) require it as a trust signal. And operators require it during shadow-mode rollout (per the Heart spoke).

Not every workflow needs HITL. Adding it where it's not needed adds latency + cost + a human bottleneck. The art is putting it precisely where it matters.

The four patterns

Pattern 1 — Approval gate

The agent prepares the decision; a human reviews and approves before action.

Right for: high-stakes irreversible actions (contract sends, fund transfers, regulator filings).

Cost: high latency (typically 1–4 hours during business hours). Use sparingly.

Substrate support: Heart's requires_approval flag + the operator-side approvals queue.

Pattern 2 — Confidence escalation

The agent acts autonomously when confidence is above a threshold; escalates to a human when below.

Right for: variable-quality work where most cases are clear-cut but some are not.

Cost: lower than Pattern 1 — only ~10–20% of cases escalate.

Substrate support: confidence threshold per task type; escalation routes to a designated reviewer agent or human.

Pattern 3 — Sample audit

The agent acts autonomously on all cases; a random sample (5–20%) goes to a human reviewer for quality audit.

Right for: high-volume work where individual case stakes are low but aggregate quality matters.

Cost: low marginal latency. The sample is what produces the eval data that justifies the autonomous bulk.

Substrate support: sample-rate config per task type; reviewer dashboard shows the sample with full four-tuple context.

Pattern 4 — Override channel

The agent acts autonomously; a human can intervene to override at any time.

Right for: ambient long-running agents (monitoring, drafting, scheduling) where the human catches the agent mid-task.

Cost: near-zero. Intervention is opportunistic.

Substrate support: every Heart task has an interrupt() operation that can be triggered from the operator UI; the agent's next heartbeat respects the interrupt.

Picking the pattern

Case stakes Volume Pattern
High, irreversible Low 1 — Approval gate
Medium, mostly clear-cut Medium-high 2 — Confidence escalation
Low, but quality matters in aggregate High 3 — Sample audit
Ambient, drifty Low-medium 4 — Override channel

Many production workflows use combinations: a confidence escalation as the default + a sample audit on the autonomous bulk + an override channel always on.

What HITL is NOT

It's not "send the agent's output to a human for a thumbs-up before acting on it". That's Pattern 1, and it's the most expensive option. Most workflows don't need it.

It's not a substitute for evaluation. HITL catches what evals miss; it doesn't replace them. A workflow with strong HITL + no evals will degrade slowly without anyone noticing.

It's not permanent. The shadow-mode → HITL → autonomous progression is the path. Most cases sit at HITL for 3–6 months while eval data accumulates, then move to autonomous.

FAQ

Q: How do you choose the confidence threshold for Pattern 2?
A: Empirically. Start at 0.7. Measure agreement-with-human on the cases that would have escalated vs the ones that didn't. Adjust until the threshold is where human review catches enough quality issues to justify its cost.

Q: Doesn't Pattern 3 mean some bad decisions slip through?
A: Yes. That's why it's only right when individual case stakes are low. The aggregate quality control comes from acting on the audit findings.

Q: How does this relate to the compliance pillar?
A: Pillar P4 covers what regulators require. This spoke covers what patterns work in practice. Both inform the deployment posture.


Want HITL working in your workflow? Start free →

🚀 Ready to build this yourself?

Create the agent described in this article in under 2 minutes — no code required.

Try It Free → Book a Demo
Share this article
𝕏 Share 🔗 LinkedIn
Playbooks

Turn this into a working agent

Browse all playbooks →

Ready to build this agent?

Setup takes less than 2 minutes. No coding required.

Start Building Free →
Image
Copy link
X
LinkedIn
Reddit
Download