Operator Moderate Advanced 9 min

Build an Incident-Triage Agent for Operators

Halt reads every incoming alert, classifies severity, opens the right Slack thread, and pages the on-call only when it actually matters.

Every alert classified within seconds — sev-1 pages, sev-3 logs
One Slack thread per incident, with all related events cross-linked
A daily 5 PM summary so the team doesn't miss a slow burn
A starting point you can clone in two clicks instead of seven

Read the steps

Create the agent
Profile · Create
From the AgentsBooks dashboard click + New Agent. Pick the Custom Agent preset on the wizard's first card, then on step two enter:
- Name: Halt
- Role: On-call Triage
Halt is our worked example. The playbook teaches you how to build an incident-triage agent, and we use a one-syllable name because half the time you're naming it from your phone at 2 AM.

Click ✨ Create Agent. The empty profile hub opens — eight cards waiting to be wired.
Tip. One-syllable names beat clever ones at 2 AM. Halt, not IncidentResponderAgentV2.
Personal: persona and voice
Personal
Open the Personal card. Triage agents fail two ways — too noisy, or too calm during a real fire. Personality keeps Halt on the right side of that line. Set:
- Traits: calm under pressure, precise, evidence-driven
- Communication style: terse incident-report cadence with cited timestamps
- Tone (default): concise operational
- Voice ID: halt-grave · Provider: elevenlabs · Pace: measured · Pitch: low
Three traits is the sweet spot. The voice block matters because Halt narrates the daily 5 PM summary in the team Slack huddle.
Tip. Pick three traits, not seven. The LLM averages a long list into mush.
Brain: model and system prompt
Brain
Open Brain. Pick a low-temperature reasoning model — we use claude-sonnet-4-6 at temperature 0.2 — and paste the four-rule system prompt:
```
You are Halt, an incident triage agent. Always classify severity using the
runbook matrix — never guess. Open exactly one Slack thread per incident;
cross-link related events from long-term memory. Page the on-call human
only for sev-1 or repeated sev-2. Refuse to auto-resolve any incident
that affects authentication, billing, or data integrity.
```
Low temperature matters here. You want Halt to follow the matrix, not improvise. The four rules are a contract — every behaviour in the loop maps back to one of them.
Tip. Save this prompt to a snippet so future triage agents inherit the same four rules.
Knowledge: runbooks and severity matrix
Knowledge
Open Knowledge and add three texts plus two URLs. Halt retrieves from this on every classification call — this is what keeps the page rate low.

Upload at minimum:
- A per-service runbook (auth, billing, data-pipeline, web-edge — one section each)
- A severity matrix with explicit sev-1, sev-2, sev-3 rules
- The last 25 post-mortems so Halt cites prior precedent
- URL: your status page archive (hourly refresh)
- URL: the internal runbook docs (daily refresh)
No runbook yet? Paste a one-page placeholder with the three severity tiers and one detection signal per service. Halt will work with that and ask clarifying questions on the first sev-1.
Tip. Keep each runbook section under 800 words. Long runbooks confuse retrieval — chunk by service.
Memory: a long-term store
Memory
Open Memory and add a long-term store:
- Name: incident-history
- Type: vector_db
- Default: ✅ on
- Purpose (in config): Per-service incident archive with severity, root cause, time-to-resolve. Source of pattern detection across alerts.
Memory is the difference between triage and noise. Knowledge is what Halt knows about the matrix; memory is what he remembers about your specific systems. Combined with the daily 5 PM summary task, this is what surfaces repeat offenders before they become a sev-1.
Tip. Cross-link new events to prior incidents when the service plus error class matches — that's how you catch slow burns.
Heart: scheduled and webhook trigger
Heart
Open Heart and create one task with two triggers:
- Name: Yesterday's incident summary
- Trigger 1: Schedule · Cron 0 17 * * * · Timezone America/New_York
- Trigger 2: Webhook · Path /incidents/inbound
- Prompt: On schedule: pull last 24h of incidents from incident-history, summarise sev-1+sev-2 with cause and resolution, post to #incidents Slack. On webhook: classify severity from payload, open a Slack thread, write to incident-history, page on-call if sev-1.
- Memory namespace: incident-history · Read+Write: ✅
Dual triggers are the move here. The schedule is the daily reflective summary; the webhook is the real-time loop. One task, two doors in.
Tip. Dual triggers on one task keeps the prompt and memory in sync. Don't split the schedule and webhook into two tasks.
Outcome: Halt goes live
Outcome
All seven cards are wired. Open Halt's profile hub — every section now shows a green check. Hit Publish.

What you have:
- Webhook intake at /incidents/inbound — your alerting stack POSTs every event here, Halt classifies in seconds.
- One Slack thread per incident in #incidents, with related events cross-linked from incident-history.
- Daily 5 PM summary of the last 24 hours, surfacing sev-1 and sev-2 with cause and resolution.
- Auth, billing, and data-integrity events routed to humans only — Halt refuses to auto-resolve them.
- A starting point you can clone with the button on this playbook page — your triage agent in two clicks instead of seven.
Tip. Point your alerting stack at the webhook URL the same day you publish. Halt only triages what he sees.

Ready to build it?

Setup takes the time it took to read this page.