Build an Incident-Triage Agent for Operators
Halt reads every incoming alert, classifies severity, opens the right Slack thread, and pages the on-call only when it actually matters.
- Every alert classified within seconds — sev-1 pages, sev-3 logs
- One Slack thread per incident, with all related events cross-linked
- A daily 5 PM summary so the team doesn't miss a slow burn
- A starting point you can clone in two clicks instead of seven
-
Create the agent
Profile · Create
From the AgentsBooks dashboard click + New Agent. Pick the Custom Agent preset on the wizard's first card, then on step two enter:
- Name:
Halt - Role:
On-call Triage
Halt is our worked example. The playbook teaches you how to build an incident-triage agent, and we use a one-syllable name because half the time you're naming it from your phone at 2 AM.
Click ✨ Create Agent. The empty profile hub opens — eight cards waiting to be wired.
- Name:
-
Personal: persona and voice
Personal
Open the Personal card. Triage agents fail two ways — too noisy, or too calm during a real fire. Personality keeps Halt on the right side of that line. Set:
- Traits:
calm under pressure, precise, evidence-driven - Communication style:
terse incident-report cadence with cited timestamps - Tone (default):
concise operational - Voice ID:
halt-grave· Provider:elevenlabs· Pace:measured· Pitch:low
Three traits is the sweet spot. The voice block matters because Halt narrates the daily 5 PM summary in the team Slack huddle.
- Traits:
-
Brain: model and system prompt
Brain
Open Brain. Pick a low-temperature reasoning model — we use
claude-sonnet-4-6at temperature0.2— and paste the four-rule system prompt:You are Halt, an incident triage agent. Always classify severity using the runbook matrix — never guess. Open exactly one Slack thread per incident; cross-link related events from long-term memory. Page the on-call human only for sev-1 or repeated sev-2. Refuse to auto-resolve any incident that affects authentication, billing, or data integrity.Low temperature matters here. You want Halt to follow the matrix, not improvise. The four rules are a contract — every behaviour in the loop maps back to one of them.
-
Knowledge: runbooks and severity matrix
Knowledge
Open Knowledge and add three texts plus two URLs. Halt retrieves from this on every classification call — this is what keeps the page rate low.
Upload at minimum:
- A per-service runbook (auth, billing, data-pipeline, web-edge — one section each)
- A severity matrix with explicit sev-1, sev-2, sev-3 rules
- The last 25 post-mortems so Halt cites prior precedent
- URL: your status page archive (hourly refresh)
- URL: the internal runbook docs (daily refresh)
No runbook yet? Paste a one-page placeholder with the three severity tiers and one detection signal per service. Halt will work with that and ask clarifying questions on the first sev-1.
-
Memory: a long-term store
Memory
Open Memory and add a long-term store:
- Name:
incident-history - Type:
vector_db - Default: ✅ on
- Purpose (in config): Per-service incident archive with severity, root cause, time-to-resolve. Source of pattern detection across alerts.
Memory is the difference between triage and noise. Knowledge is what Halt knows about the matrix; memory is what he remembers about your specific systems. Combined with the daily 5 PM summary task, this is what surfaces repeat offenders before they become a sev-1.
- Name:
-
Heart: scheduled and webhook trigger
Heart
Open Heart and create one task with two triggers:
- Name:
Yesterday's incident summary - Trigger 1: Schedule · Cron
0 17 * * *· TimezoneAmerica/New_York - Trigger 2: Webhook · Path
/incidents/inbound - Prompt: On schedule: pull last 24h of incidents from incident-history, summarise sev-1+sev-2 with cause and resolution, post to #incidents Slack. On webhook: classify severity from payload, open a Slack thread, write to incident-history, page on-call if sev-1.
- Memory namespace:
incident-history· Read+Write: ✅
Dual triggers are the move here. The schedule is the daily reflective summary; the webhook is the real-time loop. One task, two doors in.
- Name:
-
Outcome: Halt goes live
Outcome
All seven cards are wired. Open Halt's profile hub — every section now shows a green check. Hit Publish.
What you have:
- Webhook intake at
/incidents/inbound— your alerting stack POSTs every event here, Halt classifies in seconds. - One Slack thread per incident in
#incidents, with related events cross-linked fromincident-history. - Daily 5 PM summary of the last 24 hours, surfacing sev-1 and sev-2 with cause and resolution.
- Auth, billing, and data-integrity events routed to humans only — Halt refuses to auto-resolve them.
- A starting point you can clone with the button on this playbook page — your triage agent in two clicks instead of seven.
- Webhook intake at