Skip to content
Operator Moderate Advanced 9 min

Build an Incident-Triage Agent for Operators

Halt reads every incoming alert, classifies severity, opens the right Slack thread, and pages the on-call only when it actually matters.

  • Every alert classified within seconds — sev-1 pages, sev-3 logs
  • One Slack thread per incident, with all related events cross-linked
  • A daily 5 PM summary so the team doesn't miss a slow burn
  • A starting point you can clone in two clicks instead of seven
Read the steps
  1. Create the agent

    Profile · Create
    Wizard step 2 with the Custom Agent preset, name Halt, role On-call Triage, ready to create.

    From the AgentsBooks dashboard click + New Agent. Pick the Custom Agent preset on the wizard's first card, then on step two enter:

    • Name: Halt
    • Role: On-call Triage

    Halt is our worked example. The playbook teaches you how to build an incident-triage agent, and we use a one-syllable name because half the time you're naming it from your phone at 2 AM.

    Click ✨ Create Agent. The empty profile hub opens — eight cards waiting to be wired.

  2. Personal: persona and voice

    Personal
    Personal card with Halt's traits, communication style, tone, and TTS voice configured.

    Open the Personal card. Triage agents fail two ways — too noisy, or too calm during a real fire. Personality keeps Halt on the right side of that line. Set:

    • Traits: calm under pressure, precise, evidence-driven
    • Communication style: terse incident-report cadence with cited timestamps
    • Tone (default): concise operational
    • Voice ID: halt-grave · Provider: elevenlabs · Pace: measured · Pitch: low

    Three traits is the sweet spot. The voice block matters because Halt narrates the daily 5 PM summary in the team Slack huddle.

  3. Brain: model and system prompt

    Brain
    Brain card with claude-sonnet-4-6 selected, temperature 0.2, and the four-rule system prompt visible.

    Open Brain. Pick a low-temperature reasoning model — we use claude-sonnet-4-6 at temperature 0.2 — and paste the four-rule system prompt:

    You are Halt, an incident triage agent. Always classify severity using the
    runbook matrix — never guess. Open exactly one Slack thread per incident;
    cross-link related events from long-term memory. Page the on-call human
    only for sev-1 or repeated sev-2. Refuse to auto-resolve any incident
    that affects authentication, billing, or data integrity.
    

    Low temperature matters here. You want Halt to follow the matrix, not improvise. The four rules are a contract — every behaviour in the loop maps back to one of them.

  4. Knowledge: runbooks and severity matrix

    Knowledge
    Knowledge card with the runbooks, severity matrix, post-mortems, and two URL sources attached.

    Open Knowledge and add three texts plus two URLs. Halt retrieves from this on every classification call — this is what keeps the page rate low.

    Upload at minimum:

    • A per-service runbook (auth, billing, data-pipeline, web-edge — one section each)
    • A severity matrix with explicit sev-1, sev-2, sev-3 rules
    • The last 25 post-mortems so Halt cites prior precedent
    • URL: your status page archive (hourly refresh)
    • URL: the internal runbook docs (daily refresh)

    No runbook yet? Paste a one-page placeholder with the three severity tiers and one detection signal per service. Halt will work with that and ask clarifying questions on the first sev-1.

  5. Memory: a long-term store

    Memory
    Memory card with the incident-history vector store added and marked as default.

    Open Memory and add a long-term store:

    • Name: incident-history
    • Type: vector_db
    • Default: ✅ on
    • Purpose (in config): Per-service incident archive with severity, root cause, time-to-resolve. Source of pattern detection across alerts.

    Memory is the difference between triage and noise. Knowledge is what Halt knows about the matrix; memory is what he remembers about your specific systems. Combined with the daily 5 PM summary task, this is what surfaces repeat offenders before they become a sev-1.

  6. Heart: scheduled and webhook trigger

    Heart
    Heart card showing the Yesterday's incident summary task with both schedule and webhook triggers configured.

    Open Heart and create one task with two triggers:

    • Name: Yesterday's incident summary
    • Trigger 1: Schedule · Cron 0 17 * * * · Timezone America/New_York
    • Trigger 2: Webhook · Path /incidents/inbound
    • Prompt: On schedule: pull last 24h of incidents from incident-history, summarise sev-1+sev-2 with cause and resolution, post to #incidents Slack. On webhook: classify severity from payload, open a Slack thread, write to incident-history, page on-call if sev-1.
    • Memory namespace: incident-history · Read+Write:

    Dual triggers are the move here. The schedule is the daily reflective summary; the webhook is the real-time loop. One task, two doors in.

  7. Outcome: Halt goes live

    Outcome
    Halt's profile hub with all seven cards configured, ready to publish.

    All seven cards are wired. Open Halt's profile hub — every section now shows a green check. Hit Publish.

    What you have:

    • Webhook intake at /incidents/inbound — your alerting stack POSTs every event here, Halt classifies in seconds.
    • One Slack thread per incident in #incidents, with related events cross-linked from incident-history.
    • Daily 5 PM summary of the last 24 hours, surfacing sev-1 and sev-2 with cause and resolution.
    • Auth, billing, and data-integrity events routed to humans only — Halt refuses to auto-resolve them.
    • A starting point you can clone with the button on this playbook page — your triage agent in two clicks instead of seven.

Ready to build it?

Setup takes the time it took to read this page.

Sign in to clone →
Image
Copy link
X
LinkedIn
Reddit
Download