This case study describes a generic managed-services platform serving B2B SMB customers. Specific firm, individuals, and customer cohort have been anonymised per AgentsBooks's privacy policy.
The starting state
The firm operated a customer-support team handling tickets from several thousand B2B SMB customers across a managed-services product (the specifics of the product don't matter for this case — what matters is the support shape).
Pre-deployment: 12 support agents handled ~800 tickets/day. Median time-to-first-response: 4 hours during business hours. Median time-to-resolution: 23 hours. Senior-engineer escalations: ~10% of tickets. CSAT: in the high-70s.
The growth target was 3× ticket volume in 18 months without proportional support-team growth. The starting CSAT was acceptable but the team was at saturation; new growth would degrade it.
The 8-primitive shape
The firm deployed three agents:
- Tier-1 resolver (Identity:
support-tier1). Handles the routine ~70% of tickets end-to-end. Categorisation, knowledge-base lookup, draft reply, send. Heart:eventtriggered on inbound ticket. - Tier-2 escalator (Identity:
support-tier2). When tier-1 isn't confident (per the HITL confidence-escalation pattern), routes the ticket — to a specialist human agent for hard cases, to engineering for product bugs, to billing for account issues. Heart:A2Afrom tier-1. - Voice-of-customer synthesizer (Identity:
voc-synth). Weekly digest of ticket themes for the product team. Heart:schedule(Mondays 6am).
The substrate supplied: the Knowledge primitive (the firm's KB + product docs), the Memory primitive (per-customer ticket history), the Friends graph (the routes between agents and humans), the Control primitive (Zendesk + Slack channels).
What changed
After a 60-day shadow-mode period followed by 60 days of human-in-the-loop, then full autonomous tier-1:
| Metric | Pre | Post (6 mo) | Δ |
|---|---|---|---|
| Tickets/day handled | 800 | 2,400 | +3× |
| Headcount | 12 | 11 | small reduction by attrition |
| Median time-to-first-response | 4 h | 4 min | dramatically faster |
| Median time-to-resolution | 23 h | 8 h | -65% |
| Senior-engineer escalations | 10% | 8% | -20% |
| CSAT | high-70s | low-80s | +mid-single-digits |
The firm hit its 3× volume target without proportional team growth. Attrition naturally reduced the team by one; no layoffs. Three support agents re-skilled into the Agent Operator role + the Voice-of-Customer analysis role.
What didn't work the first time
Two corrections:
- Initial tier-1 was too aggressive. It tried to resolve every ticket end-to-end including ones it had no confidence on. CSAT briefly dropped in week 2 of shadow mode. Tuning the confidence threshold and routing low-confidence tickets to humans (Pattern 2 from the HITL spoke) recovered CSAT within a week.
- The Knowledge primitive was stale. The firm's KB hadn't been updated in 6 months. The agent was citing outdated procedures. A 2-week Knowledge-refresh sprint before scaling tier-1 to autonomous caught that.
The economics
- Token spend: ~$2,400/month at steady state. Heavy use of prompt caching (per the prompt-caching spoke) on the stable Knowledge context — cache hit rate stabilised around 82%.
- Saved support headcount (vs the no-AI baseline that would have required ~24 agents at the new volume): substantial.
- Payback period: ~5 months.
What this case is not
It's not "AI replaces support". The team didn't shrink meaningfully — what changed was the team's shape. Three support agents became Agent Operators. The Tier-2 layer is still human-shaped for hard cases.
It's not a SaaS-only pattern. The same shape works for managed services, support-intensive B2B, marketplace customer-success, etc. The differentiator is whether the team has the engineering rigour to invest in eval + Knowledge maintenance.
FAQ
Q: How does this differ from out-of-the-box Intercom Fin / Zendesk AI?
A: Those are good products for the "tier-1 resolver" agent in isolation. The 8-primitive substrate adds: cross-agent A2A coordination, custom Knowledge with confidentiality classes, audit-grade per-decision four-tuples, model-routing flexibility, and a firm-level org chart that includes both humans and agents. The substrate isn't a competitor to Fin/Zendesk; it's a layer above where the firm composes them.
Q: What about CSAT regression risk?
A: Continuous evals against a held-out ticket set (per eval-driven routing) catch model-side regressions before they reach customers. The firm runs the eval weekly and on every model version change.
Q: Can this replicate at smaller scale (a 2-person support team)?
A: Yes — at smaller scale the substrate's value comes more from amplifying the team than from coordination across agents. A 2-person team with one tier-1 agent often processes 5–10× their pre-deployment volume.
Want to see the firm-starter for managed-services support? Start free →