Mike is an experienced Product Manager who focuses on all the “non-development” areas of My AskAI, from finance and customer success to product design, copywriting, testing and more.
"85% containment," "85% deflection," and "85% resolution" are three different numbers measuring three different things. Most vendors sell them as if they're the same one.
Three vendors will pitch you "85% containment," "85% deflection," and "85% resolution." All three look identical on the slide I've been handed in a pitch. None of them measure the same thing.
A contained ticket can be unresolved. A deflected ticket can be a customer who gave up. A resolved ticket can be one a vendor defines into existence (more on that below).
The labels get used interchangeably across vendor decks, analyst reports, and procurement spreadsheets. They shouldn't be.
This post is the decoder I wish CX leaders had before that meeting. Three terms, three formulas, three different events on the customer journey, and the one number I'd actually pin a weekly review to.
I'm Mike, co-founder of My AskAI. We help 200+ teams run AI customer service inside Zendesk, Intercom, HubSpot, Gorgias, and Freshdesk. We sit through procurement conversations every week where vendors quote 85% on metrics that aren't even comparing the same event.
The dissent on this is loud and getting louder. Nikola Mrksic (PolyAI's co-founder) wrote on LinkedIn recently: "We deflected 70% of all contact. That number is meaningless without another question: what happened next?" He's right.
Why "85% containment", "85% deflection", and "85% resolution" can all mean nothing
⚡
TL;DR: Containment, deflection, and resolution measure three different events. A vendor can report 85% on each and be describing three different things, only one of which is "the customer's problem got solved".
Pick a vendor at random. Whichever metric they lead with is whichever one flatters their product.
I've sat in enough procurement calls to spot the pattern: the terms get sold as synonyms because synonymy makes the headline numbers look comparable across pitches. They aren't comparable.
Take a concrete example I run into a lot. A vendor reports 85% containment. The same conversations get reviewed by the support manager, and 30% of them end with the customer abandoning the chat after asking for a human three times.
The vendor's containment number includes those abandonments. Reality: 85% containment, somewhere around 55% resolution, roughly 0% deflection (the underlying contact still happened).
Same conversations. Three different headlines.
Three stat columns each showing 85% with different labels: containment (in-channel only), deflection (ticket prevented), and resolution (problem solved) — the same headline number measuring three different events.
The vendors themselves quietly acknowledge this, which I always find telling. Decagon's own containment glossary admits it: "a high containment rate isn't automatically good." Intercom Fin's deflection glossary is even more direct: "A high deflection rate could mean excellent automation, or it could mask customer dissatisfaction and eroded trust."
Mavenoid calls out the underlying incentive too: basic chatbots are "incentivized to show off a higher deflection rate" whether or not anything actually gets solved.
None of this is fringe. Calabrio's own containment-rate piece cites Salesforce data showing the share of companies tracking containment climbed from 36% in 2018 to 67% in 2022. More tracking, same problem.
The fix isn't to throw out the metrics; I'm not anti-metric. The three names describe three real events, and the fix is to know which event each one measures and to stop pretending they're the same number.
The Three-Event Decoder: containment, deflection, resolution
⚡
TL;DR: Deflection fires before a ticket opens, containment fires while it stays in-channel, and resolution fires when the problem actually ends. Each has its own formula and its own blind spot.
The three terms aren't synonyms, and I think that's where most of the confusion starts. They're sequential events on the customer journey:
Deflection fires before the customer ever opens a ticket.
Containment fires after a ticket opens but stays inside the AI or IVR channel.
Resolution fires when (and only when) the customer's problem actually ends.
Each vendor picks the event that flatters their own product to lead with. I'll walk through the decoder one event at a time.
A three-step flow showing the sequence deflection (before a ticket opens), containment (while it stays in the AI channel), and resolution (when the problem actually ends).
Containment rate: "did the AI keep the conversation in-channel?"
Containment measures what share of conversations that started in the AI or IVR channel ended in that channel without being transferred to a human. Decagon's formula is the cleanest public version: "(Number of interactions completed entirely within the chatbot ÷ Total number of chatbot interactions) × 100." Their illustrative example: 800 contained out of 1,000 conversations is an 80% containment rate.
The metric is inherited from IVR (fun fact: containment is older than chatbots by a couple of decades). Calabrio traces the genealogy directly: containment is "often referred to as 'deflection' or 'case deflection,' a term that stems from the contact center."
The contact-center world used containment for decades to mean "the IVR kept the call away from an agent", and we've inherited their metric almost wholesale. Chatbot vendors carried it over almost unchanged, which is why containment and deflection get used interchangeably in older industry writing.
What containment misses is the thing I actually care about: whether the customer's problem got solved. A frustrated customer who gave up is contained, and so is a customer who got the right answer.
Same headline number, two completely different realities.
Calabrio's own piece is honest about this. They name four failure modes that look like containment success: customer abandonment without resolution, customers escalating to agents after "completing" tasks, bot failures to interpret the customer correctly, and false positives leading to irrelevant responses. (We'd add a fifth: the customer who got part of the answer and is now waiting on an email reply, which the AI counts as contained because nothing got transferred.)
Honest benchmark range: 50-70% containment for general-purpose chatbots; 75-90% for narrow single-channel bots (order status, password reset). Calabrio reports that companies "typically aim for in the range of 70, 80, or even 90%", but the same piece is clear that "aim for" isn't "achieve in a way that's worth bragging about".
The trap here is well-named by Kashkar's "Rethinking Containment Metrics in Call Centers": chasing containment as a primary metric pushes teams to suppress handoff, exactly the wrong behavior when handoff is the right answer for the ticket.
Deflection rate: "did we prevent the ticket at all?"
Deflection measures what share of would-have-been-tickets never opened because the customer found their answer earlier in the funnel: self-service, the help center, a chatbot before contact. Mavenoid's definition is the closest I've seen to the operator-honest version: "the percentage of support requests addressed by self-service or self-help tools that agents would otherwise service."
The structural problem with deflection is the denominator: it's unobservable. You can't directly count the tickets that didn't happen, so vendors approximate the baseline in three different ways.
Some use a historical-volume baseline (last quarter we had X tickets; this quarter 0.7X, so we deflected 30%). Some use widget-engagement attribution (the customer viewed the suggested article and didn't open a ticket within 24 hours). Some use a post-chat survey ("did this answer prevent you from contacting support?" tick-box).
None of these are comparable across vendors.
What deflection doesn't see is whether the customer's problem got solved or whether they gave up. That's Mrksic's point ("that number is meaningless without another question: what happened next?"), and it's the blind spot the metric has carried since it was invented for IVRs.
Mavenoid has the killer stat on this. They cite research showing "30% of customers start their support journeys in self-service, but only 25% of cases get resolved by self-service."
The deflection-resolution gap in a single number: about a sixth of the people who started in the self-service funnel got dropped somewhere in the middle, and a deflection-only metric counts them as wins.
Honest benchmark range: wildly varies because the denominator does. For transactional ecommerce running help-center or chatbot funnel-top deflection, 20-40% is realistic. For higher-judgment categories (medical, legal, fintech), much lower.
The trap with deflection is one our industry has been openly arguing about for a year. Mrksic's post and the Reddit r/SaaS founder thread on a 39.5% deflection rate in week one both make the same argument from different directions: deflection is the easiest of the three numbers to game and the least connected to whether the customer was actually helped.
Resolution rate: "did the customer's problem actually end?"
Resolution measures what share of conversations the AI handled where the issue stayed solved. Of the three metrics, it sits closest to what the customer actually cares about, and its definition is the one vendors fight over hardest.
Six vendors, six different definitions. We pull these together in our pricing-models comparison post, and they're worth restating because the definitional spread is the whole game:
Fin (Intercom): "Fin resolves the issue end to end, or successfully executes a Procedure you've configured to end in a handoff to a human or a workflow." Procedure-handoffs were added to the billable surface in late 2025 when Fin renamed "resolutions" to "outcomes".
Sierra: counts "a resolved support conversation, a saved cancellation, an upsell, a cross-sell", which are outcomes the customer agrees to during the conversation.
Decagon: an internal "AI resolved" tag (criteria non-public, which makes it hard to audit from the outside).
Zendesk Autoresolve: a 72-hour quiet period, so the charge fires only when the ticket stays closed for 72 hours.
What resolution doesn't see is customer satisfaction with the path taken. A fast resolution that felt frustrating is still a resolution. CSAT pairs with resolution rate for exactly that reason: the two together catch what either misses alone.
Honest benchmark range, tier-banded from rollouts we've watched:
25-40%: early rollout, knowledge base not yet optimized. (Edel Optics started at 25-30%; TravelJoy at 24% with Zendesk's own AI; RecruitCRM at around 35%.)
40-65%: knowledge base optimized, common questions covered.
65-80%: APIs connected, custom answers in place, guidance tuned.
80%+: mature deployment, weekly review of where the AI struggled. (TravelJoy at 80%, Edel Optics at 75-79%.)
For comparison, Mavenoid claims "resolution rates of 58% and up" for their platform, a tier-3 figure that's defensible for the high-judgment industries they target.
The trap with resolution is hiding handoff. If the AI never tells the customer it's AI and never offers a clear escalation, "resolution" quietly becomes "the customer gave up trying to reach a person". I've watched that happen on vendors that lean heavily on human-styled avatars without disclosure.
If I had to pin a weekly review to just one of the three, it would be resolution. It's the only one of the three that implies an issue actually got solved, and that's the whole point of running support in the first place.
Whether the customer's problem ended, or whether they gave up
Yes (denominator is the lever)
Containment
In-channel
In-channel resolutions or abandonments
All in-channel starts
Whether the customer was satisfied or gave up
Less so (channel-defined)
Resolution
End-to-end
Conversations that didn't return
Conversations the AI handled
CSAT, time-to-resolution, handoff quality
Yes (numerator is the lever)
What the three numbers actually look like on real teams
⚡
TL;DR: RecruitCRM (35% to 68% resolution), Sofar Sounds (26% resolution but 85% CSAT), and TravelJoy (24% to 80%) show the same three metrics producing very different headline numbers on real rollouts.
Here are three rollouts I've watched closely. Three different headline numbers, and the same three metrics sitting underneath.
A horizontal resolution-rate axis from lower to higher, with three My AskAI customers plotted: Sofar Sounds at 26%, RecruitCRM at 68%, and TravelJoy at 80% (highlighted).
RecruitCRM: the resolution-rate climb
RecruitCRM is a SaaS platform for recruitment agencies running customer support on Intercom. They climbed from around 35% AI resolution at go-live to 68% over the first year, while AI CSAT settled at 75% and the AI saved roughly 62 hours of agent time every month.
The decoder reading on RecruitCRM: 35% → 68% is resolution rate, the end-to-end number. Containment over the same period ran higher than resolution (some conversations stayed in-channel without being fully resolved, like when a customer self-served the suggested article and didn't reopen).
Deflection is effectively zero here, because they run inside Intercom rather than a help-center funnel that prevents tickets. Three different "85%-equivalents" from one rollout.
The numbers also show something I always point to: the customer does the work that moves the number. RecruitCRM connected their Intercom help center, added their website via sync, uploaded context files, added live user data via the API, and ran a disciplined weekly review fixing the questions the AI couldn't answer.
That's the work, and it's what makes the resolution-rate number worth tracking. It climbs when the team puts that effort in, and a new model release from the vendor barely moves it on its own.
Sofar Sounds: the inverse-pattern proof
Sofar Sounds runs around 750 monthly Zendesk tickets through My AskAI. Their headline numbers are 26% AI resolution, 85% AI CSAT, and around 16 hours saved every month. The AI deliberately resolves about a quarter of the inbox and triages the rest to humans with full AI-prepared context.
The decoder reading on Sofar Sounds: this team reports CSAT as the primary metric because their triage-first setup intentionally suppresses resolution. Containment is intentionally low, and deflection isn't something we track for them at all.
The "one number to report on" varies by what the team is optimizing for. Sofar Sounds picked the honest number for their use case rather than relabelling something else.
This is the legitimate alternative to my resolution-first verdict. For triage-first teams where the goal is "fastest, best, right answer" (the framing from our AI-to-human handoff glossary), CSAT carries more signal than resolution rate would.
Resolution doesn't automatically win everywhere. The team has to pick the number that matches the outcome they're actually trying to produce.
TravelJoy: when the same number gets two names
TravelJoy is a SaaS platform for travel advisors running support on Zendesk, processing 2,500 to 2,700 tickets per month. They climbed from a 24% AI resolution rate (under Zendesk's own AI) to 80% with My AskAI, with 86% AI CSAT in the most recent 30-day window and 193 hours saved per month.
Alan Pugh, their Head of Customer Service, described the result this way:
"Our experience with My AskAI has been nothing short of transformative. In comparison to Zendesk's AI agent, we're now achieving an impressive 76% AI resolution rate, versus just 24% before." Alan Pugh, Head of Customer Service at TravelJoy. Full case study: TravelJoy 80% AI resolution, saving 193 hrs/month.
That's a genuinely big lift (and I'd take that result any day), and the measured metric behind it is AI resolution rate under each vendor's own definition. What strikes me is how naturally "deflection" and "resolution" get used as shorthand for the same figure across the industry: in everyday CX conversation, a 76% number gets called "deflection" one minute and "resolution rate" the next, even when the thing being measured is a single, specific metric.
That's exactly why I think the decoder matters. The vocabulary is loose everywhere because the three terms have been treated as synonyms for so long that the habit is baked into how the whole industry talks. When the number drives a procurement decision or a board report, the precise definition is worth pinning down.
What to do this week
⚡
TL;DR: Get each vendor's definitions in writing, compute all three numbers from your own data, report on AI resolution rate paired with CSAT, and sample 30 conversations a week to check the headline is real.
I'd run four actions this week, about three hours total.
Get every vendor's resolution, deflection, and containment definition in writing. Roughly 30 minutes per vendor. The contract addendum or pricing FAQ is where the real definitions live; the marketing page tends to gloss them. If the resolution definition has more than two clauses ("the AI resolves the issue end-to-end, OR successfully executes a Procedure handoff, OR the conversation stays closed for 72 hours, OR..."), flag it for the lock-in test in our pricing-models post. Multiple clauses are how vendors expand their billable surface over time without renegotiating the headline rate.
Compute all three numbers from your last 30 days. Roughly 60-90 minutes. Pull conversation counts from your helpdesk. Calculate (a) percentage AI-resolved using your own internal definition (what you would call resolved), (b) percentage contained (didn't transfer), and (c) percentage deflected if you can defensibly estimate the unprevented baseline. The gap between (a) and (b) is the most diagnostic number in the report: it's the conversations that stayed in-channel without being solved.
Pick the single number you'll report on, and pair it with CSAT. Roughly 15 minutes. For most teams the pair that matters is AI resolution rate plus CSAT. I'd put resolution first because it's the one of the three that implies an issue actually got solved, and CSAT catches the fast-but-frustrating case the resolution number alone would miss. Sofar Sounds reporting CSAT-primary is the legitimate alternative for triage-first teams.
Add a "what actually happened to those tickets" review. Roughly 30 minutes per week, ongoing. Sample 30 conversations the AI reported as resolved, contained, or deflected (I'd pick them at random). Read the last message. How many ended with the customer satisfied? How many ended with the customer dropping off mid-conversation? That weekly sample is the only honest check on whether the headline number is real or vendor-inflated. (It's also what we've watched drive the resolution-rate climbs at RecruitCRM, TravelJoy, and Edel Optics: the team reviewing the AI's failures every week is the lever.)
When this decoder doesn't apply (and the one case where it matters most)
⚡
TL;DR: For internal reviews the label matters less than using one metric consistently. For procurement, where vendors quote "85%" on differently-defined metrics, the decoder is the whole ballgame.
For internal optimization work, the distinction matters less than for procurement. A team that has settled on one metric and uses it consistently is fine (Sofar Sounds reporting CSAT, TravelJoy reporting AI resolution rate); the trend matters more than the absolute label.
Where it matters the most, and where it's most often gotten wrong, is procurement. When you have two vendors quoting "85%" on metrics with different names and different definitions, the decoder is the difference between picking the right vendor and overpaying for a number that doesn't measure what you think it measures.
There's a legitimate dissenting view here. Forethought's piece on deflection rate argues deflection is still the right primary metric for cost-focused teams.
The steelman: at scale, deflection-as-cost-proxy is defensible if (a) the team also tracks customer-side outcomes downstream and (b) the deflection definition is genuinely "contact prevented", not "contact suppressed by a bot that wouldn't escalate". Both conditions are doable, but most teams don't actually meet them.
There's also a structural carve-out for high-judgment industries. Mavenoid (who target tech support specifically) acknowledge resolution rates of "58% and up" as a ceiling rather than a floor, because in high-judgment categories like medical, legal, and fintech the right answer for many tickets is escalation rather than resolution. In those industries I'd treat a high containment number as a warning sign rather than a goal.
The takeaway
⚡
TL;DR: Three metrics, three journey events, three different "85%"s. I'd report on AI resolution rate paired with CSAT, and pin every vendor's definition in writing before signing.
Three metrics. Three different events on the customer journey. Three different "85%"s that vendors will pitch as if they're the same.
The framework: deflection fires pre-ticket, containment fires in-channel, resolution fires end-to-end. The formulas are different, the denominators are different, and the vendor-defined surfaces are different (I can't stress that last one enough).
My verdict, for what it's worth, is to pin AI resolution rate plus CSAT as the pair to report on, get the vendor's resolution definition in writing before signing anything, and sample 30 conversations a week to check what the headline number is actually doing.
If you want the pricing-model consequence of all this, our piece on per-conversation vs per-resolution AI pricing is the next read. Once you know which metric you're measuring, the question becomes which pricing model fits the curve that metric is on.
FAQs
What is the difference between deflection rate and resolution rate?
Deflection rate measures what share of would-have-been-tickets never reached your support team, because the customer found their answer in self-service, the help center, or a chatbot before contact opened. Resolution rate measures what share of conversations the AI actually handled where the customer's problem ended.
We tend to think of deflection as a funnel metric (did the contact open at all?) and resolution as an outcome metric (was the customer's problem solved?). Intercom Fin's own glossary on this is honest: "A high deflection rate could mean excellent automation, or it could mask customer dissatisfaction and eroded trust." The two numbers can move in opposite directions on the same rollout.
What is the difference between containment rate and deflection rate?
Containment fires after a ticket or conversation opens: it measures whether the AI or IVR kept the conversation in-channel without transferring to a human. Deflection fires before a ticket opens: it measures whether the contact was prevented in the first place.
In practice the two terms get used interchangeably because the IVR world used "containment" and the chatbot world inherited it as "deflection". (Calabrio's piece traces the genealogy explicitly.) Both describe absence-of-human, neither describes whether the customer's problem actually got solved.
What is containment rate in AI customer service?
Containment rate is the share of conversations that started in the AI channel and ended in that channel without being transferred to a human agent. The metric was inherited from contact-center IVR, where it measured the share of calls the IVR handled without routing to a live agent, and chatbot vendors carried it over almost unchanged.
What containment doesn't see is whether the customer was satisfied with how the conversation ended. (A frustrated customer who gave up reads as "contained" the same way a customer who got the right answer does.) For that reason most operators we work with treat containment as a directional signal rather than a goal, and pair it with CSAT.
How do you calculate containment rate?
The standard formula, from Decagon's glossary, is "(Number of interactions completed entirely within the chatbot ÷ Total number of chatbot interactions) × 100." The worked example: if a chatbot handled 800 out of 1,000 conversations without passing them to a live agent, containment is 80%.
The catch is in the numerator: what counts as "completed entirely within the chatbot" varies by vendor. Some count any conversation that didn't transfer; some require the customer to confirm the issue is solved; some count abandonments as contained because the customer didn't explicitly request a human.
We'd ask the vendor for the precise definition in writing, the same way we would for resolution rate.
How do you calculate call deflection rate?
The numerator is contacts prevented: the tickets, calls, or chats that did not occur because the customer found their answer earlier. The denominator is the unprevented baseline, which is the part nobody can directly observe.
Three vendor approximations: (a) a historical-volume baseline (last quarter we had X tickets; this quarter 0.7X; the delta is deflection); (b) widget-engagement attribution (the customer viewed a help-center article and didn't open a ticket within 24 hours, so we count it as deflected); (c) a post-chat survey ticking "did this answer prevent you from contacting support?". None of these are comparable across vendors. We tend to ask which method the vendor uses before believing the headline figure.
Is deflection rate a vanity metric?
Sometimes, yes. Whenever the deflection number is reported without a downstream check on what happened to the deflected customers, it tilts toward vanity. The metric goes up when contact is suppressed for any reason, including the customer giving up.
A LinkedIn post by sdsc puts it bluntly: "Hey CEOs, deflection rate is the worst metric to celebrate in customer support." That overstates the case (deflection can be a legitimate cost-proxy at scale), but the underlying point is the right one: a deflection number without an outcome number is a vanity metric. Pair it with a resolution or CSAT check on the deflected cohort and the metric becomes useful again.
Which is the most important AI customer service metric?
Resolution rate, paired with CSAT, is the pair I'd report on for most CX teams. Resolution is the one of the three that implies an issue actually got solved, which is why I'd put it first, and CSAT catches the fast-but-frustrating case the resolution number alone would miss.
The legitimate exception is triage-first teams where the AI is deliberately escalating most of the inbox to humans (Sofar Sounds is the cleanest example we know, at 26% AI resolution but 85% AI CSAT). For that pattern, CSAT is the primary metric and resolution is the supporting one. The trick is to pick the number that matches the outcome the team is actually trying to produce, and to be honest about it.
Why do AI vendors report different "resolution rates" for the same product?
Because the numerator of resolution rate is vendor-defined, and every vendor defines it differently. Six examples: Fin counts "issue resolved end-to-end OR successful Procedure handoff"; Sierra counts "resolved conversation, saved cancellation, upsell, or cross-sell"; Decagon uses a non-public internal tag; Zendesk Autoresolve fires the resolution only after a 72-hour quiet period; Gorgias Automate counts when the customer didn't reply; My AskAI counts an AI reply followed by either explicit confirmation or no reopen.
We cover this definitional spread in more detail in our per-conversation vs per-resolution pricing piece, because it's also the lock-in surface that contracts tend to be vague on. The takeaway for any procurement conversation: ask for the definition in the contract addendum, in writing, before the rate becomes the headline.
Mike is an experienced Product Manager who focuses on all the “non-development” areas of My AskAI, from finance and customer success to product design, copywriting, testing and more.