Conversational AI for support that resolves, then hands off cleanly.
A support bot that closes a ticket without solving the problem only hides demand, and the customer comes back. We build support agents grounded in your knowledge base, measured on resolution rather than deflection, that escalate to a person with full context the moment they should, inside the EU.
Conversational AI for support is a system that understands a customer's question, answers from your knowledge, and either resolves the request or hands it to a person. The lesson of 2026 is that resolution, not deflection, is the metric that matters: a bot that closes a ticket without solving the problem hides demand, and the customer comes back within days. Argus Root builds support agents grounded in your own content and honest about what they do not know, on open-weight models hosted inside the EU.
In short
- Resolution is not deflection: a bot can post a 90% deflection rate while truly resolving only about 40% — the tell is re-contact rate, around 11.3% on AI-resolved tickets versus 8.7% human.
- The cost case is real: an AI-handled resolution runs about $0.62 against roughly $7.40 for a human, and a hybrid AI-plus-human model cuts cost per resolution 71% at a CSAT cost of just 0.05 points.
- The satisfaction gap has effectively closed under hybrid handling: pure-AI lands near 4.1/5 CSAT against 4.3/5 for humans, but only when the AI resolves rather than stalls.
- Escalation is a feature, not a failure: CSAT drops about 22 points when a handoff is needed, so the handoff has to be fast and carry full context — no making the customer repeat themselves.
- Grounding decides the ceiling: a RAG-grounded agent reaches 40–70% containment on tier-one contacts, with routine intents at 65–72% and nuanced complaints under 38%, so scope it narrowly.
Deflection hides demand. Resolution removes it.
Tier-1 deflection across enterprise programs sits around 41%, and the best reach the high 50s, but containment is not the same as solving the problem: re-contact runs higher on poorly handled AI tickets than on human ones. For routine questions the satisfaction gap has effectively closed, with AI around 4.1 out of 5 against a human 4.3. The gap that remains is about escalation. Sentiment-heavy contacts like complaints and billing disputes score far lower, so the design rule is simple: there is never a dead end, only a route to a person.
| Intent | Typical AI CSAT | What we do |
|---|---|---|
| Password & account reset | 4.41 / 5 | Resolve autonomously |
| Order & refund status | 4.32 / 5 | Resolve autonomously |
| Basic troubleshooting | Strong | Resolve, escalate the edge cases |
| Billing dispute | 3.61 / 5 | Assist, then escalate with context |
| Complaint, sentiment-heavy | 3.34 / 5 | Escalate fast to a person |
The economics explain why this matters so much. An AI resolution costs around $0.62 against roughly $7.40 for a human-handled ticket, and a blended model of AI plus fast human escalation cuts cost per resolution by some 71% at a satisfaction cost of just five hundredths of a point. The prize is real, which is exactly why the temptation to claim it dishonestly, by counting contained contacts as solved, is so strong, and why the line between deflection and resolution is the one that decides whether the saving is genuine or borrowed from next month.
Containment is the most misleading number of all: the customer did not reach a human, so the system logs a success, whether or not the problem was solved. The data is blunt about the gap, with AI containing 45% or more of contacts but only around 14% genuinely resolved end to end without a person. A bot optimised for containment becomes a barrier customers fight to get past, which is why the only honest design has no dead end, only resolution or a clean escalation. Gartner expects agentic AI to resolve 80% of common service issues by 2029; reaching that is about real resolution rather than a higher containment score.
Grounding is the difference between help and harm.
An ungrounded support bot invents answers 15 to 27% of the time, which in support means confidently telling a customer something untrue. Grounding the model in your own help content with retrieval brings that down to roughly 1%, and lets every answer cite the article it came from. That retrieval layer is the same RAG work we build elsewhere, pointed at your knowledge base.
Escalation carries the full thread when a person takes over: the history, the customer's sentiment and a suggested next step, so nobody has to repeat themselves. The agent escalates below a confidence threshold rather than guessing, the interaction is labelled as AI with a one-click route to a human, and because support conversations are personal data, the whole thing runs inside the EU. The agent itself is built on our agents engine.
Understanding the question is the other half of not causing harm. A generative model reads customer intent correctly around 92% of the time against 65 to 70% for the keyword bots that gave automated support its bad name, which is why it can handle a question phrased in a way no script anticipated. But accuracy is task-dependent rather than uniform: the same system that resolves a password reset almost perfectly is far weaker on an emotionally charged complaint, so the design has to know which kind of contact it is holding and route accordingly rather than treat every question as equal.
How do we build them?
For resolution and a clean handoff, measured on the numbers that tell the truth.
Knowledge-base grounding
Retrieval over your help articles, policies and past tickets, so the agent answers from your content with a citation instead of improvising.
Resolution-focused agent
Built to complete routine requests end to end on our agents engine, connected to your help desk and order system, not a scripted menu.
Context-carrying escalation
Below a confidence threshold or on a sentiment-heavy contact, it hands to a person with the full thread and a suggested next step.
Transparent handoff
The customer always knows they are talking to AI and can reach a human in one click, because trust still favours people for the hard cases.
The right metrics
Resolution rate, re-contact within 48 hours and the CSAT gap to human handling, rather than a deflection number that flatters a dashboard.
EU-resident & multilingual
Conversations stay in the EU as the personal data they are, and the agent replies in the customer's language. See compliance →
We build it on what we run.
A support agent is our agents work and our retrieval work pointed at one job, on open-weight models we host inside the EU. We are honest about the limit: AI reads what a customer says well and reads how they feel poorly, so anything emotional or high-stakes goes to a person quickly. The goal is a system that resolves the routine majority and routes the rest with grace, measured on whether the problem was solved.
# honest by default: say it is AI, and always let them out identity: disclose_ai: true offer_human: always grounding: source: your_help_docs refuse_if_unsupported: true # no source → escalate, never guess escalate_when: - user_requests_human - confidence < 0.7 - sentiment: frustrated - topic in [billing_dispute, cancellation, legal] handoff: pass_context: full_transcript # the human starts informed measure: resolution_rate # not deflection
A scripted chatbot is not a support agent.
The old chatbot followed a fixed decision tree: it matched a keyword, offered a menu, and dead-ended when the customer said anything it had not been scripted for. That is the experience that taught a generation of customers to type agent immediately, and it deflected by exhausting people rather than helping them. A modern support agent is a different thing: it reasons about the request, retrieves the relevant policy, and takes the action, a refund, an account update, an order change, by connecting to your help desk and order systems rather than reading out a script.
That shift, from matching keywords to resolving requests, is what moved automated support from a cost-cutting annoyance to something that genuinely resolves a large share of contacts. The agentic systems now reach autonomous resolution on the majority of routine requests precisely because they can do the task rather than describe it. We build the resolving kind, connected to the systems where the work really happens, because a support agent that can answer but not act is a more articulate version of the chatbot customers already learned to skip past.
Which support metric quietly lies?
Containment is the number vendors love and customers hate. It counts an interaction as a success whenever the customer did not escalate to a human, treating the absence of escalation as proof the problem was solved, which it often is not. A customer who gave up, accepted a link that did not help, or simply went away counts identically to one whose issue was genuinely resolved, and a dashboard optimised for containment looks magnificent while satisfaction quietly erodes underneath it.
The tell is in the re-contact data: across deployments, AI-resolved contacts come back at around 11% against under 9% for human-resolved ones, and when tickets are marked resolved but the same customers keep returning about the same issue, the system is forcing closure rather than solving anything. We refuse to report containment as if it were resolution, and we watch re-contact precisely because it exposes the difference. A support system measured on containment will, given the chance, learn to trap customers rather than help them, which is the opposite of what you are paying for.
Which metrics really matter?
We measure the things that predict whether customers are helped, not the ones that flatter a report. Resolution rate, verified by what happened after the conversation rather than by the bot marking its own homework. Re-contact rate within a couple of days, which catches forced closures. Knowledge coverage, the share of questions your content can genuinely answer, because an agent cannot resolve what your knowledge base never documented. And the CSAT gap between AI and human handling, measured separately rather than blended into one figure that hides which is which.
Measuring AI satisfaction separately matters because automation handles the easy contacts and humans take the hard ones, so a blended score tells you nothing about either. Tracked apart, the data often shows AI satisfaction matching or beating humans on the contacts it should handle, and falling short on the ones it should not, which is exactly the signal you need to set the escalation line. We build the measurement in from the start, with audit-ready logging of resolutions and handoffs, because a support system you cannot honestly measure is one you cannot improve, and one whose vendor reports only the flattering number is one to distrust.
Escalation is a feature, not a failure.
A low handoff rate is not the goal, and chasing it is how customers get trapped. A healthy support agent escalates somewhere between 15 and 30% of contacts depending on their complexity, and a system reporting near-zero handoffs alongside frustrated customers is not resolving more, it is refusing to let people out. The aim is to resolve what should be resolved automatically and to escalate what should not, quickly and cleanly, rather than to minimise the number of humans involved at any cost to the customer.
The quality of the handoff matters as much as its timing. When the agent escalates, it passes the full conversation, the customer's sentiment and a suggested next step, so the person picks up with context rather than asking the customer to start again, which is the single most infuriating part of badly built support. As one industry leader put it, there should never be a dead end, only an escalation path. We design escalation as a first-class flow rather than an admission of defeat, because a clean handoff to a human is a better outcome than a confident wrong answer, and customers remember which one they got.
CSAT, honestly: where AI wins and where it does not.
The satisfaction gap between AI and human support has effectively closed for the right contacts, and remains real for the wrong ones. On structured, routine requests, AI now scores around 4.1 out of 5 against a human 4.3, a gap that narrows to five hundredths of a point under good hybrid escalation, and on the most structured intents like password resets and order status it matches or beats human handling outright. The reason is simple: those problems get solved instantly, without a queue or an availability window.
The honesty is in naming where it falls down. On emotionally charged contacts, a complaint, a cancellation, a customer who is already angry, autonomous AI scores far lower, around 3.3, because what those customers need is acknowledgement and judgement rather than a correct answer. We tune the system to recognise sentiment and route those contacts to a person quickly, so AI handles what it is good at and humans handle what they are better at. Forcing an agent to handle an emotional complaint to protect a containment number is how a support operation damages the relationships that matter most, and we design against it deliberately.
Tell them it is AI, and let them out.
Most customers still believe a human is more likely to get their answer right, around 84% of them, and pretending an AI is a person insults that instinct and backfires the moment it slips. Transparency is both ethical and effective: telling the customer they are talking to AI, and giving them a visible one-click route to a human, raises satisfaction rather than lowering it, because it removes the suspicion and the sense of being trapped. A customer who knows they can reach a person relaxes into letting the AI try first.
This is why we build labelling and easy escalation as defaults rather than options. A hidden bot that fights to keep the customer away from a human is the pattern that earns automated support its worst reputation, and it is short-term thinking: the customer who feels deceived or cornered is the one who churns or complains publicly. Being honest about what the customer is talking to, and making the exit obvious, is both the decent design and the one that performs better, which is a happy alignment we lean into rather than the trade-off it is often assumed to be.
Does the cost case hold up?
The savings are large and well documented. An AI-handled resolution runs around $0.62 against roughly $7.40 for a human one, first response times have dropped from hours to minutes, and total resolution time has compressed by something like 87% in AI-native operations. A blended model, AI handling the routine and humans the rest, cuts cost per resolution by around 71% at a negligible satisfaction cost, which is the configuration that truly works rather than the all-AI fantasy that damages CSAT.
The reason the case holds up only with resolution, not deflection, is that a deflected-but-unsolved contact comes back, and a returning customer costs more than one helped the first time. Savings built on containment are borrowed against future re-contacts and lost goodwill; savings built on genuine resolution are real and compounding. We model the economics on resolved contacts and the blended human-plus-AI flow, because that is where the documented numbers really come from, and we would rather show you a credible 71% on resolution than an impressive number on a metric that unravels the moment the deflected customers come back.
Voice or chat — which channel wins?
Conversational AI is no longer only text. Voice now carries around 19% of inbound contact-centre volume, roughly triple its share two years ago, as speech models have become good enough to handle a real phone conversation rather than a frustrating menu tree. The same agent that resolves a chat can take a call, understand the spoken request, and complete it or hand off, which matters because a large share of customers still reach for the phone for anything that feels urgent or complicated.
Channel choice also shapes satisfaction in its own right. Live chat consistently outscores email and phone on CSAT by a wide margin, because it is immediate and keeps a written record, so meeting customers on the channel that already performs best compounds the gains from automation rather than fighting them. We build across the channels your customers really use rather than forcing them onto the one cheapest to automate, because a support experience that is excellent on chat and absent on the phone has solved half the problem and annoyed the half of customers who picked up the phone.
Grounded, and aware of what it does not know.
In support, a confident wrong answer is worse than no answer, because the customer acts on it. An ungrounded model invents answers 15 to 27% of the time, which in a support setting means telling customers things about your policies, your products and their accounts that are simply untrue. Grounding the agent in your own help content, policies and resolved tickets through retrieval brings that rate down to somewhere around 1%, and lets every answer cite the source it came from, so a claim can be checked rather than trusted blindly.
Grounding only works if the knowledge is there and current, which is why knowledge coverage, the share of real questions your content can answer, is one of the metrics we track from the start. Where the knowledge base has a gap, the honest behaviour is to escalate rather than improvise, so the agent knows what it does not know and hands off instead of guessing. We build on the same governed retrieval as our knowledge systems work, kept current as your policies change, because a support agent grounded in last year's refund policy is a liability dressed up as automation.
Multilingual, and sovereign by default.
A support agent built on a language model speaks your customers' languages without a separate team or a translation layer for each one, answering in the language the customer wrote or spoke in. For a business serving several European markets, that turns multilingual support from a staffing problem into a capability the system has by default, with consistent quality across languages rather than the uneven coverage that comes from hiring for each one separately.
It also concentrates exactly the data that has to be handled carefully. A support conversation is personal data, often including account details, complaints and sometimes special-category information, and routing all of it through a foreign provider's model is a continuous export of customer data on every contact. We run the agent on open-weight models hosted inside the EU, so the transcripts and the knowledge index stay in-region with no foreign provider in the chain that a non-EU court could reach, and the data handling follows our compliance work. For support at scale, sovereignty is not a detail but the difference between a compliant operation and a daily data-protection problem.
Pilots are easy; production is the test.
Almost everyone has tried this and far fewer have shipped it: around 64% of customer-experience teams ran an agentic AI pilot in 2026, but only about 27% had even one channel fully in production. The gap is the same one that defeats AI everywhere, the distance between a demo that impresses and a system that holds up under real, varied, high-stakes customer contact, where the cost of a wrong answer is a damaged relationship rather than a bad test result.
Closing it means building the accuracy and the guardrails in from the start rather than retrofitting them after the first incident with a real customer. The grounding, the escalation, the measurement and the sovereignty are what turn a promising pilot into a channel you can trust with your customers, and they are exactly the parts a quick proof-of-concept skips. We build for production from the first day, because a support agent that works in a demo and embarrasses you in front of a customer is worse than no agent at all, and the 27% who shipped are the ones who treated it as a production system rather than an experiment to bolt onto the queue.
How does it start?
An engagement starts with two things you already have: your help content and your real tickets. The help articles, policies and resolved cases become the grounded knowledge the agent answers from, and the ticket history shows which contact types are high-volume and routine enough to automate well and which are emotional or complex enough to route to a person. That analysis usually reveals that a handful of intent types make up most of the volume, which is exactly where automation pays.
From there we build the agent on those high-value intents first, grounded, measured and connected to your help desk and order systems, with escalation tuned to your tolerance and the whole thing inside the EU. It goes live on the contacts it handles well and grows its scope as the measurement proves each new intent, rather than launching broad and hoping. The aim is an agent that resolves the routine majority and hands off the rest cleanly from day one, earning more of the volume as the data shows it has earned the trust, which is how a support operation improves rather than gambles.
Who needs this, and when should it stay human?
Conversational AI fits a support operation with real volume and a core of routine, repeatable contacts: the password resets, order statuses, refund requests and policy questions that make up most of the queue and need answering fast rather than thoughtfully. A business fielding the same questions thousands of times, or one whose support cannot keep up at acceptable cost, has exactly the profile where automation resolves a large share at a fraction of the cost and frees the human team for the contacts that need them.
Where the contacts are mostly complex, emotional or high-stakes, we will say so. A support operation that is overwhelmingly complaints, sensitive cases or bespoke problems may get little from autonomous resolution and should keep humans in front, perhaps with AI assisting them rather than replacing them. Low volume is the other case where the build does not pay, and we would rather tell you that than sell an agent for a queue that a person could clear by hand. The technology earns its place against high-volume routine work, and where your support is not that, the honest answer is to keep it human or to wait until it is.
Augmenting humans, not replacing them.
Not every use of AI in support is autonomous resolution. A large share of the value is in making human agents faster on the contacts they still handle, drafting a reply for them to check, surfacing the relevant knowledge-base answer, suggesting the next action, and automating the after-call write-up. Around 45% of calls involve an agent searching for an answer mid-conversation, and putting that knowledge at their fingertips is what drives much of the reduction in handling time, without taking the human out of the loop at all.
This matters because the honest path for many operations is augmentation before, or instead of, full automation. An agent-assist layer lifts the productivity of the team you have on exactly the complex, emotional contacts that autonomous AI handles worst, which is the opposite end of the queue from the routine work the agent resolves alone. We build both, and recommend the mix that fits your contact profile, because a support strategy that only counts the tickets AI closed on its own misses half of where AI really helps, which is in the hands of the people doing the hard part.
Your voice, and the things it must not say.
A support agent speaks for your brand on every contact, so how it speaks matters as much as what it resolves. The tone, the formality, the warmth, the house style, is something we tune to your brand rather than leave at a generic default, because a customer notices when the support voice is jarringly off from the rest of the company. An agent that resolves correctly but sounds wrong is still a poor representative, and consistency of voice across thousands of conversations is one of the quiet advantages a well-built system has over a room of differently-trained people.
Just as important is what it must never do. Guardrails define the commitments it cannot make, the topics it must not engage, the claims it is forbidden to invent, and the actions that always need a human, so the agent cannot promise a refund outside policy or improvise an answer on a regulated matter. We set those limits to your business and your compliance obligations, because an agent speaking for you needs the same boundaries you would put on a new hire, and rather more, since it speaks to far more customers far faster than any person could.
Connected to where the work happens.
Resolution requires action, and action requires connection. An agent that can only talk is back to being a chatbot; one that can resolve has to reach into your help desk to read and update a ticket, your order system to check or change an order, your account systems to look up a customer and act on their request. The difference between describing the solution and performing it is the integration into the systems where the work really lives, which is why a real support agent is a connected system rather than a chat window bolted to a model.
We build those connections to the tools you already run, your help desk, your order and account systems, so the agent completes the request end to end rather than handing the customer a set of instructions to follow themselves. The actions run through the same controlled layer as the rest of our agents work, with guardrails on what the agent may change and a record of everything it did. The point is the customer's problem solved inside your systems, not a polite explanation of how they could solve it, because the former is resolution and the latter is the deflection customers have learned to resent.
It gets better from every conversation.
A support agent should not be frozen at the quality it launched with. Every conversation is data: the contacts it resolved, the ones it escalated, the questions it could not answer, and the human corrections on the cases it got wrong. Fed back, that stream shows where the knowledge base has gaps, which intents are ready to automate next, and where the agent is drifting, so the system improves with use rather than slowly degrading as products and policies change around a fixed configuration.
We build that improvement loop in, because it is the difference between a support agent that earns more of the volume over time and one that plateaus and then quietly slips. The escalations point to the next intents worth automating; the failed answers point to the content worth writing; the corrections refine the handling. This is the same discipline of measurement and iteration we apply across our AI work, applied to support, and it is what turns a launch into a system that is resolving meaningfully more a year later than on its first day, rather than one nobody has touched since the project closed.
Questions buyers ask.
What is conversational AI for customer support?
Does AI support hurt customer satisfaction?
What is the difference between deflection and resolution?
How do you stop the agent making things up?
When does it escalate to a person?
Is customer data kept inside the EU?
What is the difference between a chatbot and a support agent?
Why optimise for containment at all?
How often should the AI escalate to a human?
Where does AI support score worst on satisfaction?
Should we tell customers they are talking to AI?
Can it handle phone calls as well as chat?
What does it cost compared to human support?
Send us your help docs and your top tickets. We'll show you what resolves.
Share your knowledge base and the questions that fill your queue. We test what an agent can resolve on a sample, show the answers with their citations and escalation points, and report the honest resolution rate, before you commit to anything.