How to build an AI customer service agent with OpenAI Agent Builder in 2025
The most comprehensive guide to building, customizing and deploying an AI customer service agent for your business using OpenAI’s Agent Builder, AgentKit and ChatKit in 2025
Mike is an experienced Product Manager who focuses on all the “non-development” areas of My AskAI, from finance and customer success to product design, copywriting, testing and more.
I’m guessing you’re here because you heard or saw OpenAI’s devday where they launched their new Agent Builder (AgentKit) tool with ChatKit.
And maybe your ears pricked up when you heard them mention the ability to ‘build your own customer service AI agent’.
Well then, you’re in the right place.
I am going to show you how you can set up your new AI agent, what to look out for and how to get the best out of it.
Let’s go.
How do I create my customer service chat workflow?
To get started building your AI agent, you’ll need to login to your OpenAI account and go to the ‘Agent Builder’ section (in the left hand menu).
Once in the Agent Builder, choose the ‘Customer service’ ChatKit template to give yourself a headstart (you can also build completely from scratch by clicking “Create” if you already have experience building workflows and AI agents with tools like Make or n8n.
A screenshot of OpenAI’s Agent Builder, listing out all workflow templates
Once you have selected the Customer service template you will be shown your draft new workflow with a few steps already populated in it, they are (going from right to left/start to finish):
Start: the entry point for your AI agent.
Jailbreak guardrail: to determine if the customer’s input is reasonable or just trying to ‘hack’ your AI agent.
Classification agent: to identify your customer’s intent when they reach out e.g. asking for a refund, a return or just a purely informational request.
Agent routing: your AI customer service agent will potentially be comprised of multiple agents, each capable of doing different things, so once you have determined an intent for your customer, you’ll need to send their request to the right agent.
Agents: the different agents you will be using for each different task.
User approval: to determine whether or not the customer was satisfied with the response they got.
End: the culmination of the agent flow, how you want it to finish.
Don’t worry, I’ll go through all of these in much more detail throughout this post.
A screenshot of the OpenAI Agent Builder customer service agent template in full.
How do I start my AI agent?
The first step in your AI agent builder workflow will be “Start”.
This is where you define the inputs that the AI agent will be taking in.
The most simple input (input variable) at the start of the flow is a text input “input_as_text”. This is just a question from your customer in plain text and to get started that’s probably all you’d need.
A screenshot of a section of the customer service agent workflow focusing on the start and guardrail sections.
If you wanted to make your AI agent ‘smarter’ or more aware of the customer it is talking to (in order to personalize responses for instance), then you may also want to pass in a “State variable”.
There are a number of different formats of these you can pass in from strings (just plain text), numbers, boolean (logic), objects (structured data) or a list.
For an AI customer service agent you may want to pass in things like:
Customer name
Customer identifier
Plan type
Or any other user specific information or context about a user that may assist an AI agent in its responses later down the line.
A screenshot of the start step in the customer service workflow showing the different possible settings.
What are ‘guardrails’ in Agent Builder and why do I need them?
The next step in your AI agent workflow will be to specify any ‘guardrails’ you want your AI agent to use.
A screenshot of the guardrails step in the customer service agent builder workflow.
Guardrails are designed do a number of things to protect both your company and your customers, they include:
PII checks and cleansing: in order to operate effectively, AI customer service agents rarely need any PII to answer questions, but customers often provide it anyway, usually without thinking. This could put you at risk of breaching privacy regulations such as GDPR if you don’t have a good way to manage it (recording, deleting etc).
So the easiest thing to do is to strip it out as soon as it enters the system. Toggling this on will do this.
From there you can make numerous selections of what you do or do not want to redact, from common data points like names and phone numbers, to country specific information like bank account numbers and ID numbers.
A screenshot of the PII guardrail in the agent builder workflow.
Moderation: different AI agents will be used for different purposes, but most of the time, you probably don’t want customers contacting you sharing harmful or sexually explicit material. Enabling this will ensure your AI agent won’t process any such requests containing this material. You can customize exactly what type of content you want the AI to block by clicking the settings icon.
A screenshot of the moderation guardrail in the agent builder workflow.
Jailbreak: this is quite unique to AI but as soon as you add an AI customer service agent to your site, one of the inevitable consequences is that people will try to use it for other purposes.
Whether that is to get it to go “off-script” for fun, to try and get access to data or just to get free access to powerful AI models to do their homework.
Enabling a jailbreak guardrail will largely prevent such attacks and is based on an open source framework constantly being worked on and improved. Depending on how critical this is for you, you have the option to choose more powerful models and higher or lower confidence levels (by default it uses GPT-5-nano, with a 70% confidence threshold).
If you want to be very conservative choose a more powerful model like GPT-5-mini or even GPT-5, with a lower confidence threshold. But also be mindful that this will result in more false positives, more expensive requests (although very minor given the size of the input) and slower responses.
A screenshot of the jailbreak guardrail in the agent builder workflow.
Hallucination: another AI specific guardrail, and one you may have heard of if you have ever heard anyone say “AI makes things up”.
There will be protection points across the workflow to do this, but this will be the first. It will check the request that is sent against the knowledge you upload to the AI (stored in your vector store - more on this later) to ensure it is only using verified information to answer from.
Similar to the Jailbreak guardrail, you can choose the model you use for this detection and the confidence level required (by default it is gpt-4.1-mini and 70% respectively).
A more capable model and a lower confidence threshold should be used to be more conservative, but a smaller, faster model, like the nano-series, could improve your response speed (and decrease your request cost, although very minor given the size of the input).
Beware though, even with such controls in place hallucinations can be very difficult to comprehensively eliminate without a robust, multistep pipeline, such as what we use for My AskAI.
A screenshot of the hallucination guardrail in the agent builder workflow.
For most customer service agent use cases the default setting will be sufficient.
Finally, you can choose what you want to happen if the guardrails run into an error (such as if OpenAI’s APIs are down - something that is not uncommon).
Handling such errors can be frustrating, we learned this through experience, so with our AI agents, we set up fallbacks to different model providers to ensure maximum uptime of our service for customers.
How does Agent Builder identify my customer’s intent?
If you’ve worked in customer service for a few years, you doubtless have heard the word “intent” thrown around.
If you haven’t “Intent” refers to what the customer is trying to do when they contact your AI agent, what is their goal?
Most AI agents (using OpenAI’s AgentKit or otherwise), after initial guardrail checks, will start by trying to determine what the customer wants to do so it can assign the conversation or ticket to the correct agent down the line.
Think of it like triage you probably have in your support team already, you want to quickly identify the type of query the customer has so you can get them help from the right person as quickly as possible.
This could be based on the task - e.g. maybe certain queries require input from certain teams e.g. refunds/accounts teams, or it could be based on complexity, e.g. tier 1 v tier 3 queries.
You will be creating different agents responsible for different tasks and you need a way of triaging tickets so you send them to the correct agent.
A screenshot of the customer service agent builder workflow focusing on the intent identification.
For this we use a “Classification agent”.
Its only job is to classify the customer’s intent into a set of categories.
It does this via the “Instructions” provided to it, by default the instructions are set as:
A screenshot of the instructions for the classification agent in the customer service agent builder template.
Classify the user’s intent into one of the following categories: "return_item", "cancel_subscription", or "get_information".
Any device-related return requests should route to return_item.
Any retention or cancellation risk, including any request for discounts should route to cancel_subscription.
Any other requests should go to get_information.
So a simple 3 category classification with a few examples (I’d always recommend providing examples where you can) to help explain.
However your business will likely have far more categories you want to use so you will just add your additional categories here, in natural language.
And, if you get stuck or need help improving the prompt, you can use the pencil icon to help you improve the prompt.
You can also provide further “context” (read, data) to your agent here if required to improve the classification.
A screenshot of the detailed settings for the classification agent in the customer service agent builder template.
You can also choose whether you want to include “Chat history” at this point.
I would highly recommend keeping this checked as you can easily have situations where customers messages or issues get split over multiple messages and so you want the AI to review them together when making its classification determination.
Like with guardrails, you again have the choice of which model you want to use for the classifying agent (by default it is GPT-4.1-mini), a good balance of speed and accuracy.
If when testing your agent, you find it is incorrectly classifying queries, sending them to the wrong agent, or if you have a larger, more complex routing or classifying logic, you may want to upgrade this to a more powerful model, such as GPT-5, but for most use cases, the default will be sufficient.
At all stages of your workflow you can also attach “Tools” to each agent step, I’ll discuss these more in the Information agent section later.
Finally (for the general settings), you will want to specify the output format of your classifying agent, you’ve defined the input via your Instructions, so you know what you want it to do. Now you have to decide, what you want the format of that output to be.
It’s best to decide on a simple format for a classifier that splits responses into a few different categories.
By default you can see, by clicking response_schema, under Output format that one of 3 specific tags are applied to each response.
The tags specified should match the ones you have provided as categories in your previous instructions.
A screenshot of the classification structured output for the classification agent.
Finally, there are some default “advanced” settings you may want to take a look at too:
Model parameters:
Temperature, Top P - in simple terms, these control the ‘variance’ of your AI agent’s responses. This can be good and bad for an AI customer service agent as if you have varied responses they can be difficult to test consistently and reliably, but if you don’t have varied responses, they can mean responses seem more ‘robotic’ as they will be the same each time for the same question. In our experience, we have found that most businesses prefer consistency over creativity and so we tend to use a lower temperature (less than 1) and Top P.
Max tokens - this can control the length of responses. While the AI agent will follow your instructions on length (later in the worflow), the token limit will put a hard limit on this. The default limit of 2,048 is probably sufficient for most customer service use cases. However if you tend to need super detailed, long responses you may need to extend this.
ChatKit settings:
Display response in chat - I would say in most customer service use cases, showing the reasoning for the intent classification isn’t required, so you can toggle this off.
Show search sources - same goes for this, it may be useful later on when generating a final response.
Continue on error - This should be toggled “on”, as otherwise you won’t be able to create a path for an error case, if an API call fails.
Write to conversation history - This should be kept on, so the AI agent has context of previous decisions made in the conversation.
If that all sounds a little too much work, for you, try our AI agents, integrated into your existing help desk and we’ll do it all for you, based on lessons we have learned from managing millions of support tickets.
How do I make sure the AI agent uses the right… AI agent?
This is a pretty simple step as it uses no AI, just plain old conditional logic.
It is an If/else block added into the flow to take the output from your AI classifier we discussed in the last section and provide it to the correct AI agent responsible for the appropriate task.
So if the customer’s query was about returning an item, it sends it to the “returns agent”, if it was about cancelling a subscription it sends it to the “retention agent” and if it was another question it sends it to the “information agent”.
If you created more categories of query in the previous section you’ll be adding more conditions here and more routing when you’ve created your respective agents.
A screenshot of the conditional logic section of the customer service agent workflow.
What can an AI customer service agent built with Agent Builder do?
This is where we get to the good stuff - making the AI agent actually do something useful.
In this example we have 3 different agents set up to handle 3 different tasks in 3 different ways, but you can mix and match what each agent is capable of and add as many additional agents as you want to suit your business’ use case.
Let’s go through each agent one-by-one so you get an idea of what they are capable of.
Return agent
Unsurprisingly, the Returns agent is responsible for offering a return to the customer.
In this case it is a very simple process, whereby the AI agent has been given the instruction to “Offer a replacement device with free shipping”
Chat history has been included (it pretty much always should be), the model is a very lightweight, fast model (gpt-4.1-mini) as it is a simple task for a powerful AI, there are no tools required (I’ll explain these later) and the output is text.
A screenshot of the return agent in the customer service agent workflow showing its settings.
Once this message has been sent the agent then follows up with a question for the customer to approve or reject as to whether they want the replacement (created using the User approval block).
A screenshot of the return agent user approval step in the customer service agent workflow.
And an End block is created with a message back to the customer stating whether or not their return is being processed.
Note that in this example, the AI agent would say the return is being processed, but as not tool call has been made, nothing would happen, so you would need to set up a tool or action to initiate this process.
A screenshot of the structured output of the return agent in the customer service agent workflow.
Retention agent
This is the next level up in complexity as, in addition to an instruction:
You are a customer retention conversational agent whose goal is to prevent subscription cancellations. Ask for their current plan and reason for dissatisfaction. Use the get_retention_offers to identify return options. For now, just say there is a 20% offer available for 1 year.
Which requests information from the user, it also utilizes a “Tool” called “get_retention_offers”.
A tool allows your AI customer service agent to access information from your business data by submitting an API call and getting a response.
For example it could be requesting details on a customer’s account or in this case, finding which offers the customer is eligible for.
A screenshot of the retention agent in the customer service agent workflow showing its settings.
By clicking on the Function you can see an example function call that will gather information about the customer to determine what offers they can be given based on their account type, current plan, tenure as a customer and number of complaints they have made.
This information would then be fed back into the instruction to decide how to respond to the customer in that specific instance.
A screenshot of the function used in the retention agent.
Information agent
The final, and arguably most powerful (and complex) agent in the template is the Information agent, whose job is to answer any general, informational queries outside of returns and cancellations.
A screenshot of the information agent in the customer service agent workflow showing its settings.
In the case of the template, the agent has all of its “informational knowledge” stuffed inside its instructions.
A screenshot of the detailed instructions used by the informational agent.
However, in practice, the information used by the AI agent would be retrieved via a tool (most likely a vector store).
There are several different types of “Tool” available to each agent:
Client tool
MCP server
File search
Web search
Code interpreter
Let’s go through each one:
A screenshot demonstrating the different tools available to each AI agent in Agent Builder.
Client tool
A client tool is a function or piece of code that can run or call an API to perform an action or retrieve information.
The example provided in the section on the Retention agent is an example of a Client tool being used.
A screenshot of the Client tool feature showing an example function.
MCP server
One of the most powerful tools available, this allows the AI agent to flexibly call existing OpenAI, 3rd party or your own MCP servers that have been set up.
In simple terms an MCP server is like a collection of APIs where the AI can figure out which ones it needs to use in order to solve the problem you have given it.
OpenAI has already released a number of MCPs that will allow your AI agent to retrieve information from: Gmail, Google Calendar, Google Drive, Outlook (Email and Calendar), Sharepoint, Microsoft Teams and Dropbox.
3rd parties have also produced their own MCP servers for Zapier, Shopify, Intercom, Stripe, Plaid, Square, Cloudflare, HubSpot, Pipedream, PayPal and Deepwiki.
But you can also build your own MCP to flexibly access data in your own product or business.
So if you wanted to allow your AI agent to search within these tools then you can with a simple authorization.
In a customer service agent, this could be useful for setting up actions in Shopify or Stripe to automate refund or return flows entirely automatically.
A screenshot of MCP server options set up in the Agent Builder.
File search
The other powerful tool you can set up is allowing your AI customer service agent to access information stored in files you have uploaded to it.
You can either directly upload a handful of files using the upload feature (example SOPs, policies or FAQs), or, if you have more content, then you can connect a vector store.
Setting this up takes a little more work on your side, as you will need to create the vector store in the first instance (a database that stores your file content in an ordered way that an AI can search easily).
You can either do this directly in OpenAI - but will be limited to only uploading files, or use a third party provider such as Pinecone where you can integrate it with service such as Apify to crawl websites.
Once you have this connected your AI agent will be able to reference all of your content when answering, effectively becoming your most knowledgeable agent instantly.
A screenshot showing how the vector store is created in the OpenAI platform.
Web search
Probably less useful in the AI customer service agent world, but you can also allow your AI agent to search the web for answers.
You probably don’t want to allow this as it then will become much more difficult to control accurate responses for your agent and is more likely to recommend competitors or refer to outdated information.
For a customer service agent, I would stick to file search and MCPs as your knowledge source.
A screenshot showing how to set up a web search tool for an AI agent.
Code interpreter
Again, unless you have highly technical product and want your AI agent to do analysis on behalf of your customers, the code interpreter tool is going to be less useful for your AI customer service agent.
It allows you to run specific Python code or let your AI agent analyze files that are uploaded to it.
A screenshot of the code interpreter tool upload screen in the Agent Builder.
How do I edit my ChatKit widget?
Now you’ve built your AI customer service agent with the Agent Builder, you’ll probably want to make it look pretty!
Well, luckily OpenAI have their own ChatKit widget builder too which allows you to customize almost all aspects of your widget.
You can start my going to your ChatKit playground, from there you can edit:
What’s the fastest way to build a customer service agent with OpenAI Agent Builder + ChatKit?
Start from the “Customer service” ChatKit template, define inputs (text + optional state variables like customer_id/plan), enable guardrails, add an intent Classification agent, route with simple if/else logic, and implement task-specific agents (e.g., Returns, Retention, Information) with the right Tools. Finish by testing and styling via the ChatKit playground.
Which data sources and tools should I connect for reliable answers?
Prefer owned/curated sources: File search (uploaded docs or a vector store) and MCP servers/Client tools (e.g., Shopify, Stripe, Intercom) for live account/order actions. Avoid open Web search for support flows to prevent drift or competitor mentions. Use Code interpreter only if you truly need file/data analysis.
How do I handle PII, jailbreaks, and hallucinations in Agent Builder?
Turn on guardrails: automatic PII redaction, content moderation, jailbreak detection, and hallucination checks against your knowledge base. Set your confidence thresholds, pick balanced models, and define an error fallback (e.g., safe handoff) so the flow degrades gracefully.
How does intent classification and routing work in this setup?
A Classification agent maps messages to tags (e.g., return_item, cancel_subscription, get_information) using clear instructions and examples - ideally with chat history enabled. Then a conditional block routes to the right agent. Start with a fast, capable model; tighten instructions or upgrade if you see misroutes.
When should I use Agent Builder instead of a turnkey platform like My AskAI?
Choose Agent Builder if you want deep customization, custom Tools/MCP workflows, and have the resources to maintain prompts, guardrails, and data pipelines. Choose a dedicated platform for faster Shopify/help-desk integration, built-in deflection analytics, policy-aware Tasks, and predictable rollout with less engineering lift.
Mike is an experienced Product Manager who focuses on all the “non-development” areas of My AskAI, from finance and customer success to product design, copywriting, testing and more.