A Complete Developer Suite for Building Production-Ready AI Agents
Developing intelligent agents has historically been a fragmented and time-consuming process. Teams face challenges ranging from orchestrating complex workflows to building custom interfaces, all while struggling to maintain version control and evaluate performance effectively. OpenAI is addressing these pain points with AgentKit—a comprehensive platform that streamlines every phase of agent development from design through deployment.
Visual Workflow Design Made Simple
Agent Builder transforms the way teams create multi-agent workflows by replacing complex code with an intuitive visual interface. This drag-and-drop canvas allows developers to compose logic using nodes, integrate various tools, and implement custom safety measures—all while maintaining clear visibility into workflow operations.
The platform includes preview functionality and inline evaluation configuration, enabling rapid iteration with complete version tracking. Teams can either start from scratch or accelerate development using pre-configured templates tailored to common use cases.
Real-World Impact:
At Ramp, the development team used Agent Builder to create a buyer agent in mere hours—a process that previously required months of orchestration, custom coding, and manual optimization. The visual approach reduced iteration cycles by 70% and compressed timelines from two quarters to just two sprints. Cross-functional collaboration improved dramatically as product, legal, and engineering teams could work together in a unified interface.
LY Corporation, a prominent Japanese technology company, experienced similar results when building a workplace assistant agent in under two hours. The visual orchestration environment enabled engineers and subject matter experts to collaborate seamlessly, dramatically accelerating their agent creation and deployment process.
Centralized Data Governance and Safety
The Connector Registry addresses a critical enterprise need: managing data sources and integrations across multiple organizational workspaces. This centralized administration panel consolidates access to pre-built connectors for popular platforms like Dropbox, Google Drive, SharePoint, and Microsoft Teams, alongside third-party integrations.
Safety is integrated directly into the development workflow through Guardrails—an open-source, modular protection layer. This system helps prevent unintended or malicious agent behavior by masking personally identifiable information, detecting jailbreak attempts, and applying customizable safeguards. Teams can implement Guardrails either as a standalone solution or through dedicated libraries available for both Python and JavaScript.
Deploying User-Facing Chat Interfaces
Creating sophisticated chat experiences typically involves complex engineering work: managing streaming responses, handling conversation threads, displaying reasoning processes, and designing engaging interactions. ChatKit eliminates this complexity by providing ready-to-embed chat components that feel native to any product.
The toolkit is designed for flexible integration into websites and applications with full customization options to match brand aesthetics and themes. Canva’s development team reported saving over two weeks by using ChatKit to build a support agent for their developer community. The integration took less than an hour and transformed their documentation into an interactive conversational experience.
ChatKit currently supports diverse applications including internal knowledge bases, employee onboarding systems, customer support solutions, and research assistants. HubSpot has successfully deployed their customer support agent using this framework.
Comprehensive Performance Evaluation
Building agents that perform reliably in production requires rigorous testing and measurement. The expanded Evals platform now offers four critical capabilities:
Dataset Management: Rapidly construct evaluation scenarios from scratch and expand them iteratively using automated graders and human feedback.
Trace Grading: Conduct comprehensive end-to-end assessments of complete agentic workflows, with automated grading that identifies specific weaknesses.
Automated Prompt Optimization: Generate improved prompts automatically based on human annotations and grading system outputs.
Third-Party Model Support: Evaluate models from various providers within a single unified platform.
These evaluation tools deliver tangible results. Carlyle reduced development time on their multi-agent due diligence framework by over 50% while simultaneously improving agent accuracy by 30%.
Advanced Model Customization
Reinforcement fine-tuning enables developers to customize reasoning models for specific use cases. Currently available on OpenAI o4-mini and in private beta for GPT-5, this capability is being refined through close collaboration with numerous customers.
Two new features enhance this customization process:
Custom Tool Calls: Train models to select and invoke appropriate tools at optimal times for superior reasoning performance.
Custom Graders: Define evaluation criteria that align with specific business requirements and use case priorities.
Getting Started
ChatKit and the enhanced Evals capabilities are immediately available to all developers. Agent Builder is accessible in beta, while the Connector Registry is beginning its beta rollout to select API, ChatGPT Enterprise, and Education customers with a Global Admin Console—a prerequisite for enabling the Connector Registry.
All these tools are included with standard API model pricing, with no additional fees for the development and deployment capabilities. Upcoming additions include a standalone Workflows API and expanded agent deployment options within ChatGPT.
This comprehensive suite eliminates the traditional barriers to agent development—fragmented tooling, extensive frontend work, manual evaluation processes, and complex orchestration—allowing teams to focus on creating innovative solutions rather than managing infrastructure.