When an OpenAI finance analyst wanted to match income throughout geographies and buyer cohorts final yr, it took hours of labor — looking by 70,000 datasets, writing SQL queries, verifying desk schemas. As we speak, the identical analyst sorts a plain-English query into Slack and will get a completed chart in minutes.
The instrument behind that transformation was constructed by two engineers in three months. Seventy p.c of its code was written by AI. And it’s now utilized by hundreds of OpenAI's staff each day — making it one of the aggressive deployments of an AI information agent inside any firm, anyplace.
In an unique interview with VentureBeat, Emma Tang, the top of information infrastructure at OpenAI whose group constructed the agent, supplied a uncommon look contained in the system — the way it works, the way it fails, and what it indicators about the way forward for enterprise information. The dialog, paired with the corporate's weblog publish saying the instrument, paints an image of an organization that turned its personal AI on itself and found one thing that each enterprise will quickly confront: the bottleneck to smarter organizations isn't higher fashions. It's higher information.
"The agent is used for any sort of evaluation," Tang mentioned. "Nearly each group within the firm makes use of it."
A plain-English interface to 600 petabytes of company information
To grasp why OpenAI constructed this technique, take into account the size of the issue. The corporate's information platform spans greater than 600 petabytes throughout 70,000 datasets. Even finding the proper desk can devour hours of an information scientist's time. Tang's Information Platform group — which sits underneath infrastructure and oversees huge information methods, streaming, and the information tooling layer — serves a staggering inner person base. "There are 5,000 staff at OpenAI proper now," Tang mentioned. "Over 4,000 use information instruments that our group supplies."
The agent, constructed on GPT-5.2 and accessible wherever staff already work — Slack, an online interface, IDEs, the Codex CLI, and OpenAI's inner ChatGPT app — accepts plain-English questions and returns charts, dashboards, and long-form analytical studies. In follow-up responses shared with VentureBeat on background, the group estimated it saves two to 4 hours of labor per question. However Tang emphasised that the bigger win is more durable to measure: the agent provides individuals entry to evaluation they merely couldn't have finished earlier than, no matter how a lot time that they had.
"Engineers, development, product, in addition to non-technical groups, who might not know all of the ins and outs of the corporate information methods and desk schemas" can now pull subtle insights on their very own, her group famous.
From income breakdowns to latency debugging, one agent does all of it
Tang walked by a number of concrete use instances that illustrate the agent's vary. OpenAI's finance group queries it for income comparisons throughout geographies and buyer cohorts. "It may possibly, simply actually in plain textual content, ship the agent a question, and it will likely be in a position to reply and provide you with charts and provide you with dashboards, all of this stuff," she mentioned.
However the true energy lies in strategic, multi-step evaluation. Tang described a latest case the place a person noticed discrepancies between two dashboards monitoring Plus subscriber development. "The info agent can provide you a chart and present you, stack rank by stack rank, precisely what the variations are," she mentioned. "There turned out to be 5 various factors. For a human, that might take hours, if not days, however the agent can do it in a couple of minutes."
Product managers use it to know function adoption. Engineers use it to diagnose efficiency regressions — asking, as an illustration, whether or not a selected ChatGPT element actually is slower than yesterday, and if that’s the case, which latency parts clarify the change. The agent can break all of it down and evaluate prior durations from a single immediate.
What makes this particularly uncommon is that the agent operates throughout organizational boundaries. Most enterprise AI brokers as we speak are siloed inside departments — a finance bot right here, an HR bot there. OpenAI's cuts horizontally throughout the corporate. Tang mentioned they launched division by division, curating particular reminiscence and context for every group, however "sooner or later it's all in the identical database." A senior chief can mix gross sales information with engineering metrics and product analytics in a single question. "That's a very distinctive function of ours," Tang mentioned.
How Codex solved the toughest drawback in enterprise information
Discovering the proper desk amongst 70,000 datasets is, by Tang's personal admission, the only hardest technical problem her group faces. "That's the largest drawback with this agent," she mentioned. And it's the place Codex — OpenAI's AI coding agent — performs its most creative position.
Codex serves triple responsibility within the system. Customers entry the information agent by Codex through MCP. The group used Codex to generate greater than 70% of the agent's personal code, enabling two engineers to ship in three months. However the third position is essentially the most technically fascinating: a day by day asynchronous course of the place Codex examines vital information tables, analyzes the underlying pipeline code, and determines every desk's upstream and downstream dependencies, possession, granularity, be a part of keys, and related tables.
"We give it a immediate, have Codex have a look at the code and reply with what we’d like, after which persist that to the database," Tang defined. When a person later asks about income, the agent searches a vector database to search out which tables Codex has already mapped to that idea.
This "Codex Enrichment" is one in all six context layers the agent makes use of. The layers vary from fundamental schema metadata and curated skilled descriptions to institutional data pulled from Slack, Google Docs, and Notion, plus a studying reminiscence that shops corrections from earlier conversations. When no prior info exists, the agent falls again to reside queries towards the information warehouse.
The group additionally tiers historic question patterns. "All question historical past is all people's 'choose star, restrict 10.' It's probably not useful," Tang mentioned. Canonical dashboards and government studies — the place analysts invested important effort figuring out the proper illustration — get flagged as "supply of reality." All the things else will get deprioritized.
The immediate that forces the AI to decelerate and assume
Even with six context layers, Tang was remarkably candid concerning the agent's largest behavioral flaw: overconfidence. It's an issue anybody who has labored with giant language fashions will acknowledge.
"It's a very huge drawback, as a result of what the mannequin usually does is really feel overconfident," Tang mentioned. "It'll say, 'That is the proper desk,' and simply go forth and begin doing evaluation. That's truly the mistaken strategy."
The repair got here by immediate engineering that forces the agent to linger in a discovery part. "We discovered that the extra time it spends gathering doable eventualities and evaluating which desk to make use of — simply spending extra time within the discovery part — the higher the outcomes," she mentioned. The immediate reads nearly like teaching a junior analyst: "Earlier than you run forward with this, I actually need you to do extra validation on whether or not that is the proper desk. So please verify extra sources earlier than you go and create precise information."
The group additionally discovered, by rigorous analysis, that much less context can produce higher outcomes. "It's very straightforward to dump every thing in and simply anticipate it to do higher," Tang mentioned. "From our evals, we truly discovered the alternative. The less belongings you give it, and the extra curated and correct the context is, the higher the outcomes."
To construct belief, the agent streams its intermediate reasoning to customers in actual time, exposes which tables it chosen and why, and hyperlinks on to underlying question outcomes. Customers can interrupt the agent mid-analysis to redirect it. The system additionally checkpoints its progress, enabling it to renew after failures. And on the finish of each process, the mannequin evaluates its personal efficiency. "We ask the mannequin, 'how did you assume that went? Was that good or dangerous?'" Tang mentioned. "And it's truly pretty good at evaluating how nicely it's doing."
Guardrails which can be intentionally easy — and surprisingly efficient
Relating to security, Tang took a practical strategy which will shock enterprises anticipating subtle AI alignment strategies.
"I feel you simply need to have much more dumb guardrails," she mentioned. "We’ve got actually robust entry management. It's all the time utilizing your private token, so no matter you will have entry to is just what you will have entry to."
The agent operates purely as an interface layer, inheriting the identical permissions that govern OpenAI's information. It by no means seems in public channels — solely in non-public channels or a person's personal interface. Write entry is restricted to a short lived check schema that will get wiped periodically and may't be shared. "We don't let it randomly write to methods both," Tang mentioned.
Consumer suggestions closes the loop. Staff flag incorrect outcomes straight, and the group investigates. The mannequin's self-evaluation provides one other verify. Long term, Tang mentioned, the plan is to maneuver towards a multi-agent structure the place specialised brokers monitor and help one another. "We're transferring in direction of that finally," she mentioned, "however proper now, whilst it’s, we've gotten fairly far."
Why OpenAI received't promote this instrument — however desires you to construct your personal
Regardless of the apparent industrial potential, OpenAI advised VentureBeat that the corporate has no plans to productize its inner information agent. The technique is to supply constructing blocks and let enterprises assemble their very own. And Tang made clear that every thing her group used to construct the system is already obtainable externally.
"We use all the identical APIs which can be obtainable externally," she mentioned. "The Responses API, the Evals API. We don't have a fine-tuned mannequin. We simply use 5.2. So you possibly can undoubtedly construct this."
That message aligns with OpenAI's broader enterprise push. The corporate launched OpenAI Frontier in early February, an end-to-end platform for enterprises to construct and handle AI brokers. It has since enlisted McKinsey, Boston Consulting Group, Accenture, and Capgemini to assist promote and implement the platform. AWS and OpenAI are collectively growing a Stateful Runtime Surroundings for Amazon Bedrock that mirrors among the persistent context capabilities OpenAI constructed into its information agent. And Apple lately built-in Codex straight into Xcode.
In line with info shared with VentureBeat by OpenAI, Codex is now utilized by 95% of engineers at OpenAI and evaluations all pull requests earlier than they're merged. Its international weekly energetic person base has tripled for the reason that begin of the yr, surpassing a million. Total utilization has grown greater than fivefold.
Tang described a shift in how staff use Codex that transcends coding completely. "Codex isn’t even a coding instrument anymore. It's rather more than that," she mentioned. "I see non-technical groups use it to arrange ideas and create slides and to create day by day summaries." One in all her engineering managers has Codex assessment her notes every morning, determine an important duties, pull in Slack messages and DMs, and draft responses. "It's actually working on her behalf in quite a lot of methods," Tang mentioned.
The unsexy prerequisite that may decide who wins the AI agent race
When requested what different enterprises ought to take away from OpenAI's expertise, Tang didn't level to mannequin capabilities or intelligent immediate engineering. She pointed to one thing way more mundane.
"This isn’t attractive, however information governance is basically vital for information brokers to work nicely," she mentioned. "Your information must be clear sufficient and annotated sufficient, and there must be a supply of reality someplace for the agent to crawl."
The underlying infrastructure — storage, compute, orchestration, and enterprise intelligence layers — hasn't been changed by the agent. It nonetheless wants all of these instruments to do its job. Nevertheless it serves as a basically new entry level for information intelligence, one that’s extra autonomous and accessible than something that got here earlier than it.
Tang closed the interview with a warning for firms that hesitate. "Corporations that undertake this are going to see the advantages very quickly," she mentioned. "And corporations that don't are going to fall behind. It's going to tug aside. The businesses who use it are going to advance very, in a short time."
Requested whether or not that acceleration apprehensive her personal colleagues — particularly after a wave of latest layoffs at firms like Block — Tang paused. "How a lot we're in a position to do as an organization has accelerated," she mentioned, "but it surely nonetheless doesn't match our ambitions, not even one bit."

