The rumors have been true: OpenAI on Thursday introduced the discharge of its new frontier massive language mannequin (LLM) household, GPT-5.2.
It comes at a pivotal second for the AI pioneer, which has confronted intensifying stress since rival Google’s Gemini 3 LLM seized the highest spot on main third-party efficiency leaderboards and lots of key benchmarks final month, although OpenAI leaders harassed in a press briefing that the timing of this launch had been mentioned and labored on nicely prematurely of the discharge of Gemini 3.
OpenAI describes GPT-5.2 as its "most succesful mannequin collection but for skilled data work," aiming to reclaim the efficiency crown with vital beneficial properties in reasoning, coding, and agentic workflows.
"It’s our most superior frontier mannequin and the strongest but available in the market for skilled use," Fidji Simo, OpenAI’s CEO of Functions, stated throughout a press briefing right now. "We designed 5.2 to unlock much more financial worth for folks. It's higher at creating spreadsheets, constructing shows, writing code, perceiving photographs, understanding lengthy context, utilizing instruments, and dealing with complicated, multi-step tasks."
GPT-5.2 includes a large 400,000-token context window — permitting it to ingest tons of of paperwork or massive code repositories directly — and a 128,000 max output token restrict, enabling it to generate intensive experiences or full purposes in a single go.
The mannequin additionally includes a data cutoff of August 31, 2025, guaranteeing it’s up-to-date with comparatively latest world occasions and technical documentation. It explicitly contains "Reasoning token assist," confirming the underlying structure makes use of the chain-of-thought processing popularized by the "o1" collection.
The 'Code Purple' Actuality Test
The discharge arrives following The Data's report of an emergency "Code Purple" directive to OpenAI workers from CEO Sam Altman to enhance ChaTGPT — a transfer reportedly designed to mobilize sources following the "high quality hole" uncovered by Gemini 3. The Verge equally reported on the timing of GPT-5.2's launch forward of the official announcement.
Throughout the briefing, OpenAI executives acknowledged the directive however pushed again on the narrative that the mannequin was rushed solely to reply Google.
"It is very important observe this has been within the works for a lot of, many months," Simo instructed reporters. She clarified that whereas the "Code Purple" helped focus the corporate, it wasn't the only driver of the timeline.
"We introduced this Code Purple to actually sign to the corporate that we wish to marshal sources in a single explicit space… however that's not the rationale it's popping out this week specifically."
Max Schwarzer, lead of OpenAI's post-training group, echoed this sentiment to dispel the thought of a panic launch. "We've been planning for this launch since a really very long time in the past… this particular week we talked about many months in the past."
A spokesperson from OpenAI additional clarified that the "Code Purple" name utilized to ChatGPT as a product, not solely underlying mannequin improvement or the discharge of recent fashions.
Below the Hood: Immediate, Pondering, and Professional
OpenAI is segmenting the GPT-5.2 launch into three distinct tiers inside ChatGPT, a method seemingly designed to steadiness the large compute prices of "reasoning" fashions with consumer demand for velocity:
-
GPT-5.2 Immediate: Optimized for velocity and day by day duties like writing, translation, and knowledge looking for.
-
GPT-5.2 Pondering: Designed for "complicated, structured work" and long-running brokers, this mannequin leverages deeper reasoning chains to deal with coding, math, and multi-step tasks.
-
GPT-5.2 Professional: The brand new heavyweight champion. OpenAI describes this as its "smartest and most reliable choice," delivering the very best accuracy for tough questions the place high quality outweighs latency.
For builders, the fashions can be found instantly within the software programming interface (API) as gpt-5.2, gpt-5.2-chat-latest (Immediate), and gpt-5.2-pro.
The Numbers: Beating the Benchmarks
The GPT-5.2 launch contains main metrics throughout most domains — particularly those who goal the "skilled data work" hole the place opponents have lately gained floor.
OpenAI highlighted a brand new benchmark known as GDPval, which measures efficiency on "well-specified data work duties" throughout 44 occupations.
"GPT-5.2 Pondering is now state-of-the-art on that benchmark… and beats or ties prime trade professionals on 70.9% of well-specified skilled duties like spreadsheets, shows, and doc creation, in accordance with knowledgeable human judges," Simo stated.
Within the vital enviornment of coding, OpenAI is claiming a decisive lead. Schwarzer famous that on SWE-bench Professional, a rigorous analysis of real-world software program engineering, GPT-5.2 Pondering units a brand new state-of-the-art rating of 55.6%.
He emphasised that this benchmark is "extra contamination resistant, difficult, numerous, and industrially related than earlier benchmarks like SWE-bench Verified."Different key benchmark outcomes embrace:
-
GPQA Diamond (Science): GPT-5.2 Professional scored 93.2%, edging out GPT-5.2 Pondering (92.4%) and surpassing GPT-5.1 Pondering (88.1%).
-
FrontierMath: On Tier 1-3 issues, GPT-5.2 Pondering solved 40.3%, a major soar from the 31.0% achieved by its predecessor.
-
ARC-AGI-1: GPT-5.2 Professional is reportedly the primary mannequin to cross the 90% threshold on this basic reasoning benchmark, scoring 90.5%
The Worth of Intelligence
Efficiency comes at a premium. Whereas ChatGPT subscription pricing stays unchanged for now, the API prices for the brand new flagship fashions are steep in comparison with earlier generations, reflecting the excessive compute calls for of "pondering" mode. They're additionally on the upper-end of API prices for the trade.
-
GPT-5.2 Pondering: Priced at $1.75 per 1 million enter tokens and $14 per 1 million output tokens.
-
GPT-5.2 Professional: The prices soar considerably to $21 per 1 million enter tokens and $168 per 1 million output tokens.
GPT-5.2 Pondering is priced 40% increased within the API than the usual GPT-5.1 ($1.25/$10), signaling that OpenAI views the brand new reasoning capabilities as a tangible value-add relatively than a mere effectivity replace.
The high-end GPT-5.2 Professional follows the identical sample, costing 40% greater than the earlier GPT-5 Professional ($15/$120). Whereas costly, it nonetheless undercuts OpenAI’s most specialised reasoning mannequin, o1-pro, which stays the costliest providing on the menu at a staggering $150 per million enter tokens and $600 per million output tokens.
OpenAI argues that regardless of the upper per-token value, the mannequin’s "larger token effectivity" and skill to unravel duties in fewer turns make it economically viable for high-value enterprise workflows.
Right here's the way it compares to the present API prices for different competing fashions throughout the LLM area:
|
Mannequin |
Enter (/1M) |
Output (/1M) |
Whole Value |
Supply |
|
Qwen 3 Turbo |
$0.05 |
$0.20 |
$0.25 |
|
|
Grok 4.1 Quick (reasoning) |
$0.20 |
$0.50 |
$0.70 |
|
|
Grok 4.1 Quick (non-reasoning) |
$0.20 |
$0.50 |
$0.70 |
|
|
deepseek-chat (V3.2-Exp) |
$0.28 |
$0.42 |
$0.70 |
|
|
deepseek-reasoner (V3.2-Exp) |
$0.28 |
$0.42 |
$0.70 |
|
|
Qwen 3 Plus |
$0.40 |
$1.20 |
$1.60 |
|
|
ERNIE 5.0 |
$0.85 |
$3.40 |
$4.25 |
|
|
Claude Haiku 4.5 |
$1.00 |
$5.00 |
$6.00 |
|
|
Qwen-Max |
$1.60 |
$6.40 |
$8.00 |
|
|
Gemini 3 Professional (≤200K) |
$2.00 |
$12.00 |
$14.00 |
|
|
GPT-5.2 |
$1.75 |
$14.00 |
$15.75 |
|
|
Gemini 3 Professional (>200K) |
$4.00 |
$18.00 |
$22.00 |
|
|
Claude Sonnet 4.5 |
$3.00 |
$15.00 |
$18.00 |
|
|
Claude Opus 4.5 |
$5.00 |
$25.00 |
$30.00 |
|
|
GPT-5.2 Professional |
$21.00 |
$168.00 |
$189.00 |
Picture Technology: Nothing New But…However 'Extra to Come'
Throughout the briefing, VentureBeat requested the OpenAI contributors if the brand new launch included any enhance to picture era capabilities, noting the thrill round comparable options in latest competitor launches like Google's Gemini 3 Picture aka Nano Banana Professional.
Sadly for these looking for to recreate the form of text-and-information heavy graphics and picture modifying capabilities, OpenAI executives clarified that GPT-5.2 comes with no present picture enhancements over the prior GPT-5.1 and OpenAI's built-in DALL-E 3 and gpt-4o native picture era fashions.
"On picture Gen, nothing to announce right now, however extra to return," Simo stated. She acknowledged the recognition of the characteristic, including, "We all know this can be a essential use case that folks love, that we launched [to] the market, and so positively extra to return there."
Aidan Clark, OpenAI's lead of coaching, additionally declined to touch upon visible era specifics, stating merely, "I can't actually converse to picture Gen myself."
The 'Mega-Agent' Period
Past uncooked scores, OpenAI is positioning GPT-5.2 because the engine for a brand new era of "long-running brokers" able to executing multi-step workflows with out human hand-holding."
Field discovered that 5.2 can extract data from lengthy, complicated paperwork about 40% sooner, and likewise noticed a 40% enhance in reasoning accuracy for Life Sciences and healthcare," Simo stated.
She additionally famous that Notion reported the mannequin "outperforms 5.1 throughout each dimension… and it excels on the form of actually ambiguous, longer rising duties that outline actual data work."Schwarzer added that coding startups like Increase Code discovered the mannequin "delivered considerably stronger deep code capabilities than any prior mannequin," which is why it was chosen to energy their new code assessment agent.Visible capabilities have additionally seen an improve.
OpenAI's launch weblog put up exhibits an instance the place "a traveler experiences a delayed flight, a missed connection, an in a single day keep in New York, and a medical seating requirement."
The result? "GPT‑5.2 manages all the chain of duties—rebooking, special-assistance seating, and compensation—delivering a extra full final result than GPT‑5.1."
A brand new analysis known as ScreenSpot-Professional, which checks a mannequin's potential to know GUI screenshots, exhibits GPT-5.2 Pondering attaining 86.3% accuracy, in comparison with simply 64.2% for GPT-5.1.
Science and Reliability
OpenAI leaders additionally harassed the mannequin's utility for scientific analysis, making an attempt to maneuver the dialog past easy chatbots to analysis assistants.
Aidan Clark, lead of the coaching group, shared an instance of a senior immunology researcher testing the mannequin.
"They examined it by asking it to generate a very powerful unanswered questions concerning the immune system," Clark stated. "That immunology researcher reported that GPT-5.2 produced sharper questions and stronger explanations for why these questions… matter in comparison with any earlier professional mannequin.
"Reliability was one other key focus. Schwarzer claimed the brand new mannequin "hallucinates considerably lower than GPT-5.1," noting that on a set of de-identified queries, "responses contained errors 38% much less typically."
The 'Vibe' Shift
Apparently, OpenAI acknowledged that not each consumer would possibly instantly desire the brand new fashions.
When requested why legacy fashions like GPT-5.1 would stay accessible, Schwarzer admitted that "fashions change a bit bit each time.
"Some customers might discover that they like the vibes of the earlier mannequin, regardless that we expect the most recent one is throughout the board typically significantly better," Schwarzer stated. He additionally famous that for some enterprise clients who’ve "actually fine-tuned a immediate for a particular mannequin," there could be "small regressions," necessitating entry to the older variations.
Security, 'Grownup Mode,' and Future Roadmap
Addressing security considerations, Simo confirmed that the corporate is making ready to roll out an "Grownup Mode" within the first quarter of subsequent yr, following the implementation of a brand new age prediction system.
"We're within the strategy of bettering that," Simo stated concerning the age prediction expertise.
"We wish to try this forward of launching grownup mode."Trying additional forward, trade experiences recommend OpenAI is engaged on a extra elementary architectural shift underneath the codename "Challenge Garlic," concentrating on a flagship launch in early 2026.
Whereas executives didn’t touch upon particular future roadmaps through the briefing, Simo remained optimistic concerning the economics of their present trajectory.
"For those who take a look at historic tendencies, compute has elevated about 3x yearly for the final three years," she defined. "Income has additionally elevated on the similar tempo… creating this virtuous cycle."
Clark added that effectivity is bettering quickly: "The mannequin we're releasing right now achieves a fair higher rating [on ARC-AGI] with virtually 400 instances much less value and fewer compute related to it" in comparison with fashions from a yr in the past.
GPT-5.2 Immediate, Pondering, and Professional start rolling out in ChatGPT right now to paid customers (Plus, Professional, Workforce, and Enterprise). The corporate notes the rollout might be gradual to take care of stability.
