Cursor’s new coding mannequin Composer 2 is right here: It beats Claude Opus 4.6 however nonetheless trails GPT-5.4

[ad_1]

Cursor’s new coding mannequin Composer 2 is right here: It beats Claude Opus 4.6 however nonetheless trails GPT-5.4

Contents

Cursor is pitching long-horizon coding, not simply higher completions The benchmark good points are substantial, even when GPT-5.4 nonetheless leads on one key chart Why the “locked to Cursor” level issues for patrons The larger image: Cursor is making an operational argument

Cursor, a San Francisco AI coding platform from startup Anysphere valued at $29.3 billion, has launched Composer 2, a brand new in-house coding mannequin now out there inside its agentic AI coding surroundings, and it gives drastically improved benchmarks from its prior in-house mannequin.

It's additionally launching and making Composer 2 Quick, a higher-priced however sooner variant, the default expertise for customers.

Right here's the price breakdown:

Composer 2 Customary: $0.50/$2.50 per 1 million enter/output tokens
Composer 2 Quick: at $1.50/$7.50 per 1 million enter/output tokens

That's a giant drop from Cursor's predecessor in-house mannequin, Composer 1.5, from February, which price $3.50 per million enter tokens and $17.50 per million output tokens; Composer 2 is about 86% cheaper on each counts.

Composer 2 Quick can be roughly 57% cheaper than Composer 1.5.

There's additionally reductions for "cache-read pricing," that’s, sending a few of the identical tokens in a immediate to the mannequin once more, of $0.20 per million tokens for Composer 2 and $0.35 per million for Composer 2 Quick, versus $0.35 per million for Composer 1.5.

It additionally issues that this seems to be a Cursor-native launch, not a broadly distributed standalone mannequin. Within the firm’s announcement and mannequin documentation, Composer 2 is described as out there in Cursor, tuned for Cursor’s agent workflow and built-in with the product’s software stack.

The supplies supplied don’t point out separate availability by way of exterior mannequin platforms or as a general-purpose API outdoors the Cursor surroundings.

Cursor is pitching long-horizon coding, not simply higher completions

The deeper technical declare on this launch shouldn’t be merely that Composer 2 scores greater than Composer 1.5. It’s that Cursor says the mannequin is best suited to long-horizon agentic coding.

In its weblog, Cursor says the standard good points come from its first continued pretraining run, which gave it a stronger base for scaled reinforcement studying. From there, the corporate says it educated Composer 2 on long-horizon coding duties and that the mannequin can remedy issues requiring tons of of actions.

That framing is vital as a result of it addresses one of many greatest unresolved points in coding AI. Many fashions are good at remoted code era. Far fewer stay dependable throughout an extended workflow that features studying a repository, deciding what to alter, enhancing a number of recordsdata, operating instructions, decoding failures and persevering with towards a objective.

Cursor’s documentation reinforces that that is the use case it cares about. It describes Composer 2 as an agentic mannequin with a 200,000-token context window, tuned for software use, file edits and terminal operations inside Cursor.

It additionally notes coaching methods akin to self-summarization for long-running duties. For builders already utilizing Cursor as their fundamental surroundings, that tighter tuning might matter greater than a generic leaderboard declare.

The benchmark good points are substantial, even when GPT-5.4 nonetheless leads on one key chart

Cursor’s printed outcomes present a transparent enchancment over prior Composer fashions. The corporate lists Composer 2 at 61.3 on CursorBench, 61.7 on Terminal-Bench 2.0, and 73.7 on SWE-bench Multilingual.

That compares with Composer 1.5 at 44.2, 47.9 and 65.9, and Composer 1 at 38.0, 40.0 and 56.9.

The discharge is extra measured than some mannequin launches as a result of Cursor shouldn’t be claiming common management.

On Terminal-Bench 2.0, which measures how effectively an AI agent performs duties in command line terminal-style interfaces, GPT-5.4 nonetheless leads at 75.1, whereas Composer 2 scores 61.7, forward of Opus 4.6 at 58.0, Opus 4.5 at 52.1 and Composer 1.5 at 47.9.

That makes Cursor’s pitch extra pragmatic and arguably extra helpful for patrons. The corporate shouldn’t be saying Composer 2 is the one greatest mannequin at every part. It’s saying the mannequin has moved right into a extra aggressive high quality tier whereas providing extra enticing economics and stronger integration with the product builders are already utilizing.

Cursor additionally included a performance-versus-cost chart on its CursorBench benchmarking suite that seems designed to make a Pareto-style argument for Composer 2.

In that graphic, Composer 2 sits at a stronger cost-to-performance level than Composer 1.5 and compares favorably with higher-cost GPT-5.4 and Opus 4.6 settings proven by Cursor. The corporate’s message shouldn’t be merely that Composer 2 scores greater than its predecessor, however that it could provide a extra environment friendly cost-to-intelligence tradeoff for on a regular basis coding work inside Cursor.

Why the “locked to Cursor” level issues for patrons

For readers deciding whether or not to make use of Composer 2, crucial query might not be benchmark efficiency alone. It could be whether or not they need a mannequin optimized for Cursor’s personal product expertise.

That may be a power. Based on the documentation, Composer 2 can entry Cursor’s agent software stack, together with semantic code search, file and folder search, file reads, file edits, shell instructions, browser management and internet entry.

That form of integration might be extra useful than uncooked mannequin high quality if the objective is to finish actual software program duties moderately than produce spectacular one-shot solutions.

But it surely additionally narrows the addressable viewers. Groups in search of a mannequin they will deploy broadly throughout a number of exterior instruments and platforms ought to acknowledge that Cursor is presenting Composer 2 as a mannequin for Cursor customers, not as a usually out there standalone basis mannequin.

The larger image: Cursor is making an operational argument

The importance of Composer 2 shouldn’t be that Cursor has all of a sudden taken the highest spot on each coding benchmark. It has not. The extra vital level is that Cursor is making an operational argument: its mannequin is getting higher, its pricing is low sufficient to encourage broader use, and its sooner tier is responsive sufficient that the corporate is comfy making it the default regardless of the upper price.

That mixture might resonate with engineering groups that more and more care much less about summary mannequin status and extra about whether or not an assistant can keep helpful throughout lengthy coding classes with out changing into prohibitively costly.

Cursor’s broader pricing construction helps body the aggressive stress round this launch. On its present pricing web page, Cursor gives a free Passion tier, a Professional plan at $20 per 30 days, Professional+ at $60 per 30 days, and Extremely at $200 per 30 days for particular person customers, with greater tiers providing extra utilization throughout fashions from OpenAI, Anthropic and Google.

On the enterprise facet, Groups prices $40 per consumer per 30 days, whereas Enterprise is custom-priced and provides pooled utilization, centralized billing, utilization analytics, privateness controls, SSO, audit logs and granular admin controls. In different phrases, Cursor isn’t just charging for entry to a coding mannequin. It’s charging for a managed utility layer that sits on prime of a number of mannequin suppliers whereas including workforce options, governance and workflow tooling.

That mannequin is more and more beneath stress as first-party AI firms push deeper into coding itself. OpenAI and Anthropic are now not simply promoting fashions by way of third-party merchandise; they’re additionally transport their very own coding interfaces, brokers and analysis frameworks — akin to Codex and Claude Code — elevating the query of how a lot room stays for an middleman platform.

Commenters on X, whereas unverified and never essentially consultant of the broader market, have more and more described shifting from Cursor to Anthropic’s Claude Code, particularly amongst energy customers drawn to terminal-first workflows, longer-running agent conduct and decrease perceived overhead.

A few of these posts describe frustration with Cursor’s pricing, context loss or editor-centric expertise, whereas praising Claude Code as a extra direct and absolutely agentic technique to work. Even handled cautiously, that form of social chatter factors to the strategic drawback Cursor faces: it has to show that its built-in platform, workforce controls and now its personal in-house fashions add sufficient worth to justify sitting between builders and the mannequin makers’ more and more succesful coding merchandise.

That makes Composer 2 strategically vital for Cursor.

By providing a less expensive in-house mannequin than Composer 1.5, tuning it tightly to Cursor’s personal software stack and making a sooner model the default, the corporate is making an attempt to point out that it supplies greater than a wrapper round outdoors techniques.

The problem is that as first-party coding merchandise enhance, builders and enterprise patrons might more and more ask whether or not they need a separate AI coding platform in any respect, or whether or not the mannequin makers’ personal instruments have gotten ample on their very own.

[ad_2]