By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
MadisonyMadisony
Notification Show More
Font ResizerAa
  • Home
  • National & World
  • Politics
  • Investigative Reports
  • Education
  • Health
  • Entertainment
  • Technology
  • Sports
  • Money
  • Pets & Animals
Reading: OpenCUA’s open supply computer-use brokers rival proprietary fashions from OpenAI and Anthropic
Share
Font ResizerAa
MadisonyMadisony
Search
  • Home
  • National & World
  • Politics
  • Investigative Reports
  • Education
  • Health
  • Entertainment
  • Technology
  • Sports
  • Money
  • Pets & Animals
Have an existing account? Sign In
Follow US
2025 © Madisony.com. All Rights Reserved.
Technology

OpenCUA’s open supply computer-use brokers rival proprietary fashions from OpenAI and Anthropic

Madisony
Last updated: August 23, 2025 2:17 am
Madisony
Share
OpenCUA’s open supply computer-use brokers rival proprietary fashions from OpenAI and Anthropic
SHARE

Need smarter insights in your inbox? Join our weekly newsletters to get solely what issues to enterprise AI, knowledge, and safety leaders. Subscribe Now


A brand new framework from researchers at The College of Hong Kong (HKU) and collaborating establishments gives an open supply basis for creating sturdy AI brokers that may function computer systems. The framework, known as OpenCUA, contains the instruments, knowledge, and recipes for scaling the event of computer-use brokers (CUAs).

Fashions skilled utilizing this framework carry out strongly on CUA benchmarks, outperforming present open supply fashions and competing intently with closed brokers from main AI labs like OpenAI and Anthropic.

The problem of constructing computer-use brokers

Laptop-use brokers are designed to autonomously full duties on a pc, from navigating web sites to working advanced software program. They’ll additionally assist automate workflows within the enterprise. Nevertheless, probably the most succesful CUA methods are proprietary, with essential particulars about their coaching knowledge, architectures, and growth processes saved non-public.

“As the shortage of transparency limits technical developments and raises security issues, the analysis group wants actually open CUA frameworks to review their capabilities, limitations, and dangers,” the researchers state in their paper.


AI Scaling Hits Its Limits

Energy caps, rising token prices, and inference delays are reshaping enterprise AI. Be part of our unique salon to find how high groups are:

  • Turning vitality right into a strategic benefit
  • Architecting environment friendly inference for actual throughput features
  • Unlocking aggressive ROI with sustainable AI methods

Safe your spot to remain forward: https://bit.ly/4mwGngO


On the identical time, open supply efforts face their very own set of hurdles. There was no scalable infrastructure for amassing the varied, large-scale knowledge wanted to coach these brokers. Present open supply datasets for graphical person interfaces (GUIs) have restricted knowledge, and plenty of analysis tasks present inadequate element about their strategies, making it tough for others to duplicate their work.

In line with the paper, “These limitations collectively hinder advances in general-purpose CUAs and prohibit a significant exploration of their scalability, generalizability, and potential studying approaches.”

Introducing OpenCUA

OpenCUA framework Supply: XLANG Lab at HKU

OpenCUA is an open supply framework designed to handle these challenges by scaling each the information assortment and the fashions themselves. At its core is the AgentNet Software for recording human demonstrations of pc duties on totally different working methods.

The software streamlines knowledge assortment by operating within the background on an annotator’s private pc, capturing display movies, mouse and keyboard inputs, and the underlying accessibility tree, which gives structured details about on-screen components. This uncooked knowledge is then processed into “state-action trajectories,” pairing a screenshot of the pc (the state) with the person’s corresponding motion (a click on, key press, and so forth.). Annotators can then evaluate, edit, and submit these demonstrations.

AgentNet software Supply: XLang Lab at HKU

Utilizing this software, the researchers collected the AgentNet dataset, which incorporates over 22,600 job demonstrations throughout Home windows, macOS, and Ubuntu, spanning greater than 200 functions and web sites. “This dataset authentically captures the complexity of human behaviors and environmental dynamics from customers’ private computing environments,” the paper notes.

Recognizing that screen-recording instruments increase vital knowledge privateness issues for enterprises, the researchers designed the AgentNet Software with safety in thoughts. Xinyuan Wang, co-author of the paper and PhD scholar at HKU, defined that they applied a multi-layer privateness safety framework. “First, annotators themselves can absolutely observe the information they generate… earlier than deciding whether or not to submit it,” he instructed VentureBeat. The info then undergoes handbook verification for privateness points and automatic scanning by a big mannequin to detect any remaining delicate content material earlier than launch. “This layered course of ensures enterprise-grade robustness for environments dealing with delicate buyer or monetary knowledge,” Wang added.

To speed up analysis, the group additionally curated AgentNetBench, an offline benchmark that gives a number of right actions for every step, providing a extra environment friendly method to measure an agent’s efficiency.

A brand new recipe for coaching brokers

The OpenCUA framework introduces a novel pipeline for processing knowledge and coaching computer-use brokers. Step one converts the uncooked human demonstrations into clear state-action pairs appropriate for coaching vision-language fashions (VLMs). Nevertheless, the researchers discovered that merely coaching fashions on these pairs yields restricted efficiency features, even with massive quantities of information.

OpenCUA chain-of-thought pipeline Supply: XLang Lab at HKU

The important thing perception was to reinforce these trajectories with chain-of-thought (CoT) reasoning. This course of generates an in depth “inside monologue” for every motion, which incorporates planning, reminiscence, and reflection. This structured reasoning is organized into three ranges: a high-level statement of the display, reflective ideas that analyze the scenario and plan the subsequent steps, and at last, the concise, executable motion. This method helps the agent develop a deeper understanding of the duties.

“We discover pure language reasoning essential for generalizable computer-use basis fashions, serving to CUAs internalize cognitive capabilities,” the researchers write.

This knowledge synthesis pipeline is a normal framework that may be tailored by corporations to coach brokers on their very own distinctive inner instruments. In line with Wang, an enterprise can file demonstrations of its proprietary workflows and use the identical “reflector” and “generator” pipeline to create the required coaching knowledge. “This permits them to bootstrap a high-performing agent tailor-made to their inner instruments without having to handcraft reasoning traces manually,” he defined.

Placing OpenCUA to the take a look at

The researchers utilized the OpenCUA framework to coach a spread of open supply VLMs, together with variants of Qwen and Kimi-VL, with parameter sizes from 3 billion to 32 billion. The fashions had been evaluated on a collection of on-line and offline benchmarks that take a look at their capacity to carry out duties and perceive GUIs.

The 32-billion-parameter mannequin, OpenCUA-32B, established a brand new state-of-the-art success price amongst open supply fashions on the OSWorld-Verified benchmark. It additionally surpassed OpenAI’s GPT-4o-based CUA and considerably closed the efficiency hole with Anthropic’s main proprietary fashions.

OpenCUA reveals huge enchancment over base fashions (left) whereas competing with main CUA fashions (proper) Supply: XLANG Lab at HKU

For enterprise builders and product leaders, the analysis affords a number of key findings. The OpenCUA methodology is broadly relevant, enhancing efficiency on fashions with totally different architectures (each dense and mixture-of-experts) and sizes. The skilled brokers additionally present robust generalization, performing properly throughout a various vary of duties and working methods.

In line with Wang, the framework is especially suited to automating repetitive, labor-intensive enterprise workflows. “For instance, within the AgentNet dataset, we already seize a couple of demonstrations of launching EC2 situations on Amazon AWS and configuring annotation parameters on MTurk,” he instructed VentureBeat. “These duties contain many sequential steps however comply with repeatable patterns.”

Nevertheless, Wang famous that bridging the hole to stay deployment requires addressing key challenges round security and reliability. “The largest problem in actual deployment is security and reliability: the agent should keep away from errors that would inadvertently alter system settings or set off dangerous negative effects past the meant job,” he mentioned.

The researchers have launched the code, dataset, and weights for his or her fashions.

As open supply brokers constructed on frameworks like OpenCUA turn into extra succesful, they may essentially evolve the connection between information staff and their computer systems. Wang envisions a future the place proficiency in advanced software program turns into much less vital than the power to obviously articulate targets to an AI agent.

He described two major modes of labor: “offline automation, the place the agent leverages its broader software program information to pursue a job end-to-end,” and “on-line collaboration, the place the agent responds in real-time and works facet by facet with the human, very like a colleague.” Mainly, the people will present the strategic “what,” whereas more and more refined AI brokers deal with the operational “how.”

Every day insights on enterprise use circumstances with VB Every day

If you wish to impress your boss, VB Every day has you coated. We provide the inside scoop on what corporations are doing with generative AI, from regulatory shifts to sensible deployments, so you’ll be able to share insights for max ROI.

Learn our Privateness Coverage

Thanks for subscribing. Take a look at extra VB newsletters right here.

An error occured.


Subscribe to Our Newsletter
Subscribe to our newsletter to get our newest articles instantly!
[mc4wp_form]
Share This Article
Email Copy Link Print
Previous Article This Penny Inventory Is Betting Huge on Meme Crypto BONK. Ought to You Purchase SHOT Shares Right here? This Penny Inventory Is Betting Huge on Meme Crypto BONK. Ought to You Purchase SHOT Shares Right here?
Next Article ‘Appears they rely an excessive amount of on glue’ ‘Appears they rely an excessive amount of on glue’
Leave a Comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR

Quarterfinals Match Recap: LIV Golf Michigan Staff Championship
Sports

Quarterfinals Match Recap: LIV Golf Michigan Staff Championship

Netanyahu nurtured Hamas ‘for years,’ Avigdor Liberman costs
National & World

Netanyahu nurtured Hamas ‘for years,’ Avigdor Liberman costs

Las Vegas neighborhood surprised with 600+ illegally parked vehicles at rising church
Politics

Las Vegas neighborhood surprised with 600+ illegally parked vehicles at rising church

Local weather Change Is Bringing Legionnaire’s Illness to a City Close to You
Technology

Local weather Change Is Bringing Legionnaire’s Illness to a City Close to You

Malik Beasley not goal of federal playing probe
Investigative Reports

Malik Beasley not goal of federal playing probe

Buyers zero in on Nvidia outcomes as US tech shares waver
Money

Buyers zero in on Nvidia outcomes as US tech shares waver

Manchester Metropolis vs. Tottenham Hotspur dwell stream: The place to observe Premier League on-line, begin time, odds
Sports

Manchester Metropolis vs. Tottenham Hotspur dwell stream: The place to observe Premier League on-line, begin time, odds

You Might Also Like

9 Greatest Planners of 2025: Weekly, Day by day, and Month-to-month
Technology

9 Greatest Planners of 2025: Weekly, Day by day, and Month-to-month

There's nothing like the sensation of a brand-new planner. As the brand new college yr kicks off, we have discovered…

32 Min Read
AI engineers at firms like Meta and OpenAI are being supplied big pay packages. What are the ethics of working in a frontier AI lab?
Technology

AI engineers at firms like Meta and OpenAI are being supplied big pay packages. What are the ethics of working in a frontier AI lab?

It’s a very good time to be a extremely in-demand AI engineer. To lure main researchers away from OpenAI and…

10 Min Read
The Apple Watch Is Lastly Getting Blood Oxygen Sensing Again
Technology

The Apple Watch Is Lastly Getting Blood Oxygen Sensing Again

You probably have an Apple Watch Sequence 9, 10, or Extremely Watch 2 that you simply purchased within the US…

3 Min Read
The 55 Greatest Exhibits on Disney+ Proper Now (August 2025)
Technology

The 55 Greatest Exhibits on Disney+ Proper Now (August 2025)

Disney+, in case you didn’t know, isn’t only for children. With its possession of the Lucasfilm model and the Marvel…

48 Min Read
Madisony

We cover the stories that shape the world, from breaking global headlines to the insights behind them. Our mission is simple: deliver news you can rely on, fast and fact-checked.

Recent News

Quarterfinals Match Recap: LIV Golf Michigan Staff Championship
Quarterfinals Match Recap: LIV Golf Michigan Staff Championship
August 23, 2025
Netanyahu nurtured Hamas ‘for years,’ Avigdor Liberman costs
Netanyahu nurtured Hamas ‘for years,’ Avigdor Liberman costs
August 23, 2025
Las Vegas neighborhood surprised with 600+ illegally parked vehicles at rising church
Las Vegas neighborhood surprised with 600+ illegally parked vehicles at rising church
August 23, 2025

Trending News

Quarterfinals Match Recap: LIV Golf Michigan Staff Championship
Netanyahu nurtured Hamas ‘for years,’ Avigdor Liberman costs
Las Vegas neighborhood surprised with 600+ illegally parked vehicles at rising church
Local weather Change Is Bringing Legionnaire’s Illness to a City Close to You
Malik Beasley not goal of federal playing probe
  • About Us
  • Privacy Policy
  • Terms Of Service
Reading: OpenCUA’s open supply computer-use brokers rival proprietary fashions from OpenAI and Anthropic
Share

2025 © Madisony.com. All Rights Reserved.

Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?