A workforce of researchers led by Nvidia has launched DreamDojo, a brand new AI system designed to show robots learn how to work together with the bodily world by watching tens of hundreds of hours of human video — a improvement that might considerably scale back the time and price required to coach the following technology of humanoid machines.
The analysis, revealed this month and involving collaborators from UC Berkeley, Stanford, the College of Texas at Austin, and a number of other different establishments, introduces what the workforce calls "the primary robotic world mannequin of its type that demonstrates sturdy generalization to numerous objects and environments after post-training."
On the core of DreamDojo is what the researchers describe as "a large-scale video dataset" comprising "44k hours of numerous human selfish movies, the biggest dataset to this point for world mannequin pretraining." The dataset, referred to as DreamDojo-HV, is a dramatic leap in scale — "15x longer period, 96x extra expertise, and a pair of,000x extra scenes than the beforehand largest dataset for world mannequin coaching," in accordance with the mission documentation.
Contained in the two-phase coaching system that teaches robots to see like people
The system operates in two distinct phases. First, DreamDojo "acquires complete bodily information from large-scale human datasets by pre-training with latent actions." Then it undergoes "post-training on the goal embodiment with steady robotic actions" — primarily studying common physics from watching people, then fine-tuning that information for particular robotic {hardware}.
For enterprises contemplating humanoid robots, this method addresses a cussed bottleneck. Educating a robotic to govern objects in unstructured environments historically requires huge quantities of robot-specific demonstration knowledge — costly and time-consuming to gather. DreamDojo sidesteps this drawback by leveraging present human video, permitting robots to be taught from commentary earlier than ever touching a bodily object.
One of many technical breakthroughs is pace. By way of a distillation course of, the researchers achieved "real-time interactions at 10 FPS for over 1 minute" — a functionality that permits sensible functions like reside teleoperation and on-the-fly planning. The workforce demonstrated the system working throughout a number of robotic platforms, together with the GR-1, G1, AgiBot, and YAM humanoid robots, exhibiting what they name "sensible action-conditioned rollouts" throughout "a variety of environments and object interactions."
Why Nvidia is betting huge on robotics as AI infrastructure spending soars
The discharge comes at a pivotal second for Nvidia's robotics ambitions — and for the broader AI trade. On the World Financial Discussion board in Davos final month, CEO Jensen Huang declared that AI robotics represents a "once-in-a-generation" alternative, significantly for areas with sturdy manufacturing bases. In accordance with Digitimes, Huang has additionally acknowledged that the following decade can be "a crucial interval of accelerated improvement for robotics expertise."
The monetary stakes are huge. Huang instructed CNBC's "Halftime Report" on February 6 that the tech trade's capital expenditures — doubtlessly reaching $660 billion this yr from main hyperscalers — are "justified, applicable and sustainable." He characterised the present second as "the biggest infrastructure buildout in human historical past," with firms like Meta, Amazon, Google, and Microsoft dramatically rising their AI spending.
That infrastructure push is already reshaping the robotics panorama. Robotics startups raised a file $26.5 billion in 2025, in accordance with knowledge from Dealroom. European industrial giants together with Siemens, Mercedes-Benz, and Volvo have introduced robotics partnerships up to now yr, whereas Tesla CEO Elon Musk has claimed that 80 p.c of his firm's future worth will come from its Optimus humanoid robots.
How DreamDojo might rework enterprise robotic deployment and testing
For technical decision-makers evaluating humanoid robots, DreamDojo's most speedy worth could lie in its simulation capabilities. The researchers spotlight downstream functions together with "dependable coverage analysis with out real-world deployment and model-based planning for test-time enchancment" — capabilities that might let firms simulate robotic conduct extensively earlier than committing to expensive bodily trials.
This issues as a result of the hole between laboratory demonstrations and manufacturing facility flooring stays vital. A robotic that performs flawlessly in managed circumstances typically struggles with the unpredictable variations of real-world environments — totally different lighting, unfamiliar objects, surprising obstacles. By coaching on 44,000 hours of numerous human video spanning hundreds of scenes and practically 100 distinct expertise, DreamDojo goals to construct the form of common bodily instinct that makes robots adaptable reasonably than brittle.
The analysis workforce, led by Linxi "Jim" Fan, Joel Jang, and Yuke Zhu, with Shenyuan Gao and William Liang as co-first authors, has indicated that code can be launched publicly, although a timeline was not specified.
The larger image: Nvidia's transformation from gaming big to robotics powerhouse
Whether or not DreamDojo interprets into industrial robotics merchandise stays to be seen. However the analysis indicators the place Nvidia's ambitions are heading as the corporate more and more positions itself past its gaming roots. As Kyle Barr noticed at Gizmodo earlier this month, Nvidia now views "something associated to gaming and the 'private laptop'" as "outliers on Nvidia's quarterly spreadsheets."
The shift displays a calculated wager: that the way forward for computing is bodily, not simply digital. Nvidia has already invested $10 billion in Anthropic and signaled plans to take a position closely in OpenAI's subsequent funding spherical. DreamDojo suggests the corporate sees humanoid robots as the following frontier the place its AI experience and chip dominance can converge.
For now, the 44,000 hours of human video on the coronary heart of DreamDojo signify one thing extra basic than a technical benchmark. They signify a concept — that robots can be taught to navigate our world by watching us reside in it. The machines, it seems, have been taking notes.

