Nvidia is getting into world models — AI models that take inspiration from the mental models of the world that humans develop naturally.
At CES 2025 in Las Vegas, the company announced that it is making openly available a family of world models that can predict and generate “physics-aware” videos. Nvidia is calling this family Cosmos World Foundation Models, or Cosmos WFMs for short.
The models, which can be fine-tuned for specific applications, are available from Nvidia’s API and NGC catalogs, GitHub, and the AI dev platform Hugging Face.
Nvidia is making available the first wave of Cosmos WFMs for physics-based simulation and synthetic data generation,” the company wrote in a blog post provided to TechCrunch. “Researchers and developers, regardless of their company size, can freely use the Cosmos models under Nvidia’s permissive open model license that allows commercial usage.”
There are a number of models in the Cosmos WFM family, divided into three categories: Nano for low latency and real-time applications, Super for “highly performant baseline” models, and Ultra for maximum quality and fidelity outputs.
The models range in size from 4 billion to 14 billion parameters, with Nano being the smallest and Ultra being the largest. Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.
As a part of Cosmos WFM, Nvidia is also releasing an “upsampling model,” a video decoder optimized for augmented reality, and guardrail models to ensure responsible use, as well as fine-tuned models for applications like generating sensor data for autonomous vehicle development. These, as well as the other Cosmos WFM models, were trained on 9,000 trillion tokens from 20 million hours of real-world human interactions, environment, industrial, robotics, and driving data, Nvidia said. (In AI, “tokens” represent bits of raw data — in this case, video footage.)
Nvidia wouldn’t say where this training data came from, but at least one report — and lawsuit — alleges that the company trained on copyrighted YouTube videos without permission.
When reached for comment, an Nvidia spokesperson told TechCrunch that Cosmos “isn’t designed to copy or infringe any protected works.”
“Cosmos learns just like people learn,” the spokesperson said. “To help Cosmos learn, we gathered data from a variety of public and private sources and are confident our use of data is consistent with both the letter and spirit of the law. Facts about how the world works — which are what the Cosmos models learn — are not copyrightable or subject to the control of any individual author or company.”