We believe the next major advancement in AI will come from systems that understand the visual world and its dynamics, which is why we’re starting a new long-term research effort around what we call general world models.
A world model is an AI system that builds an internal representation of an environment, and uses it to simulate future events within that environment. Research in world models has so far been focused on very limited and controlled settings, either in toy simulated worlds (like those of video games) or narrow contexts (such as developing world models for driving). The aim of general world models will be to represent and simulate a wide range of situations and interactions, like those encountered in the real world.
You can think of video generative systems such as Gen-2 as very early and limited forms of general world models. In order for Gen-2 to generate realistic short videos, it has developed some understanding of physics and motion. However, it’s still very limited in its capabilities, struggling with complex camera or object motions, among other things.
To build general world models, there are several open research challenges that we’re working on. For one, those models will need to generate consistent maps of the environment, and the ability to navigate and interact in those environments. They need to capture not just the dynamics of the world, but the dynamics of its inhabitants, which involves also building realistic models of human behavior.
We are building a team to tackle those challenges. If you’re interested in joining this research effort, we’d love to hear from you.