logo
Google DeepMind's Genie 3 can dynamically alter the state of its simulated worlds

Google DeepMind's Genie 3 can dynamically alter the state of its simulated worlds

Engadget14 hours ago
At start of December, Google DeepMind released Genie 2. The Genie family of AI systems are what are known as world models. They're capable of generating images as the user — either a human or, more likely, an automated AI agent — moves through the world the software is simulating. The resulting video of the model in action may look like a video game, but DeepMind has always positioned Genie 2 as a way to train other AI systems to be better at what they're designed to accomplish. With its new Genie 3 model, which the lab announced on Tuesday, DeepMind believes it has made an even better system for training AI agents.
At first glance, the jump between Genie 2 and 3 isn't as dramatic as the one the model made last year. With Genie 2, DeepMind's system became capable of generating 3D worlds, and could accurately reconstruct part of the environment even after the user or an AI agent left it to explore other parts of the generated scene. Environmental consistency was often a weakness of prior world models. For instance, Decart's Oasis system had trouble remembering the layout of the Minecraft levels it would generate.
By comparison, the enhancements offered by Genie 3 seem more modest, but in a press briefing Google held ahead of today's official announcement, Shlomi Fruchter, research director at DeepMind, and Jack Parker-Holder, research scientist at DeepMind, argued they represent important stepping stones in the road toward artificial general intelligence.
So what exactly does Genie 3 do better? To start, it outputs footage at 720p, instead of 360p like its predecessor. It's also capable of sustaining a "consistent" simulation for longer. Genie 2 had a theoretical limit of up to 60 seconds, but in practice the model would often start to hallucinate much earlier. By contrast, DeepMind says Genie 3 is capable of running for several minutes before it starts producing artifacts.
Also new to the model is a capability DeepMind calls "promptable world events." Genie 2 was interactive insofar as the user or an AI agent was able to input movement commands and the model would respond after it had a few moments to generate the next frame. Genie 3 does this work in real-time. Moreover, it's possible to tweak the simulation with text prompts that instruct Genie to alter the state of the world it's generating. In a demo DeepMind showed, the model was told to insert a herd of deer into a scene of a person skiing down a mountain. The deer didn't move in the most realistic manner, but this is the killer feature of Genie 3, says DeepMind.
As mentioned before, the lab primarily envisions the model as a tool for training and evaluating AI agents. DeepMind says Genie 3 could be used to teach AI systems to tackle "what if" scenarios that aren't covered by their pre-training. "There are a lot of things that have to happen before a model can be deployed in the real world, but we do see it as a way to more efficiently train models and increase their reliability," said Fruchter, pointing to, for example, a scenario where Genie 3 could be used to teach a self-driving car how to safely avoid a pedestrian that walks in front of it.
Despite the improvements DeepMind has made to Genie, the lab acknowledges there's much work to be done. For instance, the model can't generate real-world locations with perfect accuracy, and it struggles with text rendering. Moreover, for Genie to be truly useful, DeepMind believes the model needs to be able to sustain a simulated world for hours, not minutes. Still, the lab feels Genie is ready to make a real-world impact.
"We already at the point where you wouldn't use [Genie] as your sole training environment, but you can certainly finds things you wouldn't want agents to do because if they act unsafe in some settings, even if those settings aren't perfect, it's still good to know," said Parker-Holder. "You can already see where this is going. It will get increasingly useful as the models get better."
For the time being, Genie 3 isn't available to the general public. However, DeepMind says it's working to make the model available to additional testers.
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Could Genie 3 From Google DeepMind Resurrect VR For Education?
Could Genie 3 From Google DeepMind Resurrect VR For Education?

Forbes

time10 hours ago

  • Forbes

Could Genie 3 From Google DeepMind Resurrect VR For Education?

Virtual Reality has long promised to enhance education. It offered visions of students exploring Ancient Rome in 3D, dissecting virtual frogs without the mess, or walking through the human bloodstream at a microscopic scale. Yet those visions rarely made it into everyday classrooms. Why? Creating accurate and robust VR learning experiences just isn't realistic for the average educator. Teachers already juggle lesson planning, grading and classroom management. Expecting them to master 3D modeling software, animation pipelines and game engine scripting was never realistic. Even companies that specialize in educational VR often struggle with the cost and complexity of developing interactive content that is both engaging and accurate. Could all of that be about to change? Google DeepMind just announced Genie 3, and it's quite frankly astonishing. What is Genie 3? Genie 3 generates interactive 3D environments in real time, from simple text prompts. Type "a rainforest ecosystem" or "the surface of Mars in 2050" and Genie responds by building immersive, explorable worlds in seconds. These aren't pre-rendered videos. They're dynamic, reactive spaces that users can navigate and interact with at 24 frames per second. This capability isn't entirely new. Previous iterations, like Genie 1 and Genie 2, and other video-generation models such as Veo 2, began to explore what was possible. But they lacked the real-time interactivity and environmental consistency needed for serious educational use. Genie 3 seems to bridge that gap. It allows learners to explore a world, revisit locations, and witness events unfold with continuity. For educators, this could be the tipping point. Building a virtual field trip used to require teams of developers, designers and researchers. Genie 3 collapses that workflow into a few lines of text. A teacher preparing a lesson on climate zones might input: "A desert landscape transitions into a temperate forest, then a polar ice cap." Genie 3 renders it on the spot, complete with weather patterns and animal behavior. This ease of creation addresses the most critical bottleneck: time. Genie 3 may make certain types of immersive teaching possible for the first time. A history teacher could summon ancient Babylon and guide students through its streets. A physics teacher could create zero-gravity environments to demonstrate Newton's laws. Genie 3 also allows "promptable world events." This means educators can inject interactivity into the scene. Want to demonstrate the impact of deforestation? Trigger a scenario where logging machines clear a portion of the forest. Students can observe changes in weather, animal migration, and biodiversity. These are not scripted animations. They are emergent responses, built on the fly based on user inputs. This level of control and flexibility could move Genie 3 beyond novelty. Could it become a tool for critical thinking and exploration? Students not just observing, but experimenting. Genie 3 Limitations Despite the excitement, limitations remain. Genie 3 can't yet model real-world locations with geographic precision. It doesn't simulate complex interactions between multiple agents, meaning multiplayer educational scenarios are still out of reach. And while the system supports a few minutes of consistent interaction, it isn't designed for extended sessions. But these are technical constraints, not conceptual ones. The trajectory is clear. This raises new questions. What happens when content creation becomes so easy that anyone can build a virtual experience? Who ensures accuracy? Who reviews for bias? In classrooms, these questions matter deeply. A world model that misrepresents historical events or scientific principles could mislead students at scale. DeepMind acknowledges this. Genie 3 is being released gradually, with oversight from its Responsible Development & Innovation Team. Only selected researchers and creators have access for now. That approach slows widespread adoption but gives space to refine safeguards. Even in this early phase, it's clear that Genie 3 could redefine what is possible in educational content creation. We could be entering a time when educators no longer have to choose between depth and interactivity. No longer spend months developing a single VR lesson. If Genie 3 delivers on its promise, or indeed Genie 4 or 5, immersive learning will move from the margins to the mainstream. The real power of Genie 3 isn't in its graphics or speed. It lies in who gets to use it. When a teacher or a student with no technical background can build a realistic simulation in seconds, the conversation around educational VR changes. From "why don't more schools use this?" to "how will we use this next?" VR in education hasn't failed. It's been waiting. Waiting for a tool that matches the ambitions of the classroom. Genie 3 might just be that tool.

Google DeepMind's Genie 3 can dynamically alter the state of its simulated worlds
Google DeepMind's Genie 3 can dynamically alter the state of its simulated worlds

Engadget

time14 hours ago

  • Engadget

Google DeepMind's Genie 3 can dynamically alter the state of its simulated worlds

At start of December, Google DeepMind released Genie 2. The Genie family of AI systems are what are known as world models. They're capable of generating images as the user — either a human or, more likely, an automated AI agent — moves through the world the software is simulating. The resulting video of the model in action may look like a video game, but DeepMind has always positioned Genie 2 as a way to train other AI systems to be better at what they're designed to accomplish. With its new Genie 3 model, which the lab announced on Tuesday, DeepMind believes it has made an even better system for training AI agents. At first glance, the jump between Genie 2 and 3 isn't as dramatic as the one the model made last year. With Genie 2, DeepMind's system became capable of generating 3D worlds, and could accurately reconstruct part of the environment even after the user or an AI agent left it to explore other parts of the generated scene. Environmental consistency was often a weakness of prior world models. For instance, Decart's Oasis system had trouble remembering the layout of the Minecraft levels it would generate. By comparison, the enhancements offered by Genie 3 seem more modest, but in a press briefing Google held ahead of today's official announcement, Shlomi Fruchter, research director at DeepMind, and Jack Parker-Holder, research scientist at DeepMind, argued they represent important stepping stones in the road toward artificial general intelligence. So what exactly does Genie 3 do better? To start, it outputs footage at 720p, instead of 360p like its predecessor. It's also capable of sustaining a "consistent" simulation for longer. Genie 2 had a theoretical limit of up to 60 seconds, but in practice the model would often start to hallucinate much earlier. By contrast, DeepMind says Genie 3 is capable of running for several minutes before it starts producing artifacts. Also new to the model is a capability DeepMind calls "promptable world events." Genie 2 was interactive insofar as the user or an AI agent was able to input movement commands and the model would respond after it had a few moments to generate the next frame. Genie 3 does this work in real-time. Moreover, it's possible to tweak the simulation with text prompts that instruct Genie to alter the state of the world it's generating. In a demo DeepMind showed, the model was told to insert a herd of deer into a scene of a person skiing down a mountain. The deer didn't move in the most realistic manner, but this is the killer feature of Genie 3, says DeepMind. As mentioned before, the lab primarily envisions the model as a tool for training and evaluating AI agents. DeepMind says Genie 3 could be used to teach AI systems to tackle "what if" scenarios that aren't covered by their pre-training. "There are a lot of things that have to happen before a model can be deployed in the real world, but we do see it as a way to more efficiently train models and increase their reliability," said Fruchter, pointing to, for example, a scenario where Genie 3 could be used to teach a self-driving car how to safely avoid a pedestrian that walks in front of it. Despite the improvements DeepMind has made to Genie, the lab acknowledges there's much work to be done. For instance, the model can't generate real-world locations with perfect accuracy, and it struggles with text rendering. Moreover, for Genie to be truly useful, DeepMind believes the model needs to be able to sustain a simulated world for hours, not minutes. Still, the lab feels Genie is ready to make a real-world impact. "We already at the point where you wouldn't use [Genie] as your sole training environment, but you can certainly finds things you wouldn't want agents to do because if they act unsafe in some settings, even if those settings aren't perfect, it's still good to know," said Parker-Holder. "You can already see where this is going. It will get increasingly useful as the models get better." For the time being, Genie 3 isn't available to the general public. However, DeepMind says it's working to make the model available to additional testers.

DeepMind reveals Genie 3, a world model that could be the key to reaching AGI
DeepMind reveals Genie 3, a world model that could be the key to reaching AGI

TechCrunch

time14 hours ago

  • TechCrunch

DeepMind reveals Genie 3, a world model that could be the key to reaching AGI

Google DeepMind has revealed Genie 3, its latest foundation world model that the AI lab says presents a crucial stepping stone on the path to artificial general intelligence, or human-like intelligence. 'Genie 3 is the first real-time interactive general purpose world model,' Shlomi Fruchter, a research director at DeepMind, said during a press briefing. 'It goes beyond narrow world models that existed before. It's not specific to any particular environment. It can generate both photo-realistic and imaginary worlds, and everything in between.' Genie 3, which is still in research preview and not publicly available, builds on both its predecessor Genie 2 – which can generate new environments for agents – and DeepMind's latest video generation model Veo 3 – which exhibits a deep understanding of physics. Image Credits:Google DeepMind With a simple text prompt, Genie 3 can generate multiple minutes – up from 10 to 20 seconds in Genie 2 – of diverse, interactive, 3D environments at 24 frames per second with a resolution of 720p. The model also features 'promptable world events,' or the ability to use a prompt to change the generated world. Perhaps most importantly, Genie 3's simulations stay physically consistent over time because the model is able to remember what it had previously generated – an emergent capability that DeepMind researchers didn't explicitly program into the model. Fruchter said that while Genie 3 clearly has implications for educational experiences and new generative media like gaming or prototyping creative concepts, its real unlock will manifest in training agents for general purpose tasks, which he said is essential to reaching AGI. 'We think world models are key on the path to AGI, specifically for embodied agents, where simulating real world scenarios is particularly challenging,'Jack Parker-Holder, a research scientist on DeepMind's open-endedness team, said during a briefing. Techcrunch event Tech and VC heavyweights join the Disrupt 2025 agenda Netflix, ElevenLabs, Wayve, Sequoia Capital — just a few of the heavy hitters joining the Disrupt 2025 agenda. They're here to deliver the insights that fuel startup growth and sharpen your edge. Don't miss the 20th anniversary of TechCrunch Disrupt, and a chance to learn from the top voices in tech — grab your ticket now and save up to $675 before prices rise on August 7. Tech and VC heavyweights join the Disrupt 2025 agenda Netflix, ElevenLabs, Wayve, Sequoia Capital — just a few of the heavy hitters joining the Disrupt 2025 agenda. They're here to deliver the insights that fuel startup growth and sharpen your edge. Don't miss the 20th anniversary of TechCrunch Disrupt, and a chance to learn from the top voices in tech — grab your ticket now and save up to $675 before prices rise. San Francisco | REGISTER NOW Image Credits:Google DeepMind Genie 3 is designed to solve that bottleneck. Like Veo, it doesn't rely on a hard-coded physics engine. Instead, it teaches itself how the world works – how objects move, fall, and interact – by remembering what it has generated and reasoning over long time horizons. 'The model is auto-regressive, meaning it generates one frame at a time,' Fruchter told TechCrunch in a separate interview. 'It has to look back at what was generated before to decide what's going to happen next. That's a key part of the architecture.' That memory creates consistency in its simulated worlds, and that consistency allows it to develop a kind of intuitive grasp of physics, similar to how humans understand that a glass teetering on the edge of a table is about to fall, or that they should duck to avoid a falling object. This ability to simulate coherent, physically plausible environments over time makes Genie 3 much more than a generative model. It becomes an ideal training ground for general-purpose agents. Not only can it generate endless, diverse worlds to explore, but it also has the potential to push agents to their limits – forcing them to adapt, struggle, and learn from their own experience in a way that mirrors how humans learn in the real world. Image Credits:Google DeepMind Currently, the range of actions an agent can take is still limited. For example, the promptable world events allow for a wide range of environmental interventions, but they're not necessarily performed by the agent itself. Similarly, it's still difficult to accurately model complex interactions between multiple independent agents in a shared environment. Genie 3 can also only support a few minutes of continuous interaction, when hours would be necessary for proper training. Still, Genie 3 presents a compelling step forward in teaching agents to go beyond reacting to inputs so they can plan, explore, seek out uncertainty, and improve through trial and error – the kind of self-driven, embodied learning that's key in moving towards general intelligence. 'We haven't really had a Move 37 moment for embodied agents yet, where they can actually take novel actions in the real world,' Parker-Holder said, referring to the legendary moment in the 2016 game of Go between DeepMind's AI agent AlphaGo and world champion Lee Sedol, in which Alpha Go played an unconventional and brilliant move that became symbolic of AI's ability to discover new strategies beyond human understanding. 'But now, we can potentially usher in a new era,' he said.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store