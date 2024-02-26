



Google researchers have unveiled a new artificial intelligence model that can take a text prompt, sketch, or idea and turn it into a virtual world you can interact with and play with.

The virtual world model, called Genie, was trained on gameplay and other videos found online, and is currently only a research preview. The game is more of a 2D platformer than full VR.

While this may still be a long way from a true holodeck like Star Trek, it does show that you might one day be able to walk into a room and create a fully interactive adventure from just a few words.

What is Google Genie?

In the world of AI, people talk about opening Pandora's box and letting the genie out of the lamp to illustrate the reality that content can be created with relatively little effort. In fact, just as humans acquire skills over years, AI models require extensive training.

You can't just rub the lamp and expect a genie to come out. First the lamp must be filled with knowledge and abilities. In Genie's case, it's derived from “a large dataset of publicly available internet videos,” and engineers went to great lengths to create the model's code and weights.

Tim Rocktäschel, Genie's Google DeepMind team lead, wrote in X that the team used a dataset consisting of more than 200,000 hours of video from 2D platformers and focused on scale.

It was trained unsupervised using unlabeled videos. This allows you to learn different movements, controls, and actions for your character and do so in a consistent way. As a result, “our model can transform any image into his playable 2D world,” he explained Rocktäschel.

What does this actually mean?

There are many tools on the market that allow you to take website or app mockups created by graphic designers and turn them into code.

This may not always be the best code, but it will create a functional prototype that you can use. There are also AI tools that create websites from text prompts.

With Genie, you can essentially draw a sketch on paper, give it a perfectly crafted digital art, give it an AI-generated 2D depiction of the world, and Genie will do the rest. .

We're excited to reveal what the Open Endedness team at @GoogleDeepMind has been up to. Genie is a foundational world model trained solely from internet videos that, given image prompts, can generate his 2D world with an infinite variety of controllable actions.

Generate the images and other assets needed to turn your sketch into a fully realized open world, predicting the next pixel frame based on actions provided by the player.

The authors used a tokenizer that compresses the video into individual tokens. It is then sent to the action model and the transition between the two frames is encoded as one of eight potential actions. Then use another model to predict future frames.

The solution to bring it all together was the same breakthrough that OpenAI delivered with Sora. That means a lot of data and the same amount of computing power.

What will happen to Genie in the future?

Genie doesn't have a release date, and since it's a research project, it's unclear if it will ever become a real product. It's possible that someday you'll be able to pick up your best Android smartphone and have your assistant make you a game of avoiding vampires, but that won't happen for a few years.

Even more important are the underlying technologies and new approaches to content generation developed during its creation, such as label-free learning that leads to an open world.

Rocktäschel mentioned Sora on X, specifically the idea that it is a “world model.” He said that while it is impressive and visually stunning, “the world model needs 'action'.” He added, “Genie is an action-controllable world model, but it is trained completely unsupervised from video.”

Another major advance brought by Genie is a deeper understanding of real-world physics, which can be used to train robots to move more efficiently through their environments and to You will be able to complete tasks that you have never done before.

