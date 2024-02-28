



Google has introduced another generative artificial intelligence (AI) model that can create an infinite number of 2D platformer video games. Genie is touted as an action-controllable world model trained on unsupervised video game data. Predictive analytics can also be used to generate video game levels to control playable characters and determine their movements. Interestingly, earlier this month, OpenAI also introduced a world model called Sora that can generate hyper-realistic videos of up to one minute in length.

The announcement was made by Tim Rocktschel, OpenEndedness Team Lead at Google DeepMind, through a series of posts on X (formerly Twitter). He says, “We introduce Genie, a foundational world model trained solely from internet videos that, given an image prompt, can generate his 2D world with an infinite variety of controllable actions. ”. Genie is unique in that it can only generate certain things, and is also the only video game generation model published to date.

Google's Genie AI model is not yet publicly available and exists only as a research model at this time. This is why user-centric features are still unknown. You can use images to generate video game levels, but it's unclear if you'll be able to receive text or even video prompts. A preprint version of the paper highlighting the technical aspects has been posted online. The AI ​​model was trained unsupervised using 200,000 hours of video game footage and contains 11 billion parameters. The model's architecture uses his three different parts: a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model.

How Google Genie works

For simplicity, a spatiotemporal video tokenizer takes video game footage and breaks it into small dataset chunks called tokens that can be used in the underlying model. In spatiotemporal, data is described as being split in both time and space (for example, a video was split into 2 second clips, but each frame was also split into multiple parts) .

Next comes the autoregressive dynamic model. Autoregressive models basically predict the future based on how something has performed in the past, while dynamic models are responsible for understanding how things change and move over time. there is. Therefore, this is where predictive analytics begins. The final component is the latent action model. Here, the AI ​​understands how playable characters move and traverse the video game world.

The potential action space learned by Genie is not only diverse and consistent, but also interpretable. Rockchel says that humans typically understand the mapping to semantically meaningful actions (left, right, jump, etc.) after a few turns. This part explains that the main problem this AI model solves is not only generating 2D video game levels, but also understanding how basic movement occurs and using that information to navigate real-world terrain. It is important because it emphasizes that understanding how it can be used for

Emphasizing this, he added that Genie's models are generic and not constrained to 2D. We also train the genie on robot data (RT-1) without actions and show that it is possible to learn an action-controllable simulator there as well. We believe this is a promising step toward a common global model of AGI.

