



Artificial intelligence (AI) is blurring the line between imagination and reality. From ChatGPT to Mid-Journey, we've already shown you how you can create by stimulating your mind. Recently, OpenAI introduced Sora, which provides a text-video AI generator. So what's next? Now, his DeepMind team at Google has announced 'Genie', a new model that allows you to create interactive 2D video games from a single image prompt or text description.

Simply put, Google Genie is an AI platform that generates interactive video games. Developed by Google DeepMind's open-ended team, this groundbreaking research project has great potential for the future of entertainment, game development, and even robotics. Google describes Genie as a “world model” trained on a large dataset consisting primarily of his 200,000 hours of unlabeled video footage from his 2D platformer games. Unlike traditional AI models that require explicit instructions or labeled data, Genie learns by observing the actions and interactions in these videos and can generate video games from a single prompt or image. will do so.

But how exactly does this AI Genie work?

At first glance, Genie may seem like a magical AI that can transform imagination into reality. However, the basic process is quite complex. Let me explain with an example.

So Genie has three core components. — Video Tokenizer: Imagine your Genie is a skilled chef preparing complex dishes. Similar to how chefs break down ingredients into smaller pieces for easier handling, Video Tokenizer efficiently processes large amounts of video data into manageable units called “tokens.” These tokens serve as the basic building blocks for the genie's understanding of the visual world.

— Latent action model: In the second step, after chopping the tokenized video data, the latent action model plays a central role. Act like a seasoned culinary expert and carefully analyze the transitions between consecutive frames in your video. This analysis allows us to identify eight basic actions, or “spices” that are essential to Genie's recipes. These actions range from jumping and running to interacting with objects within the game environment.

— Dynamics Model: Finally, comes the dynamic model process, the creative chef that brings it all together. Similar to how a chef predicts how flavors will interact based on the ingredients selected, this model predicts the next frame of a video sequence. The current state of the game world, including the player's actions (selected “spice”), is taken into account and subsequent visual results are generated accordingly. This continuous prediction process ultimately creates the illusion of an interactive and engaging gaming experience.

In particular, Genie is still in development and has the following limitations:

Limited visual quality: Currently, Genie can only produce games at low frame rates (1FPS), which affects visual fidelity. Research-only access: At this time, Genie is not available to the public and remains a research project within Google DeepMind. Ethical Considerations: As with any powerful technology, the potential misuse of Genie requires careful consideration. Google is committed to ethical aspects to ensure responsible development and implementation.

However, once released, Genie is expected to revolutionize creativity across many areas. Its ability to generate interactive worlds from minimal input opens the door to exciting possibilities in the future for entertainment, education, and more.

Issuer:

Divya Bhati

date of issue:

February 27, 2024

