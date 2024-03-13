



Google DeepMind has announced new research focusing on an AI agent that can perform a set of never-before-seen 3D gaming tasks. For years, the team has been experimenting with AI models that can win at games such as Go and Chess, and can learn the games without being taught the rules. According to DeepMind, for the first time, it has shown that an AI agent can understand a wide range of game worlds and perform tasks within them based on natural language instructions.

Researchers collaborated with studios and publishers including Hello Games (No Man's Sky), Tuxedo Labs (Teardown), and Coffee Stain (Valheim and Goat Simulator 3) to implement Scalable Instructable Multiworld Agent (SIMA) in nine games. I trained. The team also used his four research environments, including one embedded in Unity where agents are instructed to form sculptures using building blocks. This gave SIMA, described as a “generalist AI agent in a 3D virtual setting,” a variety of environments and settings to learn from, with different graphical styles and perspectives (first and third person).

“Each game in the SIMAS portfolio opens up a new interactive world with a wide range of skills to learn, from simple navigation and menu use to mining resources, piloting spaceships and crafting helmets,” the study said. they wrote in a blog post. Learning to follow instructions for such tasks in the world of video games could lead to AI agents that are more useful in any environment, the researchers noted.

Google Deep Mind

The researchers recorded humans playing the game and looked at the keyboard and mouse inputs used to perform actions. They used this information to train his SIMA. SIMA features “precise image and language mapping and video models that predict what will happen next on screen.” AI can understand different environments and perform tasks to achieve specific goals.

According to the researchers, SIMA does not require game source code or API access and works with commercial versions of games. Also, only two inputs are required. The content displayed on the screen and the instructions given by the user. DeepMind claims that SIMA can work in almost any virtual environment because it uses the same keyboard and mouse input methods as humans.

Agents are evaluated on hundreds of basic skills that can be performed in as little as 10 seconds across several categories, including navigation (“turn right”), object interaction (“pick up mushrooms”), and menu-based tasks (such as opening). Masu. Creating maps and items, etc. Eventually, DeepMind wants to be able to command agents to perform more complex, multi-step tasks based on natural language prompts, such as “find resources and build a camp.” .

In terms of performance, SIMA performed well based on a number of training criteria. The researchers trained the agent on one game (let's call it Goat Simulator 3 for clarity), had him play the same title, and used that as a baseline for performance. SIMA agents trained on all nine games performed significantly better than agents trained on Goat Simulator 3 alone.

Google Deep Mind

Of particular interest is that the version of SIMA that was trained on the other eight games and then played the other games performed about the same on average as the agent trained only on the latter. “This ability to function in entirely new environments highlights SIMA's ability to generalize beyond training,” DeepMind said. “While this is a promising initial result, further research is needed before SIMA can operate at the human level in both visible and invisible games.”

However, for SIMA to be truly successful, it requires language input. In tests where agents were not provided with verbal training or instructions, they performed a common action of gathering resources instead of (for example) walking where they were instructed. In such cases, SIMA “acts in an appropriate but purposeless manner,” the researchers said. So it's not just us humans. Artificial intelligence models may also need a little manipulation to do their job properly.

DeepMind says this is early-stage research and the results “demonstrate the potential for developing a new wave of generalist, language-driven AI agents.” The team expects the AI ​​to become more versatile and generalizable as it is exposed to more training environments. Researchers hope that future versions of the agent will improve his SIMA understanding and ability to perform more complex tasks. “Ultimately, our research builds toward more general AI systems and agents that can understand and safely perform a wide range of tasks in ways that are useful to people online and in the real world.” DeepMind said.

