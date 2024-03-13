



Even the expansion/hunt and fetch quests would be better with a little help from the AI. At this point in the evolution of machine learning AI, we are used to specially trained agents that can take complete control of everything from Atari games to complex board games like Go. But what if an AI agent could be trained not only to play a specific game, but also to interact with a 3D environment in general? What if we focused on responding to natural language commands?

These are, as research engineer Tim Hurley put it, “not trained to win, but trained to do what you're told'' and “scalable, directive, multi-world These are the kinds of questions that are driving Google's DeepMind research group in developing SIMA, a “self-reliance agent.'' Presentation attended by Ars Technica. “And not just one game, but a bunch of different games running at the same time.”

Hurley said SIMA is still “mostly a research project” and the results achieved in the project's first technical report show there is a long way to go before SIMA begins to approach human-level listening abilities. Emphasize that there is. Still, Hurley believes SIMA will ultimately build upon the foundation of an AI agent that players can direct and converse with in cooperative gameplay situations, thinking of themselves more as a “trusted partner” than a “superhuman adversary.” He said he hopes to be able to provide the following.

“This study was not about achieving high scores in games,” Google said in a blog post announcing the study. “Learning to play even one video game is a technical feat for an AI system, but learning to follow instructions in a variety of game settings can unlock AI agents that are more useful in any environment. It may be possible to cancel it.”

learn how to learn

Google trained SIMA on nine very different open-world games to create generalizable AI agents.

To train SIMA, the DeepMind team focused on three-dimensional games and test environments controlled from a first-person or over-the-shoulder third-person perspective. All nine games in the test suite, provided by Google's development partners, prioritize “open-ended interactions,” avoid “extreme violence,” and play in a wide range of different environments, from “space exploration” to “wacky games.” Provide interaction. Goat mayhem. ” To make SIMA as generalizable as possible, agents are not given privileged access to the game's internal data or control APIs. This system takes only pixels on the screen as input and provides only keyboard and mouse controls as output, [model] humans have used [to play video games] The research team also designed the agent to work in games that run in real time (i.e., 30 frames per second), rather than slowing down the simulation due to extra processing time like other interactive machines. . learning project.

SIMA animation samples responding to basic commands in very different game environments.

Although these limitations increase the difficulty of SIMA's task, agents can be integrated into new games or environments “off-the-shelf” with minimal setup and without the need for special training on the “ground truth” of the game world. It also means that you can. It is also relatively easy to test whether SIMA can “transfer” what it has learned from training on previous games to games it has never seen before. This could be an important step towards achieving artificial general intelligence.

SIMA uses videos of human gameplay (and associated timecoded input) in the provided games as training data, annotated with natural language descriptions of what is happening in the footage. Masu. As the researchers note in a technical report, these clips “complete in less than approximately 10 seconds” to avoid complications that can arise from “the wide range of instructions possible over long timescales.” The focus is on “instructions that can be given.” Integration with pre-trained models such as SPARC and Phenaki eliminates the need for SIMA models to learn how to interpret language and visual data from scratch.

