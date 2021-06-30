



Posted by Leopold Haller and Hernan Moraldo, Software Engineer, Google Research

Over the last two decades, dramatic advances in computing and connectivity have allowed game developers to create works of ever-expanding range and complexity. Simple linear levels have evolved into a photo-realistic open world, procedural algorithms have enabled games with unprecedented variety, and increased Internet access has transformed games into dynamic online services. Unfortunately, the scope and complexity is growing faster than the size of quality assurance teams and the capabilities of traditional automated testing. This poses a challenge for both product quality (such as release delays and post-release patches) and the quality of life of developers.

Machine learning (ML) technology has shown the potential to have a significant impact on the game development flow and provides a possible solution. This allows designers to balance the game and create high-quality assets in a fraction of the time artists have traditionally needed. In addition, they can be used to train challenging opponents who can compete at the highest level of play. However, some ML techniques currently pose impractical requirements for production game teams, such as designing game-specific network architectures, developing expertise in implementing ML algorithms, and generating billions of frames of training data. There is likely to be. Conversely, game developers work in settings that offer unique benefits of leveraging ML technology, such as direct access to game sources, expert demonstrations, and the unique interactive nature of video games.

Today, we are introducing an ML-based system that game developers can use to train game test agents quickly and efficiently. This allows developers to quickly find serious bugs and allows human testers to focus on more complex and complex issues. The resulting solution does not require ML expertise, works in many of the most popular game genres, and has an ML policy that generates game actions from game state in less than an hour on a single game instance. Can be trained. We have also released an open source library that shows the functional applications of these techniques.

Supported genres include arcade, action / adventure, and racing games.

The Right Tool for the Right Work The most basic form of video game testing is simply playing the game. Lots. Many of the most serious bugs (such as crashes and dropouts from the world) are easy to detect and fix. The challenge is to find them in the vast state space of modern games. That’s why we decided to focus on training systems that “just play the game” on a large scale.

The most effective way to do this is to provide developers with the ability to train an ensemble of game tests, rather than training a single highly effective agent who can play the entire game end-to-end. It turned out to be. Each agent can effectively perform a few minutes of tasks. This is what game developers call a “gameplay loop.”

These core gameplay behaviors are often expensive to program in the traditional way, but they can be trained much more efficiently than a single end-to-end ML model. In practice, commercial games create longer loops by iterating and remixing core gameplay loops. This means that developers can test most of their gameplay by combining ML policies with a few simple scripts.

One of the most fundamental challenges in applying simulation-centric semantic API ML to game development is bridging the gap between the world of simulation-centric video games and the world of data-centric ML. Instead of asking the developer to convert the state of the game directly to a custom low-level ML feature (too much effort) or trying to learn from raw pixels (too much data for training) Our system is a customary API for game developers that can explain the game to developers in terms of the essential state that the player observes and the semantic actions that can be performed. All this information is expressed through concepts familiar to game developers, such as entities, raycasts, 3D position and rotation, buttons, and joysticks.

As you can see in the example below, the API allows you to specify monitoring and actions with just a few lines of code.

An example of action and observation in a racing game.

From APIs to Neural Networks This high-level semantic API is not only easy to use, but also gives you the flexibility to adapt your system to a particular game under development. The specific combination of API building blocks adopted by game developers will tell you the choice of network architecture. This is to provide information about the type of game scenario in which the system is deployed. It uses different methods of processing action output depending on whether it represents a digital button or an analog joystick, or uses image processing techniques to process observations of the results of an agent’s raycast survey of the environment. Methods are included (similar to how self-driving cars investigate the environment) using lidar).

Our API is general enough and many common control schemes (of action output to control movement) in games such as first person games, third person games with camera relative control, racing games, twin stick shooters, etc. Allows modeling of configuration), etc. 3D movement and aiming are often an integral part of gameplay in general, so these games create networks that automatically go towards simple actions such as aiming, approaching, and avoiding. The system accomplishes this by analyzing the control scheme of the game and creating a neural network layer that performs custom processing of the game’s observations and actions. For example, the position and rotation of objects in the world are automatically transformed into directions and distances from the perspective of an AI-controlled game entity. This transformation usually helps speed up learning and make the learned network more generalized.

An example of a neural network generated for a game using joystick controls and raycast inputs. Depending on the input (red) and control scheme, the system will generate custom pre- and post-processing layers (orange).

Real-time learning from experts After generating a neural network architecture, you need to train your network to play the game using the appropriate learning algorithms.

Reinforcement learning (RL), which trains ML policies directly to maximize rewards, may seem like a natural choice as it is used to train the game’s highly capable ML policies. .. However, the RL algorithm tends to require more data than a single game instance can generate in a reasonable amount of time, and hyperparameter tuning and powerful to achieve good results in the new domain. Knowledge of ML domains is often required.

Instead, we found that Imitation Learning (IL), which trains ML policies based on observing professionals playing the game, is suitable for our use case. Unlike RL, where agents have to discover the right policies on their own, IL only needs to reproduce the behavior of human professionals. Game developers and testers are experts in their respective games, so it’s easy to provide demonstrations of how to play the game.

Uses an IL approach inspired by the DAgger algorithm. This allows you to take advantage of the most compelling quality of video games, interactivity. Thanks to the reduced training time and data requirements of the Semantic API, training takes place virtually in real time, allowing developers to smoothly switch between providing gameplay demonstrations and watching systemplay. This creates a natural feedback loop in which the developer repeatedly provides modifications to the continuous stream of ML policies.

From a developer’s point of view, providing a demonstration or fix for a faulty behavior is as easy as picking up a controller and starting the game. When you’re done, you can put the controller down and watch the ML policy play. The result is a training experience that is real-time, interactive, highly experienced, and very often a little more enjoyable.

ML policy for FPS games trained in our system.

Conclusion Introducing a system that combines a high-level semantic API with a DAgger-inspired interactive training flow. This allows you to train ML policies to help you test different genres of video games. We have released an open source library as a functional diagram of the system. No ML expertise is required and training of test application agents is often completed in less than an hour on a single developer machine. We hope that this work will help stimulate the development of ML technology that can be deployed in the actual game development flow in an accessible, effective and enjoyable way.

Acknowledgments We would like to thank the core members of the project, Dexter Allen, Leopold Haller, Nathan Marts, Hernan Morald, Stewart Miles, and Hina Sakazaki. The training algorithm is provided by TFAgent and the inference on the device is provided by TFLite. Special thanks to research advisors Olivier Bachem, Erik Frey, Toby Pohlen and Eugene Brevdo, Jared Duke, Oscar Ramirez and Neal Wu for their helpful guidance and support.

