Facebook today proposed NetHack as a spectacular challenge to AI research at the NeuroIPS 2021 AI conference in Sydney, Australia. NetHack is a video of the 80’s with simple visuals that are considered to be the most difficult in the world. A game, Facebook claims that data scientists can run state-of-the-art AI method benchmarks in complex environments without having to run experiments on powerful computers. computer.

The game has served as an AI benchmark for AI for decades, but in 2013 Google’s DeepMind went into full swing, superhumanizing Pong, Breakout, Space Invaders, Seaquest, Beamrider, Enduro, and Q * bert. We have demonstrated a system that can be played at the level. According to experts like DeepMind co-founder Demis Hassabis, progress goes beyond just improving game design. Rather, they are announcing the development of a system that may one day diagnose disease, predict complex protein structures, and perform segmented CT scans.

In particular, reinforcement learning, a type of AI that can learn strategies for coordinating large-scale systems such as manufacturing plants, traffic control systems, financial portfolios, and robots, from laboratories to highly influential real-world applications. Is migrating. For example, self-driving car companies such as WayveandWaymo are using reinforcement learning to develop their own vehicle control systems. Also, through Microsoft’s Bonsai, Siemens employs reinforcement learning to tune CNC machines.

Recent advances in reinforcement learning are underpinned by simulation environments such as games such as StarCraft II, Dota 2, and Minecraft. However, this advance is quite computationally expensive, requires thousands of GPUs to run in parallel in a single experiment, and has yet to be applied to more realistic problems outside of these games. , Facebook AI researcher Edward said. Grefenstette, Tim Rocktschel and Eric Hambro wrote in a blog post. You need a complex environment that allows for very fast simulations at low computational costs, highlighting the shortcomings of RL.

Net hack

Facebook’s proposal follows the release of the original NetHack-based research tool, the NetHack Learning Environment (NHLE). (The NetHack Challenge is based on NHLE.) First released in 1987, NetHack imposes the task of acquiring magic amulets for players down 50 or higher dungeon levels. Fight spellbooks, other items, and monsters. NetHack levels are procedurally generated and all games are different. Facebook researchers point out that they will test the limits of major AI generalization.

To win a NetHack game, you need to make a long-term plan in a very harsh environment. When the player’s character died, the game started from scratch in a whole new dungeon, followed by Grefenstedt, Rockshell, and Hambro. As an expert player, completing a game successfully requires an average of 25 to 50 times more steps than the average StarCraft II game, and the player’s interaction with objects and the environment is very complex.Amazing methods, and consult with external sources of knowledge [such as] Official NetHack Guidebook, NetHack Wiki, Online Video, Forum Discussion].

Partial observations make NetHack investigations essential, and procedural generation and persistence significantly increase the cost of failure. Also, AI cannot reset or interfere with the environment, so there is no way to support systems such as StarCraft II’s DeepMinds AlphaZero or Montezumas Revenge’s Ubers Go-Explore.

[The challenges in NetHack] From randomized mazes to dangerous, more structured challenges such as large rooms full of monsters and traps, towns and forts, and kraken-infested waters, Grefenstett, Rockshell, and Hambro say. New ways of dealing with ever-changing observations in the probabilistic and rich game world require the development of techniques that are likely to scale to highly volatile real-world settings.

lightweight

NetHack also has the advantage of a lightweight architecture. The world of turn-based ASCII art and the game engine, primarily written in C, captures its complexity. NetHack allows AI to learn quickly when rendering symbols instead of pixels, with the exception of the simplest physics, importantly without wasting computational resources in simulating dynamics or rendering observations. To

In fact, training advanced machine learning models in the cloud remains exorbitant. According to a report by arecent Synced, the University of Washington, Glover, which specializes in both generating and detecting fake news, received $ 25,000 in training in two weeks. OpenAI spent $ 256 an hour training the GPT-2 language model, and Google spent an estimated $ 6,912 to train BERT.

In contrast, one high-end graphics card is sufficient to train an AI-powered NetHack agent in hundreds of millions of steps a day using the TorchBeast framework. The TorchBeast framework supports further scaling by adding graphics cards or machines. Agents can experience billions of steps in their environment in a reasonable time frame, pushing the limits that current technology can achieve.

[The NHLE] You can train reinforcement learning agents 15 times faster than the Atari benchmark 10 years ago.[s]In addition, NetHack is used to push the boundaries of the latest state-of-the-art deep reinforcement learning methods while providing a higher degree of complexity while performing 50 to 100 times faster than tasks of comparable difficulty. You can test it.

Challenge

NHLE consists of three components: a Python interface to NetHack using the popular OpenAI Gym API, a set of benchmark tasks, and a baseline machine learning agent. To win the NetHack Challenge, participants must develop an AI that will ensure that they win NetHack or achieve the highest possible score. In doing so, the competition aims to directly compare different methods with new benchmarks for future research, while at the same time demonstrating NHLE’s suitability as a research setting.

There are no restrictions on how you can train your system for the NetHack Challenge. Awards are: (1) the best overall AI system, (2) the best AI system without neural networks, and (3) academic or academic or Awarded to the best AI systems from independent teams.

Grefenstette, Rocktschel, and Hambro say that achieving these goals can lay the foundation for follow-up competitions that focus on specific aspects of AI. In addition, the NetHack Challenge sheds light on a class of training and modeling approaches that can handle a wide variety of environments and costly errors such as having to start over if a character is killed by a creature. May be useful.

For example, navigation for many real-world and industrial issues share these characteristics. As a result, NetHack’s advances are towards reinforcement learning in a wider range of applications, says Grefenstette, Rockshell, and Hambro.

Facebook’s NeurIPS 2021 NetHack Challenge is partnered with co-sponsor AIcrowd and runs from early June to October. Winners will be announced at NeurIPS in December.

