



This article is part of a review of AI research treatises, a series of posts investigating the latest findings in artificial intelligence.

In a treatise published last week in the peer-reviewed scientific journal Nature, Google Brain scientists introduced deep reinforcement learning techniques into the floor plan, the process of adjusting the placement of various components of a computer chip.

Researchers have used reinforcement learning technology to successfully design the next generation of Tensor Processing Units, Google’s specialty artificial intelligence processor.

The use of software in chip design is nothing new. However, according to Google researchers, the new reinforcement learning model automatically generates chip floor plans that are better or equivalent to human-generated ones in all key metrics, including power consumption, performance, and chip area. To do. And it does it in less time than humans do.

The fact that AI outperforms human performance has received a lot of attention. Some media describe it as artificial intelligence software that can design computer chips faster than humans, and chips that take months for humans to design [Googles] You can build a new AI in less than 6 hours.

Another outlet said, “The virtuous cycle of AI designing chips for AI seems to be just beginning.

But what surprised me when I read the treatise was not the complexity of the AI ​​system used to design computer chips, but the synergistic effect of humans and artificial intelligence.

Analogy, intuition, rewards

This paper describes the problem as follows: Routing congestion.

Basically, what you want to do is place the components in the best way. However, as with any problem, the increasing number of components in the chip makes it difficult to find the optimal design.

Existing software can help speed up the process of discovering chip placement, but it is inadequate when the target chip becomes complex. Researchers have decided to draw experience from how reinforcement learning solves other complex space problems such as Go.

Chip floor plans are similar [emphasis mine] Different pieces (for example, netlist topology, number of macros, macro size, aspect ratio), boards (various canvas sizes and aspect ratios), winning conditions (relative importance of different metrics, or different densities) A game that includes (constraints of wiring congestion), the researchers wrote.

This is a manifestation of analogy, one of the most important and complex aspects of human intelligence. We humans can draw abstractions from solved problems and apply them to new problems. We take these skills for granted, but they make us very good at transfer learning. That’s why researchers were able to reconstruct the chip floor plan problem as a board game and tackle it in the same way that other scientists solved Go.

Deep reinforcement learning models are particularly good at exploring very large spaces that are physically impossible with the computational power of the brain. But scientists faced a much more complex problem than Go. [The] A state space with a cluster of 1,000 nodes on a grid with 1,000 cells is on the order of 1,000! (Over 102,500), while Go’s state space is 10360, researchers write. I will. The chip they tried to design consists of millions of nodes.

They solved the complexity problem with an artificial neural network that could encode the chip design as a vector representation, making it much easier to explore the problem space.According to the treatise, our intuition [emphasis mine] The policy for performing common chip placement tasks was to be able to encode the state associated with the new hidden chip into a meaningful signal during inference. That’s why we trained a neural network architecture that can predict rewards with new netlist placement. The ultimate goal is to use this architecture as the encoder layer of the policy.

The word intuition is often used loosely. But it’s a very complex and little-understood process that involves experience, unconscious knowledge, pattern recognition, and more. Our intuition comes from years of experience in one area, but also from experience in another. Fortunately, testing these intuitions is getting easier with the help of high-power computing and machine learning tools.

It is also worth noting that reinforcement learning systems require well-designed rewards. In fact, some scientists believe that reinforcement learning is sufficient to reach general-purpose artificial intelligence, with proper reward functions. However, without proper rewards, RL agents can fall into an infinite loop and do ridiculous and meaningless things. In the following video, the RL agent playing Coast Runners abandons the main goal of winning the race in an attempt to maximize points.

Google scientists have designed floorplan system rewards as a negative weighted sum of proxy wire length, congestion, and density. Weights are hyperparameters that needed to be adjusted during the development and training of reinforcement learning models.

With the right rewards, reinforcement learning models could leverage their computational power to find all sorts of ways to design floor plans that maximize rewards.

Selected dataset

The deep neural networks used in the system were developed using supervised learning. Supervised machine learning requires labeled data to adjust model parameters during training. Google scientists have created a dataset of 10,000 chip placements. The input is the state associated with a particular placement, and the label is the reward for that placement.

To avoid manually creating all floor plans, researchers used a combination of human-designed plans and computer-generated data. There is not much information in this article about how much human effort was put into evaluating the examples generated by the algorithms contained in the training dataset. However, without quality training data, supervised learning models make inadequate reasoning.

In this sense, AI systems differ from other reinforcement learning programs such as AlphaZero, which developed gameplay policies that do not require human input. In the future, researchers may develop RL agents that allow them to design their own floor plans without the need for supervised learning components. However, my guess is that given the complexity of the problem, it is very likely that a combination of human intuition, machine learning, and high performance computing will continue to be needed to solve such problems. is.

Reinforcement learning design vs human design

An interesting part of the research presented by Google researchers is the chip layout. We humans use all sorts of shortcuts to push the boundaries of the brain. You can’t tackle complex problems in one big chunk. However, modular hierarchical systems can be designed to divide and overcome complexity. Our ability to design with top-down architectures in mind has played a major role in developing systems that can perform very complex tasks.

Here is an example of software engineering, which is my specialty. In theory, the entire program could be written in a very large contiguous stream of commands in a single file. However, software developers do not write programs that way. We write software with small pieces, functions, classes and modules that can interact with each other through a well-defined interface. Then nest those pieces into larger pieces, gradually creating a hierarchy of components. You don’t have to read every line of the program to understand what the program is about. Modularity allows multiple programmers to work in one program and reuse previously created components in multiple programs. In some cases, just looking at the class architecture of your program may be enough to point you in the right direction to find a bug or find the right place to add an upgrade. We often want modularity and good design in exchange for speed.

After the epidemic, the same can be seen in computer chip design. Human-designed chips tend to have neat boundaries between different modules. On the other hand, the floor plan designed by Google’s Reinforcement Learning Agent found the least resistant route, regardless of what the layout looked like (see figure below).

Right: Manual design chip Left: AI design chip

I’m curious if this will be a sustainable design model in the future, or if there will be any compromise between the highly optimized machine learning generated design and the top-down order imposed by human engineers. there is.

AI + human intelligence

As chip designers leveraging Google’s reinforcement learning show, AI hardware and software innovation continues to provide abstract thinking, finding the right problems to solve, intuitive development of solutions, and solutions. You need to select the right type of data to validate. These can be enhanced by better AI chips, but they cannot be replaced.

After all, I don’t think this is about AI surpassing humans, AI creating smarter AI, or AI developing recursive self-improvement capabilities. Rather, it is a manifestation of humans finding ways to use AI as a prop to overcome their own cognitive limits and expand their abilities. If there is a virtuous cycle, it is finding a better way for AI and humans to work together.

