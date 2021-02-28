



The so-called search space for accelerator chips for artificial intelligence, the functional blocks that the chip architecture needs to optimize. A feature of many AI chips is the same parallel processor element for a large number of simple mathematical operations, here called “PE”, for performing many vector matrix multiplications that are the mainstay of neural network processing.

Yazdanbakhsh et al.

A year ago, ZDNet talked to Google Brain director Jeff Dean about how the company can use artificial intelligence to advance internal development of custom chips and accelerate software. Dean said that deep learning forms of artificial intelligence can, in some cases, make better decisions about how to lay out circuits in a chip than humans.

This month, Google published one of these research projects, Apollo, to the world in a treatise “Apollo: Transferable Architecture Exploration” posted on the arXiv file server and a related blog post by lead author Amir Yazdanbakhsh.

Apollo represents an interesting development that goes beyond what Dean suggested in his formal speech at the International Solid-State Circuits Conference a year ago and his remarks to ZDNet.

In the example Dean showed at the time, machine learning could be used for some low-level design decisions known as “places and routes.” The chip designer uses software to determine the layout of the circuits that form the behavior of the chip, much like designing a floor plan of a building.

In contrast, in Apollo, the program does what Yazdanbakhsh et al. Call “architectural exploration” rather than a floor plan.

The chip architecture is the design of the chip’s functional elements, how they interact, and the way software programmers access those functional elements.

For example, traditional Intel x86 processors have a certain amount of on-chip memory, a dedicated arithmetic logic unit, and a large number of registers. The combination of these parts gives meaning to the so-called Intel architecture.

Asked about Dean’s explanation, Yazdanbakhsh told ZDNet in an email: “I think our work and placement and routing projects are orthogonal and complementary.

“The exploration of architecture is at a much higher level than the placement and routing in the computing stack,” explained Yazdanbakhsh, referring to a presentation by Christopher Batten of Cornell University.

“I believe [architecture exploration] This is where there are higher margins for better performance, “says Yazdanbakhsh.

Yazdanbakhsh et al. Call Apollo the “first transportable architecture exploration infrastructure.” This is the first program that works on different chips, better exploring possible chip architectures, and transfers what it learns to each new task.

The chip that Yazdanbakhsh and the team are developing is itself a chip for AI and is called an accelerator. This is the same class of chips as the Nvidia A100 “Ampere” GPU, the Cerebras Systems WSE chip, and many other startup components currently on the market. Therefore, there is a good symmetry of using AI to design a chip that runs AI.

Given that the task is the design of AI chips, the architecture that the Apollo program is investigating is a good architecture for running neural networks. That is, many linear algebras, many simple mathematical units that perform matrix multiplication and sum the results.

The team defines the task as one of finding the right combination of math blocks for a particular AI task. They chose a convolutional neural network called MobileNet, which is a fairly simple AI task. This is a resource-efficient network designed by Andrew G. Howard and Google colleagues in 2017. In addition, we tested the workload using several networks designed internally for tasks such as object detection and semantic segmentation.

In this way, the goal is: What are the appropriate parameters for the chip’s architecture so that the chip meets certain criteria such as speed for a particular neural network task?

The search included a sort of over 452 million parameters, such as the number of math units called processor elements, the amount of parameter memory and activation memory optimal for a particular model.

The advantage of Apollo is that it directly combines a variety of existing optimization techniques to see how they stack up when optimizing the architecture of a new chip design. Here, the violin plot shows relative results.

Yazdanbakhsh et al.

Apollo is a framework. That is, you can adopt different methods developed in the literature for so-called black box optimization, adapt them to your particular workload, and compare how each method works in solving your goals. I will.

In yet another great symmetry, Yazdanbakhsh employs several optimization techniques that were actually designed to develop neural network architectures. This includes the so-called evolutionary approach developed by Google’s Quoc V. Le and colleagues in 2019. An ensemble of model-based reinforcement learning and so-called population-based approaches developed by Google’s Christof Angermueller et al. For the purpose of “designing” DNA sequences. And the Bayesian optimization approach. Therefore, Apollo brings together approaches designed for neural network design and biological synthesis to design circuits that may be used for neural network design and biological synthesis, a pleasing symmetry. Contains major levels of sexuality.

All these optimizations are compared and the Apollo framework is excellent. Its raison d’etre is to systematically implement different approaches and convey what works best. Apollo’s test results detail how evolutionary and model-based approaches are superior to random selection and other approaches.

However, Apollo’s most striking finding is that performing these optimization techniques can result in a much more efficient process than brute force searches. They compared, for example, an ensemble population-based approach with what is called a semi-exhaustive search of a solution set of architectural approaches.

Yazdanbakhsh and colleagues have seen that a population-based approach can find solutions that take advantage of circuit trade-offs, such as computing and memory, which typically require domain-specific knowledge. Because the population-based approach is a learned approach, it finds solutions that go beyond the scope of half-centric search.

P3BO [population-based black-box optimization] In fact, you’ll find a slightly better design than semi-exhaustive in the search space for 3K samples. You can see that the design uses a very small memory size (3MB) in favor of more compute units. It takes advantage of the nature of compute-intensive vision workloads that were not included in the original half-centric search space. This demonstrates the need for manual search space engineering for a semi-exhaustive approach, but learning-based optimization techniques take advantage of large search spaces that reduce manual effort.

Therefore, Apollo can understand how well different optimization approaches work in chip design. But it does more than that. In other words, you can do what is called transfer learning to show how these optimization approaches can be improved.

By executing an optimization strategy and improving the chip at one design point, such as the maximum chip size in millimeters, the results of these experiments can be sent as input to subsequent optimization methods. The Apollo team found that various optimization techniques improve performance on tasks such as area-constrained circuit design by leveraging the best results of initial or seed optimization techniques.

The design of chips for MobileNet, or other networks and workloads, must surround them all due to the fact that they are limited by the applicability of the design process to a particular workload.

In fact, Berkin Akin, one of the authors who helped develop the MobileNet version of MobileNet Edge, points out that optimization is the product of both chip and neural network optimization.

“Neural network architectures need to be aware of the target hardware architecture in order to optimize overall system performance and energy efficiency,” Akin wrote in a paper with colleague Suyog Gupta last year.

ZDNet contacted Akin by email and asked, “How valuable is a hardware design if it separates from the neural net architecture design?”

“It’s a great question,” Akin replied by email. “I think it depends on the situation.”

According to Akin, Apollo may be sufficient for certain workloads, but so-called co-optimization between chips and neural networks offers other benefits in the future.

Here is Akin’s complete reply:

There are certainly use cases where you are designing hardware for a particular suite of fixed neural network models. These models can be part of a highly optimized and representative workload from the hardware target application domain, or they can be needed by users of custom-built accelerators. This task addresses this type of problem using ML to find the best hardware architecture for a particular workload suite. However, there are certainly cases where you have the flexibility to jointly optimize your hardware design and neural network architecture. In fact, there is some work in progress for such co-optimization, and we hope that it will lead to even better trade-offs.

The final point is that dialectics evolve in an interesting way, as new processes in chip design can have measurable impact on neural network design, even if chip design is affected by new AI workloads. Is that there is a possibility of doing. The coming year.

