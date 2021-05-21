



Google CEO Sundar Pichai spoke at this week’s Google I / O virtual conference keynote about the company’s latest TPU v4 tensor processing unit in just 1 minute and 42 seconds, the most important of the event. It may have been the long-awaited news.

With the new release, the company has more than doubled the performance of its TPU hardware over its predecessor, the TPU v3 chip, bringing significant new power and promise to machine learning training speeds on Google Cloud Platform.

Our computing infrastructure is the way to drive and maintain these [AI and ML] In a nearly two-hour keynote on Tuesday, May 18, Pichai said progress and the tensor processing unit were a big part of it. We are excited to announce the next generation of TPU v4 today. The TPUs are interconnected to a supercomputer called a pod. Each v4 pod contains 4,096 v4 chips, and each pod has 10 times more interconnect bandwidth per chip than other network technologies.

The resulting new TPU computing power means that one TPU pod on the v4 chip can provide floating point performance for multiple exaflops. Performance metrics are based on Google’s custom floating-point format called Brain Floating Point Format or bfloat16.

The new TPU v4 infrastructure, available to Google Cloud customers later this year, is the fastest system ever introduced by Google, which Pichai called a historic milestone. “

Previously, he said he needed a custom supercomputer to create the computing power of Exaflops. However, many of these have already been deployed today, and soon dozens of TPU v4 pods will be deployed in the data center, many of which operate on nearly 90% carbon-free energy. It’s very exciting to see this pace of innovation.

Google’s previous version of TPU 3.0 was announced in 2018.

TPU is Google’s custom-developed application-specific integrated circuit (ASIC) used to accelerate ML workloads. Developers can run ML workloads using the Google Cloud TPU and the Googles Tensor Flowopen source machine learning software library. TensorFlow was developed by Google in 2015 and was first released.

Google Cloud TPU is designed to help researchers, developers, and businesses build TensorFlow computing clusters that can use CPUs, GPUs, and TPUs as needed. The TensorFlow API allows users to run replication models on Cloud TPU hardware, while TensorFlow applications can access TPU nodes from Google Cloud containers, instances, or services.

Some AI analysts quickly advertised the news of TPU v4 and what that meant for companies facing the ever-increasing demand for ML training.

This can be a big problem if you’re training a large AI / ML system, especially if you’re using Google’s TensorFlow. This is Jack E. Gold, President and Chief Analyst of J. Gold Associates told Enterprise AI. If you are training a large model, you may not have enough processing power. It takes days or weeks to run on current systems available in the cloud, mostly based on highly parallelized GPUs. And this can be very costly in terms of cloud cost and power.

What Google has done for TPU is to build highly optimized chips for TensorFlow-based modeling and use models, especially those that need to be updated frequently and large datasets. To facilitate model training.

So what Google is doing here with the v4 chip is to dramatically increase the computing power available and significantly reduce the time it takes to model, Gold said. .. It also allows you to run much larger models in a reasonable amount of time. However, it is equally important that if the model runs fast, it will use less total power, which will reduce the amount of power per model. It also benefits not only the cost of the cloud data center, but also the capacity to handle more users.

By using Google’s own TPU, he said, this is also the company’s move to continue replacing processors from other vendors with its own. Google wants to stay ahead of other companies such as AWS and Microsoft that are building their own accelerators for AI cloud-based services.

Gold also said that Google does a lot of its own AI / ML / DL modeling, so whatever the company can do to enhance its internal needs with additional features is a big win for them. It was. Not only does it support external customers, but so does their own requirements, he said.

Charles King, Principal Analyst at Pund-IT, said Google’s ability to double the performance of previous v3 chips while achieving exascale performance with a single V4 pod is both impressive. I did.

According to King, this is a remarkable achievement that demonstrates the company’s technical insight and willingness to continue to fund chip development. He added that it is also important for corporate customers.

Absolutely, these new chips enhance the AI-related workloads and services offered on Google Cloud, King said. If Google can deliver superior performance at a very competitive price, it can devalue the services of its competitors.

Holger Mueller, Principal Analyst at Constellation Research, said the TPU v4 news was one of Google I / O’s most exciting announcements, building leads with algorithms on silicon using TPU v4. ..

With this development, Google continues to build a lead in AI computing via AWS and Microsoft Azure, Mueller said. [This is the] It is the first architecture to reach Exaflops, and AI needs it. When you do that, Google Style will win faster and cheaper AI in businesses and governments, including the military.

Another analyst, Karl Freund, founder and chief analyst for AI in machine learning, HPC, and Cambrian AI Research, said early benchmarks look promising for new TPUs.

According to Freund, TPUv4 looks like a winner, based on early MLPerf benchmarks. As we approach availability and pricing announcements later this year, we’re waiting for the final benchmarks we expect to see this summer. It took a lot longer than previous TPUs, but it might be worth the wait.

