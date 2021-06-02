



Google LLC today introduced a new infrastructure option for its cloud platform. This allows enterprises to provision instances using Tensor Processing Units.

A Cloud TPU VM, called a new instance, is available in preview. Early adopters use them for a variety of tasks, from AI-powered healthcare analytics to quantum chemistry.

Customers on Google Cloud Platform have long been able to provision instances using TPUs. However, these instances did not run in the same physical server enclosure as the TPU, but were connected remotely over a network link. The application was slow because it had to send data to the TPU over the network and wait for the results to be returned.

The Cloud TPU VM removes the delay. Instances connect directly to Google’s AI chip, avoiding network-related performance degradation and delays.

Search giants also believe that new instances can help customers reduce the cost of their cloud infrastructure in some cases. This is because in large AI projects with large amounts of data, the task of sending data from a cloud instance to the TPU itself can require significant computing resources. As a result, companies may have to buy more expensive instances with faster processors. With Cloud TPU VMs, customers don’t have to pay extra because they don’t have to send data over network links.

Enterprises can provision instances with either a Cloud TPU v2 unit, a second iteration of the chip, or the new Cloud TPU v3. The main difference is performance. A single Cloud TPU v2 can provide up to 180 teraflops of performance. That’s 180 trillion computing operations per second. Cloud TPU v3, on the other hand, can manage up to 420 teraflops.

One of the use cases that Google sees for instances is to develop algorithms that run on Cloud TPU pods. Cloud TPU Pod is a large cluster of TPU-powered AI servers that enterprises can rent to run particularly complex machine learning models. The fastest clusters offered exceed 100 petaflops or 50 trillion operations per second.

Developers can build algorithms on Cloud TPU VMs for a fraction of the cost of renting pods and move their software to more powerful hardware when they are ready for production. Because Cloud TPU VMs and Cloud TPU pods use the same chip, the task of moving workloads is much easier than if you had to migrate your software between different processors.

As an additional measure, the new instance comes with root permissions. This means that developers have full access to the software running within their instance, which simplifies certain coding tasks.

The Google Cloud TPU VM has allowed us to significantly scale up our research while minimizing implementation complexity, said James Townsend, a researcher at the UCL Queen Square Neurology Institute in London. There is a low friction path from model implementation and debugging on a single TPU device to multi-device and multi-host (pod scale) training.

Google is also using new instances internally to support its efforts to develop quantum computers. Shrestha Basu Mallick, product manager of the Sandbox @ Alphabet research team at Google’s parent company Alphabet Inc., said our team built one of the most powerful classic simulators for quantum circuits. Can evolve a 40-cubit wave function. This involves manipulating trillions of complex amplitudes. TPU scalability is also key to enabling our team to perform quantum chemistry calculations on macromolecules with up to 500,000 orbitals.

Each TPU in Google’s cloud consists of multiple matrix units, which are processing cores optimized for the specific types of mathematical operations that AI models use to process data. Third-generation Cloud TPU v3 chips are supported by a water cooling system that absorbs the heat generated by the core while it is running.

AI models represent the data they process in the form of large numbers called floating-point values. Google’s TPU stores these numbers in a data format called bfloat-16. This data format was developed by search giants to enhance their AI workloads. With bfloat-16, the chip can store numbers that typically occupy 32 bits of space with only 16 bits. This reduces the total number of bits that need to be processed, which speeds up the calculation.

