



Recently, Google announced a new cloud TPU virtual machine (VM) that provides direct access to the TPU host machine. Using these VMs, the company provides a new and improved user experience for developing and deploying TensorFlow, PyTorch, and JAX on Cloud TPU.

Customers can already set up virtual instances on Google Cloud using the TPU chipset. However, this had some drawbacks because the instances did not run in the same server environment. The TPU was remotely connected to the chipset over the network connection, and the application had to send data to the TPU over the network and wait for the processed data to be returned, slowing down the process.

The Cloud TPU VM is previewed so customers can connect the TPU chipset directly to the deployed instance. This helps prevent network delays between different applications and Google Cloud instances when using the TPU chipset. Alexander Spiridonov, Product Manager at Google AI, said in a blog post about the new Cloud TPU VM:

This new cloud TPU system architecture is simpler and more flexible. In addition to the key usability benefits, performance gains can also be achieved by eliminating the need for code to traverse the data center network to reach the TPU. In addition, significant cost savings may be seen. Previously, if you needed a powerful ComputeEngine VM fleet to feed data to a remote host in a Cloud TPU pod slice, you would need to perform that data processing directly on the Cloud TPU host and have an additional ComputeEngine VM.

Source: https: //cloud.google.com/blog/products/compute/introducing-cloud-tpu-vms

Google offers Cloud TPU VM in two variations. The first variants are Cloud TPU v2, which is based on the 2nd generation TPU chipset, and the new Cloud TPU v3 version, which is based on the 3rd generation TPU. According to Google Cloud, the difference between the two lies in performance. Cloud TPU v2 can run up to 180 teraflops and TPU v3 can run up to 420 teraflops.

A use case for Cloud TPU VMs is to develop algorithms on existing Cloud TPU pods. These are large clusters of AI servers based on TPU. In particular, these solutions are suitable for running highly complex machine learning models. For example, the fastest clusters provide capacities in excess of 100 petaflops per second. This makes building algorithms on these clusters significantly cheaper. Customers only have to pay the pod rent and the cost of migrating to more powerful hardware when moving to production. In addition, Google Cloud plans to use Cloud TPU VM in its quantum computing program.

Huggingface, a Twitter account in the AI ​​community, said in a tweet:

With the power of JAX / Flax and the new cloud TPUV3-8, masked LM can now be pre-trained in just 18 hours.

The Cloud TPU VM currently in preview is now available in the us-central1 and europe-west4 regions. These VMs are available from $ 1.35 per hour per TPU host machine with Google’s preemptive services. See the pricing page for more information. Finally, customers can quickly start training their ML models with JAX, PyTorch, and TensorFlow using CloudTPU and CloudTPU pods by leveraging documentation and JAX, PyTorch, and TensorFlow quickstarts.

