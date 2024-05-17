



On Tuesday, May 14th, Google announced its 6th generation TPU (Tensor Processing Unit) called Trillium.

The chip is essentially a TPU v6 and is the company's latest weapon in the AI ​​battle with GPU maker Nvidia and cloud providers Microsoft and Amazon, which have their own AI chips.

TPU v6 replaces two types of TPUv5 chips: TPUv5e and TPUv5p. The company said the Trillium chip is “his most high-performance, energy-efficient TPU to date.”

(Source: Google)

Google said at the IO conference in Mountain View, California, that the Trillium chip will run an AI model that is the successor to the current Gemini large language model.

performance

Google has made complete improvements to the chip. This chip provides 4.7x more peak computing performance per chip. It also doubles the high-bandwidth memory, internal bandwidth, and chip-to-chip interconnect speeds.

“We arrived at the 4.7x number by comparing the peak compute performance per chip (BF16) of Trillium TPU and Cloud TPU v5e,” a Google spokesperson said in an email to HPCwire.

BF16 performance on TPU v5e is 197 teraflops, and with a 4.7x improvement, BF16's peak performance on Trillium is 925.9 teraflops.

Significant performance improvements to Google's TPUs have been long overdue. The TPU v5e's 197 teraflops BF16 performance is actually down from the TPU v4's 275 teraflops.

Memory and bandwidth

The Trillium chip has next-generation HBM memory, but it's not clear whether it's HBM3 or HBM3e, which Nvidia uses in its H200 and Blackwell GPUs.

HBM2 capacity for TPU v5e was 16 GB, so Trillium capacity is 32 GB and can be used with both HBM3 and HBM3e. HBM3e provides the most bandwidth.

Server pods can pair up to 256 Trillium chips, and chip-to-chip communication is 2x better compared to TPU v5e. Google hasn't disclosed chip-to-chip communication speeds, but it could be 3,200 Gbps, double the TPU v5e's 1,600 Gbps.

Trillium TPUs are also 67% more energy efficient than TPU v5e, Google said in a blog entry.

Shorter chip release cycle

Trillium will replace the TPU brand name and become the brand behind future generations of chips. Trillium is based on the name of a flower and should not be confused with AWS's Trainium, which is an AI training chip.

Google wasted no time in releasing its 6th generation TPU. It's been less than a year since the company released his TPU v5 chip.

TPU v4, introduced in 2020, was left alone for three years before the release of TPU v5. The development of TPU v5 itself was mired in controversy.

Google claimed that its AI agent helped floorplan its TPU v5 chip about six hours faster than human experts.

Researchers involved in the TPU v5 AI design project have been fired or retired, and the claims are currently being investigated by Nature magazine. (https://www.hpcwire.com/2023/10/03/googles-controversial-ai-chip-paper-under-scrutiny-again/)

system

The server pod hosts 256 Trillium chips, and the AI ​​chips communicate 2x faster than a similar TPU v5 pod setup.

Pods can be combined into larger clusters, and communication is done over an optical network. Communication between pods is also 2x faster, providing the scalability needed for large AI models.

“Trillium TPUs can scale to hundreds of pods, connecting tens of thousands of chips in building-scale supercomputers interconnected in multi-petabits per second data center networks,” Google said.

A technology called multislice strings large AI workloads across thousands of TPUs in large clusters. This ensures high uptime and power efficiency of the TPU.

chip

The chip features a third-generation SparseCore, which is an intermediate chip near high-bandwidth memory where most of the AI ​​crunching happens.

SparseCore provides near-in-memory processing of data and supports new computing architectures being explored by AMD, Intel, and Qualcomm.

Typically, data must be moved from memory to a processing unit, which consumes bandwidth and creates chokepoints. Sparse computing models attempt to free up network bandwidth by moving processing units closer to memory clusters.

“Trillium TPUs enable us to train the next wave of foundational models faster, reduce latency, and deliver those models at lower cost,” Google said.

Trillium also has TensorCore for matrix calculations. Trillium chips are designed for AI and cannot run scientific applications.

The company recently announced Axion, the first CPU to pair with Trillium.

hyper computer

The Trillium chip will be part of Google's homegrown TPU-optimized Hypercomputer AI supercomputer design.

This design integrates compute, networking, storage, and software to support different AI consumption and scheduling models. A “calendar” system adheres to strict deadlines for when tasks should start, while a “flex start” model guarantees when tasks will finish and produce results.

Hypercomputers contain software stacks and other tools to develop, optimize, deploy, and tune AI models for inference and training. This includes JAX, PyTorch/XLA, and Kubernetes.

The hypercomputer will continue to work with GPU-optimized interconnect technologies, including Titanium offload systems and technologies based on the Nvidia H100 GPU.

availability

Trillium chips are expected to be available on Google Cloud, but Google has not disclosed an availability date. This will be the top of the line product and will be more expensive than the TPU v5 product.

The high price of GPUs in the cloud could make Trillium attractive to customers. Customers already using AI models available on Vertex, Google Cloud's AI platform, can also switch to Trillium.

AWS' Trainium chip is also available, but Microsoft's Azure Maia chip is primarily for inference.

Possible mitigation from GPU squeeze

Google has previously announced its TPUs as an AI replacement for Nvidia's GPUs. Google published a research paper comparing the TPU's performance to its equivalent Nvidia GPU.

Google recently announced that it will host specialized DGX boxes with Nvidia's new GPUs, the B200, and Blackwell GPUs.

Nvidia recently announced it would acquire Run.ai in a deal worth $700 million. The acquisition of Run.ai will allow Nvidia to keep its software stack independent from Google's stack when running AI models.

The TPU was originally designed for Google's homegrown models, but the company is trying to improve its mapping to open source models, including Gemma, a derivative of Gemini.

Sources 1/ https://Google.com/ 2/ https://www.hpcwire.com/2024/05/17/google-announces-sixth-generation-ai-chip-a-tpu-called-trillium/

