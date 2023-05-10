



Implementing state-of-the-art artificial intelligence (AI) and machine learning (ML) models requires a large amount of computation to train the underlying model and serve the trained model. Given the demands of these workloads, a one-size-fits-all approach is not enough and you need an infrastructure purpose-built for AI.

We work with partners to provide a wide range of compute options for ML use cases such as Large Language Models (LLM), generative AI, and diffusion models. We recently announced the G2 VM. This will be the first cloud to offer new NVIDIA L4 Tensor Core GPUs for processing generative AI workloads. Today, we are expanding that portfolio by launching a private preview of our next-generation A3 GPU supercomputer. Google Cloud now offers a full range of GPU options for training and inferring ML models.

The Google Compute Engine A3 supercomputer is built to train and serve the most demanding AI models driving innovation in generative AI and language models at scale today. Our A3 VMs combine NVIDIA H100 Tensor Core GPUs with Google’s cutting-edge networking advancements to serve customers of all sizes.

A3 is the first GPU instance to use custom-designed 200 Gbps IPUs, where GPU-to-GPU data transfers bypass the CPU host and flow over a separate interface from other VM network and data traffic . This enables up to 10x more network bandwidth compared to A2 VMs, with lower tail latency and higher bandwidth stability.

Our industry-unique intelligent Jupiter data center networking fabric scales to tens of thousands of highly interconnected GPUs, enabling full-bandwidth, reconfigurable optical links that can adjust their topology as needed. Delivers workload bandwidth indistinguishable from more expensive commercial non-blocking network fabrics for virtually any workload structure and lower TCO.

The A3 supercomputer scale provides up to 26 exaflops of AI performance, significantly improving the time and cost of training large-scale ML models.

As enterprises move from training to serving ML models, A3 VMs are also a strong fit for inference workloads, delivering up to 30x inference performance improvement over A2 VMs powered by NVIDIA A100 Tensor Core GPUs* .

Designed for performance and scalability

A3 GPU VMs are purpose-built to deliver the highest performance training for today’s ML workloads with the latest CPUs, improved host memory, next-generation NVIDIA GPUs, and major networking upgrades it was done. Key features of A3 are:

8 H100 GPUs with NVIDIA Hopper Architecture for 3x Compute Throughput

3.6 TB/s bisection bandwidth between A3s 8 GPUs via NVIDIA NVSwitch and NVLink 4.0

Next Generation 4th Generation Intel Xeon Scalable Processors

2 TB of host memory with 4800 MHz DDR5 DIMMs

10x network bandwidth enhanced by our hardware-enabled IPUs, specialized server-to-server GPU communication stack, and NCCL optimizations

A3 GPU VMs are a step forward for customers developing state-of-the-art ML models. By significantly accelerating ML model training and inference, A3 VMs enable enterprises to train more complex ML models faster, enabling customers to use Large Language Models (LLM), generative AI, and diffusion Create opportunities to build and support models. Optimize your operations and stay ahead of the competition.

This announcement builds on our partnership with NVIDIA to provide customers with a full range of GPU options for ML model training and inference.

Powered by next-generation NVIDIA H100 GPUs, Google Cloud’s A3 VMs will accelerate the training and serving of generative AI applications, said Ian Buck, vice president of Hyperscale and High Performance Computing at NVIDIA. I’m here. Following the recent launch of Google Cloud’s G2 instance, he is proud to continue working with Google Cloud to help transform enterprises around the world with its AI infrastructure.

Fully managed AI infrastructure optimized for performance and cost

Customers looking to develop complex ML models without maintenance can deploy A3 VMs on Vertex AI. Vertex AI is an end-to-end platform for building ML models on a fully managed infrastructure purpose-built for low-latency services and high performance. training. Today, at Google I/O 2023, we are excited to build on these services by opening up Vertex AI’s generative AI support to more customers and introducing new features and underlying models.

Customers looking to design their own custom software stacks can also deploy A3 VMs on Google Kubernetes Engine (GKE) and Compute Engine. This enables automatic scaling, workload orchestration, and automatic upgrades.

Google Cloud’s A3 VM instances provide compute power and scale for the most demanding training and inference workloads. We look forward to leveraging their AI expertise and leadership in infrastructure at scale to provide a powerful platform for ML workloads. – Noam Shazeer, CEO, Character.AI

AI is built into Google Cloud’s DNA. We applied decades of experience running global scale computing for AI. Its infrastructure is designed to be scaled and optimized to run a variety of AI workloads and is available today. To join A3’s preview waiting list, register at this link.

*Data source: https://www.nvidia.com/en-us/data-center/h100/

