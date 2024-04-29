



Google Cloud is rapidly expanding its portfolio of solutions for high performance computing, highlighted by a series of announcements at its latest event, Google Cloud Next 24. Conference attendees were introduced to new products and updates to a suite of highly customizable products designed specifically for enterprises. HPC community.

H3 VM

The update includes several new benchmark results, including those for the Google Cloud H3 virtual machine series. H3 VM enables HPC workloads for applications such as climate modeling, scientific computing, and engineering simulation. Built on Intel's 4th generation Xeon platform, the H3 VM consolidates computing, networking and storage into one HPC-optimized platform.

Recent industry-standard benchmark results show that H3 VMs deliver up to 3x better performance per node, better scalability for multi-node workloads, and the highest price-performance ratio compared to the company's previous generation C2 instances. It has been shown to be 2x better.

A3 VM shines with MLPerf Inference v4.0

Turning to AI performance, Google Clouds A3 VMs showed impressive results in the latest MLPerf Inference v4.0 benchmark tests. A3 VMs are designed for training advanced AI models such as LLM and combine NVIDIA H100 GPUs with Google's cutting-edge networking technology.

Google submitted 20 results across seven models for MLPerf, including Stable Diffusion XL and Llama 2 (70B) with A3 VM. All results were within 0-5% of the peak performance demonstrated by NVIDIA submissions.

parallel store

Parallelstore, part of the Google Cloud HPC Toolkit, is a storage system based on Intel's open source DAOS project. Parallelstore optimizes resources for data-intensive AI/ML workloads by eliminating redundant data storage, reducing costs and GPU idle time.

The service is currently in private preview, but Intel's latest benchmarks show there's reason to expect a broader release. Performance results are 96 GiB/s reads and 60 GiB/s writes, with IO latency as low as 0.28 ms random read and 0.28 ms. In a distributed Google Cloud environment, 0.36 ms per random write was demonstrated.

Adding cloud HPC toolkits: Blueprints for ML and CAE

Cloud HPC Toolkit has two interesting new blueprints. The first is a blueprint for an ML workload (including LLM training) running on an A3 VM with an NVIDIA H100 Tensor Core GPU that requires the user to carefully manage the infrastructure and network configuration. Allow the system to spin up. The Cloud HPC Toolkit ML Blueprint enables this through components including the open source scheduler Slurm, a fully managed Filestore, preconfigured user environments, and more.

The second new solution is a computer-aided engineering blueprint. CAE workloads are compute-intensive applications such as structural, fluid mechanics, thermal, and electromagnetic analysis. The innovative CAE reference architecture blueprint leverages the power of the H3 and C3 VM families to deliver robust performance for leading CAE software such as Ansys Fluent and Siemens Simcenter STAR-CCM+ to support memory-intensive workloads. and ensure efficient handling of complex resource management.

Customer Success Story: Stanford University

Stanfords Doerr School of Sustainability leverages Google Cloud's HPC toolkit to meet the growing demands of researchers. The toolkit's flexible deployment options allow Stanford University to seamlessly integrate cloud computing and on-premises resources and provide a consistent and familiar user interface through Chrome Remote Desktop. This approach allows researchers to remotely access interactive nodes while maintaining a similar experience to using an on-premises cluster.

As a testament to the unparalleled customization provided by the HPC Toolkit, the school has developed a unique module to safely and efficiently use Vertex AI instances for code development.

Robert Clapp, senior research engineer at Stanford University, explains how the HPC Toolkit enables fast, secure, and consistent HPC deployments at scale. Toolkit allows you to spin up clusters with different partitions depending on your needs, so you can take advantage of: When needed, he leverages the latest hardware such as NVIDIA GPUs and leverages Google Cloud's workload-optimized VMs to meet price-performance goals. Dynamic cluster sizing, the ability to use spot VMs when appropriate within cluster partitions, and the ability for researchers to get up and running quickly in a familiar environment are all enhanced by the toolkit.

See Google Cloud at ISC 2024

The rapid pace of innovation means this is an exciting time for HPC customers. Hot on the heels of the impressive Google Cloud Next 24 just a few weeks ago is another big event in Hamburg, Germany: ISC High Performance 2024. The May 12-16 conference and exhibition will focus on the latest advances in HPC, machine learning, data analytics, and quantum computing. The Google Cloud team will be on hand to connect with the HPC community and demonstrate our continued growing innovations in HPC. For more information, please visit booth D19.

