Optimize traffic for AI apps generated by network functions

Many companies are exploring how to incorporate the benefits of generative AI (gen AI) into their business. According to the 2023 Gartner report, “We Shape AI, AI Shapes Us: 2023 IT Symposium/Xpo Keynote Insights” (October 16, 2023), most organizations use or plan to use AI on a daily basis to improve productivity. In the 2024 Gartner CIO and Technology Executive Survey, 80% of respondents said they plan to adopt generative AI within three years.1

Enterprises looking to deploy large-scale language models (LLMs) face a unique set of network challenges compared to serving traditional web applications, because generative AI applications behave very differently than most other web applications.

For example, web applications typically exhibit predictable traffic patterns, with requests and responses being processed in a relatively short time (usually in milliseconds). In contrast, due to their multimodal nature, gen AI inference applications can have fluctuating request/response times, which can present unique challenges. At the same time, LLM queries often consume 100% of a GPU or TPU's compute time, compared to more typical request processing that runs in parallel. Due to the computational cost, inference latencies can range from seconds to minutes.

Typical Web Traffic

Generation AI Traffic

As a result, traditional round-robin or utilization-based traffic management techniques are generally not suitable for gen AI applications. To deliver the best end-user experience for gen AI applications and efficiently use limited and costly GPU and TPU resources, we recently announced several new networking capabilities to optimize traffic for AI applications.

Many of these innovations are built into Vertex AI and are now available with cloud networking so you can use them with whichever LLM platform you choose.

Let's take a closer look.

1. Accelerating AI training and inference with cross-cloud networks

According to an IDC report, 66% of enterprises cite generative AI and AI/ML workloads as one of the primary use cases for multi-cloud networking.2 This is because the data required for model training/fine-tuning, search augmentation generation (RAG), or grounding resides in many different environments. This data must be accessed or copied remotely so that the LLM model can access it.

Last year, Google introduced the Cross-Cloud Network, which provides service-centric any-to-any connectivity built on Google's global network, making it easier to build and assemble distributed applications across clouds.

Cross-Cloud Network includes products that provide reliable, secure, SLA-backed cross-cloud connectivity for high-speed data transfer between clouds, helping move the vast amounts of data required to train AI models. Products included in the solution include Cross-Cloud Interconnect, which provides managed interconnection with 10 Gbps or 100 Gbps bandwidth with a 99.99% SLA and end-to-end encryption.

In addition to providing secure and reliable data transfer for AI training, cross-cloud networking also enables you to run AI model inference applications across hybrid environments, for example, by accessing models running in Google Cloud from application services running in another cloud environment.




