



Google has designed its own new processor, the Argos Video (Trans) Coding Unit (VCU). This has one sole purpose: video processing. A new, highly efficient chip has allowed tech giants to replace tens of millions of Intel CPUs with their own silicon.

For many years, Intel’s video decoding / encoding engine built into the CPU has dominated the market because of its state-of-the-art performance, features, and ease of use. However, custom-built application-specific integrated circuits (ASICs) tend to outperform general-purpose hardware because they are designed specifically for a single workload. As a result, Google turned to developing its own dedicated hardware for YouTube’s video processing tasks, which was a huge success.

However, Intel may have a trick with the latest technology that could regain Google’s specialty video processing business.

Massive video requires new hardware

Users upload over 500 hours of video content per minute to YouTube in a variety of formats. Google puts its content in multiple resolutions (144p, 240p, 360p, 480p, 720p, 1080p, 1440p, 2160p, 4320p, etc.) and data-efficient formats (for example, requires H.264, VP9, ​​or AV1). You need to transcode quickly. Awesome encoded horsepower.

Previously, Google had two options for transcoding / encoding content. The first option was Intel’s Visual Computing Accelerator (VCA), which included three Xeon E3 CPUs with an Iris Pro P6300 / P580 GT4e integrated graphics core with state-of-the-art hardware encoders. The second option was to use software encoding and a generic Intel Xeon processor.

Google has determined that neither option is power efficient enough for the new YouTube workload. While the Visual Computing Accelerator itself requires a lot of power, scaling the number of Xeon CPUs essentially means increasing the number of servers, adding power and a data center footprint. To do. As a result, Google decided to adopt custom in-house hardware.

(Image credit: Google)

Google’s first-generation Argos VCU does not completely replace Intel’s central processing unit, as the server still needs to run the OS and manage storage drives and network connections. For the most part, Google’s Argos VCU is similar to a GPU that always requires an associated CPU.

Instead of the stream processor found on GPUs, Google’s VCU has 10 H.264 / VP9 encoder engines, several decoder cores, and 4 LPDDR4-3200 memory channels (with a 4×32 bit interface). , PCIe interface, DMA engine, and a small general purpose core for scheduling purposes. Most IPs, with the exception of in-house designed encoders / transcoders, are licensed by third parties to reduce development costs. Each VCU is also equipped with 8GB of usable ECC LPDDR4 memory.

The main idea behind Google’s VCUs is to place as many high-performance encoders / transcoders as possible on a single silicon (while maintaining power efficiency) and scale the number of VCUs separately from the number of servers required. Is to do. Google has placed two VCUs on the board and installed 10 cards per dual-socket Intel Xeon server to significantly improve its decoding / transcoding performance per rack.

Increased efficiency leads to migration from Xeon

According to Google, the company’s VCU-based machines have up to 7x (H.264) and up to 33x (VP9) performance / TCO calculation efficiency compared to server systems with Intel Skylake. .. This improvement takes into account the cost of the VCU (for Intel CPUs) and the three-year operating cost, making the VCU an easy choice for video giant YouTube.

Offline 2-pass single output (SOT) throughput system Throughput (MPix / s) Throughput (MPix / s) performance / TCO Performance / TCOH.264VP9H.264VP92-way Skylake7141541x1x4x Nvidia T12.4 x-8x on CPU, GPU, and VCU-powered systems Google Argos VCU5,9736,1224.4×20.8x20x Google Argos VCU14,93215,3067×33.3x

Performance figures shared by Google show that a single Argos VCU is barely faster than the H.264 2-way Intel Skylake server. However, you can install 20 VCUs on such a server, so VCUs dominate in terms of efficiency. But when it comes to the more demanding VP9 codec, Google’s VCU appears to be five times faster than Intel’s dual-socket Xeon, offering amazing efficiency benefits.

Since Google has been using the Argos VCU for the last few years, it has apparently replaced many of its Xeon-based YouTube servers with machines running its own silicon. It’s very difficult to estimate how many Xeon systems Google actually replaced, but some analysts believe that tech giants could replace 4 to 33 million Intel CPUs with their VCs. I am. Even if the second number is overrated, we’re still talking about millions of units.

(Image credit: Google)

As Google requires a large number of processors for other services, the number of CPUs it buys from AMD or Intel is still very high, years before Google’s own data center grade, so soon. May not decrease. System on chip (SoC) is ready.

It’s also worth noting that if you try to use an innovative encoding technology (such as AV1) now because Argos doesn’t support codecs, Google will also need to use a generic CPU on YouTube. In addition, as more efficient codecs emerge (these tend to be more demanding in terms of computing power), Google will need to continue to use the CPU for its initial deployment. Ironically, the benefits of dedicated hardware will only grow in the future.

Google needs to make its encoding technology even more efficient, so it’s already working on a second-generation VCU that supports the AV1, H.264, and VP9 codecs. It’s unclear when a new VCU will be introduced, but it’s clear that the company wants to use its own SoC instead of a general-purpose processor whenever possible.

Intel is not stationary

However, Intel hasn’t stopped yet. The company’s DG1 Xe-LP-based quad-chip SG1 server card can decode up to 28 4Kp60 streams and transcode up to 12 simultaneous streams. Basically, Intel’s SG1 is exactly what Google’s Argos VCU does. Scales video decoding and transcoding performance separately from the number of servers, reducing the number of general purpose processors required in the data center used for video applications.

(Image credit: Intel)

Intel will provide simultaneous transcoding of 10 high-quality 4Kp60 streams with future single-tile Xe-HP GPUs. Keeping in mind that some Xe-HP GPUs scale to 4 tiles and you can install multiple GPUs per system further enhances Intel’s market-leading media decoding and encoding capabilities.

Overview

Google has successfully built a remarkable video (trans) coding unit (VCU) that supports H.264 and VP9. This unit can provide significantly higher efficiency for video encoding / transcoding workloads than Intel’s existing CPUs. In addition, VCU allows Google to scale video encoding / transcoding performance regardless of the number of servers.

Still, Intel already has Xe-LP GPUs and SG1 cards that offer some full-fledged video decoding and encoding capabilities, so Intel will be successful in video streaming-intensive data centers. In addition, with the introduction of Intel’s Xe-HP GPU, the company is committed to solidifying its position in this market.

