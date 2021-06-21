



In a world where Moore’s Law is slowing down and hardware needs to be increasingly co-designed with the system software stack and the applications running on it, the matrix of possible combinations of hardware is wider and deeper. .. Above all, this marks the end of the era of general-purpose CPU computing. But it also does the job of choosing the right hardware for a particular workload, 20 years ago, or even 10 years ago, when general-purpose X86 servers were the safest bet and almost everyone adopted it. It’s much more difficult than it sounds.Intel in the data center

Over the last 20 years of the 21st century, various CPUs, GPUs, FPGAs, and custom ASICs, especially for running AI workloads, in addition to fine-grained capacity that is specifically sized for workloads and charged on an hourly basis. A wide range of computing specializations have been made. More and more rules. The public cloud allows enterprises to test which combination of capacity and capabilities is appropriate before making a large capital commitment. As a result, computing is skyrocketing between hyperscalers and cloud builders. We need a treasure trove of this math to drive our own workloads and they are letting us rent it out. This means that buyer reflections and computational matrices can be offloaded to the hyperscaler, which is also a cloud builder.

Nothing proves this better than Google’s announcement of a new Tau instance type on Google Cloud.

Tau, which most people know, is a Greek symbol for the golden ratio, and as Google’s senior vice president of technology infrastructure Urs Hlzle explains for the next platform, the name is given by the company. The balance of compute, memory, and I / O is meant to convey what you’re trying to get, a particular scale commonly performed by search engine and application giants, and clear competitors in public cloud racing. Great for out-workloads. And to be precise, the types of workloads we’re talking about when Google sales scale out are search engines, web services, and other workloads like that.

Tau t2d is the first instance of an instance family that is very likely to include other processors tuned to provide better value for very specific workloads. The t2d instance is based on AMD’s Milan Epyc 7003 processor single-socket implementation. In this case, up to 60 cores are activated, from which you can split into smaller bits. Since the special Milan chip has a total of 64 cores, four of these cores are used to manage the KVM hypervisor and other storage and network functions. As far as we know, Google doesn’t have a complete in-house DPU to handle this task, so all cores are hypervisors and their Frees you from executing I / O. Modified KVM hypervisor. However, I strongly suspect that Google has a SmartNIC in some way that can offload some storage and network features without reaching the DPU. As a result, your Tau t2d instance has 60 of the 64 cores that you can use to perform your work. Otherwise, 30% of the CPU core will be burned out by the hypervisor, storage, and I / O overhead.

Neither Google nor AMD are specific about the feed and speed of this special Milan Epyc 7003 chip. This is annoying, but expected. Ultimately, t2d will be visible when it’s available on Google Cloud, so the Tau instance has 56% better performance and 42% better price / performance than the ArmGraviton2 instance on Amazon Web. Why does Google tell me? SPECrate2017_int_base Services and even wider margins for Cascade Lake Xeon SP instances in Microsoft Azure running integer benchmark tests.

Here are the performance differences that Google sees in its own SPEC test:

And here is the price / performance difference:

As you can see, Google has normalized this data for a Graviton2 m6g.8xlarge instance. This instance has 32 vCPUs and 128 GB of memory, and the link to the network is 12 Gb / sec. The rental fee is $ 1.232 per hour on demand. With 32 vCPUs and 128GB of memory, a Tau instance costs $ 1.352 per hour to rent on demand, which makes it a bit more expensive, but with much higher performance. The question is what is Google paying AMD to get this 64-core Milano part for t2d instances. cost.

The Microsoft Azure instance specifically identified in this document was a D32s_v4 instance with 32 vCPUs, 128 GB of memory, and a 16 Gb / sec network. It costs $ 1.536 per hour.

The Microsoft Azure D32s_v4 instance is not using the latest Ice Lake Xeon SP. This may close some of the performance gap between Milan and Graviton 2 and close the price-performance gap, but it depends on the amount Microsoft charges. AzureDS_v5 instances based on Ice Lake Xeon SP have been publicly previewed since late April. With 32 vCPUs and 128 GB of memory, the D32s_v5 ​​instance costs only $ 0.768 per hour, improving raw integer performance per core by about 20%. About half the price. The calculation is as follows.

It seems that Microsoft was able to get one hellish discount on the Ice Lake Xeon SP, as it strongly suspects that Microsoft hasn’t lost money on the instance. And if the integer performance scales from Cascade Lake to Ice Lake as expected, then Ice Lake appears to be the price / performance winner. Yes, this is amazing. No, it’s not surprising why Google didn’t run tests on Azure Ice Lake instances that aren’t yet generally available. But Google definitely did the same math we did. Winning all benchmarks is instant. All we know is that the competition is good, which makes all cloud vendors fiercely competing in the dollar.

AWS and Azure instances scale up to 48 and 64 vCPUs, but it’s strange that Google didn’t show off even more with maximum performance. Maybe DPU is useful here?

By the way, the 56% performance improvement is due to the use of the AMD Optimizing C / C ++ (AOCC) compiler. The AMD OptimizingC / C ++ (AOCC) compiler is highly tuned to the Epyc architecture, just as the Intel compiler is highly tuned for the Xeon SP chip. With the open source GCC11 compiler, there was only a 25% performance improvement over Graviton 2. So the price went down because half of the performance gains came from the compiler, half from the chip, the Tau instance didn’t have the maximum memory, and probably some other features were down. (Therefore, Google can make less money by paying less for chips.)

This shows how three different VMs stack together in the CoreMark benchmark test, which is a common way to measure CPU performance. For some reason, Google only talks about price / performance here.

The Tau t2d instance will be available as an instance of Google Compute Engine (equivalent to AWS EC2 and Microsoft Azure VMs) in the third quarter, as well as the underlying compute type of Google Kubernetes Engine. This is a container platform service that is also available on Google. Cloud Public cloud.

By the way, you can expect Google to have Tau instances based on other processors.

For us, this is the first instance of the Tau family, Hlzle tells The Next Platform. We will continue the family with other chipsets, preferably from AMD, perhaps from other chipsets. The practice is to create a configuration that is very suitable for this type of user. I think it’s really great to be able to create this gap in the X86 world without compromising and without recompiling or relicensing software to customers on a different architecture. But as you know, Arm continues to be considered a competitor and embraces any solution that helps our customers. However, AMD wants to say that even with Arm Graviton 2, as the numbers show, it’s clearly one step ahead in its workload category.

All targets in the data center are always in motion. It’s amazing that everyone hits something. You just need to heat the target coming to your head, we guess, and keep shooting.

