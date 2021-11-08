



At this morning’s virtual event, AMD CEO Lisa Su unveiled its latest and long-awaited server product. A new Milan-X CPU that leverages AMD’s new 3D V-Cache technology. The new Instinct MI200 GPU also offers up to 220 compute cores across two Infinity Fabric connected dies, delivering incredible 47.9 peak double-precision teraflops.

With the increasing need to deploy additional computing performance that is delivered more efficiently and at scale to power the services and devices that define modern life, we are high performance computing. I’m in the megacycle.

AMD’s new 3D Epyc CPU (coded by Milan-X) with AMD 3D V-Cache is the company’s first server CPU with 3D chiplet technology. The processor has three times the L3 cache compared to a standard Milan processor. In Milan, each CCD had 32 megabytes of cache. Milan-X adds a 64MB 3D stack cache on top, for a total of 96MB per CCD. For more than 8 CCDs, up to 768 MB of L3 cache is added. With the addition of the L2 and L1 caches, there is a total of 804 megabytes of cache per socket.

Milan-X is built on the same Zen 3 cores as Milan, with a total of 64 cores. The extended processor is compatible with existing platforms after the BIOS upgrade.

According to AMD, Milan-X and 3D V-Cache employ a hybrid bond plus silicon penetration via approach, with more than 200 times the interconnect density of 2D chiplets and 15 compared to existing 3D stacking solutions. Provides more than double the density. The die-to-die interface uses a direct copper-copper bond without solder bumps to improve heat, transistor density, and interconnect pitch.

According to AMD, Milan-X performance is improved by 50% for targeted technical computing workloads compared to Milan processors. Microsoft Azure is the first announced customer, with all new HBv3 instances previewed today and server partners Dell Technologies, HPE, Lenovo and Supermicro preparing their products for the first quarter of 2021. ISV ecosystem partners include Altair, Ansys, Cadence and Siemens. And Synopsys.

AMD has run Synopsys’ verification solution VCS to demonstrate faster performance of Milan-X on EDA workloads. The 16-core Milan-X with AMD’s 3D V-Cache delivers 66% faster RTL verification than standard Milan without V-Cache. VCS is used by many of the world’s top semiconductor companies to detect defects early in the development process before the chip is committed to silicon.

Manufactured on TSMC’s 6nm process, the MI200 is the world’s first multi-chip GPU, designed to maximize compute and data throughput in a single package. The MI200 series includes two cDNA 2 GPU dies that utilize 58 billion transistors. It has up to 220 compute units and 880 2nd generation matrix cores. Eight stacks of HBM2e memory provide a total of 128 gigabytes of memory at 3.2 TB / s with four times the capacity and 2.7 times the bandwidth of the MI100. Connecting the two CDN 2 dies is an Infinity Fabric link running at 25 Gbps with a total bidirectional bandwidth of 400 GB / s.

The MI200 Accelerator answers questions on the surface. What if the chip designer dramatically optimized the GPU architecture for double precision (FP64) performance? The OAM form factor MI250X increases peak double precision by 4.2 times over MI100 (47.9 teraflops compared to 11.5 teraflops). By comparison, AMD noted that Nvidia improved the traditional double-precision FP64 peak performance of server GPUs by 3.7x between 2014 and 2020. See the graph on the right.

The Instinct MI250X offers 47.5 teraflops of single precision (FP32) performance, 2.5 times the previous model, and provides 383 teraflops of peak theoretical half precision (FP16) for AI workloads. Its high-density computing capabilities cannot be achieved without power costs. According to AMD, the top of the stack, the OAM MI250X, consumes up to 560 watts, but has low power components for system-level air cooling and other configurations.

At this morning’s launch event, Forrest Norrod, Senior Vice President and General Manager of AMD’s Data Center and Embedded Solutions Business Group, directly compared MI200 OAM with Nvidia’s A100 (80GB) GPUs in a variety of HPC applications. .. In AMD testing, a single-socket 3rd generation AMD Eypc server with a single AMD Instinct MI250X OAM 560 watt GPU achieved a median of 42.26 teraflops on a high-performance Linpack benchmark.

Norrod also performed a combustion simulation of hydrocarbon molecules to show a competitive comparison of MI200OAM and Nvidia A100 (80GB) with molecular simulation code LAMMPS. The simulation time lapse shows that four MI250X560 watt GPUs complete the job in less than half the time of four A100SXM 80GB 400 watt GPUs.

The MI200 Accelerator incorporates 3rd generation AMD Infinity Fabric technology. Up to eight Infinity Fabric links connect the AMD Instinct MI200 to 3rd generation Epyc Milan CPUs and other GPUs in the node for integrated CPU / GPU memory consistency.

AMD has also introduced Elevated Fanout Bridge (EFB) technology. “Unlike the silicon bridge architecture embedded in the board, EFB allows the use of standard board and assembly techniques, providing greater accuracy, scalability, and yield while maintaining high performance,” Norrod said. Mr. says.

Three new MI200 series form factors have been announced. MI250X and MI250 available in the Open Hardware Compute Accelerator Module or OCP Accelerator Module (OAM) form factor. In addition, the PCIe card form factor AMD Instinct MI210 will be available on OEM servers.

AMD MI250X Accelerators are now available on HPE or HPE CrayEX supercomputers. Other MI200 series accelerators, including the PCIe form factor, are scheduled for Q1 2022 from server partners such as ASUS, ATOS, Dell Technologies, Gigabyte, HPE, Lenovo and Supermicro.

The MI250X Accelerator will be the leading computing engine for the next 1.5 exaflop (peak) supercomputer frontier currently installed at DOE’s Oak Ridge National Laboratory. Each of the more than 9,000 frontier nodes contains one “optimized 3rd generation AMD Epyc CPU” instead of Milan-X linked to four AMD MI250X accelerators via AMD’s coherent Infinity Fabric.

As detailed recently, the MI200 powers three giant systems on three continents. In addition to the frontier, which is expected to be the first exascale computer in the United States to come online next year, the MI200 has been selected for the European Union’s Pre-Exascale LUMI system and Australia’s Petascale Setonics system.

“As our momentum gained, Milan’s adoption far outstripped Rome,” Sue said. Looking to the future of the roadmap, the next-generation Genoa Epyc platform will feature up to 96 high-performance 5nm Zen 4 cores and support next-generation memory and IO features DDR5, PCIe Gen 5, and CXL. According to AMD, Genova is currently sampling customers who are expected to produce and launch next year.

“We worked with TSMC to optimize 5nm for high performance computing,” says Su. “”[The new process] It offers twice the density, twice the power efficiency, and 1.25 times the performance of the 7nm process used in today’s products. “

Su also announced a new version of Zen4 for cloud native computing called “Bergamo”. Bergamo features up to 128 high-performance Zen4 C cores and comes with a complete suite of other Genova features such as DDR5, PCIe Gen 5, CXL 1.1, and InfinityGuard security features. In addition, it is socket compatible with Genoa, which has the same Zen4 instruction set. Bergamo is scheduled to begin shipping in the first half of 2023, Sue said.

“Investing in a multi-generational CPU core roadmap combined with advanced process and packaging technologies gives us leadership across general-purpose technical computing and cloud workloads,” says Su. “You can count on us to continue pushing the boundaries of high performance computing.”

AMD also announced version 5.0 of ROCm. It is an open software platform that supports environments across multiple accelerator vendors and architectures. “ROCm 5.0 adds MI200 support and optimization, extends ROCm support and improves developer tools to increase end-user productivity, including the Radeon Pro W6800 workstation GPU,” AMD said. Brad McCredie, Corporate Vice President of GPU Platforms, said. At a media briefing last week.

The company also has a new Infinity Hub that gives developers access to HIP and OpenMP documentation, tools, and educational materials, while system administrators and scientists have containerized HPC apps optimized and supported on the AMD platform. You can download the ML framework.

Market Watcher Addison Snell, CEO of Intersect360 Research, commented on today’s newsraft: MilanX or MI200 creates statements with its own multiple statements based on benchmarks. Using consistent memory with Infinity Fabric is a milestone that neither Intel nor Nvidia can respond immediately. “

