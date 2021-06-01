



The AMD team surprised us here. What seemed like a very common Computex keynote was an incredible demonstration of AMD testing in the lab using TSMC’s new 3D fabric technology. As we’ve talked about 3D fabrics earlier, AMD is making good use of 3D fabrics by stacking processors with additional cache to deliver ultra-fast bandwidth and improve game performance. That’s just a claim, and AMD showcased a new demo processor on the Computex stage. Here are the details of the actual content.

3D Chiplet: Next Step

AMD announced at its Financial Analyst Day in March 2020 that it is considering 3D stacking technology using “X3D”. In a very strange figure, there is a chiplet processor on the outside that looks like an HBM stack or some kind of memory. At the time, AMD said it was a combination of 2.5D and 3D packaging technologies that allowed more than 10 times the bandwidth density. The “X” in “X3D” is meant to represent a hybrid, and the technology was set for the “future.” Since then, TSMC has announced a technical line of 3D fabrics, a broad name for the combination of 2.5D and 3D integrated products.

Today, AMD announced the first phase of its 3D chiplet journey. The first application is a stack cache on top of a standard processor chiplet. On stage, Lisa Su exhibited one of AMD’s Ryzen 5000 dual chiplet processors with Zen 3 cores. One of the computing chipsets has 64 MB of SRAM built on top of TSMC’s 7nm integrated at the top, effectively triple the amount of cache that the core can access.

In other words, the original Ryzen 5000 chiplet with 8 cores that can access the 32MB L3 cache becomes a complex of 8 cores that can access the 96MB L3 cache. The two dies are coupled by silicon through vias (TSVs), passing power and data between the two. AMD claims that the total bandwidth of the L3 cache exceeds 2 TB / sec. This is technically faster than the die’s L1 cache (although it has longer latency).

As part of the chip diagram, the TSV is a direct bond of copper to copper. Since the cache dies are not the same size as the core complex, additional structural silicon is required to ensure equal pressure on both the lower calculation die and the upper cache die. Both dies are thinned to enable new chiplets with the same board and heat spreader technology currently used in Ryzen 5000 processors.

The prototype processor displayed on stage had one of the chipsets that used this new caching technology. The other chiplet remains standard to show the difference, and one chiplet that the cash die is “exposed” reveals it so that it can be compared to a regular non-integrated chiplet. did. CEO Dr. Lisa Su said the 64MB SRAM in this case is a 6mm x 6mm design (36mm2) and is located in less than half the die area of ​​a complete Zen 3 chiplet.

In full product, Lisa explained that stack cache is enabled on all chiplets, with a cache of 96 MB per chiplet, or a total of 192 MB on such processors with 12 or 16 cores.

As part of the technology, this packaging allows for more than 200 times the interconnect density compared to regular 2D packaging (what we already know from HBM stacking), and microbump technology (Intel’s Foveros). Bow), and more than 3 times better interconnection efficiency compared to micro bumps. The TSV interface is direct die-to-die copper wiring. In short, AMD uses TSMC’s chip-on-wafer technology. Dr. Sue argued on stage that these features made it the industry’s most advanced and flexible “active-on-active” chip stacking technology.

For a performance demonstration, AMD compared before and after using Gears of War 5. One was a standard Ryzen 9 5900X 12-core processor and the other was a prototype using the new 3D V-Cache built on the Ryzen 9 5900X. .. Both processors are fixed at 4GHz and are paired with an unnamed graphics card.

The comparison point in this scenario is that one processor has a 64 MB L3 cache and the other processor has a 192 MB L3 cache. One of the selling points of the Ryzen 5000 processor was the enhanced L3 cache available on each processor to support game performance. Moving this up to 96 MB per chiplet further extends that benefit, with AMD showing + 12% FPS gain (184 FPS vs. 206 FPS) and increased cache size at 1080p. In a series of games, AMD claimed an average game performance of + 15%.

DOTA2 (Vulcan): + 18% Gears 5 (DX12): + 12% Monster Hunter World (DX11): + 25% League of Legends (DX11): + 4% Fort Knight (DX12): + 17%

This is by no means an exhaustive list, but it is an interesting reading. AMD’s claim here is that the + 15% bump resembles a jump of the full architectural generation, effectively enabling rare improvements due to philosophical design differences. Here at AnandTech, keep in mind that philosophical design enhancements can be a major driver of future performance as it becomes more difficult to drill down into new process nodes.

AMD says it has made great strides in this technology and plans to start production on the finest processors by the end of the year. No mention was made of which product it came from, whether it was a consumer or a company. Befitting this, AMD states that Zen 4 will be available in 2022.

AnandTech analysis

Well that was unexpected. I knew AMD would invest in TSMC’s 3D fabric technology, but I don’t think I expected this to be demoed on desktop processors soon or first.

Starting with technology, this is clearly TSMC’s SoIC chip-on-wafer working, but with only two layers. TSMC showed 12 layers, but they were inactive layers. The problem with silicon stacking lies in activity and subsequent heat. Other TSV stack hardware, such as HBM, has proven to be the best way to do this, as SRAM / memory / cache does not add much to the processor’s thermal requirements. The downside is that the cache that stacks at the top is just a cache.

This is where AMD and Intel stacking is different. By using TSVs instead of microbumps, AMD can get more bandwidth and power efficiency from TSVs, but it can also stack multiple chiplets higher if desired. TSVs can carry power and data, but they must be designed around two for cross-signaling. Intel’s Foveros technology, which is also 3D stacking, relies on microbumps between the two chiplets. These are larger and consume more power, but Intel can place logic on both the lower and upper dies. Another factor is thermal. It is usually closer to a heat spreader / heatsink, so the top die logic needs to better manage the thermals, but moving the logic away from the board requires power to be transported to the top die. .. Intel wants to mix microbumps and TSVs in future technologies, and TSMC has a similar roadmap for the future of its customers.

Moving on to the chiplet itself, the 64 MBL3 cache chiplet was claimed to be 6 mm x 6 mm, or 36 mm2, built on TSMC 7nm. The fact that it is built on TSMC 7nm is an important point here. Cache chiplets may seem suitable for cheaper process nodes. The cost trade-off is power and die area (yields on such small die sizes are not worth considering). If AMD creates these cache chipsets with TSMC 7nm, the Zen 3 with additional cache will normally require 80.7 mm2 for the Zen 3 chiplet, then another 36 mm2 for the cache, 45% more per processor. Requires practically no silicon. There is currently a shortage of silicon, which can affect the number of more widely available processors. This may be the reason AMD said it was considering “high-end” products first.

Currently, adding a 64MB cache to a chip that already has a 32MB L3 cache is not as easy as it looks. If AMD integrates it directly as an adjacency to the L3 cache, then there is a two-tier L3 cache. It may require more power to access that 64MB, but it provides more bandwidth. Whether a regular 32 MB is sufficient compared to the additional 64 MB provided by the stack die depends on the workload. The extra 64 MB can be considered an equivalent L4 cache, but the problem here is that the extra 64 MB must go through the main chiplet below it in order to be output to main memory. .. This is a noteworthy additional power draw. I am very interested in seeing how the memory profile from a core perspective is output in this additional chiplet and how AMD integrates it into the structure. Unfortunately, it’s not as flashy as persistent memory, which was a completely different design spirit, as AMD states it’s a SRAM-based design. Sticking to SRAM at least means that it can seamlessly provide performance gains.

In terms of performance, we found that the depth of the L3 cache improves game performance in both discrete and integrated games. However, increasing the depth of the L3 cache does not significantly affect performance. This is best illustrated in a review of Intel’s Broadwell processor with a 128 MB L4 cache (about 77mm2 at Intel 22nm), with the additional cache improving only the game and compression / decompression tests. It will be interesting to see how AMD sells non-gaming technologies.

Finally, interception into the mainstream-AMD states that it is ready to begin integrating technology into its high-end portfolio of production at the end of the year. AMD states that Zen 4 will be available in 2022 at 5nm. Based on previous timescales, AMD’s next processor family predicted a launch in approximately February 2022. It’s unclear at this point if it’s Zen 4, but Zen 4 is at 5nm and AMD is exhibiting this 3D V-Cache at 7nm. It’s unclear if AMD plans to monetize this feature at 7nm, or if it could combine a 5nm Zen4 chiplet with a 7nm 64MB cash chiplet. Combining the two isn’t too difficult, but AMD may want to push cache technology to a premium product than the Ryzen desktop. As technology rushes through the stack, you may see special one-off editions.

In conclusion, I have a few questions I would like to ask AMD. I hope to get some answers, and if I do, I circulate back in detail.

