By Dr. Phil Garrou, Contributing Editor
Most of us packaging focused technologists do not traditionally follow what’s being presented at the IEEE High Performance Computing Architectures Conference (HPCA)…but that’s why you follow IFTLE…i.e. to find such material. This year’s conference was held in Austin, TX the first week of Feb. What I want to focus on is the presentation by AMD on “Design and Analysis of an APU for Exascale Computing.”
If some of these concepts look familiar, check out IFTLE 323, “The New DARPA Program “CHIPS.”
The need for ever more computational power continues to grow and exaflop (1018 ) capabilities may soon become necessary. This paper presents the AMD vision for an exascale node architecture for exascale computing including low-power and high-performance CPU cores, integrated energy-efficient GPU units, in-package high-bandwidth 3D memory, die-stacking and chiplet technologies, and advanced memory systems.
Two of the building blocks for this exascale node architecture are (1) it’s chiplet-based approach that decouples performance-critical processing components like CPUs and GPUs from components that do not scale well with technology (e.g., analog components), allowing fabrication in individually optimized
process technologies for cost reduction and design reuse in other market segments and (2) the use of in-package 3D memory, which is stacked directly above high-bandwidth-consuming GPUs.
The exascale heterogeneous processor (shown below) is an accelerated processing unit (APU) consisting of CPU and GPU compute integrated with in-package 3D DRAM. The overall structure makes use of a modular “chiplet” design, with the chiplets 3D-stacked on other “active interposer” chips. “The use of advanced packaging technologies enables a large amount of computational and memory resources to be located in a single package”. The exascale targets for memory bandwidth and energy efficiency are incredibly challenging for off-package memory solutions. Thus AMD proposes to integrate 3D-stacked DRAM into the EHP package.
In the center of the EHP are two CPU clusters, each consisting of four multi-core CPU chiplets stacked on an active interposer base die. On either side of the CPU clusters are a total of four GPU clusters, each consisting of two GPU chiplets on a respective active interposer. Upon each GPU chiplet is a 3D stack of DRAM. The DRAM is directly stacked on the GPU chiplets to maximize bandwidth. The interposers underneath the chiplets provide interconnection between the chiplets along with other functions such as external I/O interfaces, power distribution and system management. Interposers maintain high-bandwidth connectivity among themselves by utilizing wide, short distance, point-to-point paths.
The performance requirements require a large amount of compute and memory to be integrated into a single package. Rather than build a single, monolithic system on chip (SOC), AMD proposes to leverage advanced die-stacking technologies to decompose the EHP into smaller components consisting of active interposers and chiplets. Each chiplet houses either multiple GPU compute units or CPU cores. The chiplet approach differs from conventional multi-chip module (MCM) designs in that each individual chiplet is not a complete chip. For example, the CPU chiplet contains CPU cores and caches, but lacks memory interfaces and external I/O.
While a monolithic SOC imposes a single process technology choice on all components in the system.
With chiplets and interposers, each discrete piece of silicon can be optimized for its own functions.
It is expected that smaller chiplets will have higher yield due to their size, and when combined with KGD testing, can be assembled into larger systems at reasonable cost (IFTLE note – this is yet to be proven).
It is expected that the decomposition (or disintegration as IFTLE prefers to call it) of the EHP into smaller pieces will enables silicon-level reuse of IP.
(note – this is one of the main drivers of the DARPA CHIPS program …see IFTLE 323)
Thermal Issues ?
EHP’s in-package DRAMs stay below the 85°C limit. The figure below shows the temperature difference in the bottom-most in DRAM die in a stack (in the package). They conclude that their use of aggressive die stacking should be thermally feasible with air cooling. However, “…more advanced cooling solutions may become necessary as the hit rate of the in-package DRAM improves, more power from the external memory is shifted to the EHP, or if a design point uses a greater per-node power budget.”
For all the latest in advanced packaging, stay linked to IFTLE…