Inside the Hybrid Memory Cube
The HMC provides a breakthrough solution that delivers unmatched performance with the utmost reliability.
Since the beginning of the computing era, memory technology has struggled to keep pace with CPUs. In the mid 1970s, CPU design and semiconductor manufacturing processes began to advance rapidly. CPUs have used these advances to increase core clock frequencies and transistor counts. Conversely, DRAM manufacturers have primarily used the advancements in process technology to rapidly and consistently scale DRAM capacity. But as more transistors were added to systems to increase performance, the memory industry was unable to keep pace in terms of designing memory systems capable of supporting these new architectures. In fact, the number of memory controllers per core decreased with each passing generation, increasing the burden on memory systems.
To address this challenge, in 2006 Micron tasked internal teams to look beyond memory performance. Their goal was to consider overall system-level requirements, with the goal of creating a balanced architecture for higher system level performance with more capable memory and I/O systems. The Hybrid Memory Cube (HMC), which blends the best of logic and DRAM processes into a heterogeneous 3D package, is the result of this effort. At its foundation is a small logic layer that sits below vertical stacks of DRAM die connected by through-silicon -vias (TSVs), as depicted in FIGURE 1. An energy-optimized DRAM array provides access to memory bits via the internal logic layer and TSV - resulting in an intelligent memory device, optimized for performance and efficiency.
By placing intelligent memory on the same substrate as the processing unit, each system can do what it’s designed to do more efficiently than previous technologies. Specifically, processors can make use of all of their computational capability without being limited by the memory channel. The logic die, with high-performance transistors, is responsible for DRAM sequencing, refresh, data routing, error correction, and high-speed interconnect to the host. HMC’s abstracted memory decouples the memory interface from the underlying memory technology and allows memory systems with different characteristics to use a common interface. Memory abstraction insulates designers from the difficult parts of memory control, such as error correction, resiliency and refresh, while allowing them to take advantage of memory features such as performance and non-volatility. Because HMC supports up to 160 GB/s of sustained memory bandwidth, the biggest question becomes, “How fast do you want to run the interface?”
The HMC Consortium
A radically new technology like HMC requires a broad ecosystem of support for mainstream adoption. To address this challenge, Micron, Samsung, Altera, Open-Silicon, and Xilinx, collaborated to form the HMC Consortium (HMCC), which was officially launched in October, 2011. The Consortium’s goals included pulling together a wide range of OEMs, enablers, and tool vendors to work together to define an industry-adoptable serial interface specification for HMC. The consortium delivered on this goal within 17 months and introduced the world’s first HMC interface and protocol specification in April 2013.
The specification provides a short-reach (SR), very short-reach (VSR), and ultra short-reach (USR) interconnection across physical layers (PHYs) for applications requiring tightly coupled or close proximity memory support for FPGAs, ASICs and ASSPs, such as high-performance networking and computing along with test and measurement equipment.
|FIGURE 1. The HMC employs a small logic layer that sits below vertical stacks of DRAM die connected by through-silicon-vias (TSVs).|
The next goal for the consortium is to develop a second set of standards designed to increase data rate speeds. This next specification, which is expected to gain consortium agreement by 1Q14, shows SR speeds improving from 15 Gb/s to 28 Gb/s and VSR/USR interconnection speeds increasing from 10 to 15–28 Gb/s.
Architecture and Performance
Other elements that separate HMC from traditional memories include raw performance, simplified board routing, and unmatched RAS features. Unique DRAM within the HMC device are designed to support sixteen individual and self-supporting vaults. Each vault delivers 10 GB/s of sustained memory bandwidth for an aggregate cube bandwidth of 160 GB/s. Within each vault there are two banks per DRAM layer for a total of 128 banks in a 2GB device or 256 banks in a 4GB device. Impact on system performance is significant, with lower queue delays and greater availability of data responses compared to conventional memories that run banks in lock-step. Not only is there massive parallelism, but HMC supports atomics that reduce external traffic and offload remedial tasks from the processor.
As previously mentioned, the abstracted interface is memory-agnostic and uses high-speed serial buses based on the HMCC protocol standard. Within this uncomplicated protocol, commands such as 128-byte WRITE (WR128), 64-byte READ (RD64), or dual 8-byte ADD IMMEDIATE (2ADD8), can be randomly mixed. This interface enables bandwidth and power scaling to suit practically any design—from “near memory,” mounted immediately adjacent to the CPU, to “far memory,” where HMC devices may be chained together in futuristic mesh-type networks. A near memory configuration is shown in FIGURE 2, and a far memory configuration is shown in FIGURE 3. JTAG and I2C sideband channels are also supported for optimization of device configuration, testing, and real-time monitors.
HMC board routing uses inexpensive, standard high-volume interconnect technologies, routes without complex timing relationships to other signals, and has significantly fewer signals. In fact, 160GB/s of sustained memory bandwidth is achieved using only 262 active signals (66 signals for a single link of up to 60GB/s of memory bandwidth).
|FIGURE 2. The HMC communicates with the CPU using a protocol defined by the HMC consortium. A near memory configuration is shown.|
|FIGURE 3.A far memory communication configuration.|
FIGURE 2. The HMC communicates with the CPU using a protocol defined by the HMC consortium. A near memory configuration is shown.
A single robust HMC package includes the memory, memory controller, and abstracted interface. This enables vault-controller parity and ECC correction with data scrubbing that is invisible to the user; self-correcting in-system lifetime memory repair; extensive device health-monitoring capabilities; and real-time status reporting. HMC also features a highly reliable external serializer/deserializer (SERDES) interface with exceptional low-bit error rates (BER) that support cyclic redundancy check (CRC) and packet retry.
HMC will deliver 160 GB/s of bandwidth or a 15X improvement compared to a DDR3-1333 module running at 10.66 GB/s. With energy efficiency measured in pico-joules per bit, HMC is targeted to operate in the 20 pj/b range. Compared to DDR3-1333 modules that operate at about 60 pj/b, this represents a 70% improvement in efficiency. HMC also features an almost-90% pin count reduction—66 pins for HMC versus ~600 pins for a 4-channel DDR3 solution. Given these comparisons, it’s easy to see the significant gains in performance and the huge savings in both the footprint and power usage.
HMC will enable new levels of performance in applications ranging from large-scale core and leading-edge networking systems, to high-performance computing, industrial automation, and eventually, consumer products.
Embedded applications will benefit greatly from high-bandwidth and energy-efficient HMC devices, especially applications such as testing and measurement equipment and networking equipment that utilizes ASICs, ASSPs, and FPGA devices from both Xilinx and Altera, two Developer members of the HMC Consortium. Altera announced in September that it has demonstrated interoperability of its Stratix FPGAs with HMC to benefit next-generation designs.
According to research analysts at Yole Développement Group, TSV-enabled devices are projected to account for nearly $40B by 2017—which is 10% of the global chip business. To drive that growth, this segment will rely on leading technologies like HMC.
|FIGURE 4.Engineering samples are set to debut in 2013, but 4GB production in 2014.|
Micron is working closely with several customers to enable a variety of applications with HMC. HMC engineering samples of a 4 link 31X31X4mm package are expected later this year, with volume production beginning the first half of 2014. Micron’s 4GB HMC is also targeted for production in 2014.
Future stacks, multiple memories
Moving forward, we will see HMC technology evolve as volume production reduces costs for TSVs and HMC enters markets where traditional DDR-type of memory has resided. Beyond DDR4, we see this class of memory technology becoming mainstream, not only because of its extreme performance, but because of its ability to overcome the effects of process scaling as seen in the NAND industry. HMC Gen3 is on the horizon, with a performance target of 320 GB/s and an 8GB density. A packaged HMC is shown in FIGURE 4.
Among the benefits of this architectural breakthrough is the future ability to stack multiple memories onto one chip. •
THOMAS KINSLEY is a Memory Development Engineer and ARON LUNDE is the Product Program Manager at Micron Technology, Inc., Boise, ID.