We recently pulled apart a Sapphire Radeon HD5750 graphics board, containing an AMD/ATI RV840 40-nm GPU, running at 700 MHz, and supported by eight Gb (1 GB) of Samsung GDDR5 memory. This card is a budget card, but the ATI chip still boasts 1.04 billion transistors, 720 stream processors and 36 texture units, can compute at ~1 TFLOPS with a pixel fill rate of 11 Gpixel/s, and can run memory at 1150 MHz with 74 GB/sec of memory bandwidth. I’m not a gamer, but those numbers are impressive to me!
When we started looking at the memory chips, and decoded the part number, we found that we had Samsung’s fastest graphics memory part, claimed to run at 5 Gbps. Graphics DRAMs are designed to run faster anyway, but 5 Gbps is three times faster than the fastest regular DDR3 (Double-Data Rate, 3rd Generation) SDRAM, which can do 1.6 Gbps.*
So what makes this one so blazing fast? Beginning with the x-ray, the difference between a Graphics DDR5 when compared with a 1Gb DDR3 (K4B1G0846F-HCF8) part starts to show up. If we look at an x-ray of the DDR3 chip, we can see that it has the conventional wire-bonding down the central spine:
|Plan-View X-ray of Samsung 1 Gb DDR3 SDRAM
When we compare the K4G10325FE-HC04 GDDR5 we can see first that it’s a flip-chip device (no wires), and if we squint hard enough we can also see that the bumps are distributed across the die as well as along the spine.
|Plan-view X-ray of Samsung 1 Gb GDDR5 Part from ATI Radeon
This is confirmed in the die photograph:
|Die Photo of Samsung 1 Gb GDDR5 SGRAM
Which compares with the die photo of the 1-Gb DDR3:
|Die Photo of Samsung 1 Gb DDR3 SDRAM
The die layout is clearly optimized to reduce RC delays from the memory blocks to the outside world. The next question for me is the nature of the flip-chip bonding; is it regular solder bumps or gold stud bumps? A cross-section solves that problem – solder, on plated-up copper lands.
|Cross-sectional Images of Samsung GDDR5 Chip in Package
A quick x-ray spectroscopy analysis tells us that the solder is silver-tin lead-free, confirming the package marking.
So the answer to our question is actually fairly obvious – lay out the die to reduce input/output line lengths, and thereby RC delays on the chip, and replace bond wires with bumps to minimize RC delays in the package. A nice exposition of basic principles used to optimize performance.
The next step would be to co-package the memory chips with the GPU to reduce lateral board delays, and we have seen that in products such as the Sony RSX chip in the PS3 gaming system. And after that, lay out the GPU for through-silicon vias – but that will be another story..
* At the time of writing!