TSMC HKMG is Out There!

I have to apologise for a hiatus in posting due to pressure from the day job, but this week is Semicon West week, so it seems appropriate to announce that we’ve started analysing TSMC’s 28-nm gate-last HKMG product, in this case a Xilinx Kintex-7 FPGA, fabbed in TSMC’s HPL process.

Having seen two generations of Intel’s HKMG parts (the 45-nm Xeon and 32-nm Westmere) using gate-last technology, it’s inevitable that we’ll compare those with the TSMC process.

The Kintex family is the mid-range group in the latest 28-nm generation 7-series of FPGAs from the company. These are optimised for the highest price/performance benefit, giving the performance of the previous Virtex-6 parts at half the price.
The Kintex-7 has eleven layers of metal (Fig. 1); the 1x layers run from metals 1-4, with a pitch of ~96 nm, the smallest we have ever seen.
Fig. 1 General Structure of Xilinx Kintex-7

Contacted gate pitch is ~118 nm in our initial analysis, with minimum gate length of ~33 nm, though since this is replacement gate there is no way of knowing absolutely the original poly gate width, which defines the source/drain engineering.

Plan-view imaging (Fig. 2) indicates that TSMC has implemented the restricted design rules that have been much discussed in the gate-first/gate-last debate. Regular, uni-directional patterning of functional gate and dummy gate lines helps out the lithography, but inevitably reduces packing density compared with Manhattan layout schemes.
Fig.2 Plan-View Image of Gates and Active Silicon
By the look of it, double patterning with a gate plus a cut mask has been used. FPGAs are usually laid out in a more relaxed manner than dense logic, so here we can see lots of dummy gates, and also dummy active regions.

The gate structure itself definitely has some similarities with Intel’s 45-nm, as we can see from figures 3 and 4.
Fig.3 Intel 45-nm (left) and TSMC/Xilinx 28-nm NMOS Transistors
Fig.4  Intel 45-nm (left) and TSMC/Xilinx 28-nm PMOS Transistors

In both it appears that the buffer oxide, the high-k layer and a common work-function material are put down before the sacrificial polysilicon gate. Then the source/drain engineering is performed, and dielectric stack deposited and planarized back to the polysilicon; and the sacrificial gate is removed, and the NMOS/PMOS gate stacks are put in and planarized.

Of course there are also differences – TSMC is not using embedded SiGe for PMOS strain, and there is an additional high-density metal layer in the PMOS gate. There is also no distinct dielectric capping layer in the TSMC structure, and there is an extra sidewall spacer (likely part of the source/drain tuning). The wafers are also rotated to give a <100> channel direction.
Intel stated that they applied stress to NMOS devices using the gate metal stack and the contacts; TSMC could be doing the same, although the contacts are spaced further from the gate edge. If there is PMOS stress, the mechanism is unclear, though it is possible that the extra high-density layer in the gate could be for that purpose. However, this part is fabbed in the HPL low-power process, and typically we do not see e-SiGe in such processes.
Analysis is ongoing – more details to come, and possibly a comparison with the AMD Llano gate-first HKMG part, that’s in our labs at the moment.

N.B. We are at Semicon West, at Booth 2337 – drop by and get a coupon for a free die photo!

Intel Goes Tri-Gate at 22-nm!

In a pair of press and analyst briefings this morning, Mark Bohr and Steve Smith announced that Intel will indeed be using a 3D transistor structure for their 22-nm product, settling one of the big questions about Intel’s process development over the last few years – do they stay planar or not? (And, incidentally, settling a bet between me and Scott Thompson – Scott wins!)

The big debate at IEDM last year about advanced CMOS was whether transistor structures would move to a 3D structure (finFET, tri-gate, whatever label you choose), or use ultra-thin SOI layers to attain fully depleted operation. The debate was not resolved – I was definitely left with the impression that the adherents in both camps held to their opinions, which probably means we will have two process groupings, much as we have with the gate-first/gate-last high-k/metal gate (HKMG) structures.

Intel have come down on the side of tri-gate – apparently the decision was taken in 2008, after their researchers had showed that the gate-last HKMG gate structure would work in 3D, and that the planar version could not give enough of a performance boost. So for the last three years they’ve been developing the process and getting it manufacturable for the production of the Ivybridge product line later this year.

Intel’s Research and Development Sequence to Reach the Tri-Gate 22-nm Node

I may have lost the bet about planar, but my gut feel that their HKMG process could be extended to 22-nm seemed to be right, since Mark confirmed that they are using gate-last (replacement gate) technology, with evolutions of existing NMOS and PMOS strain technology. Immersion lithography and double patterning will be used where necessary, and no extra mask layers are needed so the additional cost is only 2 – 3%. And apparently it’s scalable to 14 nm!

The schematic below shows a gate formed on three sides of three fins, to give more drive strength than available from one fin:

Schematic of Tri-Gate Across Three Fins (Source – Intel)

When translated to gate-last HKMG, it looks like this in this Intel image from 2007 (the section is through three gates, with three fins buried under oxide running across the field of view):

Gate-Last HKMG Tri-Gate Transistors (Source – Intel)

And now a new image from today’s briefing, showing an array of transistors with six fins in the centre, and some with two fins at the top right and bottom left:

Intel Tri-Gate Transistors (with STI and gate mold oxide removed) (Source – Intel)

Clearly this means a whole new set of design and layout paradigms, and we can see evidence here of double patterning using fin and gate masks, with cut masks to define the individual fins and gates.

During the briefings, Mark also scotched the rumour that appeared a few weeks ago about a hybrid process, where the SRAM is tri-gate and other areas are planar – all of the chip area will be tri-gate. In addition a parallel SoC process is being developed so that the Atom line of products can be extended to 22-nm.

For us commentators, going tri-gate was always a possibility for Intel; they have been publishing papers on the topic for almost ten years, with a flurry of them five years ago – here’s an image from a press briefing in 2006:

TEM Image of HKMG Tri-Gate Transistor, Sectioned Through the Fin(Source – Intel)

Their R-D-M (Research-Development-Manufacturing) methodology has been well established for quite a while now, and enabled them to keep to their schedule of a new process generation every two years. Based on comments today, we can expect to see 22-nm production in the second half of this year, and product on the shelves in the New Year.

Then we’ll see what it really looks like!

A Shameless Plug for ASMC

Winter is finally starting to fade in Ottawa, and the early signs of spring are showing. The maple sap is running, the first migrant birds have arrived, the frogs are peeping, and we have evening daylight. On the conference calendar, spring means that ASMC (IEEE/SEMI Advanced Semiconductor Manufacturing Conference) is on the horizon, this year in Saratoga Springs, New York on May 16 -18. There, spring should be well advanced, and it will be a great time of year to visit the Empire State.

As the name says, ASMC is an annual conference focused on the manufacturing of semiconductor devices – in this it differs from other conferences, since the emphasis is on what goes on in the wafer fab, not the R&D labs, and the papers are not exclusively research papers.

I’m plugging ASMC because it seems to be one of the more under-rated conferences, unlike IEDM and the VLSI symposia, which get the media attention for leading-edge R&D and processes. However, it’s the nitty-gritty of manufacturing in the fab that gets the chips out of the door, and this meeting discusses the work that pushes the yield and volumes up and keeps them there.

I always come away impressed by the quality of the engineering involved; not being a fab person myself any more, it’s easy to get disconnected from the density of effort required to equip a fab, keep it running and bring new products/processes into production. Usually the guys in the fab only get publicity if something goes wrong!

This year, in addition to the 50-plus papers, there are keynotes from Norm Armour of GLOBALFOUNDRIES (GloFo), Gary Patton of IBM, and Peter Wright of Tradition Equities, as well as a panel discussion on partnerships in semiconductor manufacturing, moderated by Dave Lammers. There are also tutorials, on 3D (by James Lu of Rensselaer Poly), and EUV (by Obert Wood of GloFo), and an invited session of ISMI papers.

The technical sessions include:

  • Factory Optimization
  • Advanced Metrology
  • Advanced Equipment, Materials and Processes
  • Advanced Process Development and Control
  • Advanced Lithography
  • Defect Inspection and Yield Optimization
  • Data Management

Of course, I’m biased to some extent because we’ll be giving a paper there again. I can’t make it this year, but a colleague of mine, Ray Fontaine, is presenting on "Recent Innovations in CMOS Image Sensors". This will be the seventh year running we’ve given a paper, the manufacturing and equipment engineers that attend seem to like seeing what their competitors are doing. In this case Ray will run through some of the changes in the camera chips that we all take for granted in our phones these days.

Other papers that caught my eye may give us some clues as to what to expect in the lithographic field; the IBM/Glofo/Toshiba alliance has one on contact patterning strategies (paper 6.3), and another cooperative paper by IBM/JSR/KLA Tencor/Tokyo Electron on double patterning (6.1), and an IBM/ASML contribution on advanced overlay control (2.5). And on the materials processing side, there are three papers on low-k dielectrics from GloFo/KLA Tencor (2.4), UAlbany/Air Liquide (3.5), and Novellus (poster in session 4); and a couple on nickel silicide by GloFo (5.3) and Ultratech (poster in session 4); and a clue to the mysteries of high-k dielectrics from UMC/National Cheng Kung U (3.4).

More stategically aimed discussions are by Infineon (1.1) on the challenges in having a global supply chain, Sumita Bas of Intel will be speaking on sustainable/green in the chip business (1.3), and two talks by SEMATECH, one on 450 mm manufacturing (ISMI session), and the other on 3D/TSV manufacturing (3.1).

Out of the conference room, there’s a poster session and reception on the Monday evening, and on the Tuesday, Dave Lammers’ panel session, "Models for Successful Partnerships in Semiconductor Manufacturing". Partnership is one of the buzzwords in chipmaking these days, and the panelists we have should know it well; Ari Komeran from the industry development side of Intel, Michael Fancher from Albany, Olivier Demolliens, head of LETI-NANOTEC in France; and Dr Walid Ali, from ATIC in Abu Dhabi.

After the panel session, what could be a highlight of the conference, a tour of the Luther Forest Technology Campus, including a look at GLOBALFOUNDRIES (Norm Armour’s) new Fab 8, followed by a reception at the Canfield Casino.

Register soon – rates go up on May 8th!

Panasonic Gate-First HKMG also First Out of the Gate

As I suggested a few months ago, we put some credence in Panasonic’s press release last September that they would be shipping their first 32-nm HKMG parts last October. Samsung had announced their Saratoga chip, and both Altera and Xilinx have displayed silicon from TSMC, but until last Friday (18 March), none have said that they were shipping product. As of Friday Xilinx announced that they were shipping their Kintex-7 product, the first of their 7-series of FPGAs.

Earlier this month our faith in Panasonic was rewarded, and we found the chip! It took a few false starts buying Panasonic products that we tore down and threw away, but now we have a verified 32-nm, gate-first, high-k metal-gate (HKMG) product. The supply chain was a bit longer than we had hoped, but as promised the chip was shipped with a week 41 date code, in October.

So, for the curious, this is what a transistor looks like:

Panasonic’s 32-nm HKMG NMOS Transistor

We can see the TiN metal gate at the base of the polysilicon, and the thin line of high-k at the base of the TiN. Also noticeable are a dual-spacer technology (sometimes referred to as differential offset spacers), and a thin line of nitride over the source/drain extension regions (possibly indicating a nitrided oxide under the high-k). The salicide is the usual platinum-doped nickel silicide. Less visible are mechanisms of applying strain, other than the nitride layer over the gate; embedded SiGe and dual-stress liners are not used.

All of which is typical for Panasonic – their 45-nm product did not appear to use any enhanced strain techniques, and the only concession to PMOS enhancement was wafer rotation to give a 1-0-0 channel direction. The emphasis is different from Intel; rather than raw performance, the targets are increased integration, die size reduction/reduced cost, and now we have high-k, reduced leakage/lower power. The September press release does say that transistor performance is improved by 40%, but it also claims 40% power reduction and a 30% smaller footprint.

Here’s a 45-nm transistor for comparison:

Panasonic’s 45-nm Generation Transistor

And, for good measure, Intel’s 32-nm device:

Intel 32-nm NMOS Transistor

The part itself uses a nine-metal (eight Cu, one Al) stack with a hybrid low-k/extra-low-k stack. Die size is ~45 mm2 in a conventional FC-BGA package. Minimum metal pitch is specified as 120 nm [1], and we have found 125 nm in our early investigations.

Panasonic 32 nm General Structure

Analysis is ongoing – stay tuned for more details, and of course we’ll be doing reports!

[1]S. Matsumoto et al., Highly Manufacturable ELK Integration Technology with Metal Hard Mask Process for High Performance 32nm-node Interconnect and Beyond”, IITC 2010

Apple’s A5 Processor is by Samsung, not TSMC

Forty-eight hours ago we obtained an iPad 2 and brought it back to the lab, and took it apart to have a look at Apple’s A5 processor chip. We’ve come to the conclusion that the main innovation in the new iPad is the A5 chip. Flash memory is flash memory (multi-sourced from Samsung and Toshiba in the iPads we’ve seen), the DRAM in the A5 package is 512 MB instead of 256 MB, and the touchscreen control uses the same trio of chips as the iPad 1 – not even a single chip solution as we’ve seen in the later iPhones. And the 3G version uses the same chipset as the Verizon iPhone launched a few weeks ago. This is the mother-board from a 32-GB WiFi-only iPad 2:

Motherboard from 32-GB iPad 2

The A5 can be seen in the centre of the board. If we look at the package we can identify the Apple’s APL0498 marking for the A5 (the A4 is APL0398), and also 4 Gb of Elpida mobile DRAM. Date codes are 1107 for the A5 and 1103 for the memory – only a few weeks in the supply chain here!

Apple A5 from iPad 2

The x-ray images show us that we have the usual package-on-package (PoP) structure, with two memory chips in the top part of the PoP, and the APL0498 processor on the lower half.

X-Ray Image of A5 Package-on-Package

The two rows of dense black dots on the outside of the image are the solder balls from the memory chips in the top half of the package (connecting with the bottom half), and the less dense dots are the solder balls on the bottom half of the package connecting the A5 chip to the iPad board below. If you squint really hard you can see smaller dots about five rows in from the edge which are the flip-chip solder balls on the A5 die – and they take up quite a large proportion of the area, showing that this is a good-sized die.
The die photo and die mark are shown here:
Die Photo of Apple’s A5 Chip from the iPad 2
APL0498E01 Die Mark of Apple A5 Chip

The x-ray is right – the A5 die is more than twice as large as the A4, at 10.1 x 12.1 mm (122.2 mm2), vs 7.3 x 7.3 mm (53.3 mm2) – here’s the A4 chip for comparison:
Apple A4 Die Photo

Given that the A5 is a dual-ARM core, and has more graphics capability than the A4, more than doubling the size is to be expected, but it’s also a clue that this is still made in 45-nm technology.
So after the web speculation that TSMC might be fabbing the A5 rather than Samsung, we had to take a look, and the quickest way is to do a cross-section and compare it with the A4 from last year’s iPad.
So here’s the A5:
SEM Cross-Section of Apple A5
It’s a nine-metal layer part, with eight levels of copper and one aluminum. Zooming into the transistor level:
SEM Cross-Section of Transistors and M1 in A5 Processor
And now the A4:

SEM Cross-Section of Transistors and M1 – M4 in A4 Processor

At this scale even electron microscopes start to run out of steam, so not the clearest of images in either case, but good enough to see the similar shape of the transistor gates and the dielectric layers. So at least this sample of the A5 is fabbed by Samsung, just as all Apple’s processor chips have been for the last while.

Many thanks to the guys in the lab who’ve worked through the weekend to get this information – Chipworks is not really in the media business, but there’s always a buzz when a hot new consumer part comes out.

And on a different note, commiserations and condolences to our Japanese colleagues, they have much more important things of concern than the details of the iPad 2.

How to Get 5 Gbps Out of a Samsung Graphics DRAM

It’s well known that electronics games buffs like their image creation as realistic (or at least as cinema-like) as possible, which in image-processing terms means handling more and more fine-grained pixel data as fast as possible. That means more and more stream processors and texture units in the graphics processor to handle parallel data streams, and faster and faster memory to funnel the data in and out of the GPU.

We recently pulled apart a Sapphire Radeon HD5750 graphics board, containing an AMD/ATI RV840 40-nm GPU, running at 700 MHz, and supported by eight Gb (1 GB) of Samsung GDDR5 memory. This card is a budget card, but the ATI chip still boasts 1.04 billion transistors, 720 stream processors and 36 texture units, can compute at ~1 TFLOPS with a pixel fill rate of 11 Gpixel/s, and can run memory at 1150 MHz with 74 GB/sec of memory bandwidth. I’m not a gamer, but those numbers are impressive to me!

When we started looking at the memory chips, and decoded the part number, we found that we had Samsung’s fastest graphics memory part, claimed to run at 5 Gbps. Graphics DRAMs are designed to run faster anyway, but 5 Gbps is three times faster than the fastest regular DDR3 (Double-Data Rate, 3rd Generation) SDRAM, which can do 1.6 Gbps.*

So what makes this one so blazing fast? Beginning with the x-ray, the difference between a Graphics DDR5 when compared with a 1Gb DDR3 (K4B1G0846F-HCF8) part starts to show up. If we look at an x-ray of the DDR3 chip, we can see that it has the conventional wire-bonding down the central spine:

Plan-View X-ray of Samsung 1 Gb DDR3 SDRAM

When we compare the K4G10325FE-HC04 GDDR5 we can see first that it’s a flip-chip device (no wires), and if we squint hard enough we can also see that the bumps are distributed across the die as well as along the spine.

Plan-view X-ray of Samsung 1 Gb GDDR5 Part from ATI Radeon

This is confirmed in the die photograph:

Die Photo of Samsung 1 Gb GDDR5 SGRAM

Which compares with the die photo of the 1-Gb DDR3:

Die Photo of Samsung 1 Gb DDR3 SDRAM

The die layout is clearly optimized to reduce RC delays from the memory blocks to the outside world. The next question for me is the nature of the flip-chip bonding; is it regular solder bumps or gold stud bumps? A cross-section solves that problem – solder, on plated-up copper lands.

Cross-sectional Images of Samsung GDDR5 Chip in Package

A quick x-ray spectroscopy analysis tells us that the solder is silver-tin lead-free, confirming the package marking.

So the answer to our question is actually fairly obvious – lay out the die to reduce input/output line lengths, and thereby RC delays on the chip, and replace bond wires with bumps to minimize RC delays in the package. A nice exposition of basic principles used to optimize performance.

The next step would be to co-package the memory chips with the GPU to reduce lateral board delays, and we have seen that in products such as the Sony RSX chip in the PS3 gaming system. And after that, lay out the GPU for through-silicon vias – but that will be another story..

For those with an interest in the memory interface circuitry in the RV840, my colleague Randy Torrance has posted a discussion on the Chipworks blog.

* At the time of writing!

Samsung’s 3x DDR3 SDRAM – 4F2 or 6F2? You Be the Judge..

We recently acquired Samsung’s latest DDR3 SDRAM, allegedly a 3x-nm part. When we did a little research, we found that the package markings K4B2G0846D-HCH9 lined up with a press release from Samsung last year about their 2 Gb 3x-nm generation DRAMs. My colleague at Chipworks, Randy Torrance, popped the lid to take a look, and drafted the following discussion (which, amongst other things, raises the perennial question for us reverse engineers – how do you define a process node in real terms?). Now read on..

The first thing we did was measure the die size. This chip is 35 sq mm, compared to the previous generation 48-nm Samsung 1Gb DDR3 SDRAM, which is 28.6 sq mm. Clearly this 2 Gb die is much smaller than 2X the 48-nm 1 Gb die, so our assumption that we have a 3x nm part looks good so far.

Die Photo of Samsung 3x DDR3 SDRAM

Next we did a bevel-section of the part to take a look at the cell array. We were surprised with what we found. The capacitors are laid out in a square array instead of the more usual hexagonal pattern (see below), and the wordline (WL) and bitline (BL) pitches are both about 96 nm. The usual method of determining DRAM node is to take half the minimum WL or BL pitch. That places this DRAM at the 48-nm process node, the same as the previous Samsung generation of 48 nm. So why does the die size look like it should be a smaller technology? For this we need to look at cell size.

Plan-View TEM image of Capacitors in Samsung 3x-nm SDRAM

But before we get into that we should discuss the DRAM convention of describing the memory cell size in terms of the minimum feature size, F. Historically, DRAM cells have used an 8F2 architecture for many years. This allows for the use of a folded bitline architecture, which helps reduce noise. In order to decrease cell area, companies came out with the first 6F2 cells in 2007; this 6F2 architecture is now used by all major players in the DRAM market. The guys at ICInsights published the plot below in the latest McLean report which nicely illustrates the progress:

DRAM Cell Size Reduction Through the Years
The 48 nm SDRAM has a cell size of ~0.014 sq µm. This new SDRAM has a cell size of 0.0092 sq µm. Clearly this cell is much smaller than the 48 nm generation. If we take the half-WL pitch as the minimum feature size (F), we get an F of 48 nm for this process. The cell area of 0.0092 sq µm is exactly 4 x F, squared, 4F2. Is this the world’s first 4F2 cell? From this point of view it certainly appears so. The cell is four times the size of the minimum feature, squared. But, there are other ways of looking at this.
A 4F2 architecture is defined as having a memory cell at each and every possible location, that being each and every crossing of WL and BL, with the cell being 2F x 2F. This is in fact what we see on this Samsung DRAM, so maybe we are looking at the first 4F2 architecture. But let’s look just a bit closer to be sure.

We compared the poly and active layout under the array between the 48 nm SDRAM and this new one. The images are shown below. As can be seen, both have very similar layouts. The angle of the active silicon (diffusion) direction is about the same. The active areas are ovals. Each diffusion has two wordlines crossing it. There is a gap between all the active areas, such that a third WL does not cross active on this diagonal active direction.

Samsung K4B1G0846F 48nm 1 Gb DDR3 SDRAM,
Poly and Active Area Image under Cell Array

Samsung K4B2G0846D 2Gb DDR3 SDRAM,
Poly Remnants and Active Areas under Cell Array
This new DRAM clearly has a very similar cell layout to the previous one. In both cases the wordlines do not have a transistor under them at every possible location that a transistor would fit. Rather, one of every three possible transistor locations is filled with a break in the diffusion stripe. This is really a better definition of a 6F2 cell, since in a 6F2 architecture 2/3 of the WL/BL intersections are filled with storage cells. As we noted above, a 4F2 cell really should have transistors at every possible transistor location.

When we look at the pitch of the diffusions in this new DRAM, we see it is much tighter. In fact, along the WL direction the diffusion pitch is now 64 nm, whereas in the 48 nm SDRAM this pitch was 96 nm. So if you take half the minimum pitch in the chip as the node, this is a 32-nm part (ITRS 2009 still defines F as half the contacted M1 pitch, which would be 48 nm).

So, do we have a 32 nm node, and a 6F2 architecture? Maybe. The only issue is that if we use 32 nm as F, then when we plug that into the 6F2 equation we get 0.0061 um2 as the cell size. However, the cell size is actually 0.0092 um2. If we use that number and use the equation to calculate F we find that F=39nm. Soâ??¦ do we call this a 32 nm or a 39 nm node? It depends how you calculate it – either way it’s a 3x!

So, although it’s a little disappointing that I don’t think we can announce the worlds first 4F2 DRAM, we can announce the worlds smallest node, 32 or 39 nm, production 6F2 DRAM.

Samsung have had to put in a few process tweaks to squeeze the cells into the much smaller area, mostly at the transistor and STI level. We’re still looking at it, so we may not have the whole story yet, but some of what we’ve seen so far is:
• Ti-? (likely TiN)-gate buried wordline transistors
• STI filled with nitride in the array
• Bitlines at the same level as peripheral transistors
Our up-coming reports will give many more details on this fascinating part.

Common Platform Goes Gate-Last – at Last!

At the IBM/GLOBALFOUNDRIES/Samsung Common Platform Technology Forum on Tuesday, Gary Patton of IBM announced that the Platform would be moving to a gate-last high-k, metal-gate (HKMG) technology at the 20-nm node.

At the 45- and 32-nm nodes there has been a dichotomy between gate-last as embodied by Intel, TSMC, and UMC, and gate-first, promoted by the Common Platform and others such as Panasonic. (Though, to be realistic, Intel’s is the only HKMG we’ve seen so far, and the only 32-nm product.)

The split puzzled me a bit, at least for high-performance processes, since Intel have clearly shown that for PMOS, compressive stress using embedded SiGe source/drains is a really big crank that is enhanced by removal of the dummy polysilicon gate in the gate-last sequence. In fact, in their 32-nm paper at IEDM 2009 [1], the PMOS linear drive current exceeds NMOS, and the saturated drive current (Idsat) is 85% of NMOS. This trend is shown below:

Intel Drive Currents at the Different Nodes [1]

 We can clearly see the narrowing between NMOS and PMOS drive currents at the 45-nm node, namely with the start of replacement gate (gate-last) technology.

So it seems obvious that to have high-performance PMOS, gate-last is the way to go; admittedly IBM and their allies have been using compressive nitride for PMOS, which Intel never have (at least to my knowledge), but there are limitations to that – now that contacted gate pitch has shrunk to less than 200 nm, there is not much room to get the nitride close to the channel – a problem that will increase with further shrinks.

So in a way it’s not surprising that the Platform has made the change; nitride stress is running out of steam, and gate replacement offers improved compressive stress for PMOS, and other stress techniques for NMOS (Intel builds some stress in with the gate metal).

Gary Patton said that IBM have been evaluating gate-last in parallel with gate-first since 2001, and it’s logical that they and their partners should. Both GLOBALFOUNDRIES and Samsung have published on gate-last, so there has been some evidence of checking out the parallel paths.

GLOBALFOUNDRIES PMOS and NMOS (right) Gate-Last Transistors [2]


Samsung Gate-Last Transistor [3]

Patton said that they selected gate-first in 2004; judging by their papers, Intel took their decision in 2003. The rationale that he put forward for the change to gate-last involved four points:

  • Density – gate-first has higher density, since gate-last requires restricted design rules (RDRs). That prevents orthogonal layout, requiring local interconnect; but at 20-nm RDRs are needed for lithography, so that advantage disappears.
  • Scaling – it’s easier to scale without having to cope with RDRs; at 20-nm there’s no choice.
  • Process simplicity – it’s obviously easier to shrink if you can keep the same process architecture, whether it be to 32- or 20-nm
  • Power/performance – the gate last structure allows strain closer to the channel, increasing performance; but fully contacted source/drains increase parasitic capacitance, slowing things down. According to Patton these net each other out for a high-performance process, making the gate first/last decision neutral. For low-power processes, strain is not used at the 45/32-nm nodes, so gate-first gives better power/performance metrics.  At 20-nm strain has to be used for low-power, and with the need for RDRs and local interconnect, the balance shifts in favour of gate-last.

So it appears that for the Platform the equation between pure transistor performance, process convenience, and power/performance made gate-first the choice at 45/32/28-nm, but at 20-nm the balance changes to make gate-last the way to go. That was likely influenced by the adoption of immersion lithography between 65- and 45-nm, which reduced the need for RDRs.

Intel presumably did similar sums during their 45-nm development, and figured that using RDRs would save them the cost of going to wet lithography at that node, and at the same time adopting gate-last technology would give them a manufacturing advantage. (My speculation is that they had also concluded that their version of gate-last may be more complicated to start up, but would prove to be more manufacturable than struggling with the instabilities that seem to go with the gate-first work-function materials. I guess they’ve proved that!)

Interestingly, now that Intel is using immersion lithography at 32-nm, they have loosened up on the RDRs, there’s more flexibility in the layout than there appeared to be at 45-nm.

I have to congratulate the Common Platform marketing guys on putting up a live webstream of the Technology Forum – I couldn’t get to the event itself, so wouldn’t have been able to comment without it. The stream will be available until April 29, so if you want to see Gary Patton for yourself, you can.

Screen Shot of Gary Patton of IBM at the Common Platform Technology Forum

Unfortunately, talking to my journalist colleagues, no slide sets were available, even at the press conference, so watching the stream occasionally leaves you puzzled as to what’s being talked about; and as you can see from the screen shot above, the room screens were carefully blanked out for the camera. Also, the breakout sessions in the afternoon were not streamed, or if they were, not recorded for later viewing. Still, kudos to the Platform for the live stream we did have, and the pre-recorded panel sessions!

From Gary’s and other comments at the Forum, it’s clear that the first HKMG products will be launched at 32-nm, and 28-nm will be following along fairly soon after. We can’t wait to see some!

For those waiting for more details if last year’s IEDM, I will finish my review; there were 36 sessions with 212 papers, so not a small task to do conscientiously, the Christmas break interrupted things, and there have been distractions since (like the Forum!), but I will get there!


  1. P. Packan et al., High Performance 32nm Logic Technology Featuring 2nd Generation High-k + Metal Gate Transistors, IEDM 2009, paper 28.4, pp. 659 – 662
  2. M. Horstmann et al., Advanced SOI CMOS Transistor Technologies for High-Performance Microprocessor Applications, CICC 2009, paper 8.3, pp.149 – 152
  3. K-Y. Lim et al., Novel Stress-Memorization-Technology (SMT) for High Electron Mobility Enhancement of Gate Last High-k/Metal Gate Devices, IEDM 2010, paper 10.1, pp. 229 – 232

IEDM 2010 Retrospective – Part 1

The International Electron Devices Meeting started its 56th session last week on Sunday in San Francisco. This year the program appears to more academic than in previous years, and this was confirmed by the conference chair in his opening address – only 145 submissions out of a total of 555, an all-time low as a percentage. Attendance was guesstimated at ~1500, again lower than earlier years on the west coast. On the other hand, the atmosphere is noticeably more upbeat than last year, and there are plenty of industry attendees.

Sunday was short course day, well attended with ~580 participants. There were two courses, “15nm CMOS Technology”, and “Reliability and Yield of Advanced Integrated Technologies” – I sat in on the reliability session and brought myself up to date on the issues now that we’re into the deep nanometer era. The European weather had its effects, the chair Guido Groeseneken was stuck in Amsterdam due to snow, and Werner Weber had to take over. So far Europe has had a worse winter than I’ve had to cope with in Canada!

The course had some useful stuff for me, not being involved in reliability – it’s not something we need to worry about when we take stuff apart! We had a good review of time-dependent breakdown and n- and p-BTI by Ben Kaczer of IMEC; some interesting new analytical work on changes in low-k dielectrics from Shinichi Ogawa; a surprisingly optimistic review of ESD techniques by Christian Russ of Infineon (apparently strain can actually improve ESD performance!); and the day was rounded out by consecutive reviews of different approaches to reliability by design mitigation by Ashraf Alam (Purdue) and Andrzej Strojwas (PDF Solutions).

On the 15-nm course, the gossip I heard was that folks were pleasantly surprised that there is a roadmap to get there. Tom Skotnicki convinced people that the thin SOI/thin BOX solution will work better than finFETs – at least he didn’t get snowed in!

Monday morning we got into the plenary session, starting with Kinam Kim of Samsung. He started off by predicting that DRAM will get into the 10-nm generation, though not for another ten years, by using new variants of MIM stacked capacitors, and evolving through buried wordlines to vertical access transistors with buried bitlines.

Then he moved on to flash, detailing the problems created by a shrinking number of electrons on the floating gate, the increasing aspect ratio of the gate stack, and the inability to scale the dielectrics. We’ll still get to the 1x node, but after that 3D cell structures will appear, likely with charge-trapping technology. We had a brief reference to ReRAM as universal memory (though as the Scots say, I ha’e ma doots), but it’ll be a while before we get there.

Then we moved into logic, with the many variants possible below 20-nm – finFETs, hybrid chips with III-V devices on silicon, graphene, etc, and a quick run-through of the various stacking options such as package-on-package and (of course) TSVs; the latter was apt in the context of the day’s announcement of an 8-GB DIMM using TSVs.

The second plenary talk was equally interesting in pointing up the actual and potential use of semiconductors in making electrical consumption more efficient, from generation through transmission to end usage. Examples given were whole-wafer thyristors used for switching HVDC lines (apparently DC transmission is much more efficient than AC, and there’s a 1400-km, 800KV line in China), and at the other end of the scale a server power supply with 99% efficiency.

Schematic (top) and Image of Whole-Wafer Laser-Triggered Thyristor Switch (Source: Infineon/IEDM)

The afternoon memory session started off with Samsung’s 27-nm NAND flash paper (5.1). It amazes me every time that we see a new generation of NAND flash that the cell is essentially a shrink of the classic control gate/floating gate structure, even though we’re now counting electrons.

That’s what we have here:

Samsung 27-nm (left, source: Samsung/IEDM) and 35-nm NAND Flash Gate Structures

I’ve included an image of the 35-nm cell for comparison, to show the essentially similar structures, control gate/wordline (CG) on top, and floating gate (FG) below. Below is an orthogonal section along the line of the control gate, again with the 35-nm part for comparison.

Section Parallel to Control Gate of 27-nm (left, source: Samsung/IEDM) and 35-nm NAND Flash

The main difference we can see, apart form the dimensional shrink, is the increase in the aspect ratio of both gates. This deliberate, to maintain the coupling ratio between the control gate and floating gate, and also the resistivity of the wordline to minimize RC delay (8 ohm/sq is quoted).

One of the changes discussed in the paper is a novel tunnel oxidation process (i.e. between FG and substrate) that conserves the boron doping in the channel and raises the Vt by ~0.5V. This tweak is useful for a number of performance considerations:

  • the reduced Vt shift between a programmed state and an un-programmed state helps reduce the capacitance linkage between adjacent floating gates
  • it improves endurance by reducing the fringing field between the top corner of the active silicon and the control gate, where it comes down close to the substrate between floating gates – during programming the high voltage across this gap can cause tunneling to the CG, which can degrade the tunnel oxide and affect endurance.
  • it improves the data retention by reducing charge leakage off the floating gate

This “novel tunnel oxidation” is not described in detail, but if we blow up their somewhat fuzzy TEM image, and compare again with the 35-nm chip, it looks as though the tunnel oxide has been nitrided.

Tunnel Dielectrics of 27-nm (left, source: Samsung/IEDM) and 35-nm NAND Flash

A question was asked at the end of the paper about the novel oxidation, and of course the presenter didn’t give a direct answer, but an implant step was mentioned; a locking implant for the boron seems likely.

I’ve gone into more detail about this paper than is probably sensible in a blog, but it was the first paper of the regular sessions, and the detail of getting 64Gb of cells that work onto one die can’t help being fascinating to a process geek like me.

Next up (5.2) was Micron’s 25-nm flash, which they announced almost a year ago. A different take on the same challenges, Micron have used air-gap technology to mitigate the capacitance linkage between adjacent gates, and adacent bitlines (see below).

"Air Gaps" in Micron 25-nm NAND Flash (Source: Micron/IEDM)

Essentially they seem to have optimized the uneven fill that we have often see in similar structures (e.g. the Samsung part above), and of course “air-gap” is a bit of a misnomer – I presume it’s actually a vacuum with whtever trace gases are in the deposition chamber when the dielectric is formed.

They also illustrated that at the 25-nm node, we’re down to about ten electrons on the floating gate for a 100 mV Vt shift, so in a typical MLC cell with 300 – 500 mV separation between levels, that’s about 30 – 50 electrons difference. With this degree of sensitivity, any traps in the stack can affect the Vt, so considerable effort has been taken to minimize trapping and charge leakage.

Electrons required for a 100mV Vt shift vs. cell feature size. 
(Source: Micron/IEDM)

Micron also highlighted the sensitivity to boron concentration in the channel – instead of counting electrons, we’re counting atoms – at 25 nm we’re down to ~75 atoms, with a 3Ï?? variation of ~35%, and a corresponding effect on Vt; together with the increased sensitivity to noise at this node, some serious work has had to be done on the programming algorithms and error correction.

Number of Boron atoms per cell vs. feature size.
(Squares – mean; diamond – -3Ï??; circle – +3Ï??; triangles – ±3Ï?? percentage divided by the mean. Source: Micron/IEDM)

Later in the afternoon Macronix had a couple of papers (5.5, 5.6) on the other flash technology, charge trapping (CT) using a nitride layer. Macronix has been one of the more prolific industrial contributors in recent years, with six papers this year and seven last year.

Coincidentally, the first paper is on a BE-SONOS NAND flash structure (barrier-engineered SONOS), which uses a thin ONO layer under the charge-trapping nitride layer, instead of under the floating gate (as we speculate above, in the Samsung paper). The thin ONO layer is used as a “modulated tunneling barrier”, which suppresses hole tunneling at low electric fields for retention, but allows efficient tunneling at high fields for erase.

That gives us a ONONO stack under the gate;

Orthogonal Sections of Macronix 38-nm BE-SONOS NAND Flash
(Source: Macronix/IEDM)

The detail in the paper reveals that the lower two oxides are actually nitrided, and the ONO barrier layer thicknesses are 13/20/35 Å (bottom – top), covered by a ~70 Å nitride layer and another ~70 Å oxide, formed by oxidizing the nitride layer. 75-nm and 38-nm NAND flash structures were tested.

The intent of this work is to show that the reliability is improved by leaving the dielectric stack intact, as opposed to etching it when the gates are etched; previously it had been thought that the CT layer had to be etched to stop charge spreading on the nitride. In this they seem to have succeeded, since there is no change after a multiplicity of cycling tests, too many to go into detail here. The results indicate that there is no lateral charge spreading on the nitride CT layer.

Since the CT dielectrics do not need to be cut, this avoids any in-process damage at the edge of the dielectric; an advantage over the cut-dielectric version of CT-flash, but also over floating-gate flash, since these days the floating gate and STI are defined simultaneously, and the FG edge and tunnel dielectric are vulnerable.

The other Macronix paper (5.6) details a study of fast initial charge loss in CT-flash devices, incuding BE-SONOS, where the Vt shifts within a second of programming, and then saturates at a stable value. They minimised this by optimizing the film stack, and by refill programming to duplicate program levels.

Macronix has done a lot of work on the different CT-flash technologies, but BE-SONOS seems to be particularly pragmatic form, and a viable alternative to FG-flash – will we see it in production any time soon?

That was the end of day 1 of the conference; there were other papers that I missed, but I will be trying to review them in the next few weeks; meanwhile part 2 of IEDM retrospective will be up in a few days, covering the final two days of the meeting.

IEDM Next Week!

Next Sunday the great and the good of the electron device world will be gathering in San Francisco for the 2010 IEEE International Electron Devices Meeting. To quote the conference web front page, “IEDM has been the world’s main forum for reporting breakthroughs in technology, design, manufacturing, physics and the modeling of semiconductors and other electronic devices.”

From my perspective at Chipworks, focused on chips that have made it to production, it’s the conference where companies strut their technology, and post some of the research that may make it into real product in the next few years.

In the last few days I’ve gone through the advance program, and here’s my pick of what I want to try and get to, in more or less chronological order. As usual there are overlapping sessions with interesting papers in parallel slots, but we’ll take the decision as to which to attend on the conference floor.

On Sunday December 5th, we start with the short courses, “15nm CMOS Technology” and “Reliability and Yield of Advanced Integrated Technologies”. Kelin Kuhn of Intel has organised the former, and we have some impressive speakers – Thomas Skotnicki (ST – Trends and Scaling), Mukesh Khare (IBM – Device Challenges), Sam Sivakumar (Intel – Lithography), Yoshihiro Hayashi (Renesas – BEOL), and Clive Bittlestone (TI – Device/Circuit Interactions).

Having started in the business on 10-micron geometries, 15-nm devices seem crazy to me, but on the Intel clock it’s only three years away! I’m starting to tell folks to think about the end of silicon, at least as we know it, since my brain will not wrap around the idea of 11- and 8-nm gates, and 11-nm is only five years away (and 30 – 40 atoms across, depending on orientation!). The guys in the R&D labs have been thinking about that for the last decade or so (as we’ve seen at IEDM), so this should be an interesting day to see what they’ve come up with and how we get there.

The other side of the technology coin is reliability at these advanced nodes, and IMEC’s Guido Groeseneken has set up the other short course, with a slightly more academic slate of instructors. We have Ben Kaczer (IMEC – FEOL), Shinichi Ogawa (NIAIST – BEOL), Christian Russ (Infineon – ESD), Ashraf Alam (Purdue – Reliability-Aware Design), and Andrez Strojwas (PDF Solutions – Yield, Yield models, and DFM). Another good day – although the courses make a long Sunday, from 9 a.m. to 5.30 p.m., it’s worth sticking around to the end.

Monday morning we have the plenary session; a couple of good ones here, kicking off with Kinam Kim of Samsung, discussing silicon’s future (will it get beyond 11 nm?) and Arunjai Mittal of Infineon will discuss potential energy savings through the use of semiconductors – energy efficiency is one of the major themes of the conference this year. The third plenary is on bionanoscience in healthcare, a whole new area to me; the plenaries traditionally make the link between semiconductors and other fields of science.

After lunch we get to the conference proper. Straight into session 2, we have a set of 3D integration papers, by TSMC (paper 2.1), on integration at 28 nm, TSV-induced stress on HKMG (high-k, metal-gate) devices by Panasonic/Qualcomm/Samsung/IMEC/Newcastle U (2.2), and IBM (tungsten TSVs – 2.4), and some more academic institutions.

Session 5 on memory technology is split between traditional floating gate flash papers by Samsung (5.1) and Intel/Micron (5.2), and charge-trapping MONOS/SONOS flash, with two papers by Macronix (5.5, 5.6). Micron and Samsung are touting their 25 and 27-nm (respectively) NAND flash technologies; Micron has solved the interline capacitance (wordline/wordline and bitline/bitline) problem by using airgaps:

Micron 25-nm NAND Flash – X-Sections of Wordlines and Select Transistors (top) and Bitlines (bottom) Source: Micron/IEDM

To my knowledge this will be the first volume production of air-gap technology in any product, even though there have been announcements by IBM and others discussing its use in the metal-dielectric stack. It’s kind of ironic that this is in the process front end! Laura Peters has been previewing a number of IEDM papers at ElectroIQ, and more details of this one can be found here. We’ll see how Samsung gets around the same problemâ??¦

There’s also a paper from NCSU (5.3) discussing TaN floating gates down to 1 nm thick; if it works it will be a step towards vertically shrinking the NAND flash stack, something that hasn’t happened much so far in the conventional two-polySi gate structure.

Intel have a couple of papers (6.1, 6.7) indicating where they may go at the 15- and 11-nm generations; both detailing Quantum-Well Field Effect Transistors (QWFETs), the former InGaAs finFETs, and the latter strained germanium pFETs.

Come 6.30 there’s the reception, a chance to see folks we haven’t seen since the last year, or at least since Semicon West, the last tech-fest I was able to get to. Bring ear-plugs – a thousand-plus engineers talking at the same time make a lot of noise!

Tuesday morning there’s a session on high-k and channel engineering, with the IBM Alliance tuning pFET VTH with a Ge implant (11.4, Laura’s take here), and TSMC/Nanyang U discussing gate stack annealing in their gate-last process (11.6, Laura)

Session 12 features another group of memory papers; Hynix is giving an invited review paper (12.4), and is co-affiliate with Grandis on a spin-torque RAM (12.7); and IBM (12.5) and Samsung (12.6) have papers on the same topic.

Session 13 is a highlight session of invited papers on “Next Generation Power Devices and Technology”, covering the field from silicon and silicon carbide to gallium nitride devices.

In the field of image sensors, TSMC has an invited paper (14.1) on a 0.9 µm pixel BSI (backside illumination) image sensor and the scaling challenges involved. TSMC fabs the sensors for Omnivision, which are now at the 1.4 µm BSI generation commercially, so maybe we’ll see this one in a couple of years. Omnivision has now migrated to a 300-mm copper process for their 1.1 µm pixel part, just being launched. This is how we get 12-Mpixel cameras in a cell phone!

Cross-Section of Omnivision OV5642 1.4 µm-Pixel BSI Image Sensor

The IEDM conference lunch speaker is Jim Clifford of Qualcomm, on the evolution of their chipsets and the technology required – we have just seen their first 45-nm part, and they are leaders in multichip packages; maybe we’ll get a hint of their 28/32 nm and TSV plans.

Macronix has a third paper (19.2) in the afternoon memory session on tungsten oxide resistive memory; and in the power session Panasonic (20.5) describes a high-voltage (1300+ V) AlGaN/GaN on silicon device, and TSMC talks high-performance LDMOS (20.8). In general the afternoon has a preponderance of academic speakers, with other sessions on device/circuit interactions, advanced processes, thin film transistors, memory simulation, and graphene (sessions 17, 18, and 21 – 23).

At 5.15 we have the first of three sponsored events; Applied Materials is holding a technical symposium, “Is Moore’s Law Taking Us in a New Direction? The Future of Transistor Technology”, around the corner at the Wyndham Parc 55 hotel, with a slate of speakers from GLOBALFOUNDRIES, IBM, Qualcomm, ST and other companies.

And if that’s not enough, there are the conference panel sessions back at the Hilton at 8 p.m. – “Heterogeneous Device Integration as Enabler of Functional Diversification for More than Moore”, which promises to range from nanomaterials to 3D chip stacking; and “Power Crunch – Threat or Opportunity?”, discussing power optimization at the transistor, circuit, and system level.

By the end of those (if I’ve lasted that long) I will surely be getting into information overload, so I hope I sleep well, ready for session 27 on Wednesday morning, which covers off the advanced HKMG CMOS papers.

TSMC are discussing 22-nm FinFET process (27.1), Intel (27.2) have a HKMG RFCMOS review, Qualcomm (27.3) are talking 28-nm low-power SoC technology (gate-first or gate-last – we’ll see!), and IBM (27.5) are updating the 32-nm eDRAM work they presented last year. The odd paper out (27.4) is a more theoretical study by Texas Instruments of the way 1/f noise is affected by layout features such as active/active spacing and dual stress liner boundaries.

Cross and Longitudinal (right) Sections of TSMC 22-nm FinFET (Source: TSMC/IEDM)

In the parallel sessions Renesas is detailing microwave annealing of NiPt silicide (26.1), STMicroelectronics et al.(29.1) and NXP-TSMC (29.2) have phase-change memory papers, and Hynix (29.7) is showing off a 3D NAND-flash memory cell.

Lunchtime, ASM is holding their fourth annual seminar with their own speakers and Mike Chudzig from IBM, on ALD and epitaxy in CMOS.

Afternoon session 33 (novel processes) kicks off with an invited talk by Ichiro Mori of SELETE (33.1) on their EUV results; I’m not sure the concept of EUV is novel any more, but it’ll be interesting to see how far things have come.

Renesas has a paper on embedded DRAM with MIM capacitors in porous low-k (33.3), continuing the technology we have seen from the former NEC in the Nintendo Wii – now in volume production in their 55-nm process.

Embedded DRAM Capacitor Stack in NEC-Fabbed Memory Die From Nintendo Wii

A little later there is a talk by the IBM consortium on 32 nm BEOL using copper with a copper/manganese seed layer (33.5), followed by TSMC (33.6) discussing chip/package interactions when extreme low-k dielectrics are used.

In parallel sessions, TSMC have another FinFET paper (34.1), and a breakdown study of low-k dielectrics (35.2). Toshiba have an interesting failure analysis study (35.3) looking at anomalous phosphorus diffusion by scanning spreading resistance imaging, followed by U. Cal, IMEC, and Infineon (35.4) examining the effect of strain on ESD protection devices. Laura P. adds detail at ElectroIQ here.
By Wednesday afternoon a lot of attendees will be heading for home, and I’m usually thankful when the last paper’s done, but that’s not the end this year! The SOI Industry Consortium is holding a workshop on fully depleted SOI starting at 5 p.m., with some notable speakers from academe and industry. It will be in the Hilton, preceded by a reception and followed by a buffet supper to aid the weary bones and brain cells.

So as always, no peace for the curious! I will be trying to post a more detailed blog as the conference unfolds, but given all the interesting topics being covered, time may be at a premium. I hope to see you there!