

PDF issue: 2024-11-01

# Robust Design of Embedded SRAM on Deep-Submicron Low-Power SoC

Nii, Koji

(Degree) 博士(工学)

(Date of Degree) 2008-03-25

(Date of Publication) 2013-02-06

(Resource Type) doctoral thesis

(Report Number) 甲4351

(URL) https://hdl.handle.net/20.500.14094/D1004351

※ 当コンテンツは神戸大学の学術成果です。無断複製・不正使用等を禁じます。著作権法で認められている範囲内で、適切にご利用ください。

※ この論文ファイルは印刷不可です。



#### **Doctoral Dissertation**

# Robust Design of Embedded SRAM on Deep-Submicron Low-Power SoC

「超微細技術を用いた低消費電力 SoC におけるオンチップ SRAM のロバスト設計に関する研究」

January 2008

Graduate School of Science and Technology Kobe University

Koji Nii

### **Abstract**

This thesis reports robust design techniques of a static random access memory (SRAM) for deep submicron low-power system-on-chip (SoC) devices. For miniaturization and extended battery life of mobile instruments, SoC devices with low-power, high integration, and low-cost are demanded. Embedded memory, which is typically SRAM, has become the key component of SoC devices because it increases the storage capability and occupies most of a die's area. Therefore, it is necessary to study methods to shrink it while retaining high performance, without sacrificing its low-power benefits. This study specifically examines the embedded SRAM design technique against several limitations in deep-submicron technology.

First, the background of this research area and the objective of this study are explained. In the second part, the fundamentals of an SRAM are described briefly. Critical issues remain related to scaling down of embedded SRAM to meet the challenge of Moore's Law for deep submicron technology. The main issues of an SRAM can be summarized as three limitations: power dissipation, device variations, and access ability per unit time. The reason underlying each limitation is described to provide a better understanding of the objective of this study.

For the subsequent four parts, practical robust design techniques to surmount each limitation are demonstrated. Power reduction techniques, which are standby leakage in a sleep situation and active power dissipation during read and write operations, are discussed first. It is presented that the substrate back bias control technique contributes to reduction of the subthreshold leakage in memory cell arrays while maintaining the stored data. The gate tunneling leakage reduction technique is also introduced for sub-100-nm CMOS technology. The dynamic source-biasing technique is demonstrated for reducing not only the standby leakage but also the operating power, including charge and discharge capacitances, in the memory cell array and peripheral circuit. Evaluation results using 90-nm and 65-nm CMOS technology underscore the effective power reduction of the SRAM by proposed circuit techniques.

In the fourth part, a robust SRAM design technique against process variation and temperature variation is described. To enhance the reading stability and writing stability of SRAM memory cells, a wordline suppression technique and dynamic power-source-line bias control technique are proposed. These circuitries are advantageous in terms of improving the minimum operating voltage reduction in comparison to other circuit techniques. In this part, it is shown that the adequate compensate of the stabilities are performed against the variation of threshold voltage of transistors and temperature. It is designed and fabricated using 45-nm advanced CMOS technology. Test results verify that the minimum operating voltage is reduced using this circuitry.

In the fifth part, a high-density two-port SRAM design technique is reported. Recent

application processors demand memory IP blocks not only as single-port SRAM, but also as dual-port SRAM to perform parallel operations. Their storage capabilities tend to increase as scaling progresses. A circumvention of the access scheme occurs when both ports access a dual-port SRAM simultaneously. This access scheme contributes to reduction of the two-port SRAM cell area because of the smaller transistor sizes than those of typical two-port SRAMs. The scheme is also expected to reduce the standby leakage. The test results for a prototype two-port SRAM that is designed and fabricated using 65-nm technology are shown in this part.

In the sixth part, the SRAM design under the dynamic voltage and frequency scaling (DVFS) environment is discussed. The main objective of this DVFS is reduction of power consumption in accordance with the workload dependency. The supply voltage is lowered dynamically to reduce the power when the workload is light. However, it is difficult to lower the supply voltage of SRAM IP blocks because of the increase in process variations. Although the design solution described in the fourth part is expected to improve the minimum operating voltage (*VDDmin*) of an ordinary 6T single-port SRAM, it will eventually confront the limitation of the SRAM *VDDmin* because of the degraded SRAM stability. For that reason, alternative 6T single-port SRAM cell designs, which use 8T read margin free cell or several types of 10T read margin free cells, are introduced. Their advantages and disadvantages are discussed.

The conclusion of this study is described in the last part, along with remaining problems for developing the deep sub-micron SRAM. The latest robust design technique to improve the stability is also introduced. Lastly, the results of these studies are summarized with comments related to future work for 32-nm or 22-nm advanced device technology and beyond.

# **Contents**





# **List of Figures**

- Figure 1.1: System-on-Chip (SoC):
	- (a) 32-bit RISC microcontroller; (b) Media application processor
- Figure 1.2: Prediction of memory occupation within a die
- Figure 1.3: DRAM and SRAM bitcells:
	- (a) DRAM bitcell; (b) High impedance SRAM bitcell; (c) Full-CMOS SRAM bitcell
- Figure 1.4: Outline of this thesis
- Figure 2.1 List of input pins and output pins for the typical embedded 1-port SRAM macros and a timing diagram
- Figure 2.2 Schematic of Full-CMOS 6T SRAM bitcell
- Figure 2.3 Layouts of 6T SRAM bitcell:
	- (a) Conventional orthogonal type; (b) Double well-boundary thin type
- Figure 2.4 Scaling trend of 6T SRAM bitcell
- Figure 2.5 Layout plot of 4.0  $\mu$ m<sup>2</sup> 6T SRAM bitcell for 0.18  $\mu$ m technology
- Figure 2.6 Top view of 0.49  $\mu$ m<sup>2</sup> 6T SRAM bitcell for 65 nm technology
- Figure 2.7: Leakage current flows of a MOSFET
- Figure 2.8: Dependences of drain current versus gate-source bias
- Figure 2.9: Global and local  $V_{th}$  variations
- Figure 2.10: Device model of local variations of transistors
- Figure 2.11: Pelgrom plot
- Figure 2.12: Static noise margins (SNMs) for various  $V_{th}$  combinations:

(a) With global  $V_{th}$  variation; (b) With local  $V_{th}$  variation

- Figure 2.13: Measured distributions of the SNMs for 45-nm SRAM bitcell
- Figure 2.14: Distributions of  $I_{ds}$  versus  $V_{th}$  for each SRAM transistors
- Figure 3.1: Concept of ABC-MT-CMOS
- Figure 3.2: ABC-MT-CMOS circuit and waveform
- Figure 3.3: Simulated leakage current for a memory cell
- Figure 3.4: Microphotograph of test chip
- Figure 3.5: Power dissipation versus supply voltage
- Figure 3.6: Measured leakage current
- Figure 3.7: Gate leakage current versus *Tox*
- Figure 3.8: Gate leakage current versus gate voltage ( $EOT = 2.0$  nm)
- Figure 3.9: Leakage model in 6T SRAM cell
- Figure 3.10: Gate leakage current suppression in a cell
- Figure 3.11: Standby leakage current of 6T SRAM cell
- Figure 3.12: Static noise margin of 6T SRAM cell
- Figure 3.13: Wordline driver circuit with *Ig* suppression (a), (b), and simulated timing waveform (c)
- Figure 3.14: Schematic diagram of LDLC
- Figure 3.15: Schematic diagram of 32-kB SRAM with *Ig* suppression
- Figure 3.16: Simulated waveform in read operation
- Figure 3.17: Microphotograph of 32-kB SRAM
- Figure 3.18: Layout plot of LDLC
- Figure 3.19: Standby leakage measured for 32-kB SRAM
- Figure 3.20: Active power reduction by column-based source bias control
- Figure 3.21: Reduction of cell current
- Figure 3.22: Simulated waveform of DCB SRAM
- Figure 3.23: Measured static noise margin in reading operation
- Figure 3.24: Die photograph of 32-kB DCB SRAM
- Figure 3.25: Estimated active power reduction
- Figure 3.26: Estimated leakage reduction
- Figure 4.1: 6T SRAM cell schematic and butterfly curves under worst and best conditions
- Figure 4.2: Simulated static noise margin (SNM) versus NMOS  $V_{th}$
- Figure 4.3: SNM improvement by lowering wordline voltage  $(V_{WL})$  in cases with and without local  $V_{th}$  variations
- Figure 4.4: Schematics of read assist circuits (RAC)
- Figure 4.5: Simulation result of WL voltage  $(V_{WL})$  depending on NMOS  $V_{th}$
- Figure 4.6: Operating analysis of read assist circuit by SPICE simulation
- Figure 4.7: Practical read assist circuit enhanced sensitivity of process variation
- Figure 4.8: Layout of each passive resistance for proposed RAC
- Figure 4.9: Resistance sensitivities depend on critical dimension (CD) shift.
- Figure 4.10: Comparison of WL voltage  $(V_{WL})$  in RAC with enhanced gate controller (GC) and without GC
- Figure 4.11: Simulated waveform of proposed RAC
- Figure 4.12: Write assist circuit (WAC) improving write ability.
- Figure 4.13: The simulated waveform of the ary-VDM and dmy-VDM in the write status
- Figure 4.14: Voltage of ary-VDM depending on the division number
- Figure 4.15: Comparison of the write ability by dc simulation result of the write-trip-point
- Figure 4.16: Estimation of power reduction in the write cycle
- Figure 4.17: Simulated waveform in the read and write cycle
- Figure 4.18: SEM images of 6T SRAM cells:
	- (a)  $0.327 \mu m^2$  normal cell; (b)  $0.245 \mu m^2$  high-density cell
- Figure 4.19: Measured SNMs: (a)  $0.327 \mu m^2$  normal cell; (b)  $0.245 \mu m^2$  high-density cell
- Figure 4.20: Measured dc characteristics of 6T SRAM cells
- Figure 4.21: Die photograph of a test chip
- Figure 4.22: Layout plot of 256-kbit SRAM macro
- Figure 4.23: Shmoo plot
- Figure 4.24: Measured minimum operating voltage (*VDDmin*) of 512-kbit SRAM at worst temperature
- Figure 5.1: System block diagrams and timing charts of the memory access:
	- (a) Sequential memory access; (b) Parallel memory access
- Figure 5.2: SRAM memory cell circuits:
	- (a) 1RW 6T-SRAM cell; (b) 1R1W 8T-SRAM cell; (c) 2RW 8T-SRAM cell
- Figure 5.3: Assortment of the access modes of the dual-port SRAM: (a) Different row and column access; (b) Different row and common column access; (c) Common row and Different column access; (d) Common row and column access
- Figure 5.4: Butterfly curves and static noise margin of the DP 8T-cell for both common row access and different row access
- Figure 5.5: Concept of proposed circumventing simultaneous common-row-access
- Figure 5.6: Block diagram and timing chart of proposed access scheme
- Figure 5.7: Circuit of row-address comparator (RAC)
- Figure 5.8: Circuit of bitline shifter for secondary port
- Figure 5.9: Scaling trend of SRAM memory cell size
- Figure 5.10: The 8T DP-cell layout
- Figure 5.11: Top view SEM images of 8T DP-cell after poly-etching and second metal copper dual-damascene interconnect: (a) FEOL; (b) BEOL
- Figure 5.12: Measured SNM for conventional and proposed 8T DP-cells
- Figure 5.13: Stability analysis by  $V_{th}$  curve simulations
- Figure 5.14: Estimation of the stand-by leakage of the 8T DP-cell by SPICE Simulation
- Figure 5.15: Die photograph of a test chip
- Figure 5.16: Layout plot of fabricated 32-kB UHD-DP-SRAM macros
- Figure 5.17: Comparisons of the bit-density and cell size ratio
- Figure 5.18: Measured SNMs
- Figure 5.19: Shmoo plot
- Figure 6.1: Alternative 6T SRAM bitcells:
	- (a) 8T single-end type; (b) 10T single-end type; (c) 10T differential type
- Figure 6.2: Conventional memory cell circuits:
	- (a) Conventional 2-port SRAM bitcell; (b) 2-port cell for gate-array
- Figure 6.3: Proposed 2-port memory circuit with a read buffer
- Figure 6.4: Simulated waveforms of RBLs
- Figure 6.5: Dependence of delay time on the supply voltage
- Figure 6.6: Microphotograph of test chip
- Figure 6.7: Shmoo plot
- Figure 6.8: Proposed 10T SRAM cell circuit:
	- (a) Column source bias type; (b) Column gate bias type
- Figure 6.9: Enhancement of SNM
- Figure 6.10: Microphotograph of test chip for 45 nm technology
- Figure 6.11: Top views of 10T SRAM bitcells:
	- (a) Column source bias type; (b) Column gate bias type

# **List of Tables**

Table 3.1: Test chip features

- Table 3.2: Target specifications of leakage for 90-nm low-power CMOS technology
- Table 3.3: Transistor size of 6T SRAM cell
- Table 3.4: Standby leakage current of 6T SRAM cell
- Table 3.5: State of cell block
- Table 3.6: Process features for 90-nm low-power applications
- Table 3.7: Features of 32-kB SRAM
- Table 4.1: Features of fabricated 512-kbit SRAM macros
- Table 5.1: Dimension of 8T DP-cells
- Table 5.2: Features of fabricated dual-port SRAM macros
- Table 6.1: SRAM contents for test chip

# **Chapter 1 Introduction**

### **1.1. Background of Research Area**

Many semiconductor devices are used nowadays. They are solid state devices having an electronic characteristic of the electrical conduction. In such devices, integrated circuits (ICs) using metal oxide semiconductor field effect transistors (MOSFETs) are already used widely. These semiconductor ICs are used not only for various digital instruments such as personal computers, radios, video cameras, televisions, games, and phones, but also for automobiles, home security devices, medical instruments, and industrial instruments. Particularly among them, mobile applications such as mobile phones, laptop computers, and wearable audio and video appliances prevail in our modern lifestyle worldwide. To miniaturize and extend the battery life of these instruments, system-on-chip (SoC) devices

 offering low power, high integration and low cost are in demand. In fact, SoCs are very-large scale integrated (VLSI) circuits or ultra-large scale integrated (ULSI) circuits that include many intellectual blocks (IPs) with over a million transistors within a die. Consequently, they can be integrated by replacing any IC part for one SoC device on the system board. This substitution





(a) 32-bit RISC Microcontroller (MCU) (b) SH-Mobile media application processor

Figure 1.1: System-on-Chip (SoC)

reduces the number of ICs and the system board size, contributing to down-scaling and cost reduction of mobile instruments. The power dissipation is also expected to be suppressed by this integration to one SoC device.

In fact, SoC devices have many IP blocks within a die, as described above. These IP blocks contain user logics, CPUs, memories, input/output interfaces, analogs, and so on. For example, Fig. 1.1 shows a microphotograph of dies for a 32-bit RISC microcontroller [1] and mobile application processor [2]. Especially, memory IPs have become key components in SoC devices because they play a crucial role in increasing the storage capability; they occupy most of a die's area. As presented in Fig. 1.2, the International Technology Roadmap for Semiconductors (ITRS) 2003 Edition predicts that the memory will occupy about 90% of chip area in 2013 [3]. Therefore, it is necessary to study feasible means to shrink it, while retaining high performance using little power.

The two types of available memory devices are volatile memory and nonvolatile memory. The former erases stored data when the power supply is turned off. The latter retains the stored data even if the power supply is turned off. Typical devices of volatile memory are dynamic random access memory (DRAM) and static random access memory (SRAM). Typically, the DRAM bitcell consists of a transistor (1T) and a capacitor (1C) with wordline and a single-ended bitline, as presented in Fig. 1.3(a). On the other hand, the SRAM bitcell consists of four transistors (4T) with high-resistance loads or six transistors (6T) with a wordline and double-ended complementary bitline pair, as shown respectively in Figs. 1.3(b) and 1.3(c). Both



Figure 1.2: Prediction of memory occupation within a die







Figure 1.3: DRAM and SRAM bitcells

memories present disadvantages and advantages. Although the SRAM cell area is larger than that for a DRAM, the SRAM presents merits of higher access speed and obviation of an additional circuit for refreshing data. Moreover, it is more process friendly for SoC products, requiring no additional process steps than the DRAM. Consequently, the SRAM has been used widely for SoC products.

Commodity SRAM chips, which operate only as memory devices, can customize the process technology to minimize the memory cell area. For example, the bitcell with highly resistive loads using multilayers of poly-silicon or the bitcell with TFT transistors using 3D-stacked devices contribute to reduce their memory cell area, saving manufacturing costs by reducing the die size of the commodity SRAM chip. For SoC devices, however, the on-chip memory IP, which is generally called *embedded memory*, is considered in the total chip cost to combine the memory

and logic area within a die. The 3D-stacked TFT type of bitcell requires additional process steps, as does a DRAM cell. Therefore, this type of memory cell is not amenable to logic CMOS technology because of the increased manufacturing costs. The type of bitcell with highly resistive loads also requires additional process steps; moreover, it has the disadvantage of increasing the standby leakage and decreasing the operating stability. Accordingly, beyond 0.25 μm CMOS technology, the 6T full-CMOS type of memory cell of one-layer poly-silicon and multi-metal layers, which is core-logic-friendly, is the major type of embedded SRAM for advanced SoC devices because of the lack of additional manufacturing costs.

According to the background described above, this study specifically examines embedded SRAM design technique for low-power SoC applications. Practically, this thesis mainly discusses robust design techniques of embedded SRAM for advanced logic CMOS technology, which is in deep submicron generation. As described in the next section, many issues and requirements pertain to the design of embedded SRAM on advanced deep submicron CMOS devices. This study is intended to overcome these device issues and to contribute to meeting the requirements of an advanced SoC product.

## **1.2. Objective of this Study**

In deep submicron technology, SoC products require a high-speed and low-power embedded memory to support increased storage capabilities. According to device scaling, many issues arise because of device limitations for keeping up with the miniaturization trend. Although scaling of both the supply voltage and the threshold voltage of transistors enables low-power, high-speed operation, it considerably increases the static leakage current. The lower threshold voltage of transistors engenders the increment of the subthreshold leakage current of the off state transistors from drain to source. The thinner gate oxide induces the substantial increment of direct tunnel leakage. It becomes non-negligible for 90-nm technology and beyond. This increment of the leakage strongly affects the performance of embedded SRAM IPs, which occupy a large area of SoC devices. Leakage reduction by the circuit technique for embedded SRAM is demanded as well as device improvement for low-power operation.

Embedded SRAM in deep submicron CMOS technology for SoC is facing a crisis of increasing threshold voltage  $(V<sub>th</sub>)$  variation within a die caused by the doping fluctuation or the line edge roughness (LER). It is readily apparent that large  $V_{th}$  variation induces asymmetry in DC characteristics, which deteriorates both the static noise margin (SNM) and the write capability. The problem of variation-induced stability degradation appeared in a 6T SRAM cell, manifesting itself as an increase of the total memory capacity on a chip, which points to a crucial factor preventing scaling down of the SRAM supply voltage. Up to the 65-nm technology, 6T SRAM cell sizes were shrunk according to the scaling trend being half the size of that of the previous generation. However, subsequent to the sub-50-μm era, it has become more difficult to

keep up with the scaling trend because of increasing local  $V_{th}$  variation. The lithography and process technology are disincentive to scale down an SRAM cell size, but the main constraint is that the transistor sizes in an SRAM cell cannot be shrunk because of an increase in such local  $V_{th}$ variations resulting from scaling. Consequently, this problem must be overcome by design solutions and reduction of the variation through device optimization.

It is a fact that SRAMs face limitations in terms of power dissipation through the increase of the clock frequency to improve the performance of SoCs as a technology advancement. Accordingly, the system architecture has moved to parallel operations to increase the practical computation speed through increased parallel processing rather than through increased clock frequency. For that reason, the number of memory accesses has increased considerably, rendering the memory access speed a system bottleneck. To date, the component with the most embedded memory is single-port SRAM, which has one access port for reading and writing operations, although the demands for multi-port SRAM continue to increase to accommodate high-speed communications and image processing. Multi-port SRAM is suitable for such parallel operation and improves the total chip performance. Although the memory access speed, the number of clock cycles, improves with increasing number of access ports of the SRAM, its area penalty also increases concomitantly with the number of ports. Consequently, high-density multi-port SRAM is demanded recently for SoC chips that can provide large capability to the extent of SP-SRAM.

Dynamic voltage and frequency scaling (DVFS) is expected to reduce the total power consumption of a chip. The main objective of the DVFS is to reduce the power consumption in accordance with the workload dependency. The supply voltage is lowered dynamically to reduce the power when the workload is light. However, it is difficult to lower the supply voltage of SRAM IP blocks because of the increase in process variations. A design solution will help the minimum operating voltage (*VDDmin*) of a normal 6T single-port SRAM. However, it eventually will face the limitation of the SRAM *VDD<sub>min</sub>* because of the degradation of SRAM stability. Thereby, alternative 6T single-port SRAM cells, which are more robust against device variations, are demanded for the DVFS environment.

# **1.3. Overview of this Thesis**

Figure 1.4 presents the outline of this thesis, as visualized very simply. First, the background and objective of this study are described. For deep submicron technology, critical issues related to embedded SRAM are pointed out in Chapter 2. The main issues of an advanced SRAM design are summarized as three limitations: power dissipation, device variation, and access ability per unit time. An explanation for each limitation is provided to enhance understanding of the study objective.

For the next four parts of this paper, practical robust design techniques against each limitation are demonstrated. In Chapter 3, power reduction techniques, which are both standby



Figure 1.4: Outline of this thesis

leakage in sleep mode and active power dissipation during read and write operations, are discussed first. It is presented that the substrate back bias control technique contributes to reduction of the subthreshold leakage in memory cell arrays while maintaining the stored data. The gate tunneling leakage reduction technique is also introduced for sub-100-nm CMOS technology. The dynamic source-biasing technique is demonstrated for reducing not only standby leakage but also operating power in a SRAM cell array and peripheral circuit. An evaluation result using 90-nm and 65-nm CMOS technologies underscores the effective reduction of the SRAM power using the proposed circuit techniques.

In Chapter 4, an SRAM design technique that is robust against process variation and temperature variation is discussed for low-power or low-standby-power CMOS technology. To enhance the reading stability and writing stability of SRAM memory cells, wordline suppression technique and dynamic power-source-line bias control technique are proposed. These circuitries present an advantage in terms of the improvement of lowered minimum operating voltage over the other circuit techniques. In this part, it is described that the adequate compensation of the stabilities is performed against the variation of threshold voltage of transistors and temperature. It is designed and fabricated using 45-nm advanced CMOS technology. Test results show that the minimum operating voltage is lowered by this circuitry.

In Chapter 5, the high-density two-port SRAM design technique is reported. Recent application processors demand memory IP blocks not only as single-port SRAM but also as dual-port SRAM to perform parallel operations. Their storage capabilities tend to increase as scaling advances. A circumvention of access schemes, when both ports access simultaneously, is presented in this part. This access scheme reduces the two-port SRAM cell area by yielding smaller transistors than the typical two-port SRAM ones. It is also expected to reduce the standby leakage. Test results of a prototype two-port SRAM, which is designed and fabricated using 65-nm technology, are presented in this part.

In Chapter 6, the SRAM design for dynamic voltage and frequency scaling (DVFS) environment is discussed. Alternative 6T single-port SRAM cell designs, which use 8T read margin free cell or several types of 10T read margin free cells, are introduced; their advantages and disadvantages are discussed.

The overall conclusion of this contribution is presented as a summary in Chapter 7.

# **Chapter 2 Issues of an Embedded SRAM for Deep Submicron Logic CMOS Technology**

For sub-100-nm advanced logic CMOS technology, the use of embedded SRAMs for SoCs is facing a crisis of increasing power dissipation and threshold voltage  $(V<sub>th</sub>)$  variation. Meanwhile, the number of memory accesses is increasing in systems, causing the memory access speed to become a system bottleneck. In this chapter, the fundamental of the embedded SRAM and scaling trend of the bitcell are described briefly. The relations of these issues of an SRAM to the deep submicron CMOS technology era are introduced.

### **2.1. Introduction**

#### **2.1.1. Embedded SRAM Macros**

Figure 2.1 shows a list of input pins and output pins for typical embedded 1-port SRAM macros along with its timing diagram. The symbols CLK, CE\_N, WE\_N, and OE\_N respectively denote the clock input, cell enable control, write enable control, and output enable control input signals. In the figure, the ADD $\leq n$ , DI $\leq m$ , DO $\leq m$ , show the address input bus, data input bus, and data output bus, where the variables *n* and *m* are integer numbers respectively depending on the word depths and data bit width. For instance, the case of  $n=9$ ,  $m=16$  means that the configuration of the SRAM macro is the 256-word depth and 16-bit width. Most SoC devices demand many SRAM instances, which have many configuration variations. The SRAM macro operates with synchronization of the system clock CLK; the other inputs CE\_N, WE\_N, ADD $\leq n$  and DI $\leq m$  must satisfy the set-up time and hold-time by the positive clock edge every clock cycle. When the control input CE N is disabled (in this case, CE\_N="H"), the SRAM macro is in sleep status: it is not accessed for reading or writing. The SRAM operates for reading or writing depending on the WE\_N input level if the CE\_N is enabled (hence, CE\_N="L"). When the WE\_N="H", the SRAM reads out the stored data bits of memory cells addressed by ADD<*n*>. Conversely when the WE\_N="L", the SRAM writes the data bits for memory cells addressed by ADD $\leq n$ . The OE N controls the output bus to the enable status or high-impedance status.

#### **2.1.2. 6T SRAM cell**

Figure 2.2 shows a schematic for 6T SRAM bitcell, which consists of the cross-coupled inverter pair and two transfer NMOS (access transistors). Each inverter has pull-down NMOS



Figure 2.1: List of input pins and output pins for the typical embedded 1-port SRAM macros and a timing diagram

(drive transistor) and pull-up PMOS (load transistor). The wordline is connected to the gate terminal of both access NMOS. The bitline pairs BL, /BL are connected to the source terminal of both access NMOS. These transistor sizes must be designed to the optimal size that achieves both a small area and sufficient stability. Empirically, each transistor size is designed such that the drain currents of each transistor––the load PMOS, access NMOS, drive NMOS––becomes 1:2:4, approximately. The detailed discussion for circuit design optimization for the 6T SRAM bitcell is omitted here because it depends strongly on the process technology.



Figure 2.2 Schematic of Full-CMOS 6T SRAM bitcell







Figure 2.3: Layouts of 6T SRAM bitcell



Figure 2.4: Scaling trend of 6T SRAM bitcell

Figure 2.3 displays typical layouts of the 6T SRAM memory cell. Figure 2.3(a) shows the conventional orthogonal type; also, Fig. 2.3(b) shows the double well-boundary thin type, which is used widely for sub-micron technology. The reason that all transistors have the same direction and the shape of poly-silicon gates––they are almost straight rectangles––is that the process variation is slight and the lithography is simplified through design for manufacturing.

#### **2.1.3. Scaling Trends of 6T SRAM cell**

The scaling trend in the last ten years of the typical industrial SRAM bitcell for low-power CMOS technology is presented in Fig. 2.4. Results show that the cell size was halved every two years up to the 45-nm technology era. It might tend to slow down beyond the 32-nm technology because of the increased device variation discussed in a later section. The conventional orthogonal type of memory cell is used up to 0.25-μm CMOS technology. For 0.18 μm technology, the double-well boundary thin type of memory cell was utilized first time in 1999. Subsequently, the ladder type of memory cell came to be used. Figure 2.5 portrays the layout plot



Figure 2.5: Layout plot of 4.0  $\mu$ m<sup>2</sup> 6T SRAM bitcell for 0.18 $\mu$ m



Figure 2.6: Top view of 0.49  $\mu$ m<sup>2</sup> 6T SRAM bitcell for 65nm technology

of a 6T SRAM bitcell; it is 4.0  $\mu$ m<sup>2</sup> for 0.18  $\mu$ m CMOS technology. Figure 2.6 shows the top view of 6T SRAM bitcell for 65-nm CMOS technology; it has a straight poly-silicon shape and  $0.5$ - $\mu$ m<sup>2</sup> cell size.

In the next three sections, the main issues confronting advanced SRAM design are summarized as three limitations: power dissipation, device variations, and access ability per unit time. The reason for each limitation is described.

# **2.2. Limitation of Power Dissipation**

In fact, CMOS devices have been scaled for more than 30 years to achieve higher density and performance and lower power dissipation. Although scaling of both the supply voltage and the threshold voltage of transistors enables low-power, high-speed operation, it causes a significant increase of the static leakage current. In deep submicron technology, such high leakage current becomes an important contributor to power dissipation of CMOS circuits as threshold voltage, channel length, and gate oxide thickness are reduced.

Figure 2.7 portrays the leakage current flows of the MOSFET in deep submicron technology. Five leakage flows exist as presented in Fig. 2.7. They are listed as follows:

- *I*1: Subthreshold current
- *I*<sub>2</sub>: Gate-induced drain current (GIDL)
- *I*3: Junction band-to-band tunneling current (BTBT)
- *I*4: Punch-though current
- $I_5$ : Gate direct tunneling current  $(I_g)$



Figure 2.7: Leakage current flows of a MOSFET



Figure 2.8: Dependences of drain current versus gate-source bias

The subthreshold current  $(I_1)$  between the source and drain in an MOSFET flows when the gate voltage is below the threshold voltage  $(V<sub>th</sub>)$ . Consequently, the MOSFET is in a weak inversion condition. This subthreshold current is exponentially dependent on the  $V_{th}$  of the MOSFET. In short-channel devices, the source and drain depletion width in the vertical direction and the source drain potential have a strong effect on the band bending over a large fraction of the device. Consequently, the  $V_{th}$  and subthreshold leakage vary with the drain bias. This effect is drain-induced barrier lowering (DIBL). This barrier becomes lower as the drain bias becomes higher, thereby increasing the subthreshold leakage. Therefore, the suppression of the supply voltage can reduce the subthreshold current by this DIBL effect. The GIDL  $(I_2)$  results from a high field effect in the drain junction of the MOSFET. If the negative gate bias is large, the field crowding and peak field increase, thereby increasing high field effects. Effects of this high field induce the large current from drain to substrate as BTBT, or avalanche multiplication. The GIDL increases with thinner oxides and higher supply voltage because of the enhanced electric field. Drain and source to well junctions are typically reverse-biased, causing *pn*-junction leakage current. If both *p* and *n* regions are heavily doped, BTBT dominates as *I3* leakage. This is the case for advanced MOSFETs using heavily doped shallow junctions and halo doping for better short-channel-effect. Reduction of the gate oxide thickness increases the field across the oxide. The high electric field induces tunneling of electrons from the channel (substrate) to the gate.



Figure 2.9: Global and local  $V_{th}$  variations

This is gate leakage (*I5*) as presented in Fig. 2.7. The thinner oxide fosters a higher electric field, inducing high gate leakage. In typical bulk CMOS devices, the on-state NMOSFET becomes the highest electric field between the gate and channel. In this case, the gate leakage flows from the gate to the drain and source. Figure 2.8 shows the dependence of the drain current versus the gate source bias.

The active (operating) power of the integrated circuit is proportional to the square of the supply voltage. Reducing the supply voltage is the most effective approach to reduce the active power. However, to maintain the performance, the threshold voltage  $V_{th}$  must also be reduced, thereby increasing the subthreshold leakage. Although the thinner gate oxide improves the drain current, it induces the marked increment of gate leakage. It becomes non-negligible for 90-nm technology and beyond. This increment of the leakage strongly affects the performance of embedded SRAM IPs, which occupy a large part of the area of SoC devices. Leakage reduction using circuit techniques for embedded SRAM has become the essential technique for deep submicron technology.

## **2.3. Variability of Device Characteristics**

The scaling of CMOS transistors has progressed aggressively with respect to Moore's Law. However, in accordance with this scaling, many researchers in advanced technology have come to confront difficulties that have been neglected so far. The variation of MOS device characteristics has become an obstacle to obtaining stable CMOS circuits. The variation of MOS devices, typically considered as threshold voltage  $(V<sub>th</sub>)$ , has several components, as presented in Fig. 2.9: die-to-die (D2D) variation, wafer-to-wafer (W2W) variation, and lot-to-lot (L2L) variation. This global  $V_{th}$  variation ( $\sigma_{Vth\ global}$ ) is caused mainly by variations in gate length, gate width, and gate oxide thickness. Meanwhile, a within-die variation (σ<sub>*Vth local*) exists, which occurs</sub> randomly in the local area for pair MOS transistors, as presented in Fig. 2.9. Statistical analyses show that MOS device variability follows a normal distribution curve, as shown in eq. (2.1).

$$
f(x) = \frac{1}{\sqrt{2\pi} \cdot \sigma} \cdot \exp\left(\frac{-(x-\mu)^2}{2\sigma^2}\right), \quad -\infty < x < \infty, \quad \sigma^2 > 0,\tag{2.1}
$$

The total standard deviation of  $V_{th}$  is represented by the root-mean-square of  $\sigma_{Vth}$  global and  $\sigma_{Vth}$  local, as shown in eq. (2.2)

$$
\sigma_{Vth\_total} = \sqrt{\sigma_{Vth\_local}^2 + \sigma_{Vth\_global}^2},\tag{2.2}
$$

and in particular, one can only slightly ignore the local component of the  $V_{th}$  variation ( $\sigma_{Vth~local}$ ) originating from the random dopant fluctuations (RDF) and gate line edge roughness (LER) [4-7]. It can rarely be ignored because the  $\sigma_{Vth\_local}$  gets larger in direct relation to the inverse of the transistor channel area. Actually,  $\sigma_{Vth\_local}$  is represented by the following eq. (2.3).

$$
\sigma_{Vth\_local} = \sqrt{\sigma_{Vth\_RDF}^2 + \sigma_{Vth\_LEF}^2 + \sigma_{Vth\_Others}^2},\tag{2.3}
$$

Figure 2.10 portrays the model of the dopant and interface fluctuation of the transistor and the LER. The RDF component, which occurs randomly within a die, tends to increase as the device shrinks. The standard deviation of the RDF is represented in eq. (2.4).

$$
\sigma_{v_{th}} \propto \frac{T_{ox} \cdot \sqrt[4]{N_{sub}}}{\sqrt{W_{\text{eff}} L_{\text{eff}}}} = \frac{Avt}{\sqrt{LW}} \quad (Avt \text{ : Pelgrom coefficient}), \tag{2.4}
$$

In that equation,  $T_{ox}$ ,  $N_{sub}$ ,  $W_{eff}$  and  $L_{eff}$  respectively denote the gate oxide thickness, channel



Figure 2.10: Device model of local variations of transistors

dopant concentration, and the effective channel width and length. Figure 2.11 shows the standard deviation of  $V_{th}$  versus 1/ $\sqrt{L}W$ , which is called a "Pelgrom plot" in general. Here, symbol L represents the channel length and symbol W represents the channel width. As depicted in this graph, the standard deviation of  $V_{th}$  is related directly to  $1/\sqrt{L}W$ ; the small transistor has large  $V_{th}$ variation. The coefficient of this plot depends on the equivalent gate-oxide thickness (EOT). The EOT decreases, indicating slight thinning. The slope of this plot becomes a slow grade as technology scaling. However, the feature size of transistors becomes 0.7× shrinking of one technology scaling; furthermore, the standard deviation of  $V_{th}$  increases. Therefore, SRAM cells, which have small transistors, might face a higher probability of failure with no circuit operation:

some fail bits might appear once the unit cell consists of a device array with a size of megabit order because of this unavoidable  $V_{th}$  fluctuation. Consequently, CMOS device stability against  $V_{th}$  is impossible to discuss properly without due consideration of the  $\sigma_{Vth~local}$  component effect.

Figure 2.12 shows an example of this fatal phenomenon of 6T SRAM cells. We specifically examine the static noise margin (SNM) that corresponds to the SRAM readout stability [8]. The gray curves in Figs. 2.12(a) and 2.12(b) show SNMs without  $V_{th}$  variations. Here, the NMOS  $V_{th}$ in a memory cell gets smaller by 100 mV because of the global  $V_{th}$  variation. In this case, both SNM<sub>L</sub> and SNM<sub>R</sub> worsen simultaneously, as presented in Fig. 2.12(a). However, the  $\sigma_{Vth~local}$  that appears randomly in six transistors engenders a  $V_{th}$  imbalance in a memory cell, which causes the asymmetry of  $SNM<sub>L</sub>$  and  $SNM<sub>R</sub>$ , as presented in Fig. 2.12(b). Therefore, the SRAM cell design must be considered by making allowances for the unbalanced  $V_{th}$  combination originating from the  $\sigma$ <sub>*Vth\_local*</sub> component [9-12].

For the standard deviation of read stability SNM against local  $V_{th}$ , the variation of each of six memory transistors is described as

$$
\sigma \text{SNM} = \sqrt{k_1 \cdot \sigma_{Vth_{-}P1}^2 + k_2 \cdot \sigma_{Vth_{-}P2}^2 + k_3 \cdot \sigma_{Vth_{-}N1}^2 + k_4 \cdot \sigma_{Vth_{-}N2}^2 + k_5 \cdot \sigma_{Vth_{-}N3}^2 + k_6 \cdot \sigma_{Vth_{-}N4}^2},\tag{2.5}
$$

where  $k_n$  ( $n=1-6$ ) is the sensitivity coefficient. From DC stability analysis, the margin of stability is evaluated as

$$
\mu \text{SNM} \ge N \cdot \sigma \text{SNM},\tag{2.6}
$$

where *N* is a positive number depending on the total capability of the target SRAM. For example, 8-Mbit SRAM corresponds to  $5.3\sigma$  (N=5.3). Figure 2.13 presents a graph of measurement result of a distributed SNM for a 45-nm 6T SRAM memory cell. In this graph, two distributions are portrayed for high-density 6T SRAM cells with an assist circuit (AST) and without an assist circuit, as described in Chapter 5.

Discussion about  $V_{th}$  variation has been presented so far, the drain current  $(I_{ds})$  variation must also be discussed. The drain current is represented using the following equation.

$$
I_{ds} = \beta (V_{gs} - V_{th})^{\alpha} \tag{2.7}
$$

Figure 2.14 shows a Monte-Carlo simulation result of  $V_{th}$  versus drain current  $(I_{ds})$  for 45-nm 6T SRAM bitcell considering the local *Vth* variation. Figure 2.13 shows that each *Ids* of the load PMOS, access-NMOS and drive-NMOS is distributed according to the local  $V_{th}$  variation. The variations of these drain currents affect the write ability and readout sense margin. Local variations of drain current must be considered along with  $V_{th}$  variations.



Figure 2.11: Pelgrom plot



Figure 2.12: Static noise margins for various  $V_{th}$  combinations



Figure 2.13: Measured distribution of the SNM for 45-nm SRAM bitcell



Figure 2.14: Distribution of  $I_{ds}$  versus  $V_{th}$  for each SRAM transistors

### **2.4. Overflow of Memory Access Frequency**

The memory capability tends to increase as technology is scaled. Figure 1.2 in Chapter 1 shows a prediction of the percentage of memory area of a SoC device. This shows the increase of data processing per unit time to improve the performance of the application chip. However, as described above, SRAMs face limitations in terms of power dissipation through increased clock frequency as technology is advanced. Accordingly, the system architecture has moved to parallel operations to increase the practical computation speed through increased parallel processing rather than through increased clock frequency. Consequently, the number of memory accesses has increased considerably, which has made the memory access speed a system bottleneck.

To date, most embedded memory is single-port SRAM, which has one access port for reading and writing operations, although the demands for multi-port SRAM continue to increase to accommodate high-speed communications and image processing. A multi-port SRAM is suitable for such parallel operation and improves the total chip performance. Although the memory access speed (the number of clock cycles) improves with the numerical increase of SRAM access ports, its area penalty also increases exponentially with the number of ports. Consequently, high-density multi-port SRAM is demanded recently for SoC chips that can provide large capability to the extent of SP-SRAM.

# **2.5. Summary**

Increasing leakage current, so-called subthreshold leakage, in addition to gate-induced drain current (GIDL), drain-induced barrier lowering (DIBL), band-to-band tunnel leakage (BTBT), and gate leakage were introduced. Their effects on SRAM performance were discussed. The issues of increasing the active power dissipation were also introduced because of the slowdown in voltage scaling in spite of the increment of clock frequency. From another perspective, the degradation of stability against process variation is described. As the device feature size is scaled down, the variability of the threshold voltage  $(V<sub>th</sub>)$  of MOSFETs within a die increases because of dopant fluctuation or line-edge-roughness. This  $V_{th}$  variation affects a considerable degradation of the stability for reading and writing operations. Meanwhile, the system architecture has moved to parallel operations to increase the practical computation speed because the clock frequency almost reaches its limitation. The memory access capabilities have become a system bottleneck because memory accesses are so numerous for such parallel systems.

# **Chapter 3 Power Reduction Technique of SRAM**

A low-power SRAM using an effective method called "ABC-MT-CMOS" is presented in this chapter [13]. It controls the back-gate bias to reduce the leakage current when the SRAM is not activated, as during sleep mode, while retaining the data stored in the memory cells. Test chips are designed and fabricated with a 4-kB gate-array SRAM. Experimental results show that the leakage current is reduced to 1/1000 in sleep mode. The active power is 0.27 mW/MHz at 1 V, which is a reduction of 1/12 of a conventional SRAM with a 3.3 V.

A gate leakage reduction technique is described. In sub-100-nm generation, the gate-tunneling leakage current increases and dominates the total standby leakage current of LSIs based on decreasing gate-oxide thickness. Showing that the gate leakage current is reduced by lowering the gate voltage, a local DC level control (LDLC) for SRAM cell-arrays and an automatic gate leakage suppression driver (AGLSD) for peripheral circuits are proposed. A 32-kB 1-port SRAM is designed and fabricated using 90-nm low-standby CMOS technology. The evaluation result shows that the standby current of a 32-kB SRAM is 1.2 μA at 1.2 V at room temperature. It is reduced to 7.5% of the conventional SRAM.

The dynamically controlled column bias scheme (DCB) is proposed to reduce the active power. It reduces the active power by 64% and the stand-by current by 93% when using 90 nm CMOS technology.

# **3.1. Introduction**

Low-power, low-voltage SRAMs are becoming increasingly necessary for mobile systems. Although scaling of both the supply voltage and the threshold voltage of transistors enables low-power, high-speed operation, it considerably increases the static leakage current. To avoid this undesirable leakage current, several methods have been reported. One is called the "Variable Threshold (VT) CMOS" [14], which controls the substrate bias to reduce the leakage current in sleep mode when the circuit is not operating. Another is called the "Multi Threshold (MT) CMOS" [15], which uses two threshold voltages for the transistors. Using this method, the higher threshold transistors cut off the leakage current in sleep mode. Although the former method can retain the stored data in sleep mode, it requires a triple-well structure and a charge-pump circuit. On the other hand, the latter method is simple, but it does lose data stored in the memory cells because the source line and the ground line of the internal circuit become floating nodes by high-threshold transistors. Reportedly, to avoid loss of latched data in a MT-CMOS, the stored data can take refuge in the embedded high-threshold latch circuit (balloon circuit) in sleep mode [16]. This is unsuitable for memories because of the large area overhead and design complexity.
An embedded SRAM, whose memory cell array blocks consist of high-threshold transistors to reduce the leakage current, and whose peripheral blocks consist of low-threshold transistors to improve the access time, has been reported [17]. However, the use of that method is restricted to embedded design. It is not applicable to master-slice designs such as gate arrays. The Auto-Backgate-Controlled Multi-Threshold CMOS is adopted [13]. It can reduce the leakage current considerably using a simple circuit while in sleep mode. All transistors in the core area have a low threshold voltage, making them suitable for master-slice chips such as gate arrays. To reduce undesirable leakage current in sleep mode, the backgate bias is automatically controlled to increase the threshold voltage.

Concomitantly with the rapid development of micro-fabrication technology, the effect of gate leakage current has become dominant in the standby mode of SoCs, mainly because thinner gate-oxide films increase the probability of quantum tunneling between the poly gate and the inversion layer [18], [19]. Especially in 90-nm CMOS technology nodes, the gate leakage current originating from an embedded SRAM, which occupies most of the area in SoC, cannot be neglected. Therefore, to apply some CMOS devices fabricated in 90-nm technology nodes to various mobile applications, the gate-oxide thickness is enlarged to realize very low standby power. However, the power supply voltage is scaled down based on the scaling law; the drain current decreases if the gate-oxide thickness is not scaled down. This scaling down prevents the achievement of high-speed operation and from realizing scaling merits. Several methods have been proposed to solve such problems. For example, the thin gate oxide was reportedly applied only to the critical path by dual or triple gate-oxide technology [20]. However, because of the difficulty in controlling the gate-oxide thickness locally, this method is not feasible. On the other hand, high- $k$  gate-stack, which is considered to be an alternative to conventional  $SiO<sub>2</sub>$ , has attracted much attention from researchers [21]. Making use of high-*k* gate-stack, it is possible to suppress the gate leakage current because the physical thickness of the high-*k* gate-stack is greater than that of the conventional  $SiO<sub>2</sub>$  film of the same dielectric constant. Even if the high- $k$ gate-stack is used in the gate oxide, one will encounter the same problem as that of  $SiO<sub>2</sub>$  film: the gate leakage current cannot be neglected in future device scaling. In other words, the quantum mechanical behavior inevitably appears through the scaling of the high-insulation film. Therefore, the problem of the gate leakage current, which appears in 90-nm technology nodes, will become more pronounced in future CMOS devices. In this section, a new design method for an embedded SRAM is proposed. It suppresses the gate leakage current without complicated fabrication processes.

This chapter is organized as follows. In Section 3.2, the concept of the ABC-MT-CMOS and its effect of reducing the power are introduced. Then the implementation of the chip, which contains a 4-kB gate array SRAM, is reported; relevant experimental results are discussed. In section 3.3, the relationship between gate voltage and gate leakage current is quantitatively discussed. Showing that lowering the NMOS gate voltage is the most crucial factor to suppress



Figure 3.1: Concept of ABC-MT-CMOS

the total gate leakage current, this gate voltage lowering method is performed both in the memory cell array and in peripheral circuits. Then, the detailed structure of a test chip is demonstrated. A 32-kB SRAM is fabricated using the 90-nm CMOS logic process. The standby leakage current is evaluated and is shown to be more suppressed than the conventional embedded SRAM. A brief summary is presented in Section 3.5.

## **3.2. Reducing Subthreshold Leakage in Sleep Mode**

#### **3.2.1. ABC-MT-CMOS Circuit**

Figure 3.1 portrays the ABC-MT-CMOS circuit the concept. Here, Q1 and Q2, which are higher threshold transistors than those for the internal circuit, act as a switch to cut off the leakage current. While the LSI is operating, which is the so-called *active mode*, these transistors are turned on. The virtual source line, *VVDD*, becomes 1.0 V supplied by the voltage source *Vdd1* through Q1. Another virtual source line, *VGND*, is forced to ground level through Q2. The internal circuits consist only of low-threshold transistors. In the active mode, the dynamic current and static leakage current flow from *Vdd1* to ground, as denoted by  $I_{dd}$  (active) in Fig. 3.1. The current  $I_{dd}$  (active) can be reduced if the switch transistors Q1 and Q2 turn off in sleep mode. However, the data stored in the memory cell disappears in such a condition. The



Figure 3.2: ABC-MT-CMOS circuit and waveform

ABC-MT-CMOS uses the higher voltage source *Vdd2* (3.3 V) and diodes D1 and D2 to reduce the leakage current while retaining the stored data. In sleep mode, the *VVDD* is connected to *Vdd2* through D1; also, *VGND* is connected to ground through D2. Here, the diodes D1 and D2 actually consist of two diodes each. The forward voltages of D1 and D2 are 1.0 V if it is assumed that the forward bias of one diode is 0.5 V. Consequently, *VVDD* and *VGND* respectively become about 2.3 V and 1 V. The static leakage current *I<sub>dd</sub>* (sleep), which flows from *Vdd2* to ground, decreases considerably compared to that of the active mode because the threshold voltage of the internal transistors increases by its backgate bias effect. In sleep mode, *VVDD* and *VGND* maintain their voltage levels because of the weak leakage current  $I_{dd}$  (sleep), so that the data stored in the memory cell is retained. This method requires no triple-well structure or complicated circuits such as charge pumps and balloon circuits.

Figure 3.2 shows the actual configuration of the ABC-MT-CMOS circuit. It has two additional high-threshold transistors Q3 and Q4. In the active mode,  $SL = "L"$  is applied with SL = "H" and Q1, Q2 and Q3 turn on and Q4 turns off. Thereby, both *VVDD* and the substrate bias, BP become 1.0 V. On the other hand, in sleep mode,  $SL = "H"$  and  $SL = "L"$  are applied and  $Q1$ , Q2 and Q3 turn off and Q4 turns on. Thereby, BP becomes 3.3 V. The static leakage current, which flows from *Vdd2* to ground through D1 and D2, determines the voltages  $V_{d1}$ ,  $V_{d2}$  and  $V_m$ . Here,  $V_{d1}$  denotes the bias between the source and the substrate of the PMOS transistors,  $V_{d2}$ denotes that of the NMOS transistors, and *Vm* denotes the voltage between *VVDD* and *VGND*. The static leakage current was simulated in a memory cell in sleep mode using SPICE simulation.

Figure 3.3 shows the leakage current per memory cell for the voltage between *VVDD* and *VGND*, as denoted by  $V_m$ , which is equal to (BP -  $V_{d1}$  -  $V_{d2}$ ) as depicted in Fig. 3.2. The horizontal



Figure 3.3: Simulated leakage current for a memory cell

axis represents  $V_m$  and the vertical axis represents the leakage current on a logarithmic scale. In the graph, the solid line shows the leakage current on a memory cell circuit for the change of *Vm* under the BP = 3.3 V. Actually,  $V_m$  is changed by varying *Vdd1* and *Vdd2* simultaneously. Here, it is assumed that  $V_{d1}$  is equal to  $V_{d2}$ . Results show that the leakage current can be reduced exponentially by the reduction of  $V_m$ . When  $V_m = 1.0$  V ( $V_{d1} = V_{d2} = 1.15$  V), the leakage current is reduced to 20 pA/cell, which is comparable to a conventional memory cell consisting of high-threshold transistors and operated at 3.3 V, as shown in the figure. Therefore, a sufficient increase is visible in the threshold voltage of the circuit because of the backgate effect.

#### **3.2.2. Implementation of 0.35** μ**m 4-kB SRAM**

To evaluate the effectiveness of this work, test chips containing 4-kB SRAM were designed and fabricated using a 0.35  $\mu$ m<sup>2</sup> CMOS gate array and triple metal. Figure 3.4 shows a microphotograph of a chip. The chip size is 6 mm  $\times$  5 mm. The core area, which corresponds to the sea of the gate array region, is 4 mm  $\times$  3 mm. All transistors in the core area have a low threshold voltage: 0.12 V for NMOS transistors and 0.20 V for PMOS transistors. Here, the threshold voltage is defined as the 1  $\mu$ A drain current. The high-threshold transistors (Q1–Q4,



Figure 3.4: Microphotograph of test chip



Figure 3.5: Power dissipation versus supply voltage

and D1 and D2) are located in the I/O buffer region with a small area overhead. The high-threshold voltage is 0.53 V for NMOS transistors and 0.58 V for PMOS transistors. The numbers of NMOS and PMOS transistors in the core region are 139-k and 73-k, respectively, including peripheral circuits. The chip characteristics are presented in Table 3.1. The SRAM size is 1.8 mm  $\times$  2.8 mm. Its configuration is 128 bits by 256 words. The memory cell arrays are divided into two planes. Each memory cell array plane has 256 rows by 64 columns. The 128 cells are accessed simultaneously.

Evaluation results show that the SRAM passed all test programs including the march pattern and row bar pattern. It also passed the data retention test in sleep mode. It is compared to the

Table 3.1: Features of test chip

| <b>Process</b>                | $0.35 \mu m$ CMOS 3-metal                     |  |  |
|-------------------------------|-----------------------------------------------|--|--|
| Supply Voltage                | $I/O$ 3.3 V Core 1.0 V                        |  |  |
| High Threshold Voltage        | $0.58$ V (PMOS) $0.53$ V (NMOS)               |  |  |
| Low Threshold Voltage         | 0.20 V (PMOS) 0.12 V (NMOS)                   |  |  |
| Chip Size                     | 6.0 mm x 5.0 mm                               |  |  |
| Core Size                     | $4.0$ mm $\times$ 3.0 mm                      |  |  |
| SRAM configuration            | 32-kbit SRAM (128 bit x 256 word)             |  |  |
| <b>SRAM Size</b>              | 2.8 mm $\times$ 1.8 mm (5.0 mm <sup>2</sup> ) |  |  |
| <b>Total SRAM transistors</b> | 73-k Tr. (PMOS) 139-k Tr. (NMOS)              |  |  |
|                               |                                               |  |  |



Figure 3.6: Measured leakage current

active power of the proposed SRAM using ABC-MT-CMOS with a conventional high-threshold SRAM, which is fabricated for the reference.

Figure 3.5 shows a graph of the measured active power dissipation for the supply voltage. The solid line shows the SRAM using ABC-MT-CMOS scheme; the dashed line shows the conventional SRAM. The active power of the SRAM is 0.27 mW/MHz at 1.0 V. A conventional SRAM is 3.23 mW/MHz at 3.3 V; it cannot operate at 1.0 V. The power is reduced to 1/12 of a conventional SRAM. Figure 3.6 shows the measured leakage current of the chip. For a conventional SRAM, which consists of high-threshold transistors, the leakage current is about 1 μA. The leakage current of the ABC-MT-CMOS in sleep mode is 0.6 μA with a power dissipation of 2 μW. The standby current of ABC-MT-CMOS at 1.0 V is 0.4 mA. The leakage current in sleep mode is reduced to approximately 1/1000 of the standby mode, which is comparable to a conventional SRAM with a 3.3 V power supply. The access time of the ABC-MT-CMOS SRAM is 11.8 ns at 1.0 V. The minimum operating voltage is 0.8 V.

### **3.3. Gate Leakage Suppression Technique**

#### **3.3.1. Gate Leakage Issues**

In recent microprocessors, the capacity of on-chip memory is increasing rapidly and improving overall performance. According to the ITRS road map [3], memory will occupy about 90% of chip area in 2013. In such a memory-rich chip, the leakage current of an embedded SRAM dominates the standby current. Therefore, the reduction of the standby leakage current of SRAMs is the most important factor to achieve low power consumption. When the supply



Figure 3.7: Gate leakage current versus *Tox*



Figure 3.8: Gate leakage current versus gate voltage  $V_{gs}$  (EOT = 2.0 nm)

voltage is scaled down, the subthreshold leakage current, which is a main contributor to the standby leakage in SRAM, tends to increase because of decreased threshold voltage  $(V<sub>th</sub>)$  of the transistors. A dual- $V_{th}$  technique is generally used to reduce the subthreshold leakage current in a system LSI. Using low- $V_{th}$  (high- $V_{th}$ ) transistors in the critical (non-critical) path, one can improve both the speed and the power. This dual- $V_{th}$  technique is easier than the multi-gate-oxide technique described above. For that reason, the subthreshold leakage current can be reduced.

Figure 3.7 shows the relationship between the gate leakage current and the gate insulation film thickness. The vertical axis shows the gate leakage current per unit gate area; the horizontal axis shows the physical oxide thickness of the gate insulation film. The gate leakage current increases exponentially as the gate insulation film becomes physically thinner, keeping up with the scaling. The gate leakage current of NMOS is 4–10 times greater than that of PMOS of the same thickness. Figure 3.8 shows the relationship between the measured gate tunneling current of the MOS transistor and gate voltage in 90-nm CMOS logic technology [22]. The equivalent gate-oxide thickness is 2.0 nm. There are two modes in the direction of gate leakage currents: one is inversion mode, in which the leakage flows in the turned-on MOSFET, and the other is accumulation mode, in which it flows in the turned-off MOSFET. The directions of the gate leakage currents are also depicted on the right side of Fig. 3.8. Because of the difference in their leakage directions, dependence of the gate leakage current differs from the inversion mode and

the accumulation mode, as depicted in Fig. 3.8. At an operating voltage of 1.2 V, the inversion current of NMOS is the largest of all modes. Furthermore, when the gate voltage is changed from 1.2 to 0.6 V, the gate leakage current is suppressed to about one-tenth of the leakage current at 1.2 V. Therefore, lowering the NMOS gate voltage reduces the gate leakage current in LSIs. In the next section, according to the considerations presented above, a method to suppress the gate leakage current is proposed for an embedded SRAM.

Having demonstrated the effectiveness of lowering the NMOS gate voltage, this characteristic feature is next introduced to our circuitry. Two techniques are useful to reduce the gate leakage current: controlling the power supply voltage of the memory cell dynamically, and modification of the conventional peripheral circuit.

#### **3.3.2. Memory Cell Array**

Figure 3.9 portrays the leakage model in a six-transistor (6T) SRAM cell. The wordline WL is low-level ("L") and bitlines BL and /BL are high level ("H") when the cell is inactive. One storage node of the cell is "H"; the other is "L." The gate leakage current (I<sub>g cell</sub>) and drain leakage current of a cell (Ioff\_cell) are calculated as the following.

$$
I_{g\_cell} = W_p \times I_{gpi} + W_p \times I_{gpa} + W_d \times I_{gni} + (W_d + 3 \times W_a) \times I_{gna}
$$
\n
$$
I_{off\_cell} = W_p \times I_{offp} + (W_d + W_a) \times I_{offn}
$$
\n(3.1)

Therein,  $W_a$ ,  $W_d$ , and  $W_p$  respectively denote the gate widths of the access transistor, driver transistor, and load transistor. In addition, *Ioffp* (*Ioffn*) represents the drain leakage current per unit width for PMOS (NMOS) transistor. Furthermore, *Igpi* (*Igni*) and *Igpa* (*Igna*) respectively represent



Figure 3.9: Leakage model in 6T-SRAM cell



Figure 3.10: Gate leakage current suppression in a cell

the gate leakage currents of unit width in the inversion mode and the accumulation mode. The standby leakage current of a memory cell can be estimated as the sum of I<sub>g cell</sub> and I<sub>off cell</sub>. Here, gate-induced drain leakage (GIDL) is ignored because it is smaller than one-tenth of the total drain leakage current in 90-nm CMOS technology [22]. The inversion current of NMOS is dominant in gate leakage currents (Figure 3.8). Therefore, the third term in (3.1) is the most dominant factor; it is specifically intended for the suppression of the gate leakage current of the NMOS driver transistors.

Here, the method to control the potential of the source line of the memory cells is adopted to reduce the gate leakage current in inactive mode. Figure 3.10 depicts the concept of this method. Node *V<sub>M</sub>* (PMOS source line) is disconnected from the PMOS substrate lines connected to the supply voltage. The NMOS source line is commonly connected to ground line with NMOS substrate lines. A DC level controller is introduced to control the  $V_M$  potential dynamically. Using this circuitry, the  $V_M$  potential is lowered during inactive mode, which engenders the lowering of the gate voltage of the driver transistor. Before developing this discussion, the merit of lowering *VM* potential must be assessed. Various methods of controlling the source line potential to reduce leakage current have been reported to date [23]–[29]. Lowering the source line has the advantage of reducing the subthreshold leakage current of the PMOS load transistor. The leakage reduction results from a reverse body bias effect of the PMOS originating from the decrease of the source voltage. In addition, because of the enhancement of drain-induced barrier lowering (DIBL), the subthreshold leakage current in both the NMOS driver transistor and the PMOS load transistor are reduced. This method does not address the subthreshold leakage current in NMOS access

| Leakage specifications |               |                | @R.T.         | $(a)$ 40 degree C. |               |
|------------------------|---------------|----------------|---------------|--------------------|---------------|
| $(pA/\mu m)$           |               | Typ. process   | Leakage-worst | Typ. process       | Leakage-worst |
|                        | <b>NMOS</b>   | $\overline{4}$ | 16(x4)        | R.T. x2            |               |
| $I \circ ff$           | <b>PMOS</b>   | $\overline{4}$ | 16(x4)        |                    |               |
| Ig                     | $NMOS$ (Inv.) | 40             | 200(x5)       |                    |               |
|                        | PMOS (Inv.)   | 5              | 25(x5)        |                    |               |
|                        | $NMOS$ (Acc.) | $\leq 1$       | $<$ 3 (x3)    | R.T. x 1           |               |
|                        | PMOS (Acc.)   | $\leq 1$       | $<$ 3 (x3)    |                    |               |

Table 3.2:Target specifications of leakage for 90-nm low-power CMOS technology

Table 3.3: Transistor size of 6T SRAM cell

| Cell dimensions | Width $(nm)$ | $Lg$ (nm) |
|-----------------|--------------|-----------|
| Access Tr.      | 140          |           |
| Driver Tr.      | 200          | 90        |
| Load Tr.        | 140          |           |

transistors. In particular, as for the access transistor between the pre-charged bitline and the cell node stored "H," the subthreshold leakage current might increase slightly. Nevertheless, the leakage is very small compared to the subthreshold leakage of the access transistor in the opposite side. For that reason, it can be ignored.

On the other hand, raising the GND line is also well known to be an effective means to reduce the subthreshold leakage current of the access transistor and the driver transistor. Raising the GND line is mainly responsible for the substrate bias effect of NMOS. In this method, the DIBL reduces all the subthreshold leakage current in both NMOS and PMOS Regarding the reduction of the gate leakage current in the SRAM cell, both methods described above are effective. Lowering the source or raising the GND source can decrease the gate–source voltage of the NMOS driver transistor, which results in great reduction of the total gate leakage current. Therefore, it is important to choose an appropriate method to suppress the most dominant leakage current.



Figure 3.11: Standby leakage current of a 6T-SRAM cell



Figure 3.12: Static noise margin of 6T-SRAM cell

This study addresses the lowering of the *VDD* because the gate leakage current is the dominant factor in our 90-nm CMOS process, as presented in Table 3.2. The gate leakage current in NMOS (Inv.) is one order of magnitude larger than others; for that reason, the total standby current can be reduced by lowering *VDD*. In addition, to suppress the extraordinary leakage caused by some fail bits, it is convenient to cut the power supply line of the corresponding cell block for repair. Another reason to use *VDD* control is that it easily handles the line from the viewpoint of layout design. Moreover, in the case in which the GND source line is controlled, it might cause undesirable access delay because of the additional pull-down NMOS switch. On the other hand, for controlling the source line, the pull-up PMOS switch does not affect the access time because no cell current flows in it.

Using process data in our 90-nm CMOS technology, the leakage current of a 6T SRAM cell is estimated. The target specifications of the leakage current are presented in Table 3.2. Here, *Ioff* is the drain current of the turned-off MOSFET, which includes GIDL and subthreshold leakage current. As described above, GIDL is negligible in our process, so *Ioff* is equal to the subthreshold leakage current in any process condition and moderate temperature for mobile applications. Substituting the transistor dimensions of the 6T SRAM cell (presented in Table 3.3) and leakage specifications to (3.1) and (3.2), the leakage current of a unit cell is obtained. The estimations of leakage current are shown in Table 3.4. It becomes 11.4 pA at room temperature in the typical

| Condition      |         | $\text{Ioff}$ (pA) | Ig(pA) | Total $(pA)$ | Ig/Total |
|----------------|---------|--------------------|--------|--------------|----------|
| R.T.           | Typical | l.92               | 9.46   | 11.38        | 83.1%    |
|                | Worst   | 7.68               | 45.78  | 53.46        | 85.6%    |
| $ 40$ degree C | Typical | 3.84               | 9.46   | 13.30        | 71.1%    |
|                | Worst   | 15.36              | 45.78  | 61.14        | 74.9%    |

Table 3.4: Standby leakage current of 6T SRAM cell

process, and 61.1 pA at 40°C in the worst-leakage process, respectively. Results show that the gate leakage current accounts for a large percentage of the total standby current of the cell.

Figure 3.11 represents the standby leakage current at  $V_M$  V for various combinations of process type (Typ. and Leakage-worst) and temperature (R.T. and 40°C). In addition, the standby leakage currents with lowered  $V_M$  potential ( $V_M$  = 0.6 V) are also shown. The gate leakage current  $I_g$  dominates the largest part of the total standby leakage at  $V_M$  = 1.2 V. These results show that the gate leakage current is reduced to about one-tenth by lowering the  $V_M$  potential. Moreover, because of the effect of back gate bias in the PMOS and that of DIBL,  $I_{off}$  is reduced approximately by half. Results show that more than 80% reduction of the total leakage current of memory cell was achieved in inactive mode by lowering the *VDD* source line.

During the active period, in which the SRAM is in read or write operation mode, the  $V_M$ potential is set to *VDD*, thereby increasing the gate leakage current. However because the power in read or write operations is mostly consumed in charging and discharging bitlines, the increase in the gate leakage current in the active cell does not affect the overall active power. Moreover, to maintain performance and ensure cell stability, the  $V_M$  potential must be recovered to *VDD*, even if the gate leakage current is increased. Figure 3.12 presents the measured DC characteristics of 6T SRAM cells in the 90-nm CMOS process. The measured static noise margin is sufficient for stable read-out operations.

#### **3.3.3. Peripheral Circuit**

In this section, the reduction of the gate leakage current in the peripheral circuit is presented. In the peripheral circuit, most gate leakage current flows in the WL driver circuit of a row decoder. Figure 3.13(a) portrays the conventional WL driver circuit. The largest NMOS transistors in the final stages are turned on to drive the wordlines to a low level, except for one accessed row. As described in Section 3.3.2 the gate leakage current of NMOS turned on is dominant. To estimate the contribution of the peripheral circuit to the total standby leakage current per unit row, let us compare the gate leakage current of one row to that of the largest NMOS in the WL driver. From (3.1), the gate leakage current of the memory cells in one row  $(I_g_{row})$  is represented approximately as  $I_g_{row} = W_d \times I_{gni} \times N$ , where *N* is the number of the total columns of memory cell array. On the other hand, the gate leakage current of the final stage NMOS in the WL driver  $(I_g/d)$  can be expressed as  $I_g/dl = W_{d1} \times I_{gni}$ , where  $W_{d1}$  is the gate width. The gate leakage current of cells and a WL driver in unit row respectively become 2048 pA and 200 pA if it is assumed that  $N = 256$  and  $W_{dI} = 5$  µm. Therefore, by controlling the  $V_M$  voltage to reduce the leakage current in the memory cell, the leakage current in the peripheral circuit becomes almost equal to that in the memory cell array. Therefore, the leakage current originating from the WL driver is also reduced.

Figures 3.13(b) and 3.13(c) present the new WL driver, which suppresses the gate leakage current. It is called an automatic gate leakage suppression driver (AGLSD). Figure 3.13(b) shows a schematic diagram and 3.13(c) shows the timing diagram. Symbols XL, XM, and XH denote decode signals. Transistors Qp1 and Qn1 drive the WL. Gate inputs Qp1 and Qn1 are separated from each other in internal nodes i\_p and i\_n, which are connected via PMOS pass transistor Qp2. The gate of  $Op2$  is controlled by the WL. Both i\_p and i\_n become "L" when the WL is selected. The activated WL is charged by Qp1. After a read or write operation is completed, nodes i\_p and i n are set to "H." The WL is discharged by Qn1. Then, the gate node of Qp2 becomes "H." Therefore, the node i n becomes floating because Qp2 is turned off. As the large gate leakage current of Qn1 gradually decreases the potential of the node i n, the Qn1 gate leakage current itself rapidly decreases. Consequently, Qn1 is turned off. In this manner, the gate leakage current in Qn1 is eliminated. The small NMOS Qn2 retains the WL as "L" after the Qn1 is turned off. In this way, gate leakage current in peripheral circuits can be reduced. The gate leakage current of Qn2 is negligible because its gate width is much smaller than that of Qn1.

#### **3.3.4. Design of 32-kB 1-port SRAM**

In this section, the proposed local DC level control, which dynamically controls  $V_M$  source lines in the cell array, is explained in detail. Figure 3.14 shows a schematic diagram of the local DC level controller (LDLC). The LDLC controls one  $V_M$  line, which shares the source line of one cell block. Moreover, the  $V_M$  line is connected to four PMOS pull-up transistors (Q1–Q4). The Q3 is connected with the *VDD* through the single-stacked diode. The Q4 is connected with the *VDD* through the double-stacked diodes. The Q1 and Q2 of the remainder are connected directly with *VDD*. The DCL0, DCL1, and DCL2 signals play a role which changes the  $V_M$  potential lowering in the inactive mode. A block enable signal (BS) is introduced to select the cell block in case it contains the addressed memory cell for a read or write operation. A KILL signal is introduced so as to not select the cell block in case it contains some failed bits. This failed block is replaced with the spare block by a repair circuit; then the power supply of this failed block is cut off to reduce extraordinary leakage current.



(a)







(c)

Figure 3.13: Wordline driver circuit with *Ig* suppression (a),(b), and simulated timing waveform (c)



Figure 3.14: Schematic diagram of LDLC

The PMOS transistor Q1 pulls up the  $V_M$  potential to *VDD* when the BS signal is set to "H" (activated cell block). On the other hand, when the BS signal is set to "L" (inactivated cell block), the  $V_M$  potential is controlled by turning on one of Q2–Q4. If the DCL2 signal is "L," the  $V_M$ potential becomes *VDD*–2× $V_{tp}$ , where  $V_{tp}$  is the PMOS transistor threshold voltage. The  $V_M$ potential becomes *VDD–V<sub>tp</sub>* if DCL1 signal is "L". Therefore, the  $V_M$  potential has one of the three levels *VDD*, *VDD–V<sub>tp</sub>*, or *VDD–*2×*V<sub>tp</sub>*, depending on the DCL0, DCL1, and DCL2 signals. Using this circuitry, the optimum level can be selected based on the supply voltage. In this manner, the lower  $V_M$  during inactive mode is achieved. The potentials of the  $V_M$  source line are presented in Table 3.5. Based on the above preparations, a 32-kB 1-port SRAM is constructed (Fig. 3.15). The LDLC is located at the center of each cell block, which consists of 4 columns and 64 rows. The  $V_M$  lines of the selected cell blocks rise to *VDD* via corresponding LDLC circuits. In the method described here, the  $V_M$  potential of the addressed cell block is expected to be raised to the *VDD* level before the corresponding WL is activated. The block-enable signals BS0–7 are connected to the upper pre-decode signal (XH), which transfers the signal faster than



Figure 3.15: Schematic diagram of 32-KB SRAM with *Ig* suppression

the other pre-decode signals (XM, XL) to achieve this functionality. After the WL is deactivated, the  $V_M$  potential is lowered to  $(VDD-2\times V_{tp})$  by its leakage current for a long cycle. If the same cell block is activated consecutively at the next cycle, the *VM* potential is kept almost at *VDD* level, so that the extra power for charging up the  $V_M$  is not consumed. In standby mode, no cell block is activated for a long time; consequently, all  $V_M$  lines are lowered to  $VDD-V_{tp}$  or *VDD*–2× $V_{tp}$ . In the peripheral circuit, the WL driver (AGLSD) proposed in the previous section is used. Thereby, the gate leakage current of SRAM macro is suppressed dynamically.

Figure 3.16 depicts the simulated waveforms in the read operation particularly when the memory block is switched from inactive to active. In this simulation, the DCL2 is selected as



Figure 3.16: Simulated waveform in read operation

shown in Table 3.5. In inactive mode, the  $V_M$  line voltage is set to  $(VDD-2\times V_{tp})$  of 0.62 V for 1.2-V supply voltage. This waveform confirms that the  $V_M$  line rises to 1.2 V.

#### **3.3.5. Fabrication and Evaluation**

The 32-kB SRAM was fabricated with 90-nm CMOS technology. Figure 3.17 shows a microphotograph of the test chip. The backend process uses dual-damascene Cu metallization for four layers and low dielectrics [31]. The features of this process are presented in Table 3.6. The supply voltage is 1.2 V, equivalent gate thickness is 2.0 nm, and minimum gate length is 90 nm. A lithographical advanced symmetrical memory cell is used, as in previous studies [22], [25]. The cell size is 1.25  $\mu$ m<sup>2</sup>. The  $V_M$  source lines for cells and bitlines run vertically using the second Cu layer and wordlines run horizontally using the third Cu layer. The *VDD* supply lines, GND lines, signals for DCL0-2, and KILL run vertically using the fourth Cu layer [32].

The cell area is divided into two planes to reduce the resistance of long wordlines. Each plane consists of 8 I/O bits and a column multiplex of 32 per I/O. The total I/Os are 16 bits and

word depth is 16-k. The SRAM includes two sets of spare rows and four sets of spare columns for redundancy. Each set of spare rows includes four rows; each set of spare columns includes four columns. The half plane of the cell area has 520 rows and 264 columns of memory cells. The LDLC units are located at well-tap regions in the cell area. Figure 3.18 presents the layout plot of the LDLC. The cell array region increases 11.7% by placing the LDLC. In addition, by applying the AGLSD circuit for the row decoder, the peripheral area increases 20.2%. The physical layout of the 32-kB SRAM macro is 1012  $\mu$ m × 513  $\mu$ m and the bit density of this SRAM macro is 493 kbit/ $mm<sup>2</sup>$ . Using our gate-leakage suppression scheme, the total area overhead becomes 13.2%. The fabricated 32-kB SRAM was tested and its functional operation was observed fully. The measured access time was 2.8 ns and active power dissipation was 0.12 mW/MHz at 1.2 V at room temperature. There was no penalty of speed and active power in comparison to the conventional SRAM. The measured standby leakage current of the 32-kB SRAM was 1.2 μA at 1.2 V and at room temperature in the leakage-worst condition. This is less than 10% of the conventional SRAM without a gate leakage suppression circuit. Figure 3.19 shows the components of measured standby current. The standby current of the cell array was reduced to 7% and that of the peripheral circuit to 20% of conventional SRAM, reducing the total standby current to 7.5%. Table 3.7 summarizes the 32-kB SRAM features.





Figure 3.18: Layout plot of LDLC



Figure 3.19: Standby leakage measured for 32-kB SRAM

| <b>BS</b> |   |   |   | KILL DCL0 DCL1 DCL2 | VM potential       | Status of cell block           |
|-----------|---|---|---|---------------------|--------------------|--------------------------------|
|           |   | H | H | L                   | VDD                |                                |
|           |   | H | L | H                   | $VDD - Vtp$        | Inactivated                    |
|           | H | L | H | H                   | $VDD-2 \times Vtp$ |                                |
| H         |   | X | X | X                   | VDD                | Activated                      |
| X         | L | H | H | H                   | $Hi-Z$             | Replaced and cut off the power |

Table 3.5: State of cell block

Table 3.6: Process features for 90-nm low-power applications

| <b>VDD</b>  |                                                |
|-------------|------------------------------------------------|
|             | $90 \text{ nm}$                                |
| <b>Tox</b>  | $[2.2 \text{ nm} (\text{EO}T=2.0 \text{ nm})]$ |
| Metal pitch | $1Cu: 0.24 \mu m$ , 2Cu-4Cu: 0.28 $\mu$ m      |

Table 3.7: Features of 32-kB SRAM



# **3.4. Active Power Reduction using Column-Based Source Bias Control Technique**

Figure 3.20 is a schematic diagram of proposed Dynamically controlled Column Bias (DCB) SRAM. The power lines and the ground source lines (VSLs) run through each column of the cell array region. These VSL potentials are controlled dynamically according to the column decode signals. In the activated column, the VSLs are driven to 0 V by NMOSs placed in each column. On the other hand, the VSLs in the inactivated column are not forced, so the potential of this VSL rises gradually to the limit of the diode-footed transistor by the leakage current of the cells. Figure 3.21 shows the measured (median) and simulated relationship between the cell current and the VSL voltage using 90-nm CMOS technology. The substrate biases of the NMOSs and the supply voltage *VDD* are set respectively to 0 V and 1.2 V. As the VSL potential increases, the cell current decreases rapidly because the back gate bias effect appears in addition to the decrease in



Figure 3.20: Active power reduction by column based source bias control



Figure 3.21: Reduction of cell current

 $V_{gs}$ . Both the measurement and the simulation results show that the cell current decreases to about 20% when VSL rises to 0.4 V at the R.T.

Figure 3.22(a) shows simulated waveforms in a readout operation. Because of the decrease of the cell current explained above, the potential of BL1 in the inactive column decreases more slowly than that of BL0 in active column. The BL swing in the inactive column becomes about one-fifth of that in active column (Δ*Vactive*). Only 16 columns are activated in 512 columns, so that the undesirable power discharged in the inactive columns is suppressed. Figure 3.22(b) shows the waveforms for the subsequent two cycles. The first cycle is the same as that shown in Fig. 3.22(a). In the second cycle, we assume that column 1 is active, whereas column 0 is inactive. In this situation, the lowered VSL0 potential is raised by the BL0 discharge through the diode-footed transistor. In the third cycle, if column 1 is selected continuously, VSL0 rises to 0.4 V. Thereby, the lowered VSL can be raised to a higher level. Because the potential of VSL1 stays at 0 V, the power dissipation in the transition overhead is ignored.

We show the measured read stability SNM in Fig. 3.23 to verify the data stability of the memory cell using our DCB. Regarding the activated column (VSL=0V), the SNM in the activated row is 179 mV. Even when the VSL rises to 0.4 V, the SNM stays at 104 mV, which is sufficient to hold the data. Therefore, raising the VSL potential does not affect the data stability of the cells.



Figure 3.22: Simulated waveform of DCB SRAM

Figure 3.24 depicts a die photograph of the 32-kB DCB SRAM fabricated 90-nm CMOS technology. Figures 3.25 and 3.26 show effects of power reduction on a 32-kB DCB SRAM. Because of the DCB scheme, the active power is reduced to 71 μW/MHz at R.T. and 1.2 V supply voltage. This shows 64% reduction compared to that without DCB. The standby leakage current of the cell array is reduced to 31% of that without a DCB scheme. In sleep mode, the power in the peripheral circuit can be cut off completely, so that the total standby current becomes 0.45 μA at R.T., which indicates a 93% reduction compared to that without DCB. By applying DCB, the low-power embedded SRAM can be achieved in 90 nm CMOS technology.



Figure 3.23: Measured static noise margin in reading operation



Figure 3.24: Die photograph of 32-kB DCB SRAM



Figure 3.25: Estimated active power reduction



Figure 3.26: Estimated leakage reduction

## **3.5. Summary**

This chapter described how to reduce the leakage current in deep submicron technology. A low-power, low-voltage SRAM using Auto-Backgate Controlled MT-CMOS was proposed. Test chips containing a 4-kB gate array SRAM were designed and fabricated. The experimental results demonstrated that this method reduced undesirable leakage currents to 1/1000 in sleep mode while retaining the data stored in the memory cells. The active power at 1.0 V was reduced to 1/12 of that of a conventional SRAM with a 3.3 V.

A technique for suppressing gate leakage current for embedded SRAM in SoC was proposed in this chapter. To reduce the gate leakage current in a cell array, it adopts an LDLC circuit, which controls the potential of the source line of the memory cells dynamically. An AGLSD circuit, which can reduce gate leakage current in the peripheral circuit, was also proposed. A 32-kB SRAM using 90-nm CMOS technology was designed and fabricated. Results of tests using that SRAM show that standby leakage current was reduced to 7.5% of that of a conventional SRAM without a speed penalty. This improvement will help to realize high-performance ultra-low-power SoCs for use in mobile applications.

# **Chapter 4 Read/Write Stability Enhancement Technique for SRAM**

Variation-tolerant assist circuits of an SRAM against process and temperature are proposed. Passive resistances are introduced to the read assist circuit with replica memory transistors to lower the wordline voltage accurately reflecting the process and temperature variations. For the sake not only of enlarging the write margin but also of reducing power consumption and speed overhead, a divided dynamic power-line scheme based on a charge sharing is adopted. Test chips of 512-kbit SRAM macros and isolated memory cell TEGs are fabricated using 45-nm bulk CMOS technology. Two types of 6T SRAM cells,  $0.245 \mu m^2$  and  $0.327 \mu m^2$ , were designed and evaluated. Measurement results show that, over 100-mV improvement for static noise margin, and 35 mV for the write margin for both SRAM cells at the 1.0-V worst-condition was achieved using assist circuitry. It enables the wordline level to maintain higher voltage at a slower condition than the typical process condition, which results in 83% improvement of the cell current compared to a conventional assist circuit. Furthermore, the minimum operating voltage in the worst-case condition was improved by 170-mV, thereby confirming a high immunity against process and temperature variations with less than 10% area overhead.

### **4.1. Introduction**

Embedded SRAM in sub-50 nm advanced CMOS technology for system-on-chip (SoC) is facing a crisis of increasing threshold voltage  $(V<sub>th</sub>)$  variation within a die caused by the doping fluctuation [33,34] or line edge roughness (LER) [35]. It is readily apparent that the large  $V_{th}$ variation induces asymmetry in DC characteristics, which deteriorates both the static noise margin (SNM) and the write ability [36]. According to the 2005 Edition of the ITRS road map [37], on-chip memory capacity in 2013 will be 10 times as large as that in 2005. The problem of a variation-induced stability degradation that appeared in a 6T SRAM cell manifests itself as the increase of total memory capacity on a chip, which suggests crucial factors preventing the scaling down of the SRAM supply voltage.

Until 65-nm technology, 6T SRAM cell sizes were shrunk according to the scaling trend of being half the size of cells of the preceding generation. However, beyond the sub 50-μm era, it became more difficult to keep up with the scaling trend because of increasing local  $V_{th}$  variations. Lithography and process technology are disincentives to scaling down of an SRAM cell size, but



Figure 4.1: 6T SRAM cell schematic and butterfly curves under worst and best conditions

the main constraint is that the transistor sizes in an SRAM cell cannot be shrunk because of the increase in such local  $V_{th}$  variations with scaling. To overcome this problem, several approaches for improving the device structure have been proposed to reduce local  $V_{th}$  variations by optimizing the channel profile [38], suppressing LER [39], or using a high-*k* material for the gate dielectric [40]. Other anticipated devices such as FD-SOI, Fin-FET, and double-gate are being investigated as candidates for the next sub-50 nm generation. Although these new devices are expected to suppress the increase of  $V_{th}$  variations caused by a decrease in the amount of dopant implantation, there is a concern about higher manufacturing costs to realize these new structures. Therefore, a better alternative is to reduce the manufacturing cost by continuing to use bulk CMOS technology with improved stability using circuit-based techniques.

Dual supply voltage schemes were proposed to improve the cell stability by circuit technique [41, 42]. These schemes use higher supply voltage for the memory cell array and a lower voltage level for peripheral blocks including a wordline (WL) driver circuit to improve the read margin (SNM). For enhancing the write capability, a power-floating scheme in the writing operation was reported for 90-nm SRAM [43]. The prior art [42] was also proposed to improve the write capability using a dynamic power supply. According to the latest reports, the short WL and bitline (BL) pulse scheme and write-back scheme were proposed [44, 45]. These approaches have an effect on global or local  $V_{th}$  variations, but few advantages against temperature. Optimal timing control or a voltage level would be difficult to design against temperature and process variations



Figure 4.2: Simulated static noise margin (SNM) versus NMOS  $V_{th}$ 

in manufacturing. On the other hand, body-biasing schemes with replica monitor circuits have been reported [46, 47]. These methods mainly reduce the leakage current in keeping with the operating speed or data retention margin in the stand-by mode. Although this scheme plays a role in reducing the global  $V_{th}$  variations, the SRAM stability against the local  $V_{th}$  variations would not be improved considerably. The 7T, 8T and 10T SRAM cells [48–50] were proposed as an alternative 6T SRAM cell design, although they entail the increase of an area overhead. A simple method for improving the read and write stabilities of an SRAM was proposed for 65-nm technology [51]. It was available only for a single supply voltage with small area and power overhead.

In this explanation, the disadvantages are pointed out first under the special conditions of the previous work; then it is presented that the proposed read and write assist circuits enlarge the operating margin against wide process and temperature variations. The proposed read assist circuit is controlled adaptively to keep the cell stability against  $V_{th}$  variations. It demonstrates an effect of stability immunity through the simulation and evaluation results using 45-nm bulk CMOS low-standby-power (LSTP) technology [52].

This chapter is organized as follows. In section 4.2, the concept of improving read stability is introduced first. In the next section, enhanced read/write circuitry improving SRAM cell stability against process and temperature variations is discussed. In particular, the advantage of new assist circuit is demonstrated by comparing it with the conventional scheme. In section 4.4,



Figure 4.3: SNM improvement by lowering wordline voltage  $(V_{\text{WL}})$ 

two kinds of memory cells are introduced, and the simulated and measured DC characteristics for these SRAM cells are discussed. In section 4.5, evaluation results of test chips fabricated on 45-nm bulk CMOS technology are also shown. Finally, a brief summary is presented in section 4.6.

### **4.2. Concept of Improving Read Stability**

Figure 4.1(a) shows the full-CMOS 6T SRAM cell schematic that widely prevails among SoC circuits. The SRAM read margin is well expressed by the SNM. Figure 4.1(b) shows the simulated butterfly curves without local  $V_{th}$  variations using our original 45-nm node SPICE model under two different conditions: one is of the fast-slow (FS) global process corner at high temperature (125 $^{\circ}$ C); the other is of slow-fast (SF) corner at low temperature (-40 $^{\circ}$ C). The FS (SF) means the combination of fast (slow) NMOS and slow (fast) PMOS. (The supply voltage is 1.0 V minimum condition) the butterfly curves without local  $V_{th}$  variations show a good symmetry in the bilateral direction; the SNM strongly depends on the temperature and process global variations. In particular, the SNM has a minimum value at the FS corner. For that reason, the SNM in the FS process condition can be specifically addressed hereinafter. Figure 4.2 shows the SNM dependence on the NMOS  $V_{th}$  variation for three temperatures (-40, 27, 125<sup>o</sup>C). The SNM degrades at lower NMOS  $V_{th}$  or at a higher temperature condition, although it improves at higher NMOS  $V_{th}$  or lower temperature condition. In this way, by taking the temperature and



Figure 4.4: Schematics of read assist circuits (RAC)

process global variation into consideration, the SRAM transistor size is designed so that the operating margin becomes larger. Consequently, the SNM is expected to be ensured at the worst case of FS process condition at high temperature. Although only the read margin is discussed here, it is also necessary to examine, precisely, the write margin against temperature and process variation. By performing a similar analysis, one can see that the SF process corner at low temperature provides the worst condition of the write operation.

Next, a technique to enlarge the SNM at the presence of local  $V_{th}$  variation is discussed. In this section, a methodology to lower the voltage level of the WL compared to that of the power line of flip-flop in an SRAM cell is suggested [51]. Figure 4.3(a) shows SNMs without local  $V_{th}$ variations at the worst condition (discussed above), where the lowered voltage level to the WL is introduced. As shown in this graph, it is confirmed that the SNM is improved by lowering the WL level while maintaining the symmetric margin. In contrast, Fig. 4.3(b) presents the result in



Figure 4.5: Simulation result of WL voltage  $(V_{WL})$  depending on NMOS  $V_{th}$ 

the same simulated condition, but the local  $V_{th}$  variation is additionally introduced to each SRAM transistor. Because randomness of the local  $V_{th}$  variation destroys the symmetry of the butterfly curve, the SNM without lowering the WL level in Fig. 4.3(a) is reduced to be less than zero, indicating no expectation for SRAM read margin. Assuming that the randomness of the local  $V_{th}$ variation obeys the normal probability distribution, this deterioration of the SNM is apparent statistically if the total capacity of the SRAM memory array increases. In other words, the unit cell with asymmetric degradation of the SNM in a 1-Mb SRAM array is more likely to appear than that in a 1-kbit array [53]. Consequently, reducing the WL level is necessary so that the SNM with local  $V_{th}$  variation becomes larger than zero. Using the results obtained in Fig. 4.3(b), the WL level is expected to be lowered by more than 20% compared to the supply voltage of 1.0 V.

Lowering the WL voltage entails disadvantages not only of the operation speed (or cell current) but also of the write operation because the lowered WL reduces the ability of the access (pass-gate) transistor in an SRAM cell, which directly affects the operation speed and write margin. Therefore, it is necessary to search for the optimum condition to enhance the SNM to the greatest extent possible, with a minimum loss of both the write margin and operation speed. In the next section, a method to control the WL voltage level is described.



Figure 4.6: Operating analysis of read assist circuit by SPICE simulation


Figure 4.7: Practical read assist circuit enhanced sensitivity of process variation

### **4.3. Variation Tolerant Read/Write Assist Circuits**

#### **4.3.1. Read Assist Circuit**

Figure 4.4 shows a read assist circuit (RAC) used to control the wordline voltage  $(V_{WL})$ . In the conventional circuitry [51], as portrayed in Fig. 4.4(a), the  $V_{WL}$  is lowered by plural pull-down NMOSs (called replica access transistors, or RATs). Figure 4.5 shows the simulated  $V_{WL}$ dependence on the NMOS global  $V_{th}$  variations at high and low temperatures, which provides two serious problems in the conventional circuit (see dotted lines). The first is a considerable decrease of  $V_{WL}$  in the FS condition at low temperature. An excessive lowering of the  $V_{WL}$  strongly enhances the SNM, which in turn degrades the write margin because of a low gate overdrive of the access transistor. The second is also a decrease in the  $V_{WL}$  in the SS condition. In Fig. 4.5, the  $V_{WL}$  at SS, which is the worst condition of the cell current, becomes lower than that at TT because



Figure 4.8: Layout of each passive resistance for proposed RAC



Figure 4.9: Resistance sensitivities dependence on critical dimension (CD) shift

of its higher NMOS  $V_{th}$ , engendering further degradation of the cell current.

The reason for this unexpected  $V_{WL}$  degradation appearance in the conventional circuit is examined. Figure 4.6(a) is the simulated I-V curve for RAT and P0. Each simulation was performed at the worst 1.0 V condition for -40°C and 125°C. In the static condition, the current which flows through the P0 (IDD) is equivalent to that through the RAT (ISS), so that the  $V_{WL}$  is determined by the cross point between these two I-V curves. In contrast to the RAT, the P0 has a strong dependence on the temperature, which causes a wide range of  $V_{WL}$  fluctuation, as



Figure 4.10: Comparison of WL voltage  $(V_{\text{WL}})$  in RAC with enhanced gate controller (GC) and without GC

portrayed in Fig. 4.5(a). In other words, conventional circuits have disadvantages against process  $V_{th}$  variations and temperature fluctuation.

To overcome this problem, the new circuitry portrayed in Fig. 4.4(b) is proposed. The RATs are introduced to the source of the WL driver, not to the WL in the memory cell array. In addition, the passive resistance elements (R) are used with N-type non-silicide poly-silicon gate. This structure enables control of the WL voltage, reflecting both the process and the temperature variations. Figure 4.6(b) shows that the IDD current obeys the ohmic characteristic, whereas the ISS current conforms to the transistor I-V curve. Because the temperature dependence of the resistance element is much smaller than that of the drain current in the saturation region, it can suppress the  $V_{WL}$  fluctuation against temperature by eliminating the PMOS characteristic. Furthermore, it is noteworthy that the usage of the resistance elements presents a great advantage related to *Vth* change, which is responsible for the process variation. In general, the gate length of the poly-silicon in the SS condition gets longer than that in the TT condition, which induces a higher  $V_{th}$  value. At the same time, the amount of the resistance element using poly-silicon gate is expected to get lower because of the enlargement of cross-sectional area. Using Fig. 4.6(b), it is readily inferred that a lower resistivity of R and a higher NMOS  $V_{th}$  (smaller ISS) at the SS



Figure 4.11: Simulated waveform of proposed RAC

corner engenders a higher  $V_{WL}$ , whereas a higher resistivity of R and a lower NMOS  $V_{th}$  at the FF corner engenders a lower  $V_{WL}$ . Considering these facts together, it can be inferred that the  $V_{th}$  of RATs dominates the  $V_{WL}$  value with a little temperature dependence. The validity of these  $V_{WL}$ behaviors is verified through our simulation, as portrayed in Fig. 4.5 (see solid lines).

Figure 4.7 depicts the overall architecture implementing the proposed RAC. Additional circuitry denoted as the Gate Controller (GC) has been introduced, which consists of serial resistors (R1 and R2) and the switching transistors (P1 and N1). By applying this circuitry, one can realize a highly tolerant read assist circuit against the PVT variations rather than a simple pull-down transistor.

Before explaining the functionality of the GC, the layout of these critical resistors is described. Figure 4.8 displays a schematic layout view of our resistance elements. To reflect the process variation of the gate length *L* and width *W* of the access transistor in the SRAM memory cell, the line width of non-silicide poly-silicon (R0, R2) and the non-silicide diffusion elements (R1) respectively have the same size as *L* and *W*. In addition, the poly-silicon pitch of R0 and R2 is equal to that of the SRAM memory cell, engendering the same resistance sensitivity in conjunction with the access transistor of the SRAM cell. Figure 4.9 shows the resistance sensitivities of R0, R1, and R2 to the critical dimension (CD) shift of the gate and the diffusion sizes. The CD shift, which results from manufacturing variations in the lithography or etching process steps, influences the global  $V_{th}$  variations. For simplicity, the SS corner is specifically



Figure 4.12: Write assist circuit (WAC) improving write ability

described to explain the behavior of the additional GC. The SS corner indicates a high  $V_{th}$  and a low *I<sub>ds</sub>* condition, implying that the line width of the poly-silicon increases and that of the diffusion decreases. In other words, positive CD shifts of poly-silicon decrease the resistivity of R0 and R2, although negative CD shift of diffusion increases the R1 resistivity at the SS corner (see Fig. 4.9). The lowered resistance of R2 rather than R1 contributes to pull down the NB node, which prevents the NA node from decreasing excessively. Consequently, a higher voltage level of the  $V_{WL}$  is realized at the SS condition than in the case in which a simple pull-down transistor is applied for RAT.

Accepting that the effectiveness of our RAC with the GC has been proven qualitatively, its validity should next be verified by simulating  $V_{WL}$ . Figure 4.10 shows the simulated result of the  $V_{WL}$  against NMOS  $V_{th}$  variation, or process variation. The result of the RAC without the GC is



Figure 4.13: The simulated waveform of the ary-VDM and dmy-VDM in the write status

also shown to portray the advantage of our RAC with the additional GC. In the NMOS slow corner (SF, SS), the  $V_{WL}$  with GC is higher than that without GC; for that reason, the degradation of cell current will be suppressed. Conversely, in the NMOS fast corner (FS, FF), the  $V_{WL}$  with GC is lower than that without GC; the read stability SNM will be improved. Figure 4.11 depicts the simulated waveforms, particularly addressing WL activation. In the inactivated blocks of rows, the node NA voltage level  $V_{NA}$  remains at *VDD*. The node NB voltage level  $V_{NB}$  rises to intermediate voltage level fixed by R1 and R2 when the node XR voltage level  $V_{XR}$  declines to a low level. Then the  $V_{NA}$  is dropped by the pull-down RATs. The substantial capacitance of the node NA makes its voltage drop gradually. Consequently, the RAC demonstrably does not affect the rising speed of the WL voltage.

#### **4.3.2. Write Assist Circuit**

Figure 4.12 shows the write assist circuit (WAC). Lowering the voltage level of the power line in the memory cell array (ary-VDM) is an effective means of ensuring the SRAM write margin [42, 51]. The capacitive WAC makes use of the capacitance ratio between the ary-VDM (Cav) and the additional dmy-VDM (Cdv) [51]. The dmy-VDM is wired individually by the



Figure 4.14: Voltage of ary-VDM depending on the division number

fourth metal layer upper the cell array in each column. The voltage of ary-VDM must be lowered immediately, which requires a small Cav/Cdv to enhance the write margin against the increasing variation accompanied by the scaling. Enlarging the writing operation of the Cdv degrades the speed and power. Therefore, a divided ary-VDM scheme is proposed. In Fig. 4.12, the signals of  $XH_0$  and  $XH_7$  indicate the most important pre-decode signal of the row decoder to select one of the segment arrays. Each two-input NAND gate is connected to the bitline (BL) pair in each column of the segment arrays. In the NOP states, where all BL pairs are highly pre-charged, all ary-VDM lines are connected to the power source line through the pull-up PMOS transistors, and all dmy-VDM lines are connected to the ground line through the stacked pull-down NMOS transistors. In the read state, although one BL pair is slightly lowered in each column by flow of the cell current into the activated memory cell, the voltage levels of the BLs are almost at a sufficiently high level that all ary-VDM and dmy-VDM maintain the high level and low level, respectively. In the write state, because one of the BL pair is forced to be lowered, both ary-VDM and dmy-VDM lines are put into a floating state. Consecutively, when the corresponding XH signal is activated, the corresponding ary-VDM line falls to the intermediate voltage level determined by Cav/Cdv. As shown in Fig. 4.12, the number of column multiplex, data I/O and divided segment are the 32-column, 16-bit and 8-segment, respectively, and the number of memory cells in each column in the segment is 64-cell, as demonstrated in the following section.

Figure 4.13 shows the simulated waveform of the ary-VDM lowering for several segment division numbers (#div = 1, 2, 4, 8). In this simulation, it is assumed that the Cdv is a constant



Figure 4.15: The comparison of the write ability by dc simulation result of the write-trip-point

value determined by the wire length of the fourth metal of the 512 memory cells (#row=512). The simulation condition is under room temperature (RT) and 1.2 V typical supply voltage. In the case of #div being equal to 1, the voltage level of ary-VDM becomes 1.13 V from 1.2 V (-0.07 V), although it becomes lower 0.68 V (-0.52 V) in the case of #div being equal to 8 because of the small Cav/Cdv. Figure 4.14 also shows the voltage of ary-VDM depending on the number of division at the worst-case condition ( $VDD=1.0$  V, SF-process, -40°C). According to  $V_{th}$  curve simulation [53], to ensure the write margin in our 45-nm CMOS technology, the voltage level of ary-VDM must be  $0.7$  V at  $VDD = 1.0$  V operation of the worst case, so that the ary-VDM is divided into eight, as portrayed in Fig. 4.12.

Figure 4.15 shows the DC simulation result of improvement of write ability defined by the write-trip-point [54]. The voltage levels in cases of #div=1, 2, 4, and 8 of internal nodes Q and QB of a memory cell (as shown in Fig. 4.1) are plotted when one input of the BL pair (BL) is changed from 1.0 V to 0.0 V and the other BLB is forced constantly to 1.0 V of the pre-charge level. The initial condition is set Q equal to low level and QB equal to high level. As the voltage levels of BL and node Q decrease, the voltage level of node QB is flipped. The cross point of the input BL (Q) and output QB are defined as the write-trip-point. The higher voltage level of the



Figure 4.16: Estimation of power reduction in the write cycle

write-trip-point has the higher write ability. For that reason, it is easy to flip the data. After the data flip, the voltage levels of QB in each case of #div come to correspond to the lowered voltage level of the ary-VDM determined by Cav/Cdv, as described previously. Results show that the write-trip-point of proposed #div=8 is 0.41 V at the worst condition, which is improved by 0.2 V compared to the conventional WAC (#div=1).

The proposed divided WAC contributes to the suppression of the power overhead in the write operation compared to the conventional WAC [51]. Figure 4.16 portrays the estimated power reduction in a write cycle at the memory cell array (MAT), depending on the number of the segment division. The referenced power is without WAC of column multiplex equal to 32, in which the power is dissipated by one full swing BL and the other 31 read cell currents by half selected cells. The conventional WAC has 68% power overhead because the large capacitances of ary-VDM and dmy-VDM must be charged and discharged in every write cycle. In contrast, the proposed eight-divided WAC has only 8% power overhead because of the small capacitances of ary-VDM and dmy-VDM. The other case of column multiplex is equal to 4, which is frequently used in embedded small SRAM macros of SoCs, is also plotted. Results show that the write power is further suppressed by dividing the ary-VDM.

#### **4.3.3. Simulation Result**

Figure 4.17 shows the practical simulated waveforms of a 512-kbit SRAM macro in the read and write cycle in the best condition (FF-process, 1.4 V, 125°C) and the worst condition (SS-process, 1.0 V, -40°C). The simulated clock cycle is 4.0 ns (250-MHz operation). The first and the third clock cycles are in the write state, whereas the second and the fourth clock cycles are in the read state. Because of the proposed RAC during both the read state and write state, the  $V_{WL}$  is decreased, reflecting the difference of the process condition: 30% reduction in the FF condition, while 8% reduction in the SS condition. On the other hand, during the write-cycle, the  $V_{\text{AV}}$  at the SS (worst) condition is lowered from 1.0 V to 0.75 V immediately after the BL starts swinging, which satisfies the requirement to obtain a sufficient write margin against the process variation. In the worst condition, both the voltage levels of dmy-VDM and ary-VDM are not equalized because of the NMOS pass-gate, as portrayed in Fig. 4.12. This voltage difference is avoided only by introducing a PMOS pass-gate. However, the equalization of these nodes lowers the *VAV* level, which degrades the retention margin of unselected memory cells in the activated column. The issue of the retention margin must be addressed, especially in the case of lower supply voltage. In addition, introduction of the extra PMOS engenders an area penalty. In this way, from the viewpoint of retention margin and the area penalty, a single NMOS pass-gate between ary-VDM and dmy-VDM is adopted.



Figure 4.17: Simulated waveform in the read and write cycle

# **4.4. Fabrications and Evaluations in 45-nm technology**

### **4.4.1. Fabricated 6T SRAM Cells**

The high density and normal 6T SRAM cell are designed and fabricated using 45-nm bulk CMOS technology with the SiON gate. The high-density cell  $(0.245 \mu m^2)$  was shrunk by 50% while the normal one  $(0.327 \text{ }\mu\text{m}^2)$  was shrunk by 35%, providing that the SRAM cell size in 65 nm technology node is 0.49  $\mu$ m<sup>2</sup> [51]. Figures 4.18(a) and 4.18(b) show SEM photographs of the 6T SRAM cells after poly-silicon etching. The lithography of the critical dimension layers is used by the ArF immersion exposure technology with high numerical aperture (NA). As portrayed in Fig. 4.18, the poly-silicon gates have almost a straight pattern to reduce the intra-die  $V_{th}$ variations for both high-density and normal-density cells. Moreover, the high-density cell has a straight diffusion pattern of NMOS to achieve a small cell area. Although a small electrical beta ratio degrades the SNM of the high-density SRAM, this disadvantage can be compensated by applying our assist circuits, as explained in the next section.

### **4.4.2. Measured DC Characteristics of 6T SRAM Cells**

To evaluate DC characteristics of two types of 6T SRAM cells, isolated memory cell TEGs are designed and fabricated to measure the butterfly curves (SNM), write margins [54] and cell currents. According to the measurement result of the butterfly curves, as portrayed in Fig. 4.19, the SNM of 0.327  $\mu$ m<sup>2</sup> without the RAC is 150 mV, whereas that with the RAC is 214 mV at the





(a) Normal cell  $(0.327 \text{ }\mu\text{m}^2)$  (b) High-density cell  $(0.245 \text{ }\mu\text{m}^2)$ (a) Normal cell  $(0.327 \text{ }\mu\text{m}^2)$ Figure 4.18: SEM images of 6T SRAM cells

1.0 V and 125°C. Furthermore, a large SNM is observed for the high-density SRAM cell more than 200 mV, which verifies the effect of the assist circuit.

Figure 4.20(a) shows the measured relationship between the cell current and the SNM for



Figure 4.20: Measured dc characteristics of 6T SRAM cells

several wafers with different  $V_{th}$  conditions at a supply voltage of 1.0 V. All points indicate the mean values measured in each wafer. By introducing our RAC, it is confirmed that the SNM in the Fast-Slow (FS) process condition for both 0.245  $\mu$ m<sup>2</sup> and 0.327  $\mu$ m<sup>2</sup> increases by about 100 mV. On the other hand, the cell current at the Slow-Slow (SS) process condition in 0.327  $\mu$ m<sup>2</sup> with the RAC is improved by 83% compared to that of the conventional RAC. In addition, the cell current at FS is drastically improved compared to that of the conventional RAC. These improvements result from the  $V_{WL}$  dependence on the temperature so that the RAC realizes a stable read margin without degradation of the operation speed for the wide temperature region. Figure 4.20(b) also shows the measured write margins. In the circuitry, the write margin is affected not only by the WAC, but also by the RAC. The lower write margin at Slow-Fast (SF) process condition without an assist circuit is improved because of a dominant  $V_{AV}$  rather than  $V_{WL}$ . In the FS process condition, because the  $V_{WL}$  decrease becomes compatible with the  $V_{AV}$ , the WAC has little effect on the write margin. This results in the suppression of the write margin. However, the write margin at FS is kept higher than that at SF, so the SF remains in the worst condition for the write margin. Consequently, for overall process variation, the write margin was improved by about 35 mV compared to the case without assist circuits.

### **4.4.3. Implementations and Measurement Results of 512-kbit SRAM**

#### **Macros**

SRAM test chips were fabricated with 45-nm advanced low-standby-power bulk CMOS technology. Figure 4.21 shows a die photograph of the test chip for the normal-density SRAM cell. It consists of two conventional 256-kbit SRAM macros and two proposed 256-kbit ones. The physical layout size of the latter 256-bit normal-density SRAM macro is  $550 \times 305 \mu m^2$ . The layout plot of the proposed 256-kbit SRAM macro is portrayed in Fig. 4.22. The memory cell array is divided into two MATs, each of which consists of eight I/O bits and a column multiplex of 32 per data-I/O. The total data-I/Os are 16 bits and word depth is 16-k. One MAT of the cell area has 512 rows by 256 columns of memory cells. The RAC units are located between the row decoder and each MAT. The WAC units are located at well-tap regions in the cell area, which is divided into eight segments as described previously. The area penalty of the read and write assist circuit are, respectively, 7% and 9% for normal and high-density SRAM macros. The bit densities of the SRAM macros are, respectively,  $1.57$  Mbit/mm<sup>2</sup> and  $2.12$  Mbit/mm<sup>2</sup> for normal and high-density SRAM. The features of fabricated SRAM macros are presented in Table 4.1.

Figure 4.23(a) and 4.23(b) respectively portray the measurement results of shmoo plots at room temperature for the normal density and high-density 512-kbit SRAM macro. The region filled with "#"symbols signifies the function pass results without the assist circuit; the region filled with "\$" signifies those with the assist circuit. With the assist circuit, the minimum



Figure 4.21: Die photograph of a test chip



Figure 4.22: Layout plot of 256-kb SRAM macro

operating voltage for both SRAMs was improved by 0.08 V, as portrayed in Fig. 4.23. The high-density cell has a characteristic fail region at a higher voltage level. This is responsible for the degradation of SNM accompanied using a small electrical beta ratio between the access and driver transistor in a memory cell. Using the proposed assist circuit, the function passes even if the SRAM operates at the higher voltage level over the 1.35 V. The access times for both normal-density and high-density SRAM macros are also measured. Because the WL voltage is



lowered in our proposed assist circuit, there is a concern about degradation of the operating speed. From the measured data, each access time for the normal and high-density SRAM is respectively 2.2 ns and 2.5 ns at the typical voltage 1.2 V. Results show that the speed penalty is within 0.2 ns by comparing with and without assist circuit at the 1.2 V between plus-minus 10% voltage variations (from 1.08 V to 1.32 V). To evaluate the robustness against temperature and process variations, several process-skewed chips were tested under a wide temperature environment. Figure 4.24 shows the relationship between minimum operation voltage and the process variations in the temperature between -40°C and 125°C for the normal 512-kbit SRAM macro. The dotted line represents the minimum operation voltages without the assist circuit, whereas the solid one represents that with the assist circuit. Results show that the  $VDD_{min}$  in each process condition (FF, FS, TT, SF, SS) is improved by read and write assist circuits. The  $VDD_{min}$  is lowered from 1.13 V to 0.96 V, achieving 170 mV improvement for the worst process and temperature conditions. Moreover, it is noteworthy that the distribution of  $VDD_{min}$  in accordance with the process variations is suppressed from  $154 \text{ mV}$  to 60 mV by applying the assist circuit. These results indicate that the SRAM with our proposed assist circuit has reduced vulnerability to process and temperature variations.

# 4.5. Summary

New assist circuits to enhance the SRAM operating margin were proposed for 45-nm bulk CMOS technology. Because of proposed assist circuits, over 100 mV improvement for the SNM. and 35 mV for the write margin were achieved. Compared to the conventional assist circuit, the

| <b>Technology</b>           | 45-nm (hp65) LSTP bulk CMOS     |                             |  |  |
|-----------------------------|---------------------------------|-----------------------------|--|--|
| <b>Macro configurations</b> | 16 bits x 16-K words (MUX=32)   |                             |  |  |
| <b>Memory cell arrays</b>   | 512 rows x 256 columns x 2 MATs |                             |  |  |
| 6-T SRAM cell size          | $0.327 \,\mathrm{\mu m^2}$      | $0.245 \,\mathrm{\mu m^2}$  |  |  |
| <b>Bit density</b>          | $1.57$ Mbit/mm <sup>2</sup>     | $2.12$ Mbit/mm <sup>2</sup> |  |  |
| Area overhead of assists    | 7%                              | 9%                          |  |  |
| Access time @RT, 256Kb      | 2.2 ns $@1.2V$                  | 2.5 ns $@1.2V$              |  |  |
| <b>Standby leakage @RT</b>  | 15 pA/cell $@0.7V$              | 26 pA/cell @0.7V            |  |  |

Table 4.1: Feature of the fabricated SRAM macro



Figure 4.24: Measured minimum operating voltage (*VDDmin*) of 512-kbit SRAM at worst temperature

cell current at the worst-case condition was improved by 83%. More stable functionality of 512-kbit SRAM macros with 0.245  $\mu$ m<sup>2</sup> and 0.327  $\mu$ m<sup>2</sup> was observed. The minimum operating voltage in the process-worst case and the temperature-worst case condition was improved by 170 mV; its variation was improved to 60 mV, confirming a high immunity against process and temperature variations with less than 10% area overhead.

Recently, many SRAM architectures such as the multi-power supply scheme and dynamic

voltage and frequency scaling (DVFS) have been proposed [55,56] to maintain the scaling of the SRAM cell size and to decrease power consumption. Although these new schemes have considerable effects not only on the improvement of the stability against the local  $V_{th}$  variations but also on the reduction of the power, the issues related to PVT variations remain. As described repeatedly, this methodology has superior functionality of optimizing the SRAM characteristics automatically, especially against the PVT variations. Therefore, by combining the proposed circuitry with these new techniques, further minimum operating voltage can be achieved, yielding a promising architecture for future SRAM development beyond 45-nm CMOS technology.

# **Chapter 5 High-Density Two-Port SRAM Design**

In this chapter, the high-density and low-power dual-port SRAM design technique is discussed. An access scheme for a synchronous dual-port (DP) SRAM that minimizes the memory cell area and maintains cell stability is proposed. A priority row decoder circuit and shifted bit-line access scheme eliminate access conflict issues. Using 65-nm CMOS technology (hp90) with the proposed scheme, the 32-kB DP-SRAM macros are fabricated. A  $0.71 \mu m^2$  8T DP-cell is obtained, of which the cell size is only 1.44× larger than that of a 6T single-port (SP) cell. The bit-density of the fabricated 32-kB DP-RAM macro is 667-kbit/mm<sup>2</sup>, which is 25% larger than a conventional 8T SRAM. The standby leakage is 27% less because of the small drive-NMOS of the proposed 8T DP-cell.

### **5.1. Introduction**

In deep submicron technology, System-on-Chip (SoC) products require high-speed and low-power embedded memory to support increased storage capability. Typically, the static random access memory (SRAM) has been widely used for SoC products. To date, most embedded memory is single-port SRAM, which has one access port for reading and writing operations, although the demands for multi-port SRAM continue to increase to accommodate high-speed communications and image processing. The multi-port SRAM is suitable for parallel operation and improves the total chip performance [57–67].

Underlying the trend is the fact that SRAMs face limitations in terms of power dissipation through increased clock frequency to improve the performance of SoCs as technology advancement. Accordingly, the system architecture has moved to parallel operations to increase practical computation speed through increased parallel processing rather than through increasing the clock frequency. Many reports have described high-performance and low-power multi-core processors that have plural CPUs within a die. The memory accesses increase considerably, to the extent that the memory access speed becomes a system bottleneck. That fact creates increasing demand for a multi-port SRAM that supports access from plural ports simultaneously.

Although the memory access speed (the number of clock cycles) improves with the increasing number of access ports of the SRAM, its area penalty also increases with the number of ports. Consequently, a multi-port SRAM with more than three access ports has low capability on a die; it is used particularly for high-speed register files in a data path [64,65,66] or as buffer memory for a video image processor engine [67]. Alternatively, a dual-port SRAM with two



(b) parallel memory access

Figure 5.1: System block diagrams and timing charts of the memory access

access ports is frequently used for recent SoC chips with large capability as well as SP-SRAM. For example, it is used as buffer memory in multimedia applications [59] or as a data cache in a multi-core processor [62,63]. From the perspective described above, the embedded DP-SRAM is an essential IP block and tends to increase its capability.

In this section, it is demonstrated that the embedded dual-port SRAM can increase the internal memory access speed. Figure 5.1 presents simple block diagrams and timing charts of the memory access. Figure 5.1(a) portrays the case of sequential memory access using a typical single-port (SP)-SRAM block, whereas Fig. 5.1(b) depicts the case of parallel memory access by a dual-port SRAM block. In Fig. 5.1(a), two functional units (UNIT-A, UNIT-B) must access SP-SRAM in series through the internal data bus because there is only one-port accessibility. Consequently, two clock cycles are required if each UNIT accesses the SRAM once. On the other hand, both UNIT-A and UNIT-B can access a DP-SRAM block simultaneously within a cycle.



Figure 5.2: SRAM memory cell circuits

Thereby, the parallel memory access can increase the memory access speed in relation to sequential memory access.

By increasing the DP-SRAM capability, the occupation of a chip increases. For that reason, a higher density DP-SRAM is strongly required. In general, the unit-cell of the dual-port SRAM has been about twice as large as that of single-port SRAM to date. Although the area penalty has been reduced by the new layout structure, it is still 1.63 times larger than the SP-cell [68]. Consequently, the unit cell of DP-SRAM has eight transistors, whereas that of SP-SRAM has six transistors. In addition, some transistors must expand the gate channel length and width to maintain the cell stability and access speed. This expansion of the transistors in a unit-cell is considered to be the inherently worst-case design of the DP-SRAM when possible simultaneous



Figure 5.3: Assortment of the access modes of the dual-port SRAM

access from both ports occurs.

For this study, a priority row decoder and shifted bitline (BL) access scheme for synchronous DP-SRAM are proposed. In addition, the physical layout of the 8T DP-cell, which has been contrived to reduce its area, is introduced. The local and global variations of threshold voltage  $(V<sub>th</sub>)$  are well considered to determine the unit cell transistors. This approach has no access penalty, in addition to the smallest memory cell size ever reported in a 65-nm technology [69]. Figure 5.1 shows that this circumventive scheme must operate with a common internal clock. The proposed scheme cannot be adopted if both clocks have asynchronous frequencies which are mutually independent. When both clock phases are synchronized, however, this scheme is



Figure 5.4: Butterfly curves and static noise margin of the DP-8T-cell for both common row access and different row access

available for use even if the clocks do not have exactly the same frequency [59].

This chapter is organized as follows. In section 5.2, access conflict issues related to the dual-port SRAM are discussed first. In the subsequent section, it is introduced that the proposed circumventive common-row-access scheme reduces the cell size while maintaining the stability and access speed. In section 5.4, the design of the high-density 8T SRAM cells is explained with examination of the cell stability using SPICE simulation. Evaluation results of the test chips fabricated on 65-nm CMOS technology are presented in section 5.5. A brief summary is provided in section 5.6.

# **5.2. Access Conflict issue of Dual-Port SRAM**

Figure 5.2 shows memory cell circuits for single-port and dual-port SRAM. The standard single-port SRAM cell presented in Fig. 5.2(a) comprises six transistors: two pull-up PMOSs (load-PMOS), two pull-down NMOSs (drive-NMOS), and two transfer NMOSs (access-NMOS). The single-port SRAM realizes either read-operation or write-operation, so that its operation is often denoted as "1RW". Normally, as presented in Figs. 5.2(b) and 5.2(c), two major types of memory cells are used for the dual-port SRAM. Although both memory cells have eight transistors in common, their function differs greatly. Figure 5.2(b) portrays the one-read/one-write (1R1W) type DP-SRAM cell, in which only one of the two ports is allowed for read operation. This 1R1W memory cell has stable read operation, but its single-ended read-bitline (RBL) structure might have an impact of access-time degradation, unfortunately, because of the RBL full swing. Figure 5.2(c) portrays the two read-write (2RW) type of 8T SRAM memory cells corresponding to Fig. 5.1. In this type of dual-port memory cell, both ports are available for reading and writing, which indicates that the 2RW type of memory cell can also operate as a 1R1W, although the 1R1W type of memory cell cannot operate as a 2RW. In this way, the 2RW type of 8T DP-cell has more access flexibility. Hereafter, this type of DP-SRAM is specifically addressed in this study.



Figure 5.5: Concept of proposed circumventing simultaneous common-row-access



Figure 5.6: Block diagram and timing chart of proposed access scheme

Figure 5.3 depicts the variety of the access situations of the 2RW dual-port SRAM when both ports are enabled simultaneously. The memory cell array with activated 8T-cells, wordlines (WLA, WLB), and bitlines (BLA, BLB), is shown simply. The buffers of both sides of memory cell array designate the addressed WL drivers of both ports. Figure 5.3(a) depicts a situation in which a different row and column are accessed from both ports, which are designated independently by each address input. Figure 3(b) shows the different row and common column access situation. These two situations have no issues in terms of the access conflict of both ports because each selected memory cell, of which either WLA or WLB is activated, operates as a single port access. Figures 5.3(c) and 5.3(d) respectively show the common row and different column access, and the common row and common column access. In these common row access situations, the cell stability must be considered as the worst case for reading because the two



Figure 5.7: Circuit of row-address comparator (RAC)

enabled wordlines affect the static noise margin (SNM) degradation for all memory cells along with the selected row. Both ports operate as reading; also, one port operates as writing or both ports operate as writing. Therefore, the write stability is also considered as the worst case of the selected memory cell. The read stability is still considered in writing operations because the half-selected (selected row and unselected column) memory cells are equal to reading situations even if one or both ports are performing a writing operation. In general, if the writing data are different (namely the opposite data) from both ports, absolutely consistent address access for a writing operation from each port, as presented in Fig. 5.3(d), is inhibited because of the abnormal leakage current flows in the accessed memory cell. Still, the simultaneous reading operation or reading and writing operations from both ports is frequently required from the system. Therefore, the conventional DP-SRAM design must satisfy such a worst-case access situation: the size of 8T DP-cell necessarily becomes large because of increasing gate width of drive-NMOSs to improve the cell stability.

Figure 5.4 shows simulated butterfly curves of the SNM for the 8T DP-cell. As described earlier, the 8T DP-cell has two different SNM values depending on the access situation: one is a common access situation in which two wordlines (WLs) within the same row are selected; the other is a different access situation in which two WLs in two different rows are selected. In the common access situation shown in Figs. 5.2(c) and 5.2(d), both WLs are activated. Therefore, the electrical β ratio of the 8T DP-cell is expressed as  $\beta_{ND1}/(\beta_{NA1}+\beta_{NA2})$ . Here,  $\beta_{ND1}$ ,  $\beta_{NA1}$ , and  $\beta_{NA2}$ 

respectively indicate the coefficients of source-drain currents of the drive-NMOS, the access-NMOS for port A, and the access-NMOS for port B. On the other hand, as for the different access situation, the corresponding β ratio becomes  $\beta_{ND1}/\beta_{NA1}$  or  $\beta_{ND1}/\beta_{NA2}$  because of single activation of the WL. In general, a lower  $\beta$  ratio reduces the read stability, SNM, which indicates that the SNM in common access situation must be discussed for the worst-case design of the 8T DP-cell.

# **5.3. Circumventing Access Scheme of Simultaneous Common Row Activation**

Figure 5.5 presents the fundamental concept of the proposed DP-SRAM access scheme. For convenience, it is defined that port A, which is connected to the pair of BLA and /BLA, is primary, whereas port B, which is connected to that of BLB and /BLB, is secondary. In the secondary port B, the RAC and the BSC are introduced. Figure 5.6 expresses more detailed operations depending on the access mode. The implemented circuitry in a test chip design is portrayed in Figs. 5.7 and 5.8. Figure 5.6(a) shows that the address input signal  $AA \sim$  activates WLA in the *m*-th row (WLAm), whereas the AB $\leq$  activates WLB in the *n*-th row (WLBn),



Figure 5.8: Circuit of bit-line shifter for secondary port

which means they have different access modes. In this condition, the RAC is designed to the output "H" level so that the DP-SRAM as a whole is expected to realize a standard read or write operation. Once the  $AA \sim$  and the  $AB \sim$  select the WLs in a common row, as presented in Fig. 5.6(b), the row decoder for port B is disenabled because of the RAC. Consequently, only the WLAn is accessible to the memory cell. Simultaneously, the "L" level generated by the RAC (see also Fig. 5.7) modifies the connection of secondary port B from the pair of BLB to that of BLA, making it possible to read data stably without SNM degradation. In other words, this scheme circumvents the common access mode.

For that reason, it is possible to reduce the drive-NMOS transistor width, which contributes directly to the reduction of the DP-SRAM unit cell area. In addition, this circuitry has a strong effect on the write operation. In fact, the common access mode becomes a critical problem in the write operation because the read operation takes place in unselected columns, which means that the data to be stored might be flipped during writing. However, the proposed scheme keeps the WLBs at "L" levels as well as in the write operation. Consequently, whenever the common access mode occurs, this type of error can be avoided safely. In this way, the fatal risk associated with the specific operation in the DP-SRAM can be circumvented. Furthermore, it is noteworthy that the introduction of the additional circuitry is compensated by the reduction of the cell area of a unit DP-SRAM.

# **5.4. 8T Dual-Port Cell Design**

### **5.4.1. Scaling Trend of Memory Cell Sizes**

Figure 5.9 shows scaling trends of embedded SRAM cell size of 6T SRAM (for a 1RW single-port) and 8T SRAM (for a 2RW dual-port). The cell size of 6T SRAM shrinks by half as one technology node advances. Conventionally, the 8T DP-cell sizes are more than twice as large as 6T SP-cell sizes up to 130-nm technology. A new elongated 8T DP-cell layout was proposed; its cell size was 2.04  $\mu$ m<sup>2</sup>, which is only 1.63 times larger than 6T SP-cell of the 1.25  $\mu$ m<sup>2</sup> in 90 nm technology [68,70]. According to the scaling trend, both the 6T SP-cell and 8T DP-cell sizes become approximately half, which are 0.61  $\mu$ m<sup>2</sup> and 0.99  $\mu$ m<sup>2</sup>, respectively, in 65 nm technology with the same layout topology [71]. In this work, the new access scheme, which is described in section 5.3, is applied to achieve a smaller cell beyond the scaling trend. In addition, aggressive shrinkage is obtainable by making active diffusion and poly-silicon gates into regular polygons from the design for manufacturability (DFM) perspective, providing improved printability of the cell layout.

As a result, the proposed thin 8T DP-cell size is 0.71  $\mu$ m<sup>2</sup>, which is 30% smaller than a normal 8T DP-cell and is only  $1.44 \times$  the cell size of an advanced high-density 6T SP-cell [69].



Figure 5.9: Scaling trend of SRAM memory cell size

#### **5.4.2. Contrived 8T SRAM Cell Layout**

Beyond 100-nm technology, the major memory cell layout of 6T SRAM becomes a wide and thin rectangle type, which includes two well-bounded regions. Extending the same layout topology, the conventional 2RW type of an 8T SRAM cell layout [68] was also a thin rectangle type similar to a 6T SRAM cell. The proposed high-density 8T DP-cell layout is based on these wide and thin rectangle types. Figures 5.10 and 5.11 show the layout and an SEM image of the proposed 8T DP-cell using 65 nm LSTP CMOS technology. As with a conventional 8T DP-cell, four shared contacts of tungsten plugs connect the poly-silicon gate and diffusion region directly to achieve a smaller cell size. In terms of front-end-of-line (FEOL), the cell width (*x* direction) can be shrunk aggressively because the transistor width of drive-NMOS can be reduced to about half that of the normal cell. Regarding the back-end-of-line (BEOL), however, no scaling down occurs in the *x* direction because the second metal tracks consisting of BL pairs, WL islands, and power line are almost completely occupied, even for a conventional 8T DP-cell.

To resolve this BEOL bottleneck, the layers of BLs, WLs, and the power-line to upper layers are changed in each, as presented in Fig. 5.10(b). The BLs and power-line run with the third metal layer in vertical direction and the WLs run with the fourth metal layer in the



(b) BEOL

Figure 5.10: The 8T-DP-cell layout

horizontal direction. The ground-line maintains a second metal layer, but it is connected directly with both sides in each cell in a zigzag wire, as in a snake pattern. Consequently, the required second metal tracks are reduced to seven from nine; the cell width is then determined using FEOL, not BEOL.

In this design, the electrical β ratio is reduced to one, which minimizes the 8T DP-cell width in *x* directions, i.e. the  $\beta_{ND1} = \beta_{NA1} = \beta_{NA2}$ , which is the same ratio as that of the 6T SP-cell [71]. For







Figure 5.11: Top view SEM images of 8T DP-cell after poly-etching and second metal copper dual-damascene interconnect

that reason, the regions of *n*-type active diffusions and poly-silicon gates become a straight polygon pattern, which presents advantages from the DFM perspective. It is lithographically friendly or robust against misalignment of mask steps because of the reduction of the corner round shapes. Therefore, the minimum dimensions of FEOL can be reduced aggressively without yield loss. The dimensions of each cell are presented in Table 5.1. Regarding concerns about the read stability attributable to the small electrical β ratio, the topic is discussed in the following two sub-sections.

### **5.4.3. Simulated Butterfly Curves for Static Noise Margin**

Next, the read stability of the proposed 8T DP-cell is discussed. Figure 5.12 shows the simulated butterfly curves both of conventional DP-SRAM cell and the proposed UHD-8T-SRAM cell in 65-nm technology. The data show the process under typical conditions: 1.2 V supply voltage and room temperature. Conventional 8T SRAM must be considered the worst case of the common row access situation. On the other hand, the proposed 8T DP-SRAM is considered to be the case in which either WLA or WLB is activated like a 6T SP-SRAM. The DC simulation result shows that the SNM values are 186 mV and 194 mV, respectively, for conventional and proposed 8T DP-SRAMs. In spite of the small electrical β ratio, the SNM of the proposed UHD-8T-SRAM cell is slightly larger than the conventional DP-SRAM cell under typical conditions because that the  $V_{th}$  of small access transistor of conventional unit-cell is lower as a result of the reverse narrow effect. Meanwhile the  $V_{th}$  of access transistor of the proposed cell is almost the same as that of the driver transistor [71].



Figure 5.12: Measured SNM for conventional and proposed 8T-DP-cells

#### **5.4.4. Cell Stability Analysis**

The stability of proposed DP-SRAM cell is verified by considering the global and the local *Vth* variation. The global variation means the inter-die variation, which results from variation of the gate length, gate width, gate oxide thickness, and dopant implantation. The local variation is the intra-die variation, which results from dopant fluctuation of channel and gate line-edge-roughness (LER). Figure 5.13 shows the result of stability analysis by  $V_{th}$  curve simulation [72] considering both global and local  $V_{th}$  variations. The read and write boundaries are solved using "the worst case model analysis". In this analysis, it is assumed that the total memory capability of DP-SRAM in one die is up to 1-Mbit. The temperature is -40–125°C; the supply voltage is 1.2 V  $\pm$  10%. As presented in Fig. 5.13, the read and write margin is sufficiently good for the global corner models FF, FS, SS, and SF, as well as a typical model CC. Here, FS means fast-NMOS and slow-PMOS; SF means slow-NMOS and fast-PMOS. This simulation result shows that no yield loss pertains to mass production on account of DP-SRAM instability.

### **5.4.5. Reduction of Standby Leakage**

The small drive-NMOS transistor contributes not only to area but also to standby leakage reduction. Figure 5.14 presents a comparison of the simulated standby leakages of the reference 0.49  $\mu$ m<sup>2</sup> 6T SRAM (SP) cell, the proposed 0.71  $\mu$ m<sup>2</sup> 8T SRAM (DP), and the conventional



Figure 5.13: Stability analysis by  $V_{th}$  curve simulations



Figure 5.14: Estimation of the stand-by leakage of the 8T-DP-cell by SPICE Simulation

0.99-μm<sup>2</sup> 8T SRAM (DP) cell, respectively, using 65-nm CMOS technology. For each cell, the total leakage current flow, which is the sum of the subthreshold leakage current, the gate induced drain current (GIDL), and the gate leakage of all transistors are estimated. The typical standby

leakage of the proposed DP 8T-cell is 9.0 pA/cell at the 1.2 V supply voltage; the room temperature is reduced by 30% from that of the conventional 8T DP-cell. The increasing standby leakage of proposed 8T DP-cell is only suppressed to 1.4 times that of the 6T SP-cell because the leakage component of access-NMOS transistors was only twice.

# **5.5. Implementations and Evaluation**

### **5.5.1. Design and Fabrication of Test Chip**

Test chips with eight embedded 32-kB DP-SRAM macros are designed and fabricated using 65-nm CMOS technology. Figure 5.15 shows a microphotograph of the 36.2 mm2 test chip. The four macros at the right side are the proposed ultra-high-density (UHD) DP-SRAM, whereas the other four macros at the left side are normal DP-SRAM macros. Although the test chips were fabricated with eight metal layers, both conventional and proposed SRAM macros were implemented within four metal layers. Figure 5.16 presents the layout plot of the proposed 32-kB UHD DP-SRAM macros. Two row decoders for both port A and port B are placed exactly at the center of the macro, so that the memory cell array is divided into two cell arrays by the row decoder, thereby shortening the wordline. The primary data I/O for port A are located at the upper side and secondary data I/O for port B are located at the opposite lower side. The BL shifter is inserted



Figure 5.15: Die photograph of a test chip



Figure 5.16: Layout plot of fabricated 32-kB UHD-DP-SRAM macros

between the cell array and secondary data I/O is not inserted in the primary data I/O. The row address comparator (RAC) is placed into the secondary address buffer region. The total cell array region is decreased by 30% because of the small memory cell compared to conventional one. On the other hand, the BL shifter and the RAC in the peripheral part are slightly larger by 5%. The physical layout of the 32-kB macro is  $868 \times 442$  µm; the bit density is 667-kbit/mm<sup>2</sup>. Figure 5.17 displays comparisons of bit-densities with those obtained in previous works, along with area overheads of the 8T-cell over the 6T-cell. Figure 5.17 shows that the 25% increase of bit-density is achieved in this work.

#### **5.5.2. Measurement Result**

All 32-kB macros are tested and confirmed as fully operational. Additionally, the SNMs of the proposed 8T-cells for both ports are measured, as presented in Fig. 5.18. The results verified that the SNM for port A and port B was well balanced; results showed that the measured mean value correlates with the SPICE simulation result. The simultaneous activation of SNM for both WL need not be considered because that situation never occurs with this design. Table 5.2 summarizes the test chip features. Figure 5.19 presents a typical shmoo plot depending on the supply voltage versus clock access time under room temperature conditions. It shows the measured SRAM macro functions of 0.8–1.44 V. The measured clock access time was 3.0 ns at typical supply voltage 1.2 V; the conventional access time is 3.1 ns, which suggests a lack of an

access time penalty (see Table 5.2). The measured typical standby leakage of four 32-kB macros (totally 128 kB) including both the cell array and peripheral was 20 μA, which was reduced by 27% from that of the conventional one because of the small drive transistor of the DP-SRAM cell.

|                                              |                                     | 90 nm        |              | 65 nm               |              | Advanced 65 nm         |                                                                                                                   |
|----------------------------------------------|-------------------------------------|--------------|--------------|---------------------|--------------|------------------------|-------------------------------------------------------------------------------------------------------------------|
|                                              |                                     | 8T (DP)      | 6T (SP)      | 8T (DP)<br>(Normal) | 6T (SP)      | 8T (DP)<br>(this work) | 6T (SP)                                                                                                           |
| Rectangle size $(\mu m)$                     |                                     |              |              |                     |              |                        | $2.84 \times 0.72$ 1.76 $\times$ 0.72 1.90 $\times$ 0.52 1.18 $\times$ 0.52 1.48 $\times$ 0.48 1.03 $\times$ 0.48 |
| Cell area $(\mu m^2)$                        |                                     | 2.04         | 1.25         | 0.99                | 0.61         | 0.71                   | 0.49                                                                                                              |
|                                              | $A_{\text{np}}/A_{\text{sp}}$ ratio | $1.63\times$ | 1            | $1.61\times$        | 1            | $1.44\times$           | 1                                                                                                                 |
| Tr. width<br>(nm)                            | <b>Load-PMOS</b>                    | 140          | $\leftarrow$ | 90                  | $\leftarrow$ | 80                     | $\leftarrow$                                                                                                      |
|                                              | <b>Access-NMOS</b>                  | 140          | $\leftarrow$ | 90                  | $\leftarrow$ | 120                    | $\leftarrow$                                                                                                      |
|                                              | Drive-NMOS                          | 400          | 200          | 260                 | 130          | 120                    | $\leftarrow$                                                                                                      |
| <b>Physical</b><br><b>Dimensions</b><br>(nm) | Tr. pitch                           | 360          | $\leftarrow$ | 260                 | $\leftarrow$ | 240                    | $\leftarrow$                                                                                                      |
|                                              | P/N Iso.                            | 280          | $\leftarrow$ | 200                 | $\leftarrow$ | 150                    | $\leftarrow$                                                                                                      |
|                                              | STI Iso.                            | 140          | $\leftarrow$ | 100                 | $\leftarrow$ | $\leftarrow$           | $\leftarrow$                                                                                                      |
|                                              | Gate Iso.                           | 120          | $\leftarrow$ | 120                 | $\leftarrow$ | 110                    | $\leftarrow$                                                                                                      |
|                                              | Metal1 (L/S)                        | 120 / 120    | $\leftarrow$ | 90/90               | $\leftarrow$ | $\leftarrow$           | $\leftarrow$                                                                                                      |
|                                              | Metal $2-4$ (L/S)                   | 140 / 140    | ←            | 100 / 100           | $\leftarrow$ | $\leftarrow$           | $\leftarrow$                                                                                                      |

Table 5.1: Dimension of 8T-DP-cells



Figure 5.17: Comparisons of the bit-density and cell size ratio

|                       | Conv. DP-SRAM                                     | <b>Proposed</b>           |  |  |
|-----------------------|---------------------------------------------------|---------------------------|--|--|
| Technology            | 65 nm (hp90) LSTP CMOS                            |                           |  |  |
| Configuration         | 16 bit $\times$ 16 k word $\times$ 4 macro        |                           |  |  |
| MAT size              | 512 row $\times$ 256 column $\times$ 2 MAT /macro |                           |  |  |
| Mux I/O               | 32                                                |                           |  |  |
| Memory cell size      | $0.99 \mu m^2$                                    | $0.71 \mu m^2$            |  |  |
| Physical macro size   | 1084 μm $\times$ 442 μm                           | 868 $\mu$ m × 442 $\mu$ m |  |  |
| <b>Bit density</b>    | 534k bit/mm <sup>2</sup>                          | 667k bit/mm <sup>2</sup>  |  |  |
| Read access time@1.2V | $3.1$ ns                                          | 3.0 <sub>ns</sub>         |  |  |
| Standby leakage@1.2V  | 27 µA/Mbit                                        | 20 µA/Mbit                |  |  |

Table 5.2: Features of the fabricated SRAM macros



Figure 5.18: Measured SNMs


Figure 5.19: Shmoo plot

## **5.6. Summary**

In this chapter, a new access scheme for an ultra-high-density synchronous DP-SRAM was demonstrated. Using 65-nm CMOS technology, 32-kB DP-SRAM macros were designed and subsequently fabricated. This process yielded the smallest 8T DP-cell and the highest bit-density ever reported in the 65-nm era. Test results show that the speed penalty was negligible; standby leakage was reduced by 27% because of the small cell size. The next generation of 45-nm or 32-nm advanced SoC products will require further consideration of the device variation. For this work, the assist technique was not applied to enhance the read stability and write ability as for a single-port SRAM, as reported recently [71], [73–76], because the total size of memory embedded in one die is not as great as that of an SP-SRAM; moreover, the variation is within allowed limits. In the near future, system applications will necessitate increased total dual-port SRAM capability. Furthermore, the variation increases indefinitely according to the shrinkage. For such cases, stability enhancement techniques would become necessary for the DP-SRAM as well as for the SP-SRAM, so that the shrinkage of the DP-SRAM cell would continue. Therefore, the proposed circumvention of the access scheme for the DP-SRAM is certain to support area reduction and leakage suppression with no speed overhead for future advanced SoC products.

## **Chapter 6 Alternative 6T SRAM Design for DVFS**

In this chapter, SNM-free multi-port SRAM cells having a read buffer are proposed. The CMOS type of read buffer of a memory cell, which has logic-gate with pull-up PMOS driver and pull-down NMOS drivers, enables the read bitline to swing full voltage. It improves the access time even at lower voltage operation. During the read operation, the storage nodes of the activated memory cell are unaffected by the read bitline, even when the read wordline is activated. Consequently, the static noise margin (SNM) remains the same as in the data retention mode, resulting in a lower minimum operating voltage *VDDmin*. Moreover, the pre-charge operation is obviated for the read bitline and write bitline, thereby reducing charge and discharge power dissipation depending on the data toggle rate. Using 0.5 μm CMOS gate-array technology, several types of these SRAM macros were designed and fabricated.

The other types of 10T SRAM cells are presented for the DVFS environment. The SNM is improved because of the additional access-NMOS transistors, which also prevent the degradation of the cell current. Although the write ability worsens compared to the 6T SRAM cell because of series of access NMOS transistors, it is improved by optimizing  $V_{th}$  and transistor dimensions and the source bias scheme. These 10T SRAM cells can compensate the half-select issue which occurs at cells in selected row and unselected column during writing operation. It is designed and fabricated using 45-nm CMOS technology. The evaluation results show that the 128-kbit 10T SRAM macro can operate at wide supply voltages of 0.5–1.3 V. The *VDDmin* of the proposed 10T SRAM is 0.48 V, which is reduced by 460 mV compared to the normal 6T SRAM.

## **6.1. Introduction**

Because of the increasing number of integrated transistors as device technology is scaled, advanced SoC devices face power crises. To prevent power issues, dynamic voltage and frequency scaling (DVFS) has become an important technique for low-power and high-speed SoC applications. The main objective of the DVFS is reduction of power consumption in accordance with the workload dependency. The supply voltage is lowered dynamically to reduce the power when the workload is light. However, it is difficult to lower the supply voltage of SRAM IP blocks because of increasing process variations. The design solution described in Chapter 4 will help the minimum operating voltage (*VDDmin*) of a general 6T single-port SRAM. However, it will eventually face the limitation of the SRAM *VDDmin* because of the degradation of the SRAM stability. Thereby, alternative 6T single-port SRAM cell designs, which use 8T read margin free cell or several types of 10T read margin free cells, are reported recently [50, 77, 78]. Figure 6.1 depicts the typical types of schematics for an alternative 6T SRAM cell.

To improve the read margin of an SRAM, which is the static noise margin (SNM), additional transistors are used so as not to disturb the storage node during the read operation. Typically, an 8T SRAM cell, whose schematic is depicted in Fig. 6.1(a), becomes widely used for advanced CMOS devices. This type of memory cell has two additional NMOS transistors with a single read bitline (RBL) for read operation. For write operations, complementary write bitline pairs (WBL, /WBL) are used just as in the 6T SRAM cell. Consequently, this 8T SRAM cell improves read stability over that of the 6T SRAM cell. The 8T SRAM cell area would eventually increase because of the two additional NMOS transistors if the same gate length and width were used for each memory cell transistor. By considering the local  $V_{th}$  variation, as discussed in Chapters 2 and 4, the transistor sizes of the 6T SRAM cannot shrink further. The memory transistor sizes must be designed to become bigger than those of the previous generation to ensure the cell stability in sub 50-nm technology. Meanwhile, the 8T SRAM cell has strong robustness of read stability against process variation. For that reason, it can shrink the transistor size in accordance to the scaling trend for 45-nm, 32-nm or 22-nm generations. This might induce 6T SRAM cell size becoming larger than the 8T SRAM cell. The cross over point of this area advantage for future generations is predicted and discussed in some reports [79,80].

On the other hand, a non-pre-charge type of 10T SRAM cell is proposed as depicted in Fig. 6.1(b). This type of memory cell has a read buffer within a cell, just as the 8T SRAM cell does. The pull-up and transfer PMOS transistors and a complementary read wordline pair (RWL, /RWL) is used to achieve the pre-charge less RBL structure. Although this 10T SRAM cell has a larger area penalty than that of 8T SRAM cell, the active power can be reduced further depending on the toggle rate of readout data. Thereby, it becomes more effective in power reduction under the DVFS environment. However, these single-ended bitline structures, which operate at a full voltage swing for RBL, present the disadvantage that the transition time of RBL has large variation because of local  $V_{th}$  variation. To make matters worse, the former 8T SRAM cell requires a keeper circuit for RBL, conflicting the readout data. The 10T SRAM ladder cell also presents the disadvantage that the rising time of RBL slows because of the weak drive strength of the serially connected pull-up PMOSs.

The SNM-free 10T SRAM cell with four extra NMOS transistors, which is depicted in Fig. 6.1(c), has a differential read bitline pair (RBL, /RBL). This memory cell can compensate the variation of readout transition time because the small differential signal can be detected using a sense amplifier as well as sensing of 6T SRAM cell. The area penalty is almost the same as that of the 10T SRAM cell, as depicted in Fig. 6.1(b).

These three typical alternative 6T SRAM cells present advantages of read stability. Notwithstanding, the half select issue during writing operation remains. This half select issue occurs in cells with a selected row and unselected column in the case of column multiplexing.



(c) 10T differential type

Figure 6.1: Alternative 6T SRAM bitcells

## **6.2. High-Speed Multi-Port Register File**

### **6.2.1. Conventional Memory Cell Circuits**

Multi-port memory cell circuits have a single bitline for the read operation and a single bitline for the write operation to minimize their area. Figure 6.2(a) portrays a typical two-port memory cell circuit. The read buffer enables an asynchronous read operation without pre-charging the read bitline (RBL). However, a serious problem is that the RAM does not operate at a low supply voltage because the RBL is driven by an NMOS transistor, which cannot apply high voltage to the RBL because of the body effect. For write operations, the same problem arises because the high-data must be written by a single write bitline (WBL) through the NMOS pass transistor. For this reason, this type of memory cell is unsuitable for lower supply voltage operation.

Figure 6.2(b) depicts the conventional memory cell circuit used in the RAM for gate arrays. To operate at a low supply voltage, the full CMOS three-state buffer was used as a read buffer and the differential bitline architecture was used for the write operation. The RBL swings



(a) Conventional 2-port SRAM bitcell





Figure 6.2: Conventional memory cell circuits



Figure 6.3: Proposed 2-port memory circuit having read buffer

between the supply voltage level and the ground level. However, the RBL is charged through the serially connected transistors pl and p2, as depicted in Fig. 6.2(b), so that the delay time for the RBL to rise is not so fast. This long delay time does not meet the requirements of high-speed operation under the low supply voltage. Especially, in gate arrays, where all transistor sizes are unified, the transistor size cannot be optimized well. Thereby, the conventional memory cell induces the large delay time because of the small PMOS drain current.

### **6.2.2. SNM Free Memory Cell Having Read Buffer**

To realize two purposes––operations at high speed and at low voltage––a novel memory cell circuit is proposed. Figure 6.3 shows the proposed memory cell circuits for a two-port RAM. The read buffer circuit is modified from the conventional one. It comprises a NAND gate, a PMOS transistor (pl) and two NMOS transistors (n1 and n2). The source of the transistor pl is connected to *VDD* with no transfer gates. This makes the RBL swing to the full and enables the RAM to operate even at low voltage. Furthermore, because the RBL is driven by the buffer with no transfer gates, the delay time for the RBL voltage to rise can be decreased, leading to high-speed operation. For the three-port memory cell, two read ports exist. The memory cell has two read buffers, two read wordlines (RWLs), and two RBLs.

### **6.2.3. Effect of the Proposed Memory Cell**

Figure 6.4 shows the waveforms of the RBLs of various types of memory cell and a RWL by SPICE simulation. The RBL level of the full-custom memory cell does not reach the high level voltage *VDD* because of the body effect of its NMOS transistor, as depicted in Fig. 6.2. This



Figure 6.4: Simulated waveforms of RBLs



Figure 6.5: Dependence of delay time on supply voltage

causes a poor noise margin and a DC current flow at the sense latch, which engenders large power consumption. On the other hand, the RBL levels of the conventional and proposed memory cells can reach the high-level voltage *VDD*. In this figure, it is assumed that the tpd is the delay time from RWL to RBL at half the *VDD* voltage. Figure 6.5 shows the dependence of



Figure 6.6: Microphotograph of test chip

tpd on the supply voltage for these memory cells. According to this graph, the tpd of the full-custom memory cell strongly depends on the supply voltage; it increases extremely as the supply voltage decreases. In addition, this graph shows that the full-custom memory cell cannot operate at voltages less than 2.0 V. On the other hand, both the conventional and proposed memory cells operate at lower voltage because the RBL swings to the full. Results show that the tpd of the proposed memory cell is smaller than that of the conventional one at any voltage.

### **6.2.4. Implementation Using 0.5 μm CMOS Gate-Array**

A test chip was designed and fabricated using the 0.5 μm gate array to verify the performances of several SRAM macros. Figure 6.6 shows a microphotograph of the test chip, which is  $7.5 \times 7.5$  mm<sup>2</sup>. It contains four two-port SRAMs  $(A - D)$  and three three-port SRAMs (E F G) which have several bit-word configurations. The content of the SRAMs on the test chip is shown in Table 6.1, along with the results of the measurements. In the test chip, the upper memory cell array of the RAM intentionally differs from the lower one, so that two different kinds of access time can be measured from one RAM. For example, RAM-A has 1 bit for the upper memory cell array (designated Ubit) and 8 bits for the lower memory cell array (designated Dbit), so that the access times of 2 b  $\times$  256 w and 16 b  $\times$  256 w SRAMs were measured for the case in which the memory cell arrays are equally divided. The measurements of the address access times at room temperature and 3.3 V are shown in Table 6.1. In RAM-E, the access time of 16 b  $\times$  256 w three-port RAM is 4.8 ns. The active power consumptions were also measured. For example, in RAM-A, the maximum power consumption at 3.3 V is 0.45 mW/MHz for the

|                                 |                         | A                       | B                       | $\mathbf C$             | D                                 | Е                       | F                      | G                                 |
|---------------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-----------------------------------|-------------------------|------------------------|-----------------------------------|
| # of ports                      |                         | 2                       | $\overline{2}$          | $\overline{2}$          | $\overline{2}$                    | 3                       | 3                      | 3                                 |
| Bit $(U-bit + D-bit)$<br>x word |                         | $9(1+8)$ b<br>X<br>256w | $32(8+24)b$<br>X<br>16w | $1(1+0)b$<br>X<br>2048w | $4(1+3)b$<br>$\mathbf{x}$<br>128w | $9(1+8)$ b<br>X<br>256w | $9(1+8)$ b<br>X<br>16w | $4(1+3)b$<br>$\mathbf{x}$<br>128w |
| Columns per bit (CPB)           |                         | 1                       | 1                       | 8                       | 8                                 | 1                       | 1                      | 8                                 |
| Access time<br>(ns)             | $(U-bit)$<br>$(D-bit)$  | 4.8<br>4.8              | 2.2<br>2.6              | 5.8<br>-                | 3.1<br>3.4                        | 4.8<br>4.8              | 2.1<br>2.2             | 3.4<br>3.7                        |
| Max power<br>(mW/MHz)           | (W-port)<br>$(R$ -port) | 0.45<br>0.33            | 0.58<br>0.33            | 0.29<br>0.24            | 0.33<br>0.24                      | 0.48<br>0.37            | 0.29<br>0.21           | 0.39<br>0.30                      |

Table 6.1: SRAM contents for test chip



Figure 6.7: Shmoo plot

write operation and 0.33 mW/MHz for the read operation. Here, the maximum power consumption is under the maximum toggle rate of data bits. The practical power consumption depends upon the toggle rate of data bits, so that the typical power consumption becomes from approximately 10–20% of the maximum, in general. Figure 6.7 shows an example of the shmoo plot of RAM-A for the access time versus the supply voltage. Results show that the minimum operation voltage is 1.2 V. It was verified that all SRAMs operate at voltages greater than 1.4 V.

# **6.3. 10T SRAM Cell Design**

The two types of new cross point 10T SRAM cells are presented for the DVFS environment. The memory cell circuits are depicted in Fig. 6.8. For both circuits, four NMOS transistors––N5, N6, N7, and N8––are added with 6T SRAM cells. The former type of 10T SRAM cell, as depicted in



(a) Column source bias type



(b) Column gate bias type

Figure 6.8: Proposed cross-point 10T SRAM bitcell circuits



Figure 6.9: Enhancement of SNM

Fig. 6.8(a), is enhanced SNM, with compensated write ability by column source bias technique. The half nodes of (a) and (/a) contribute to expanding the read static noise margin when the wordline WL is activated. For that reason, both nodes of (a) and (/a) do not rise to *VDD* level, which is half the bias level. Consequently, the storage nodes of  $(m)$  and  $(m)$  reduce the lower voltage level during read operations. For writing operations, although the series connected NMOS access transistors induces weak write ability, it can compensate the write margin by changing the higher  $V_{th}$  target of load-PMOS. Furthermore, the source bias technique contributes to enhancement of the write ability by driving one of the source lines (CL or /CL) to high data. This removes the disturbance of additional drive-NMOS (N5 or N6) during write operations.

Another type of memory cell, as depicted in Fig. 6.8(b), is further enhanced SNM compared to Fig. 6.8(a). In read operation, only the read wordline (RWL) is activated and the write wordline (WWL), which is connected in column direction, is not activated. Consequently, the storage nodes are not disturbed, so that the SNM is almost the same as the data hold condition. However, the weak write ability is only compensated by changing the PMOS  $V_{th}$  target. The source bias technique as the former type of cell cannot be combined with this type of memory cell. Which is the better solution depends on the device characteristics of process technology.

Using 45-nm technology, both 10T SRAM cells shown in Figs. 6.8(a) and 6.8(b) are considered. From the SPICE simulation result depicted in Fig. 6.9, the SNM of proposed 10T SRAM cell becomes 219 mV and 271 mV, respectively, expanding 1.54 times and 1.91 times compared to the general 6T SRAM cells at 1.0 V. The source bias technique improves the 50 mV write ability for former 10T SRAM cell. Both proposed 10T SRAM cells can compensate the half-select issue which occurs at cells in a selected row and unselected column during the writing operation.

The 128-kbit 10T SRAM macros were designed and fabricated using 45-nm CMOS technology. A die photograph of the test chip for 10T SRAM with source biasing type is exhibited in Fig. 6.10. The macro size is 428  $\mu$ m × 368  $\mu$ m. The top view of SEM image for both 10T SRAM cells is also shown in Fig. 6.11. The memory cell sizes are respectively 0.77  $\mu$ m<sup>2</sup> and 0.76  $\mu$ m<sup>2</sup>. The evaluation results show that the *VDD<sub>min</sub>* of the proposed 10T SRAM is 0.48 V, which is reduced by 380 mV compared to the normal 6T SRAM. It functioned from 0.5 V to 1.3 V wide supply voltage, being available for DVFS environment.



**428** μ**m**

Figure 6.10: Die photograph of test chip for 45 nm technology



(a) Column source bias type (b) Column gate bias type





# **6.4. Summary**

New types of SNM free multi-port SRAM cells having a read buffer were proposed. The CMOS read buffer, which has a NAND-gate with pull-up PMOS driver and pull-down NMOS drivers, enables the RBL to swing to full voltage. It improves the access time even during lower voltage operation. Moreover, the pre-charge operation is obviated for both RBL and WBL pairs, thereby reducing the charge and discharge power dissipation depending on the data toggle rate. From the evaluation results of the fabricated test chips using 0.5 μm CMOS gate-array technology, the advantages of proposed memory cell are verified.

New 10T SRAM cells were presented for the DVFS environment. The SNM was improved by 54% (91%) compared to the general 6T SRAM cells. Proposed 10T SRAM cells can compensate the half-select issue which occurs at cells in selected row and unselected column in writing operation. The 128-kbit 10T SRAM macros were designed and fabricated using 45-nm CMOS technology. The evaluation results show that the *VDDmin* of proposed 10T SRAM is 0.48 V, which is reduced by 380 mV compared to the normal 6T SRAM. It functioned from 0.5 V to 1.3 V wide supply voltage, and is available for a DVFS environment.

## **Chapter 7 Conclusion**

This thesis described the robust design of SRAMs on deep submicron low-power system-on-chip (SoC) devices. This study is presented as measures to address the following four main points:

- 1. Power reduction technique of SRAM (Chapter 3)
- 2. Read/Write stability enhancement technique for SRAM (Chapter 4)
- 3. High-density two-port SRAM design (Chapter 5)
- 4. Alternative 6T SRAM design for DVFS (Chapter 6)

These circuit techniques contribute to resolution of SRAM critical issues in deep submicron technology.

In Chapter 1, the background and objective of this study were prefaced. Before presenting practical design techniques to improve robustness, the actual limitation and undesirable problems in developing the advanced SRAM are introduced in Chapter 2. Increasing leakage currents, which are subthreshold leakage, gate-induced drain current (GIDL), drain-induced barrier lowering (DIBL), band-to-band tunnel leakage (BTBT), and gate leakage, are introduced and their effects on SRAM performance are discussed. The issues of increasing active power dissipation are also introduced because of the slowdown in voltage scaling, in spite of the clock frequency increment. Consequently, a power reduction technique is demanded. From another perspective, the degradation of stability against process variation is described. Because the device feature size is scaled down, the variation of the threshold voltage  $(V<sub>th</sub>)$  of MOSFETs within a die increases because of dopant fluctuation or line-edge-roughness. This  $V_{th}$  variation considerably degrades the stability for reading and writing operations. For that reason, a design solution to enhance SRAM stability against variation is strongly required. To reduce the power dissipation while maintaining performance, a dynamic voltage and frequency scaling (DVFS) is needed. However, the minimum operating voltage of SRAM becomes a critical issue.

In Chapter 3, power reduction techniques of an SRAM are demonstrated. The substrate back bias control technique contributes to reduction of the subthreshold leakage in memory cell arrays while maintaining the stored data. Also in that chapter, the gate tunneling leakage reduction technique is introduced for sub-100 nm CMOS technology. The dynamic source-biasing technique is demonstrated for reducing not only standby leakage but also operating power, including charge and discharge capacitances in a SRAM cell array and peripheral circuits. An evaluation result using 90-nm and 65-nm CMOS technology shows an effective reduction of the SRAM power using the proposed circuit techniques.

In Chapter 4, the robust SRAM design technique against process variation and temperature variation is discussed. To enhance the reading stability and writing stability of SRAM memory cells, the wordline suppression technique and dynamic power-source-line bias control technique are proposed. These circuitries present an advantage over other circuit techniques: lower minimum operating voltage. In this part, it is shown that the adequate compensation of the stabilities is performed against the variation of threshold voltage of transistors and temperature. It is designed and fabricated using 45-nm advanced CMOS technology. The test results show that the minimum operating voltage is lowered by this circuitry.

In Chapter 5, the high-density two-port SRAM design technique is reported. Recent application processors demand memory IP blocks not only as single-port SRAM, but also as dual-port SRAM to perform parallel operations. Their storage capabilities tend to increase as scaling increases. A circumvention of an access scheme is presented, which can maintain access to both ports simultaneously. This access scheme contributes to reduction of the two-port SRAM cell area because of the reduced transistor sizes compared to the conventional two-port SRAM ones. It is also expected to reduce the standby leakage. The test result of a prototype two-port SRAM, which is designed and fabricated using 65-nm technology, is shown.

In Chapter 6, the SRAM design for dynamic voltage and frequency scaling (DVFS) environment is discussed. The main objective of the DVFS is reducing the power consumption in accordance with the workload dependency. The supply voltage is lowered dynamically to reduce the power when the workload is light. Nevertheless, it is difficult to lower the supply voltage of SRAM IP blocks because of the increased process variations. The design solution described in the fourth part will help the minimum operating voltage (*VDD<sub>min</sub>*) of a normal 6T single-port SRAM. However, it will eventually face the limitation of the SRAM *VDD<sub>min</sub>* because of the degradation of the SRAM stability. Thereby, alternative 6T single-port SRAM cell designs that use 8T read margin free cell or several types of more than 10T read margin free cells are introduced. Their advantages and disadvantages are discussed.

Lastly, the results of this research are summarized with comments related to future work. The circuit technique to reduce the SRAM power is necessary in deep sub-micron technology. For example, the adaptive body bias technique will be used for low-power SoC devices beyond 45 nm CMOS technology. The reverse body bias technique in sleep mode is proposed in this chapter. Conversely, the forward body bias technique for active mode is recently reported [81]. The best solution depends on the device characteristics. Combining the proposed circuit technique and new high- $k$  device, which is expected to reduce the gate leakage and  $V_{th}$  variation of MOSFET, will contribute to further reduction of the SRAM power dissipation. Regarding the assist circuit, as described in Chapter 3, several problems remain. Suppressing wordlines to improve reading stability induces weak drive strength of the access NMOSs in a memory cell because of *Vgs* degradation. Consequently, suppression considerably degrades the read current and deteriorates write ability at lower voltage operation. To improve write ability, even in lower voltage operation, the negative bitline circuitry is taken in the SRAM peripheral. This circuitry is effective not only for single-port SRAM but also dual-port SRAM, which tends to increase the embedded capability for image-processing unit in a multimedia application SoC. Details of this ongoing research will be reported at some future date.

For an alternative 6T SRAM cell discussed in Chapter 6, at this time, the considerable disadvantage is area overhead. In subsequent 32-nm technology, the general 6T SRAM with design solution (demonstrated in Chapter 4) retains higher density under typical supply voltage of around 1.0 V. Even if the chip is under the DVFS environment, the power supply of the SRAM is independent from other power supplies for logic and analog operation, and it constantly becomes higher voltage. For ultra-low power and low-voltage operation around 0.5 V supply voltage, however, there are no solutions of 6T SRAM because of its lower stability. Consequently, the 8T, 10T, or more than ten transistors in a memory cell are demanded to improve the robustness of an SRAM against process variations. In the near future, a computer-ubiquitous society will usher in wearable computers or sensor networks using solar power. Especially for those applications, the proposed alternative 6T SRAM technique is expected to be very useful.

For 32-nm or 22-nm and subsequent generations of new device technologies such as high-*k*/metal-gate, ultra-thin-body FD-SOI, Fin-FET, and double-gate FET, it is crucial to develop a low-power SRAM that retains its robustness: the SRAM is the key component of low-power SoCs. These proposed robust SRAM design techniques will contribute to implementation of future downscaled low-power application SoCs.

## **References**

- [1] H. Sato, N. Itoh, K. Nii, K. Yoshida, Y. Nakase, H. Makino, A. Yamada, T. Arakawa, S. Iwade, Y. Hirano, T. Ipposhi, "A 400MHz 183mW microcontroller in body-tied SOI technology," in IEEE ISSCC Dig. Tech. Papers, pp. 110-111, 481, Feb. 2003.
- [2] Toshihiro Hattori, Takahiro Irita, Masayuki Ito, Eiji Yamamoto, Hisashi Kato, Go Sado, Tetsuhiro Yamada, Kunihiko Nishiyama, Hiroshi Yagi, Takao Koike, Yoshihiko Tsuchihashi, Motoki Higashida, Hiroyuki Asano, Izumi Hayashibara, Ken Tatezawa, Yasuhisa Shimazaki, Naozumi Morino, Kenji Hirose, Saneaki Tamaki, Shinichi Yoshioka, Reiko Tsuchihashi, Nobuto Arai, Tomohiro Akiyama, and Koji Ohno, "A Power Management Scheme Controlling 20 Power Domains for a Single-Chip Mobile Processor," in IEEE ISSCC Dig. Tech. Papers, pp. 542-543, 672, Feb. 2006.
- [3] International Technology Roadmap for Semiconductors, 2005, by the Semiconductors Industry Association & SEMATECH.
- [4] K. R. Lakshmikumar, R. A. Hadaway and M. A. Copeland, "Characterization and Modeling of Mismatch in MOS Transistors for Precision Analog Design," IEEE Journal of Solid-State Circuits, Vol. SC-21, No. 6, 1057-1066, 1986.
- [5] M. J. M. Pelgrom, A. C. J. Duinmaijer and A. P. G. Welbers, "Matching Properties of MOS Transistors, " IEEE Journal of Solid-State Circuits, Vol. 24, No. 5, 1433-1440, 1989.
- [6] P. A. Stolk, F. P. Widdershoven and D. B. M. Klaassen, "Modeling Statistical Dopant Fluctuations in MOS Transistors," IEEE Trans. on Electron Devices, Vol. 45, No. 9, 1960-1971, 1998.
- [7] X. Tang, V. K. De and D. Meindl, Intrinsic MOSFET Parameter Fluctuations Due to Random Dopant Placement, IEEE Trans. on VLSI Syst., Vol.5, No. 4, 369-376, 1997.
- [8] E. Seevinck, F. J. List and J. Lohstroh, "Static Noise Margin Analysis of MOS SRAM Cells," IEEE Journal of Solid-State Circuits, Vol.SC-22, No. 5, 748-754, 1987.
- [9] A. J. Bhavnagarwala, X. Tang, D. Meindl, "The Impact of Intrinsic Device Fluctuations on CMOS SRAM Cell Stability, " IEEE Journal of Solid-State Circuits, Vol.36, No. 4, 658-665, 2001.
- [10] M. Yamaoka, K. Osada, R. Tsuchiya, M. Horiuchi, S. Kimura and T. Kawahara. "Low Power SRAM Menu for SOC Application Using Yin-Yang-Feedback Memory Cell," in Proc. of 2004 Symp. on VLSI Circ.288-291.
- [11] S. Mukhopadhyay, H. Mahmoodi-Meimand, K. Roy, Modeling and Estimation of Failure Probability due to Parameter Variations in Nano-scale SRAMs for Yield Enhancement, Proc.

of 2004 Symp. on VLSI Circ.64-67.

- [12] F. Tachibana and T. Hiramoto, Re-examination of Impact of Intrinsic Dopant Fluctuations on SRAM Static Noise Margin, Proc. of Intl. Conf. on Solid State Devices and Materials, 192-193, 2004.
- [13] H. Makino, Y. Tsujihashi, K. Nii, C. Morishima, Y Hayakawa, T Shimizu and T. A rakawa,"An Auto-Backgate-Controlled MTCMOS Circuit", in Symposium on VLSI Circuits Digest of Technical Papers, pp. 42-43, June 1998.
- [14] T. Kuroda, T. Fujita, S. Mita, T. Nagamatsu, S. Yoshioka, K. Suzuki, F. Sano, M. Norishima, M. Murota, M. Kako, M. Kinugawa, M. Kakumu and T. Sakurai, "A 0.9 V 150 MHz 10 mW 4 mm2 2-D Discrete Cosine Transform Core Processor with Variable-Threshold-Voltage Scheme, " IEEE Journal of Solid-State Circuits, Vol. 31, No. 11, pp. 1770-1779, Nov. 1996.
- [15] S. Mutoh, T. Douseki, Y. Matsuya, T. Aoki, S. Shigematsu and J. Yamada, "1-V Power Supply High-speed Digital Circuit Technology with Multithreshold-Voltage CMOS," IEEE Journal of Solid-State Circuits, Vol. 30, No. 8, pp. 847-854, August 1995.
- [16] S. Shigematsu, S. Mutoh, Y. Matsuya, Y. Tanabe, J. Yamada, "A I-V high-speed MTCMOS circuit scheme for power-down applications," IEEE Journal of Solid-State Circuits, Vol. 32, No. 6, pp. 861-869, June 1997.
- [17] W. Lee, P. Landman, B. Barton, S. Abiko, H. Takahashi, H. Mizuno, S. Muramatsu, K. Tashiro, M. Fusumada, P. Luat, F. Boutaud, E. Ego, G. Gallo, T. Hiep, C. Lemonds, A. Shih, M. Nandakumar, B. Eklund and Ih-Chin Chen; "A 1 V DSP for Wireless Communications", in IEEE ISSCC Dig. Tech. Papers, pp. 92-93, Feb. 1997.
- [18] S.M. Sze, "Physics of Semiconductor Devices", a Willy-Interscience Publication.
- [19] A. Chandrakasan, W. J. Bowhill, F. Fox, "Design of High-Performance Microprocessor Circuits", IEEE Press.
- [20] K. Imai, K. Yamaguchi, T. Kudo, N. Kimizuka, H. Onishi, A. Ono, Y. Nakahara, Y. Goto, K. Noda, S. Masuoka, S. Ito, K. Matsui, K. Ando, E. Hasegawa, T. Ohashi, N. Oda, K. Yokoyama, T. Takewaki, S. Sone, and T. Horiuchi, "CMOS Device Optimization for System-on-a-chip Applications", in IEDM. Tech. Dig. 2000, pp455-458.
- [21] C. Oh, H. Ryu, H. Kang, M. Oh, J. Lee, N. Lee, H. Lee, C. Jun, Y. Kim, K. Suh, "Ultra Low Power 6T-SRAM Chip with Improved Transistor Performance and Reliability by HfO2-Al2O3 High-k Gate Dielectric Process Optimization", VLSI Tech. Dig. pp. 71-72, 2003.
- [22] K. Tomita, H. Hashimoto, T. Inbe, T. Oashi, K. Tsukamoto, Y. Nishioka, M. Matsuura, T. Eimori, M. Inuishi, I. Miyanaga, M. Nakamura, T. Kishimoto, T. Yamada, K. Eriguchi, H. Yuasa, T. Satake, A. Kajiya, "Sub-1μm<sup>2</sup> High Density Embedded SRAM Technologies for

100nm Generation SOC and Beyond", in. Symp. VLSI Technology Dig., June 2003, pp11-12, 2002.

- [23] H. Yamauchi, T. Iwata, H. Akamatsu, and A. Matsuzawa, "A 0.5 V Single Power Supply Operated High-Speed Boosted and Offset-Grounded Data Storage (BOGS) SRAM Cell Architecture," IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 5, no. 4, pp. 377-387, Dec. 1997.
- [24] K. Nii, H. Makino, Y. Tujihashi, C. Morishima, Y. Hayakawa, H. Nunogami, T. Arakawa, and H. Hamano, "A Low-Power SRAM using Auto-Backgate-Controlled MT-CMOS," in Proc. Int. Symp. Low Power Electronics and Devices, Aug. 1998, pp. 293-298.
- [25] K. Osada, J. Shin, M. Khan, Y. Liou, K. Wang, K. Shoji, K. Kuroda, S. Ikeda and K. Ishibashi, "Universal-Vdd 0.65-2.0V 32KB Cache using Voltage-Adapted Timing Generation Scheme and Lithographical Symmetric Cell, " in ISSCC Dig. Tech. Papers, Feb. 2001.
- [26] A. Agarwal, H. Li, Kaushik Roy, "A Single-Vt Low-Leakage Gated-Ground Cache for Deep Submicron," IEEE JSSC, Vol.38, No.2, pp319-328, Feb., 2003.
- [27] K. Osada, Y. Saitoh, Eishi Ibe, K. Ishibashi, "16.7fA/cell Tunnel-Leakage-Suppressed 16Mb SRAM for Handling Cosmic-Ray-Induced Multi-Errors," ISSCC Digest of Technical Papers, pp302-303, 2003.
- [28] A. J. Bhavnagarwala, S. V. Kosonocky, M. Immediato, D. Knebel, and A. Haen, "A Pico-Joule Class, 1 GHz, 32 KByte x 64b DSP SRAM with Self Reverse Bias," in Symp. VLSI Circuits Dig., June 2003, pp251-252, 2003.
- [29] T. Enomoto, Y. Oka, H. Shikano, "A Self-Controllable Voltage Level (SVL) Circuit and its Low-Power High-Speed CMOS Circuit Applications," IEEE Journal of Solid-State Circuits, Vol. 38,no. 7, pp. 1220-1226, July 2003.
- [30] A. J. Bhavnagarwala, X. Tang, J. D. Meindl, "The Impact of Intrinsic Device Fluctuations on CMOS SRAM Cell Stability", IEEE JSSC, VOL. 36, No.4, April 2001.
- [31] S. Matsumoto, A. Ishii, K. Tomita, K. Hashimoto, Y. Nishioka, M. Sekiguchi, A. Iwasaki, S. Isono, T. Satake, G. Okazaki, M. Fujiwara, M. Matsumoto, S. Yamamoto and M. Matsuura, "Reliability Improvement of 90nm-node Cu/Low-k Interconnects", in Proc. IITC, pp. 262-264, 2003.
- [32] Y. Tsukamoto, K. Nii, Y. Yamagami, T. Yoshizawa, S. Imaoka, T. Suzuki, A. Shibayama and H. Makino, "Comparison of the Interconnect Capacitances for Various SRAM Cell Layouts to Achieve High Speed, Low Power SRAM Cells", in Proc. of Solid State Devices and Materials (SSDM), pp. 22-23, 2003.
- [33] M. J. Pelgrom, A.C.J. Duinmaijer, A. P. G. Welbers, "Matching properties of MOS transistors," IEEE Journal of Solid-State Circuits, Vol. 24, No. 5, pp. 1433-1440, Oct. 1989.
- [34] P. A. Stolk, F.P. Widdershoven, D.B.M. Klaassen, "Modeling statistical dopant fluctuations in MOS transistors," IEEE Trans. On Electron Devices, Vol. 45, No.9, pp. 1960-1971, 1998.
- [35] J.A. Croon, G. Storms, S. Winkelmeier, I. Pollentier, M. Ercken, S. Decoutere, W. Sansen, H.E. Maes, "Line edge roughness: characterization, modeling and impact on device behavior," in IEEE IEDM Dig. Tech. Papers, pp. 307-310, Dec. 2002.
- [36] A. J. Bhavnagarwala, Xinghai Tang, J. D. Meindl, "The impact of intrinsic device fluctuations on CMOS SRAM cell stability," IEEE Journal of Solid-State Circuits, Vol. 36, No. 4, pp. 658-665, Apr. 2001.
- [37] International Technology Roadmap for Semiconductors, 2005, by the Semiconductors Industry Association & SEMATECH.
- [38] E. Josse, S. Parihar, O. Callen, P. Ferreira, C. Monget, A. Farcy, M. Zaleski, D. Villanueva, R. Ranica, M. Bidaud, D. Barge, C. Laviron, N. Auriac, C. Le Cam, S. Harrison, S. Warrick, F. Leverd, P. Gouraud, S. Zoll, F. Guyader, E. Perrin, E. Baylac, J. Belledent, B. Icard, B. Minghetti, S. Manakli, L. Pain, V. Huard, G. Ribes, K. Rochereau, S. Bordez, C. Blanc, A. Margain, D. Delille, R. Pantel, K. Barla, N. Cave, M. Haond, "A Cost-Effective Low-Power Platform For The 45-nm Technology Node," in IEEE IEDM Dig. Tech. Papers, pp. 1-4, Dec. 2006.
- [39] H. Fukutome, Y. Momiyama, T. Kubo, E. Yoshida, H. Morioka, M. Tajima, T. Aoyama, "Suppression of Poly-Gate-Induced Fluctuations in Carrier Profiles of Sub-50nm MOSFETs," in IEEE IEDM Dig. Tech. Papers, pp. 1-4, Dec. 2006.
- [40] T. Hayashi, M. Mizutani, M. Inoue, J. Yugami, J. Tsuchimoto, M. Anma, S. Komori, K. Tsukamoto, Y. Tsukamoto, K. Nii, Y. Nishida, H. Sayama, T. Yamashita, H. Oda, T. Eimori, Y. Ohji, "Vth-tunable CMIS platform with high-k gate dielectrics and variability effect for 45nm node," in IEEE IEDM Dig. Tech. Papers, pp. 906-909, Dec. 2005.
- [41] M. Yamaoka, K. Osada, K. Ishibashi, "0.4-V logic-library-friendly SRAM array using rectangular-diffusion cell and delta-boosted-array voltage scheme," IEEE Journal of Solid-State Circuits, Vol. 39, No. 6, pp. 934-940, June 2004.
- [42] K. Zhang, U. Bhattacharya, C. Zhanping, F. Hamzaoglu, D. Murray, N. Vallepalli, Y. Wang, B. Zheng, M. Bohr, "A 3-GHz 70-mb SRAM in 65-nm CMOS technology with integrated column-based dynamic power supply," IEEE Journal of Solid-State Circuits, Vol. 41, No. 1, pp. 146-151, Jan. 2006.
- [43] M. Yamaoka, N. Maeda, Y. Shinozaki, Y. Shimazaki, K. Nii, S. Shimada, K. Yanagisawa and T. Kawahara, "90-nm process-variation adaptive embedded SRAM modules with power-line-floating write technique," Journal of Solid-State Circuits, Vol. 41, No. 3, pp. 705-711, March 2006.
- [44] M. Khellah, Y. Ye, N. Kim, D. Somasekhar, G.. Pandya, A. Farhang, K. Zhang, C. Webb, V. De, "Wordline & Bitline Pulsing Schemes for Improving SRAM Cell Stability in Low-Vcc 65nm CMOS Designs," in Symp. VLSI Circuits Digest of Technical Papers, pp. 9-10, June 2006.
- [45] Harold Pilo, Charlie Barwin, Geordie Braceras, Chris Browning, Steve Lamphier, Fred Towler, "An SRAM Design in 65-nm Technology Node Featuring Read and Write-Assist Circuits to Expand Operating Voltage," IEEE Journal of Solid-State Circuits, Vol. 42, no. 4, pp. 813-819, April 2007.
- [46] Y. Takeyama, H. Otake, O. Hirabayashi, K. Kushida, N. Otsuka, "A low leakage SRAM macro with replica cell biasing scheme," IEEE Journal of Solid-State Circuits, Vol. 41, no. 4, pp. 815-822, April 2006.
- [47] M. Sumita, S. Sakiyama, M. Kinoshita, Y. Araki, Y. Ikeda, K. Fukuoka, "Mixed body bias techniques with fixed Vt and Isub generation circuits," IEEE Journal of Solid-State Circuits, Vol. 40, no. 1, pp. 60-66, Jan. 2005.
- [48] K. Takeda, Y. Hagihara, Y. Aimoto, M. Nomura, Y. Nakazawa, T. Ishii, H. Kobatake, "A read-static-noise-margin-free SRAM cell for low-VDD and high-speed applications," IEEE Journal of Solid-State Circuits, Vol. 41, no. 1, pp. 113-121, Jan. 2006.
- [49] T. Suzuki, H. Yamauchi, Y. Yamagami, K. Satomi, H. Akamatsu, "A Stable SRAM Cell Design Against Simultaneously R/W Disturbed Accesses," in Symp. VLSI Circuits Digest of Technical Papers, pp. 11-12, June 2006.
- [50] N. Shibata, H. Kiya, S. Kurita, H. Okamoto, M. Tanno, T. Douseki, "A 0.5-V 25-MHz 1-mW 256-kb MTCMOS/SOI SRAM for solar-power-operated portable personal digital equipment - sure write operation by using step-down negatively overdriven bitline scheme," IEEE Journal of Solid-State Circuits, Vol. 41, no. 3, pp. 728-742, March 2006.
- [51] S. Ohbayashi, M. Yabuuchi, K. Nii, Y. Tsukamoto, S. Imaoka, Y. Oda, T. Yoshihara, M. Igarashi, M. Takeuchi, H. Kawashima, Y. Yamaguchi, K. Tsukamoto, M. Inuishi, H. Makino, K. Ishibashi, H. Shinohara, "A 65-nm SoC Embedded 6T-SRAM Designed for Manufacturability With Read and Write Operation Stabilizing Circuits," IEEE Journal of Solid-State Circuits, Vol. 42, no. 4, pp. 820-829, April 2007.
- [52] M. Yabuuchi, K. Nii, Y. Tsukamoto, S. Ohbayashi, S. Imaoka, H. Makino, Y. Yamagami, S. Ishikura, T. Terano, T. Oashi, K. Hashimoto, A. Sebe, G. Okazaki, K. Satomi, H. Akamatsu, H. Shinohara, **"**A 45nm Low-Standby-Power Embedded SRAM with Improved Immunity Against Process and Temperature Variations," in IEEE ISSCC Dig. Tech. Papers, pp.326-327, Feb. 2007.
- [53] Y. Tsukamoto, K. Nii, S. Imaoka, Y. Oda, S. Ohbayashi, T. Yoshizawa, H. Makino, K. Ishibashi, H. Shinohara, "Worst-case analysis to obtain stable read/write DC margin of high

density 6T-SRAM-array with local Vth variability," ICCAD Digest, pp. 398-405, 2005.

- [54] R. Heald, P. Wang, "Variability in sub-100nm SRAM designs," ICCAD Digest, pp. 347-352, 2004.
- [55] M. Khellah, D. Somasekhar, Y. Ye, N. S. Kim, J. Howard, G. Ruhl, M. Sunna, J. Tschanz, N. Borkar, F. Hamzaoglu, G. Pandya, A. Farhang, K. Zhang, V. De, "256-Kb Dual-V $_{CC}$ SRAM Building Block in 65-nm CMOS Process With Actively Clamped Sleep Transistor," IEEE Journal of Solid-State Circuits, Vol. 42, no. 1, pp. 233-242, Jan. 2007.
- [56] Y. Morita, H. Noguchi, H. Fujiwara, K. Kawakami, J. Miyakoshi, S. Mikami, K. Nii, H. Kawaguchi, M. Yoshimoto, "A Vth-Variation-Tolerant SRAM with 0.3-V Minimum Operation Voltage for Memory-Rich SoC Under DVS Environment," in Symp. VLSI Circuits Digest of Technical Papers, pp. 16-17, June 2006.
- [57] M. Yamashina, T. Enomoto, T. Kunio, I. Tamitani, H. Harasaki, T. Nishitani, M. Satoh, K. Kikuchi, "A micro programmable real-time video signal processor (VSP) LSI," IEEE Journal of Solid-State Circuits, Vol. 22, No. 6, pp. 1117-1123, Dec. 1987.
- [58] M. Inamori, J. Naganuma, M. Endo, "A memory-based architecture for MPEG2 system protocol LSIs," IEEE Trans. on VLSI Systems, Vol. 7, No. 3, pp. 339-344, Sep. 1999.
- [59] Chi-Weon Yoon, Ramchan Woo, Jeengheon Kook, Se-Joong Lee, Kangmin Lee, Hoi-Jun Yeo, "An 80/20-MHz 160-mW multimedia processor integrated with embedded DRAM, MPEG-4 accelerator and 3-D rendering engine for mobile applications," IEEE Journal of Solid-State Circuits, Vol. 36, No. 11, pp. 1758-1767, Nov. 2001.
- [60] H. -J. Stolberg, S. Moch, L. Friebe, A. Dehnhardt, M.B. Kulaczewski, M. Berekovic, P. Pirsch, "An SoC with two multimedia DSPs and a RISC core for video compression applications," in IEEE ISSCC Dig. Tech. Papers, pp. 330-531, Feb. 2004.
- [61] "The v2.0+EDR Bluetooth SOC architecture for multimedia,"
- [62] T. Shiota, K. Kawasaki, Y. Kawabe, W. Shibamoto, A. Sato, T. Hashimoto, F. Hayakawa, S. Tago, H. Okano, Y. Nakamura, H. Miyake, A. Suga, H. Takahashi, "A 51.2 GOPS 1.0 GB/s-DMA single-chip multi-processor integrating quadruple 8-way VLIW processors," in IEEE ISSCC Dig. Tech. Papers, pp. 194-593, Feb. 2005.
- [63] M. Nakajima, T. Yamamoto, M.Yamasaki, K. Kaneko, T. Hosoki, "Homogeneous Dual-Processor core with Shared L1 Cache for Mobile Multimedia SoC," in VLSI Circuits, 2007. Digest of Technical Papers, pp. 216-217, June 2007.
- [64] Hwang Wei, R.V. Joshi, W.H. Henkels, "A 500-MHz, 32-word×64-bit, eight-port self-resetting CMOS register file," IEEE Journal of Solid-State Circuits, Vol. 37, No. 5, pp. 56-67, May 1999.
- [65] R.K. Krishnamurthy, A. Alvandpour, G. Balamurugan, N.R. Shanbhag, K. Soumyanath, S.Y. Borkar, "A 130-nm 6-GHz 256 × 32 bit leakage-tolerant register file," IEEE Journal of

Solid-State Circuits, Vol. 37, No. 5, pp. 624-632, May 2002.

- [66] N. Tzartzanis, W.W. Walker, "A differential current-mode sensing method for high-noise-immunity, single-ended register files," in IEEE ISSCC Dig. Tech. Papers, pp. 506-543, Feb. 2004.
- [67] M. Miyama, J. Miyakoshi, Y. Kuroda, K. Imamura, H. Hashimoto, M. Yoshimoto, "A sub-mW MPEG-4 motion estimation processor core for mobile video application," IEEE Journal of Solid-State Circuits, Vol. 39, No. 9, pp. 1562-1570, Sep. 2004.
- [68] K. Nii, Y. Tsukamoto, S. Imaoka, H. Makino, "A 90 nm dual-port SRAM with  $2.04 \mu m^2$ 8T-thin cell using dynamically-controlled column bias scheme," in IEEE ISSCC Dig. Tech. Papers, pp. 508-543, Feb. 2004.
- [69] K. Nii, Y. Masuda, M. Yabuuchi, Y. Tsukamoto, S. Ohbayashi, S. Imaoka, M. Igarashi, K. Tomita, N. Tsuboi, H. Makino, K. Ishibashi, H. Shinohara, "A 65 nm Ultra-High-Density Dual-Port SRAM with  $0.71 \mu m^2$  8T-Cell for SoC," in Symposium on VLSI Circuits Digest of Technical Papers, pp. 162-163, June 2006.
- [70] K. Nii, Y. Tsukamoto, T. Yoshizawa, S. Imaoka, Y. Yamagami, T. Suzuki, A. Shibayama, H. Makino and S. Iwade, "A 90-nm Low-Power 32-kB Embedded SRAM With Gate Leakage Suppression Circuit for Mobile Applications," IEEE Journal of Solid-State Circuits, Vol. 39, No. 4, pp. 684-693, April 2004.
- [71] S. Ohbayashi, M. Yabuuchi, K. Nii, Y. Tsukamoto, S. Imaoka, Y. Oda, T. Yoshihara, M. Igarashi, M. Takeuchi, H. Kawashima, Y. Yamaguchi, K. Tsukamoto, M. Inuishi, H. Makino, K. Ishibashi, H. Shinohara, "A 65-nm SoC Embedded 6T-SRAM Designed for Manufacturability With Read and Write Operation Stabilizing Circuits," IEEE Journal of Solid-State Circuits, Vol. 42, No. 4, pp. 820-829, April 2007.
- [72] Y. Tsukamoto, K. Nii, S. Imaoka, Y. Oda, S. Ohbayashi, T. Yoshizawa, H. Makino, K. Ishibashi, H. Shinohara, "Worst-case analysis to obtain stable read/write DC margin of high density 6T-SRAM-array with local Vth variability," ICCAD Digest, pp. 398-405, 2005.
- [73] K. Zhang, U. Bhattacharya, C. Zhanping, F. Hamzaoglu, D. Murray, N. Vallepalli, Y. Wang, B. Zheng, M. Bohr, "A 3-GHz 70-Mb SRAM in 65-nm CMOS technology with integrated column-based dynamic power supply," IEEE Journal of Solid-State Circuits, Vol. 41, No. 1, pp. 146-151, Jan. 2006.
- [74] M. Yamaoka, N. Maeda, Y. Shinozaki, Y. Shimazaki, K. Nii, S. Shimada, K. Yanagisawa T. Kawahara, "90-nm process-variation adaptive embedded SRAM modules with power-line-floating write technique," IEEE Journal of Solid-State Circuits, Vol. 41, No. 3, pp. 705-711, March 2006.
- [75] H. Pilo, C. Barwin, G. Braceras, C. Browning, S. Lamphier, F. Towler, "An SRAM Design in 65-nm Technology Node Featuring Read and Write-Assist Circuits to Expand

Operating Voltage,**"** IEEE Journal of Solid-State Circuits, Vol. 42, no. 4, pp. 813-819, April 2007.

- [76] M. Yabuuchi, K. Nii, Y. Tsukamoto, S. Ohbayashi, S. Imaoka, H. Makino, Y. Yamagami, S. Ishikura, T. Terano, T. Oashi, K. Hashimoto, A. Sebe, G. Okazaki, K. Satomi, H. Akamatsu, H. Shinohara, **"**A 45 nm Low-Standby-Power Embedded SRAM with Improved Immunity Against Process and Temperature Variations," in IEEE ISSCC Dig. Tech. Papers, pp. 326-327, Feb. 2007
- [77] Y. Morita, H. Fujiwara, H. Noguchi, Y. Iguchi, K. Nii, H. Kawaguchi, and M. Yoshimoto, "An Area-Conscious Low-Voltage-Oriented 8T-SRAM Design under DVS Environment," in IEEE VLSI Circuits Symp. Dig., pp. 256-257, June 2007.
- [78] H. Noguchi, Y. Iguchi, H. Fujiwara, Y. Morita, K. Nii, H. Kawaguchi, M. Yoshimoto, "A 10T Non-Precharge Two-Port SRAM for 74% Power Reduction in Video Processing," in Proc. IEEE Computer Society Annual Symp. VLSI (ISVLSI), pp. 107-112, March 2007.
- [79] B. Cheng, S. Roy, and A. Asenov, "The Scalability of 8T-SRAM Cells under the Influence of Intrinsic Parameter Fluctuations," in Proc. IEEE European Solid-State Circuits Conf. (ESSCIRC), pp. 93-96, Sep. 2007.
- [80] Yasuhiro Morita, Hidehiro Fujiwara, Hiroki Noguchi, Yusuke Iguchi, Koji Nii, Hiroshi Kawaguchi, and Masahiko Yoshimoto, "Area Comparison between 6T and 8T SRAM Cells in Dual-Vdd Scheme and DVS Scheme," IEICE Trans. on Fundamentals, Vol. E90-A, No. 12, pp. 2695-2702, Dec. 2007.
- [81] Y. Hirano, M. Tsujiuchi, K. Ishikawa, H. Shinohara, T. Terada, Y. Maki, T. Iwamatsu, K. Eikyu, T. Uchida, S. Obayashi, K. Nii, Y. Tsukamoto, M. Yabuuchi, T. Ipposhi, H. Oda, and Y. Inoue, "A Robust SOI SRAM Architecture by using Advanced ABC technology for 32nm node and beyond LSTP devices," in IEEE VLSI Technology Symp. Dig., pp. 78-79, June 2007.

## **Publications**

### **Major Papers**

- 1) Koji Nii, Yasumasa Tsukamoto, Makoto Yabuuchi, Yasuhiro Masuda, Susumu Imaoka, Keiichi Usui, Shigeki Ohbayashi, Hiroshi Makino, and Hirofumi Shinohara, "Synchronous Ultra-High-Density 2RW Dual-Port 8T-SRAM with Circumvention of Simultaneous Common-Row-Access, " submitted to IEEE Journal of Solid-State Circuits (under review process)
- 2) Hirofumi Shinohara, Koji Nii, and Hidetoshi Onodera, "Analytical Model of Static-Noise-Margin in CMOS SRAM for Variation Consideration," submitted to IEICE Trans. on Electronics (under review process)
- 3) Masako Fujii, Koji Nii, H. Makino, S. Ohbayashi, M. Igarashi, T. Kawamura, M. Yokota,, N. Tsuda, T. Yoshizawa, T. Tsutsui, N. Takeshita, N. Murata, T. Tanaka, T. Fujiwara, K. Asahina, M. Okada, K. Tomita, M. Takeuchi, S. Yamamoto, H. Sugimoto, and H. Shinohara, A Large Scale, Flip-Flop RAM imitating a logic LSI for fast development of process technology," submitted to IEICE Trans. on Electronics (under review process)
- 4) Satoshi Ishikura, S. Ishikura, M. Kurumada, T. Terano, Y. Yamagami, N. Kotani, K. Satomi, K. Nii, M. Yabuuchi, Y. Tsukamoto, S. Ohbayashi, T. Oashi, H. Makino, H. Shinohara, and H. Akamatsu, **"**A 45nm 2port 8T-SRAM using hierarchical replica bitline technique with immunity from simultaneous R/W access issues**, "**IEEE Journal of Solid-State Circuits, 2008 to be published.
- 5) Hidehiro Fujiwara, Koji Nii, Hiroki Noguchi, Junichi Miyakoshi, Yuichiro Murachi, Yasuhiro Morita, Hiroshi Kawaguchi, and Masahiko Yoshimoto, "A Novel Video Memory Reducing 45% of Bitline Power with Majority Logic and Data-Bit Reordering," IEEE Trans. on VLSI Systems, 2008 to be published.
- 6) Koji Nii, Makoto Yabuuchi, Yasumasa Tsukamoto, Shigeki Ohbayashi, Susumu Imaoka, Hiroshi Makino, Yoshinobu Yamagami, Satoshi Ishikura, Toshio Terano, Toshiyuki Oashi, Keiji Hashimoto, Akio Sebe, Gen Okazaki, Katsuji Satomi, Hironori Akamatsu, and Hirofumi Shinohara, **"**A 45-nm Bulk CMOS Embedded SRAM with Improved Immunity against Process and Temperature Variations**",** IEEE Journal of Solid-State Circuits, Vol. 42, no. 1, pp. 180-191, Jan. 2008.
- 7) Shigeki Ohbayashi, Makoto Yabuuchi, Kazushi Kono, Yuji Oda, Susumu Imaoka, Keiichi Usui, Toshiaki Yonezu, Takeshi Iwamoto, Koji Nii, Yasumasa Tsukamoto, Masashi Arakawa, Takahiro Uchida, Masakazu Okada, Atsushi Ishii, Tsutomu Yoshihara, Hiroshi Makino, Koichiro Ishibashi, and Hirofumi Shinohara, "A 65 nm Embedded SRAM with Wafer Level

Burn-In Mode, Leak-Bit Redundancy and Cu E-trim Fuse for Known Good Die," IEEE Journal of Solid-State Circuits, Vol. 42, no. 1, pp. 96-108, Jan. 2008.

- 8) Hiroki Noguchi, Shunsuke Okumura, Yusuke Iguchi, Hidehiro Fujiwara, Yasuhiro Morita, Koji Nii, Hiroshi Kawaguchi, and Masahiko Yoshimoto, "A 10T Non-Precharge Two-Port SRAM Reducing Readout Power for Video Processing," IEICE Trans. on Electronics, April 2008 to be published.
- 9) Y. Hirano, M. Tsujiuchi, Y. Maki, T. Iwamatsu, Y. Ishii, A. Miyanishi, Y. Tsukamoto, K. Nii, T. Ipposhi, H. Oda, S. Maegawa, and Y. Inoue, "A Novel Low-Power and High-Speed SOI SRAM With Actively Body-Bias Controlled (ABC) Technology for Emerging Generations," IEEE Trans. on Electron Devices, Vol. 55, No. 1, pp. 365-371, Jan. 2008.
- 10) Yasuhiro Morita, Hidehiro Fujiwara, Hiroki Noguchi, Yusuke Iguchi, Koji Nii, Hiroshi Kawaguchi, and Masahiko Yoshimoto, "Area Comparison between 6T and 8T SRAM Cells in Dual-Vdd Scheme and DVS Scheme," IEICE Trans. on Fundamentals, Vol. E90-A, No. 12, pp. 2695-2702, Dec. 2007.
- 11) Yasuhiro Morita, Hidehiro Fujiwara, Hiroki Noguchi, Yusuke Iguchi, Koji Nii, Hiroshi Kawaguchi, and Masahiko Yoshimoto, "Area Optimization in 6T and 8T SRAM Cells Considering Vth Variation in Future Processes," IEICE Trans. on Electronics, Vol. E90-C, No. 10, pp. 1949-1956, Oct. 2007.
- 12) S. Ohbayashi, M. Yabuuchi, K. Nii, Y. Tsukamoto, S. Imaoka, Y. Oda, T. Yoshihara, M. Igarashi, M. Takeuchi, H. Kawashima, Y. Yamaguchi, K. Tsukamoto, M. Inuishi, H. Makino, K. Ishibashi, and H. Shinohara, "A 65-nm SoC Embedded 6T-SRAM Designed for Manufacturability With Read and Write Operation Stabilizing Circuits," IEEE Journal of Solid-State Circuits, Vol. 42, no. 4, pp. 820-829, April 2007.
- 13) Yasuhiro Morita, Hidehiro Fujiwara, Hiroki Noguchi, Kentaro Kawakami, Junichi Miyakoshi, Shinji Mikami, Koji Nii, Hiroshi Kawaguchi, and Masahiko Yoshimoto, "A 0.3-V Operating, Vth-Variation-Tolerant SRAM under DVS Environment for Memory-Rich SoC in 90-nm Technology Era and Beyond," IEICE Trans. on Fundamentals, Vol. E89-A, No. 12, pp. 3634-3641, Dec. 2006.
- 14) M. Yamaoka, N. Maeda, Y. Shinozaki, Y. Shimazaki, K. Nii, S. Shimada, K. Yanagisawa, and T. Kawahara, "90-nm process-variation adaptive embedded SRAM modules with power-line-floating write technique," IEEE Journal of Solid-State Circuits, Vol. 41, No. 3, pp. 705-711, March 2006.
- 15) K. Nii, Y. Tsukamoto, T. Yoshizawa, S. Imaoka, Y. Yamagami, T. Suzuki, A. Shibayama, H. Makino, and S. Iwade, "A 90-nm Low-Power 32-kB Embedded SRAM With Gate Leakage Suppression Circuit for Mobile Applications," IEEE Journal of Solid-State Circuits, Vol. 39, No. 4, pp. 684-693, April 2004.
- 16) Hisakazu Sato, Yasuhiro Nunomura, Niichi Itoh, Koji Nii, Kanako Yoshida, Hironobu Ito, Jingo Nakanishi, Hidehiro Takata, Yasunobu Nakase, Hiroshi Makino, Akira Yamada,

Takahiko Arakawa, Toru Shimizu, Yuichi Hirano, Takashi Ipposhi, and Shuhei Iwade, "A Low-Power Microcontroller with Body-Tied SOI Technology," IEICE Trans. on Electronics, Vol. E87-C, No. 4, pp. 563-570, April 2004.

- 17) Yasumasa Tsukamoto, Tatsuya Kunikiyo, Koji Nii, Hiroshi Makino, Shuhei Iwade, Kiyoshi Ishikawa, Yasuo Inoue, and Norihiko Kotani, "Realistic Scaling Scenario for Sub-100 nm Embedded SRAM Based on 3-Dimensional Interconnect Simulation," IEICE Trans. on Electronics, Vol. E86-C, No. 3, pp. 439-446, March 2003.
- 18) Y. Hirano, S. Maeda, T. Matsumoto, K. Nii, T. Iwamatsu, Y. Yamaguchi, T. Ipposhi, H. Kawashima, S. Maegawa, M. Inuishi, and T. Nishimura, "Bulk-Layout-Compatible 0.18-μm SOI-CMOS Technology Using Body-Tied Partial-Trench-Isolation (PTI)," IEEE Trans. on Electron Devices, Vol. 48, No. 12, pp. 2816-2822, Dec. 2001.
- 19) Kimio Ueda, Koji Nii, Yoshiki Wada, Shigenobu Maeda, Toshiaki Iwamatsu, Yasuo Yamaguchi, Takashi Ipposhi, Shigeto Maegawa, Koichiro Mashiko, and Yasutaka Horiba, "A CAD-Compatible SOI-CMOS Gate Array Using 0.35μm Partially-Depleted Transistors," IEICE Trans. on Electronics, Vol. E83-C, No. 2, pp. 205-211, Feb. 2000.
- 20) S. Maeda, Y. Yamaguchi, I.-J. Kim, T. Iwamatu, T. Ipposhi, S.Miyamoto, S. Maegawa, K. Ueda, K. Nii, K. Mashiko, Y. Inoue, T. Nishimura, and H. Miyoshi, "Analysis of delay time instability according to the operating frequency in field shield isolated SOI circuits," IEEE Trans. on Electron Devices, Vol. 45, No. 7, pp. 1479-1486, July 1998.
- 21) K. Nii, H. Maeno, T. Osawa, S. Iwade, S. Kayano, and H. Shibata, "A Novel Memory Cell for Multiport RAM on 0.5μm CMOS Sea-of-Gates," IEEE Journal of Solid-State Circuits, Vol. 30, No. 3, pp. 316-320, March 1995.

#### **Major Conferences**

- 1) K. Nii, M. Yabuuchi, Y. Tsukamoto, S. Ohbayashi, Y. Oda, K. Usui, T. Kawamura, N. Tsuboi, T. Iwasaki, K. Hashimoto, H. Makino, and H. Shinohara, "A 45-nm Single-port and Dual-port SRAM family with Robust Read/Write Stabilizing Circuitry under DVFS Environment," submitted to IEEE VLSI Circuits Symp., 2008.
- 2) S. Ishikura, M. Kurumada, T. Terano, Y. Yamagami, N. Kotani, K. Satomi, K. Nii, M. Yabuuchi, Y. Tsukamoto, S. Ohbayashi, T. Oashi, H. Makino, H. Shinohara, and H. Akamatsu, "A 45nm 2port 8T-SRAM using hierarchical replica bitline technique with immunity from simultaneous R/W access issues VLSI Circuits," in IEEE VLSI Circuits Symp. Dig., pp. 254-255, June 2007.
- 3) Y. Morita, H. Fujiwara, H. Noguchi, Y. Iguchi, K. Nii, H. Kawaguchi, and M. Yoshimoto, "An Area-Conscious Low-Voltage-Oriented 8T-SRAM Design under DVS Environment," in IEEE VLSI Circuits Symp. Dig., pp. 256-257, June 2007.
- 4) Y. Hirano, M. Tsujiuchi, K. Ishikawa, H. Shinohara, T. Terada, Y. Maki, T. Iwamatsu, K. Eikyu, T. Uchida, S. Obayashi, K. Nii, Y. Tsukamoto, M. Yabuuchi, T. Ipposhi, H. Oda, and Y. Inoue, "A Robust SOI SRAM Architecture by using Advanced ABC technology for 32nm node and beyond LSTP devices," in IEEE VLSI Technology Symp. Dig., pp. 78-79, June 2007.
- 5) H. Noguchi, Y. Iguchi, H. Fujiwara, Y. Morita, K. Nii, H. Kawaguchi, M. Yoshimoto, "A 10T Non-Precharge Two-Port SRAM for 74% Power Reduction in Video Processing," in Proc. IEEE Computer Society Annual Symp. VLSI (ISVLSI), pp. 107-112, March 2007.
- 6) M. Fujii, K. Nii, H. Makino, S. Ohbayashi, M. Igarashi, T. Kawamura, M. Yokota, N. Tsuda, T. Yoshizawa, T. Tsutsui, N. Takeshita, N. Murata, T. Tanaka, T. Fujiwara, K. Asahina, M. Okada, K. Tomita, M. Takeuchi, and H. Shinohara, "A Large Scale, Flip-Flop RAM imitating a logic LSI for fast development of process technology," in Proc. IEEE Int. Conf. Microelectronic Test Structures (ICMTS), pp. 131-134, March 2007.
- 7) M. Yabuuchi, K. Nii, Y. Tsukamoto, S. Ohbayashi, S. Imaoka, H. Makino, Y. Yamagami, S. lshikura, T. Terano, T. Oashi, K. Hashimoto, A. Sebe, G. Okazaki, K. Satomi, H. Akamatsu, and H. Shinohara, "A 45nm Low-Standby-Power Embedded SRAM with Improved Immunity Against Process and Temperature Variations," in IEEE ISSCC Dig. Tech. Papers, pp. 326-327,606, Feb. 2007.
- 8) S. Ohbayashi, M. Yabuuchi, K. Kono, Y. Oda, S. Imaoka, K. Usui, T. Yonezu, T. Iwamoto, K. Nii, Y. Tsukamoto, M. Arakawa, T. Uchida, M. Okada, A. Ishii, H. Makino, K. Ishibashi, H. Shinohara, "A 65nm Embedded SRAM with Wafer-Level Burn-In Mode, Leak-Bit Redundancy and E-Trim Fuse for Known Good Die Solid-State Circuits," in IEEE ISSCC Dig. Tech. Papers, pp. 488-489, 617, Feb. 2007.
- 9) H. Fujiwara, K. Nii, J. Miyakoshi, Y. Murachi, Y. Morita, H. Kawaguchi, and M. Yoshimoto,

"A Two-Port SRAM for Real-Time Video Processor Saving 53% of Bitline Power with Majority Logic and Data-Bit Reordering," in Proc. Int. Symp. Low Power Electronics and Devices (ISLPED), pp. 61-66, Oct. 2006.

- 10) K. Nii, Y. Masuda, M. Yabuuchi, Y. Tsukamoto, S. Ohbayashi, S. Imaoka, M. Igarashi, K. Tomita, N. Tsuboi, H. Makino, K. Ishibashi, H. Shinohara, "A 65 nm Ultra-High-Density Dual-Port SRAM with  $0.71 \mu m^2$  8T-Cell for SoC," in Symposium on VLSI Circuits Digest of Technical Papers, pp. 130-131, June 2006.
- 11) S. Ohbayashi, M. Yabuuchi, K. Nii, Y. Tsukamoto, S. Imaoka, Y. Oda, M. Igarashi, M. Takeuchi, H. Kawashima, H. Makino, Y. Yamaguchi, K. Tsukamoto, M. Inuishi, K. Ishibashi, and H. Shinohara, "A 65 nm SoC Embedded 6T-SRAM Design for Manufacturing with Read and Write Cell Stabilizing Circuits VLSI Circuits," in VLSI Circuits Symp. Dig., pp. 20-21, June 2006.
- 12) Y. Morita, H. Fujiwara, H. Noguchi, K. Kawakami, J. Miyakoshi, S. Mikami, K. Nii, H. Kawaguchi, M. Yoshimoto, "A Vth-Variation-Tolerant SRAM with 0.3-V Minimum Operation Voltage for Memory-Rich SoC Under DVS Environment," in VLSI Circuits Symp. Dig., pp. 16-17, June 2006.
- 13) T. Hayashi, M. Mizutani, M. Inoue, J. Yugami, J. Tsuchimoto, M. Anma, S. Komori, K. Tsukamoto, Y. Tsukamoto, K. Nii, Y. Nishida, H. Sayama, T. Yamashita, H. Oda, T. Eimori, Y. Ohji, **"**Vth-tunable CMIS platform with high-k gate dielectrics and variability effect for 45nm node," in Proc. IEEE Int. Electron Devices Meeting (IEDM), pp. 906-909, Dec. 2005.
- 14) Y. Tsukamoto, K. Nii, S. Imaoka, Y. Oda, S. Ohbayashi, T. Yoshizawa, H. Makino, K. Ishibashi, and H. Shinohara, "Worst-case analysis to obtain stable read/write DC margin of high density 6T-SRAM-array with local Vth variability," in Proc. IEEE/ACM Int. Conf. on Computer-Aided Design (ICCAD), pp. 398-405, Nov. 2005.
- 15) M. Yamaoka, N. Maeda, Y. Shinozaki, Y. Shimazaki, K. Nii, S. Shimada, K. Yanagisawa, T. Kawahara, "Low-power embedded SRAM modules with expanded margins for writing Solid-State Circuits," in IEEE ISSCC Dig. Tech. Papers, pp. 480-481, 611, Feb. 2005.
- 16) K. Nii, Y. Tsukamoto, S. Imaoka, H. Makino, "A 90 nm dual-port SRAM with 2.04 μm<sup>2</sup> 8T-thin cell using dynamically-controlled column bias scheme," in IEEE ISSCC Dig. Tech. Papers, pp. 508-509, 543, Feb. 2004.
- 17) Y. Hirano, T. Ipposhi, H. Dang, T. Matsumoto, T. Iwamatsu, K. Nii, Y. Tsukamoto, T. Yoshizawa, H. Kato, S. Maegawa, K. Arimoto, Y. Inoue, M. Inuishi, and Y. Ohji, "Impact of actively body-bias controlled (ABC) SOI SRAM by using direct body contact technology for low-voltage application," in Proc. IEEE Int. Electron Devices Meeting (IEDM), pp. 35-38, Dec. 2003.
- 18) K. Nii, Y. Tenoh, T. Yoshizawa, S. Imaoka, Y. Tsukamoto, Y. Yamagami, T. Suzuki, A. Shibayama, H. Makino, and S. Iwade, "A 90 nm low power 32 K-byte embedded SRAM with gate leakage suppression circuit for mobile applications," in VLSI Circuits Symp. Dig.,

pp. 247-250, June 2003.

- 19) H. Sato, N. Itoh, K. Nii, K. Yoshida, Y. Nakase, H. Makino, A. Yamada, T. Arakawa, S. Iwade, Y. Hirano, T. Ipposhi, "A 400MHz 183mW microcontroller in body-tied SOI technology," in IEEE ISSCC Dig. Tech. Papers, pp. 110-111, 481, Feb. 2003.
- 20) Y. Tsukamoto, T. Kunikiyo, K. Nii, H. Makino, S. Iwade, K. Ishikawa, and Y. Inoue, "Realistic scaling scenario for sub-100nm embedded SRAM based on 3-dimensional interconnect simulation," in Proc. Int. Conf. Simulation of Semiconductor Processes and Devices (SISPAD), pp. 63-66, Sep. 2002.
- 21) Y. Hirano, T. Iwamatsu, K. Shiga, K. Nii, K. Sonoda, T. Matsumoto, S. Maeda, Y. Yamaguchi, T. Ipposhi, S. Maegawa, and Y. Inoue, "High soft-error tolerance body-tied SOI technology with partial trench isolation (PTI) for next generation devices," in VLSI Technology Symp. Dig., pp. 48-49, June 2002.
- 22) Y. Hirano, T. Matsumoto, S. Maeda, T. Iwamatsu, T. Kunikiyo, K. Nii, K. Yamamoto, Y. Yamaguchi, T. Ipposhi, S. Maegawa, M. Inuishi, "Impact of 0.10 μm SOI CMOS with body-tied hybrid trench isolation structure to break through the scaling crisis of silicon technology," in Proc. IEEE Int. Electron Devices Meeting (IEDM), pp. 467-470, Dec. 2000.
- 23) Y. Hirano, S. Maeda, T. Matsumoto, K. Nii, T. Iwamatsu, Y. Yamaguchi, T. Ipposhi, H. Kawashima, S. Maegawa, M. Inuishi, T. Nishimura, "Bulk-layout-compatible 0.18 μm SOI-CMOS technology using body-fixed partial trench isolation (PTI)," in Proc. IEEE Int. SOI Conf., pp. 131-132, Oct. 1999.
- 24) Y. Wada, K. Nii, H. Kuriyama, S. Maeda, K. Ueda, and Y. Matsuda, "A 128 kb SRAM with soft error immunity for 0.35 μm SOI-CMOS embedded cell arrays," in Proc. IEEE Int. SOI Conf., pp. 127-128, Oct. 1998.
- 25) C. Morishima, K. Nii, Y. Tsujihashi, Y. Hayakawa, and H. Makino, "A 1-V 20-ns 512-Kbit MT-CMOS SRAM with auto-power-cut scheme using dummy memory cells," in Proc. IEEE European Solid-State Circuits Conf. (ESSCIRC), pp. 452-455, Sep. 1998.
- 26) K. Nii, H. Makino, Y. Tujihashi, C. Morishima, Y. Hayakawa, H. Nunogami, T. Arakawa, and H. Hamano, "A Low-Power SRAM using Auto-Backgate-Controlled MT-CMOS," in Proc. Int. Symp. Low Power Electronics and Devices (ISLPED), pp. 293-298, Aug. 1998.
- 27) H. Makino, Y. Tsujihashi, K. Nii, C. Morishima, Y. Hayakawa, T. Shimizu, and A. Arakawa, "An Auto-Backgate-Controlled MT-CMOS Circuit," in VLSI Circuits Symp. Dig., pp. 42-43, June 1998.
- 28) K. Mashiko, K. Ueda, K. Nii, Y. Wada, T. Hirota, S. Maeda, T. Iwamatsu, Y. Yamaguchi, T. Ipposhi, S. Maegawa, H. Hamano, "A 0.35 μm 560 KG SOI/CMOS gate array using field-shield isolation technique," in Proc. IEEE Int. SOI Conf., pp. 166-167, Oct. 1997.
- 29) K. Nii, K. Ueda, Y. Wada, S. Iwade, H. Hamano and K. Tsuchihashi, "A High Speed SRAM macro for 0.35 μm Low Voltage SOI/CMOS Gate Arrays," in Proc. IEEE European Solid-State Circuits Conf. (ESSCIRC), pp. 196-199, Sep. 1997.
- 30) S. Maeda, Y. Yamaguchi, I.-J. Kim, T. Iwamatsu, T. Ipposhi, S. Miyamoto, Y. Hirano, K. Ueda, K. Nii, K. Mashiko, S. Maegawa, Y. Inoue, and T. Nishimura, "A Highly Reliable 0.35μm Field-shield Body-tied SOI Gate Array For Substrate-bias-effect Free Operation," in VLSI Technology Symp. Dig., pp. 93-94, June 1997.
- 31) K. Ueda, K. Nii, Y. Wada, I. Takimoto, S. Maeda, T. Iwamatsu, Y. Yamaguchi, S. Maegawa, K. Mashiko, and H. Hamano, "A CAD compatible SOI/CMOS gate array having body-fixed partially-depleted transistors," in IEEE ISSCC Dig. Tech. Papers, pp. 288-289, 472, Feb. 1997.
- 32) S. Maeda, Y. Yamaguchi, I-J. Kim, T. Iwamatsu, T. Ipposhi, S. Miyamoto, S. Maegawa, K. Ueda, K. Nii, K. Mashiko, Y. Inoue, and H. Miyoshi, "Suppression of delay time instability on frequency using field shield isolation technology for deep sub-micron SOI circuits," in Proc. IEEE Int. Electron Devices Meeting (IEDM), pp. 129-132, Dec. 1996.
- 33) K. Nii, H. Maeno, T. Osawa, and S. Iwade, "A multi-port RAM generator with novel memory cell for CMOS Sea-of-Gates," in Proc. IEEE Custom Integrated Circuits Conf. (CICC), pp. 667-670, May 1994.
- 34) H. Maeno, K. Nii, S. Sakayanagi, and S. Kato, "LSSD COMPATIBLE AND CONCURRENTLY TESTABLE RAM," in Proc. Int. Test Conf. (ITC), pp. 608, Sep. 1992.

# **Patents**

### **United States Patents**



- 27) 6,741,492 Semiconductor memory device
- 28) 6,717,842 Static type semiconductor memory device with dummy memory cell
- 29) 6,717,223 Semiconductor device with integrally formed well contact areas
- 30) 6,710,412 Static semiconductor memory device
- 31) 6,693,820 Soft error resistant semiconductor memory device
- 32) 6,690,608 Semiconductor memory device with internal data reading timing set precisely
- 33) 6,670,262 Method of manufacturing semiconductor device
- 34) 6,643,167 Semiconductor memory
- 35) 6,627,960 Semiconductor data storage apparatus
- 36) 6,590,802 Semiconductor storage apparatus
- 37) 6,535,453 Semiconductor memory device
- 38) 6,535,417 Semiconductor storage device
- 39) 6,529,401 Semiconductor memory
- 40) 6,504,788 Semiconductor memory with improved soft error resistance
- 41) 6,493,256 Semiconductor memory device
- 42) 6,347,062 Semiconductor memory device
- 43) 6,046,949 Semiconductor integrated circuit
- 44) 6,043,521 Layout pattern of memory cell circuit
- 45) 5,793,681 Multiport memory cell circuit having read buffer for reducing read access time
- 46) 5,777,929 Multiport memory cell circuit having read buffer for reducing read access time
- 47) 5,748,541 Latch circuit operating in synchronization with clock signals
- 48) 5,712,630 High power moving object identification system
- 49) 5,684,743 Multiport memory cell circuit having read buffer for reducing read access time
- 50) RE35,591 Memory cell array semiconductor integrated circuit device
- 51) 5,654,914 Memory cell array semiconductor integrated circuit device
- 52) 5,535,159 Multiport memory cell circuit having read buffer for reducing read access time
- 53) 5,471,420 Memory cell array semiconductor integrated circuit device
- 54) 5,420,813 Multiport memory cell circuit having read buffer for reducing read access time

#### **Japanese Patents**

1) 特開 2008-047698 半導体記憶装置 2) 特開 2008-047180 半導体記憶装置 3) 特開 2007-258507 信頼性評価用半導体装置 4) 特開 2007-066493 半導体記憶装置 5) 特開 2007-019166 半導体記憶装置 6) 特開 2007-004960 半導体記憶装置 7) 特開 2006-339480 半導体記憶装置 8) 特開 2006-203756 データ処理装置、データ処理システム、およびデータ処理方法 9) 特開 2006-127669 半導体記憶装置 10) 特開 2006-099937 半導体装置 11) 特開 2006-085786 半導体集積回路装置 12) 特開 2006-085785 半導体集積回路装置 13) 特開 2006-049784 半導体記憶装置及びその製造方法 14) 特開 2006-032375 信頼性評価用半導体装置 15) 特開 2005-051061 信頼性評価用半導体装置 16) 特開 2005-044456 半導体記憶装置 17) 特開 2004-362695 半導体記憶装置 18) 特開 2004-335535 半導体記憶装置 19) 特開 2004-303340 半導体記憶装置 20) 特開 2004-192694 半導体記憶装置 21) 特開 2004-104754 半導体装置 22) 特開 2004-095058 半導体記憶装置 23) 特開 2004-079897 スタティック型半導体記憶装置 24) 特開 2004-071118 スタティック型半導体記憶装置 25) 特開 2004-047003 記憶装置 26) 特開 2004-022809 半導体記憶装置 27) 特開 2004-022070 半導体記憶装置 28) 特開 2003-323792 半導体記憶装置 29) 特開 2003-297953 半導体記憶装置 30) 特開 2003-273250 半導体記憶装置 31) 特開 2003-218238 半導体記憶装置 32) 特開 2003-173681 半導体メモリ回路およびラッチ回路 33) 特開 2003-152111 半導体記憶装置 34) 特開 2003-077294 メモリ回路 35) 特開 2003-060089 半導体記憶装置 36) 特開 2003-060088 半導体記憶装置

37) 特開 2003-051191 半導体記憶装置 38) 特開 2003-030988 半導体記憶回路 39) 特開 2002-367925 半導体装置の製造方法 40) 特開 2002-359298 半導体記憶装置 41) 特開 2002-353413 半導体記憶装置 42) 特開 2002-270701 半導体装置 43) 特開 2002-237539 半導体記憶装置 44) 特開 2002-074964 半導体記憶装置 45) 特開 2002-050183 半導体記憶装置 46) 特開 2002-043441 半導体記憶装置 47) 特開平 11-242886 半導体集積回路 48) 特開平 11-185475 プリチャージ回路 49) 特開平 11-054632 メモリセルのレイアウトパターン 50) 特開平 09-307410 ラッチ回路 51) 特開平 08-195084 メモリセル回路の配置配線 52) 特開平 08-045294 半導体記憶装置 53) 特開平 06-252366 メモリセルアレイ半導体集積回路装置 54) 特開平 06-215580 メモリセル回路 55) 特開平 06-196670 半導体集積回路装置 56) 特開平 06-103774 メモリセル回路 57) 特開平 06-028862 半導体記憶回路装置 58) 特開平 05-325566 センスアンプ回路 59) 特開平 05-290577 半導体集積回路装置 60) 特許 3630847 ラッチ回路 61) 特許 3493058 半導体記憶装置 62) 特許 3214132 メモリセルアレイ半導体集積回路装置 63) 特許 2888701 センスアンプ回路 64) 特許 2871962 半導体記憶回路装置

65) 特許 2667941 メモリセル回路
## **Acknowledgements**

I would like to express my gratitude to Professor Masahiko Yoshimoto of Kobe University for providing me the great opportunity to study in his laboratory and his appropriate guidance and valuable advice for this research. I am grateful as well to Associate Professor Hiroshi Kawaguchi of Kobe University, who gave me fruitful advice related to this research. I also would like to thank Professor Zhiwei Luo and Professor Masahiro Numa for giving me valuable guidance for revising this dissertation. My appreciation also goes to Associate Professor Chikara Ohta and Associate Professor Makoto Nagata.

This thesis comprises the results of experiments that have been undertaken since the author entered the ASIC Design Engineering Center, Mitsubishi Electric Corp. in 1990, and continued on to System LSI Laboratory. The experimental results were obtained at the current Renesas Technology Corp., which was founded after the merger with semiconductor sector of Mitsubishi Electric Corp. and Hitachi Ltd., in 2003, including results obtained from experiments carried out during the three years since 2005 at Kobe University Innovative Software & Silicon Architecture Laboratory as a doctoral student.

I am exceedingly grateful to the Managing Officer & Executive General Manager of Renesas Technology Corp., Dr. Masao Nakaya, for providing me an opportunity to study as a doctoral student and for his encouragement. My thanks also go to successive Executive General Managers, Renesas Technology Corp., Mr. Hisaharu Miwa and Mr. Toshiyasu Akiyama, at the LSI Product Technology Unit. I am grateful to the General Manager of Renesas Technology Corp., Dr. Hirofumi Shinohara, Group Manager Dr. Hiroshi Makino, and Group Manager Dr. Hidehiro Takata for their support and encouragement. I also thank the former Department Manager of Mitsubishi Electric Corp., currently Professor at Osaka Institute of Technology, Professor Shuhei Iwade, as well as the former Department Manager, Mitsubishi Electric Corp., currently Professor at Kanazawa University, Professor Yoshio Matsuda, for positively guiding me into pursuing this degree. I am grateful as well to Chief Engineer Hiroyuki Nunogami and Manager Dr. Takahiko Arakawa for their cooperation in executing this experiment, and to General Manager Dr. Toshifumi Takeda and General Manager Dr. Masahide Inuishi, Deputy General Manager Dr. Kazutami Arimoto, Department Manager Kazumasa Yanagisawa, Department Manager Dr. Koichiro Ishibashi, Department Manager Keiichi Higashitani, Department Manager Dr. Shigeto Maegawa, Department Manager Dr. Yasuo Inoue, and Department Manager Takio Ono for their cooperation and encouragement. I would also like to extend my appreciation to former Professor Takeomi Tamesada and Professor Masaki Hashizume of Tokushima University for showing me the fundamentals of electronics and guiding me to a successful career.

I have enormous appreciation for useful advice and active discussions from my colleagues of Renesas Technology Corp.: Dr. Shigeki Ohbayashi, Dr. Yasumasa Tsukamoto, Mr. Makoto Yabuuchi, Mr. Tomoaki Yoshizawa. I appreciate great design support by Mr. Susumu Imaoka, Yoshihiro Tenoh and Mr. Yasuhiro Masuda of Renesas Design Corp., and Mr. Yuji Oda, of Shikino High-Tech. Corporation. I would like to express my gratitude to Manager Mr. Yasuyuki Okamoto, Mr. Keiichi Usui, Ms. Sumie Usui, Mr. Shoji Okuda, Mr. Akinobu Morikawa, of Daio Electric Ltd., for their useful support for design and evaluation. I also thank my colleagues, Mr. Yoshiki Tsujihasi, Mr. Yasushi Hayakawa, Mr. Hideshi Maeno, Mr. Tokuya Osawa, Mr. Chikayoshi Morishima, and Mr. Nobuhiro Tsuda of Renesas Technology Corp. for contributing to this research.

The M3 collaboration project members were of invaluable help with their active discussions and useful advice: Mr. Yoshihiro Mori, Mr. Tsuguyasu Hatsuda, Mr. Yoshinobu Yamagami, Mr. Satoshi Ishikura, Dr. Toshikazu Suzuki, Mr. Akinori Shibayama, Mr. Toshio Terano, Mr. Akio Sebe, Mr. Gen Okazaki, Mr. Katsuji Satomi, and Mr. Hironori Akamatsu of Matsushita Electric Industrial Co. Ltd. I also received great assistance from many other project members.

I acknowledge Dr. Yasuo Yamaguchi, Mr. Kazuhiro Tsukamoto, Mr. Hiroshi Kawashima, Mr. Kenji Yoshiyama, Mr. Kazuhiro Ohnishi, Mr. Atsushi Ishii, Mr. Masahiko Takeuchi, Mr. Masao Sugiyama, Mr. Tamotsu Ogata, Mr. Toshiyuki Oashi, Mr. Hiroki Shinkawata, Mr. Motoshige Igarashi, Mr. Masakazu Okada, Mr. Takashi Imbe, Mr. Nobuo Tsuboi, Mr. Toshifumi Iwasaki, Mr. Kazuo Tomita, Mr. Keiji Hashimoto, Mr. Hidekazu Oda, Dr. Takashi Ipposhi, Dr. Toshiaki Iwamatsu, and Mr. Yuichi Hirano of Renesas Technology Corp. for accurate guidance and helpful discussions related to device technology. I thank Mr. Naofumi Murata, Mr. Eiji Yoshida, Mr. Tomohiro Tanaka, Mr. Shigehisa Yamamoto, Mr. Hiromitsu Sugimoto as well as Mr. Cozy Ban who helped me greatly in evaluation analysis, and Dr. Masanao Yamaoka from Central Laboratory, Hitachi Ltd. and Mr. Yoshihiro Shinozaki from Hitachi ULSI Systems Co., Ltd., Mr. Ryuichi Sakano, Dr. Yasuhisa Shimazaki, Mr. Takashi Akioka, Mr. Shigheru Shimada, Mr. Hisashi Matsumoto, Mr. Kazuyoshi Okamoto, Mr. Atsushi Miyanishi, Mr. Naofumi Sato, and Mr. Noriaki Maeda from Renesas Technology Corp. for helpful guidance and discussions on SRAM design and evaluation. I also thank my colleagues, Dr. Yasunobu Nakase, Mr. Masanori Kurimoto, Ms. Hiromi Notani, Mr. Jingo Nakanishi, Dr. Hiroaki Suzuki, Ms. Masako Fujii, Mr. Issei Kashima, and Mr. Daisuke Inoue for discussions related to the advanced circuit design.

I must acknowledge as well the research members who discussed and supported my research: Dr. Yuichiro Murachi, Dr. Yasuhiro Morita, Dr. Hidehiro Fujiwara, Mr. Hiroki Noguchi, Mr. Yusuke Iguchi, and Mr. Shunsuke Okumura and Office Administrator Ms. Emi Go and Ms. Yurie Izumi of Innovative Software & Silicon Architecture Laboratory of Kobe University. My appreciation extends also to the graduates Dr. Junichi Miyakoshi, Dr. Shinji Mikami, Dr. Kentaro Kawakami, and all the people who strived for results together and gave me enormous help. I also thank my English teacher Ms. Mei Ling Go for her assistance in proofreading of this thesis.

I am grateful to many researchers and engineers in Renesas Technology Corp., for their valuable advice and discussions. Finally, I would like to thank the many researchers who participated in lively discussions with me at international conferences and symposiums.