

PDF issue: 2025-12-05

A 40-nm Resilient Cache Memory for Dynamic Variation Tolerance Delivering ×91 Failure Rate Improvement under 35% Supply Voltage Fluctuation

Nakata, Yohei ; Kimi, Yuta ; Okumura, Shunsuke ; Jung, Jinwook ; Sawada, Takuya ; Toshikawa, Taku ; Nagata, Makoto ; Nakano, Hirofumi ;…

(Citation)

IEICE Transactions on Electronics, 97(4):332-341

(Issue Date) 2014-04

(Resource Type)
journal article

(Version)

Version of Record

(Rights)

copyright@2014 IEICE

(URL)

https://hdl.handle.net/20.500.14094/90002984



PAPER Special Section on Solid-State Circuit Design — Architecture, Circuit, Device and Design Methodology

# A 40-nm Resilient Cache Memory for Dynamic Variation Tolerance Delivering ×91 Failure Rate Improvement under 35% Supply Voltage Fluctuation

Yohei NAKATA<sup>†a)</sup>, Member, Yuta KIMI<sup>†</sup>, Shunsuke OKUMURA<sup>†</sup>, Student Members, Jinwook JUNG<sup>†</sup>, Takuya SAWADA<sup>†</sup>, Taku TOSHIKAWA<sup>†</sup>, Nonmembers, Makoto NAGATA<sup>†,††</sup>, Member, Hirofumi NAKANO<sup>†††</sup>, Makoto YABUUCHI<sup>††††</sup>, Nonmembers, Hidehiro FUJIWARA<sup>††††</sup>, Koji NII<sup>††††</sup>, Hiroyuki KAWAI<sup>†††††</sup>, Hiroshi KAWAGUCHI<sup>†</sup>, and Masahiko YOSHIMOTO<sup>†,††</sup>, Members

SUMMARY This paper presents a resilient cache memory for dynamic variation tolerance in a 40-nm CMOS. The cache can perform sustained operations under a large-amplitude voltage droop. To realize sustained operation, the resilient cache exploits 7T/14T bit-enhancing SRAM and on-chip voltage/temperature monitoring circuit. 7T/14T bit-enhancing SRAM can reconfigure itself dynamically to a reliable bit-enhancing mode. The on-chip voltage/temperature monitoring circuit can sense a precise supply voltage level of a power rail of the cache. The proposed cache can dynamically change its operation mode using the voltage/temperature monitoring result and can operate reliably under a large-amplitude voltage droop. Experimental result shows that it does not fail with 25% and 30% droop of  $V_{\rm dd}$  and it provides 91 times better failure rate with a 35% droop of  $V_{\rm dd}$  compared with the conventional design.

key words: design for robustness, cache, variation tolerance, 7T/14T SRAM

#### 1. Introduction

Technology scaling increases the threshold-voltage ( $V_{\rm th}$ ) variation of MOS transistors mainly because of random dopant fluctuation, NBTI and RTN. The minimum operating voltage ( $V_{\rm min}$ ) of SRAM cell increases as the  $V_{\rm th}$  variation increases with technology scaling, which degrades operating margin of a processor. A processor with a shrinking operating margin is more susceptible to power supply noise, IR drops, and temperature fluctuations. Especially, electric control units in electric vehicles suffer large temperature fluctuation and large voltage fluctuation/droop caused by motor noise, EMIs, voltage surges, and sudden interruptions in wiring harness connections. A sudden interruption, for example, can cause disconnection of the ECU to the power supply for several milliseconds. Power supply cir-

Manuscript received August 6, 2013.

Manuscript revised November 25, 2013.

a) E-mail: nkt@cs28.cs.kobe-u.ac.jp DOI: 10.1587/transele.E97.C.332 cuits implemented in the ECU have large capacitors to improve tolerance against sudden interruptions. If the capacitance is hundreds of microfarads, then the voltage droops caused by the sudden interruptions are reduced to less than 20% droop, with droop duration in the milliseconds. But the use of a large capacitor for the ECU should be avoided for reason of reliability, cost, and size. Consequently, voltage-variation (the voltage droops of  $20\%\ V_{\rm dd}$ ) and temperature-variation tolerant processors are needed for ECUs in electric vehicles.

Earlier designs [1]–[3] have addressed timing errors caused by a high-frequency (ca.  $100\,\mathrm{MHz}$ ) voltage droop. A tunable replica scheme [4] can reduce  $V_{\mathrm{min}}$  of SRAM by 9% under 13% voltage droop. However, they cannot mitigate embedded SRAM margin failures caused by large amplitude (ca. 20% of  $V_{\mathrm{dd}}$ ) voltage droops. An SRAM block in a processor with high integration and minimum-size transistors determine the  $V_{\mathrm{min}}$  of the entire processor. For dynamic variation tolerant processors, a fault-tolerance cache is necessary.

A common fault-tolerant cache architecture uses redundant columns/rows [5]. The architecture requires many redundant columns/rows to accommodate the large number of faults. The columns/rows are inefficient in low failure rate situations. The PADed cache proposed in [6] uses a programmable decoder to remap faulty cache lines to nonfaulty ones. As another solution, the error correction code (ECC) has been applied to caches [7]–[9]. Two-dimensional ECC proposed in [8] combines vertical and horizontal error coding. In [9], 1-bit ECC is applied to cache blocks uniformly. Blocks containing two or more defective cells are protected selectively with multi-bit ECC. The Multi-bit ECC check bits and the block locations are stored in a small dedicated cache. These techniques are not effective for large amplitude voltage droops that cause many faults.

Herein, we present a resilient cache memory that can perform sustained operations under a large-amplitude voltage droop. To realize sustainable operation, the resilient cache exploits 7T/14T bit-enhancing SRAM, which has a more reliable operation mode and on-chip voltage monitoring circuit.

<sup>&</sup>lt;sup>†</sup>The authors are with Kobe University, Kobe-shi, 657-8501 Japan.

<sup>††</sup>The authors are with JST CREST, Tokyo, 102-0076 Japan.

<sup>†††</sup>The author is with the Renesas Electronics Corporation, Itami-shi, 664-0005 Japan.

<sup>††††</sup>The authors are with the Renesas Electronics Corporation, Kodaira-shi, 187-8588 Japan.

<sup>†††††</sup>The author is with the Renesas Electronics Corporation, To-kyo, 100-0004 Japan.



Fig. 1 Block diagram of the resilient cache.

# 2. Proposed Resilient Cache

The resilient cache (Fig. 1) is a 256 KB 8-way cache memory array with 7T/14T bit-enhancing (BE) SRAM bitcell structure [10], voltage and temperature monitoring circuits [11], and an autonomous resilient cache controller. Each memory block can be switched individually its power supply to the power supply for runtime operation  $(V_{dd_{-r}t})$  or the power supply for testing  $(V_{dd\_test})$ . The power supply of the monitoring circuits and the power supply of the controller are separated from the power supply of the memory blocks. The local power rails of the memory blocks are monitored by voltage monitoring circuits, which can obtain a precise supply voltage level at a testing time and monitor a voltage fluctuation during runtime. Furthermore, a temperature monitoring circuit can sense the on-chip temperature. The temperature information recorded at a testing time is used in a temperature correction of the  $V_{\min}$ . The autonomous resilient cache controller comprises an autonomous controller and an online testing controller with a test module and data transfer unit. The online testing controller can execute memory testing that is completely transparent to user accesses. The controller obtains an operating margin and  $V_{\min}$  of the memory block. The autonomous controller controls a probing point of the voltage monitor and reference voltage  $(V_{ref})$ using the external DAC. It receives results from the monitoring circuits. The results are used for voltage droop detection and block-basis voltage droop control, as described reminder of the paper.

# 2.1 7T/14T bit-Enhancing SRAM Bitcell

Each SRAM cell in the proposed resilient cache comprises 7T/14T BE SRAM cell structure [10]. The 7T/14T BE SRAM cell has a pair of conventional 6T SRAM bitcells. The internal nodes of the pair are connected directly by two additional PMOS transistors as presented in Fig. 2. This structure of 7T/14T BE SRAM provides an additional oper-



**Fig. 2** Schematics showing a conventional 6T SRAM bitcell pair and 7T/14T SRAM bitcell.

**Table 1** Two operation modes of 7T/14T bit-enhancing bitcell.

|                      | # of memory cells comprising 1 bit | # of WL<br>drives | CL        |
|----------------------|------------------------------------|-------------------|-----------|
| Normal               | 1 (7T/bit)                         | 1                 | Off ("H") |
| Enhancing<br>(read)  | 2 (14T/bit)                        | 1                 | On ("L")  |
| Enhancing<br>(write) | 2 (14T/bit)                        | 2                 | On ("L")  |

ation mode designated as the enhancing mode along with the normal mode. The two modes of 7T/14T BE SRAM are presented in Table 1. Figure 3 shows bit error rates in 7T/14T BE SRAM and in the other scheme. In enhancing mode, the added transistors are activated and BE SRAM features reliable operations especially at low voltages by combining two bitcells.

## 2.2 On-Chip Monitoring Circuits

On-chip monitoring circuits, presented in Fig. 4, comprise a source follower (SF) and a latch comparator (LC) [11]. Supply voltage monitoring circuits measure the supply voltage fluctuation on power rails of each SRAM array. Temper-

ature monitoring circuits sense thermal diodes placed near the center of the cache macro.

Voltages on the probing point and thermal diode are level-shifted by the SFs. The level-shifted voltage ( $V_{\rm sfo}$ ) is compared with the reference voltage ( $V_{\rm ref}$ ) by the LC in synchronization with a sampling clock. The LC outputs "1" or "0" corresponding to the comparison result.

The on-chip monitoring circuits are area efficient and sense accurate voltage level of the SRAM array, in addition to the cache temperature. Therefore, they are suitable for use in online built-in self-tests (BISTs) and voltage droop detection.

# 2.3 Block Basis Online Testing

Figure 5 shows the block basis online testing scheme for the proposed resilient cache. The online test controller conducts



**Fig. 3** Bit error rates (BERs): "6T", "1-bit ECC" and "7T normal" and "14T enhancing" using 7T/14T Bit-enhancing SRAM.



Fig. 4 Supply voltage/temperature monitoring circuit.

memory testing on each memory block in order of the physical block address. The supply voltage of the testing block is decreasing gradually during the testing time. The controller records the testing voltage and temperature from the on-chip monitoring circuits with respect to each operation mode of BE SRAM at which the first failure is detected. The resilient cache still has cache lines to which data can be allocated even if memory testing is working because it is block-basis testing. The memory blocks, except the current memory under test (MUT) block, are still accessible. Thereby they can operate as runtime (RT) blocks.

The testing controller uses the test bus separated from the user bus. The proposed testing scheme is transparent to the processor operation. Although one cache way cannot be used during the testing, the IPC performance degradation in the SPEC 2006 [12] benchmarks is less than 1%. The test is conducted periodically. The testing cycle can be regulated outside the cache (e.g. a cycle responding to a control period of the software). The IPC degradation is 1% at most, although it depends on the testing cycle.

The flowchart in Fig. 6 depicts the online testing flow. At the beginning of memory testing on each block, the data transfer unit transfers data from the MUT block to the previous MUT block. The MUT block power supply is switched to testing voltage ( $V_{\rm dd\_test}$ ). After switching, a testing is executed to evaluate whether the failure is detected or not. If not detected, then  $V_{\rm dd\_test}$  is decreased by one step and the testing is executed again. If detected, then the voltage at that time is recorded with temperature. Having completed the testing on one block, the online test controller sets  $V_{\rm dd\_test}$  to a nominal  $V_{\rm dd}$  and changes next block into MUT. This flow continues until all blocks have been tested.

Operation of the data transfer unit is depicted in Fig. 7. First, physical block 0 is tested. Physical blocks of 1–7 operate as runtime blocks. Next, the data transfer unit transfer data from physical block 1 (next MUT block) to physical block 0 (previous MUT block). After the transfer, physical block 1 is tested. Physical block 0 and physical blocks 2–7 operate as runtime blocks. In this way, the MUT block moves among 8 blocks without losing the memory contents.

An example of test results is presented in Fig. 8. The online testing controller has a test result table to record  $V_{\min}$  corresponding to temperature. The recorded testing volt-



Fig. 5 Online testing architecture.



Fig. 6 Flowchart of the online testing.



Fig. 7 Block basis online testing scheme.



Fig. 8 Block basis actual  $V_{\min}$  and temperature recording.

age is actual  $V_{\min}$  of the memory array because the on-chip voltage monitor probes the local power rail of the memory blocks. The voltage monitor traces the bottom level of the



 ${\bf Fig.\,9}$   $\,$  (a) Example of voltage waveform. (b) Voltage droop detection scheme.



**Fig. 10** (a) Block basis voltage droop control. (b) Cache configuration during voltage droop.

testing voltage ( $V_{\rm bottom}$ ) during the testing time to record an actual  $V_{\rm min}$ . The test result table is used as a reference for the voltage-temperature variation adaptive control described later.

#### 2.4 Voltage and Temperature Variation Adaptive Control

Figures 9, 10, and 11 present a voltage-temperature variation adaptive control scheme. The autonomous controller detects degradation of the operating margin caused by the voltage and temperature fluctuation. If the margin is insufficient for stable operation, the controller changes the operation mode of 7T/14T bit-enhancing SRAM to the 14T enhancing mode. This adaptive control enables maintenance of the required voltage margin in the current operating condition.

To detect the voltage droop, reference voltages "high"  $(V_{\text{ref\_high}})$  and "low"  $(V_{\text{ref\_low}})$  are set to the proper level, as shown in Fig. 9.  $V_{\text{dd}}$  is monitored by the monitoring circuit



Fig. 11  $V_{\min}$  correction responding to temperature variation.

using  $V_{\text{ref\_high}}$  and  $V_{\text{ref\_low}}$ . When  $V_{\text{dd}}$  falls below  $V_{\text{ref\_high}}$ , a timer starts to count. Then, as  $V_{\rm dd}$  falls below  $V_{\rm ref\_low}$ , the timer stops to count and a gradient of the voltage droop is calculated. The autonomous controller estimates whether the  $V_{\rm dd}$  drops below  $V_{\rm min\_normal}$  or not using the gradient. If the gradient is greater than the threshold value, the controller estimates that the  $V_{\rm dd}$  crosses  $V_{\rm min\_normal}$ . If not, the controller estimates that the  $V_{\rm dd}$  does not cross  $V_{\rm min\_normal}$ (shown in Fig. 9(b)). The resilient cache changes the operation mode to the 14T enhancing mode at the voltage below  $V_{\min\_normal}$ . The controller may miss detecting very steep droops and very slow droops. The very steep droops are caused by high frequency noises. SRAM cell is less susceptible for the high frequency noises [14]. The reconfiguration of the cache is accomplished even in case that the voltage droop is very slow because the autonomous controller changes operation mode to 14T enhancing mode when the supply voltage falls below specified level.

This voltage variation adaptive control scheme is performed in a block-basis manner. Only blocks for which the  $V_{\rm dd}$  drops below its  $V_{\rm min\_normal}$  change the operation mode to the 14T enhancing mode as presented in Fig. 10. The other blocks keep the operation mode as the 7T normal mode. Dirty lines in the proposed cache can be written back to main memory even if the  $V_{\rm dd}$  drops by 35% in  $100\,\mu\rm s$ .

To reconfigure the blocks, the tag array of the resilient cache must be modified. One bit is added to the tag bits in each cache line. The comparators for the tag comparison must be extended for the additional bit. The additional bit holds MSB of the index and is compared as the LSB of tag bits. Moreover, the decoder must be designed so as not to choose the half index. The LSB of the decoder input is fixed to "0" in the bit-enhancing mode.

The  $V_{\rm min}$  at runtime is corrected in response to the runtime temperature to compensate the temperature fluctuation (shown in Fig. 11). The autonomous controller obtains the current temperature using the on-chip temperature monitor and looks up  $V_{\rm min}$  in the test result table. The  $V_{\rm min}$  corresponding to the current temperature is calculated using these data. The coefficient data to compensate temperature difference between the testing time and current time are recorded



Fig. 12 Migration process for dirty cache lines in the mode transition target way.

in coefficient tables. The calculated  $V_{\min}$  are collected in  $V_{\min}$  tables. The  $V_{\min}$  tables are used to determine the threshold of the gradient in the droop detection.

When the autonomous controller changes operation modes of the blocks into bit-enhancing mode, the dirty cache lines in the blocks must be migrated. The migration process is shown in Fig. 12. In this example, a target block of the mode transition is block 7. The controller searches dirty cache lines in the odd index of block 7. Dirty lines in the even index do not need to migrate because these lines are used after the mode transition. If the dirty cache line is detected, then the cache line migrates into the LRU cache line in the same set. If the LRU cache line is also dirty, then the LRU line is written back to main memory before the detected dirty line is migrated. If the detected dirty line is LRU line, then the line is written back to main memory.

If the  $V_{\rm dd}$  is over  $V_{\rm ref\_high}$  again, then the autonomous controller changes the operation mode of the blocks from bit-enhancing mode to normal mode. In such cases, it is unnecessary to migrate cache lines. The controller simply inactivates the control signal of the 7T/14T bit-enhancing SRAM (CL depicted in Fig. 2) and sets the cache state of the cache lines in the odd index to invalid.

#### 3. Measurement Results

# 3.1 On-Chip Voltage Droop Waveform and $V_{\min}$ of Memory Blocks

Measurement results obtained using a test chip fabricated in 40-nm CMOS (Fig. 13) are presented in Figs. 14–16. The voltage monitoring circuit measures the on-chip voltage droop waveform (Fig. 14). An upper waveform in Fig. 14 is the injected waveform from outside the chip. This waveform is measured at off-chip probing point on the global power rail. A lower waveform is acquired by measurement with the on-chip monitoring circuit, which probes the local power rail of each memory block. The on-chip measurement waveform presents a different shape from that of the injected waveform because of parasitic elements of the chip. The result shows that the on-chip monitoring circuit is necessary to obtain a precise voltage level.



Fig. 13 Micrograph and features of test chips.



Fig. 14 Measured off-chip/on-chip voltage droop waveforms.

Measured  $V_{\rm min}$  characteristics of the memory blocks are shown in Fig. 15. The  $V_{\rm min}$ s are acquired for 8 blocks of 11 chips at each operation mode of BE SRAM. The temperature at the measurement is normal (25°C) and high (100°C). The averages of the  $V_{\rm min}$  of the worst block (i.e.  $V_{\rm min}$  of the entire cache) for 11 chips are 1015 mV in normal mode and 806 mV in bit-enhancing mode at 25°C. At 100°C, the average  $V_{\rm min}$  in normal and bit-enhancing modes are 1050 mV and 827 mV respectively. Results show that changing the operation mode of BE SRAM to bit-enhancing mode improves the operating margin by 205 mV at 25°C and 223 mV at 100°C, on average.

# 3.2 Voltage Variation Tolerance

The voltage variation tolerance of the resilient cache is evaluated using a voltage droop injection to the external power supply rail. During voltage droop injection, the trace of cache access is input to the resilient cache. Then the accesses to fail bits are counted. Five cache traces were taken from SPEC2006 [12]. The evaluation shows that the re-



Fig. 15  $V_{\text{min}}$ s of eight memory blocks of 11 chips (measured).

silient cache does not fail irrespective of the droop duration length when the voltage droop amplitude is 20%. Therefore, it is seen that the resilient cache can be applied to the ECUs in electronic vehicles.

To investigate the voltage variation tolerance of the resilient cache, we conducted evaluations under voltage droop conditions with amplitude higher than 20%. The amplitudes are assumed to be 25%, 30%, and 35% of  $V_{\rm dd}$  as shown in Fig. 16(a). The droop durations are  $50 \mu s$ ,  $500 \mu s$ , 5 msand 50 ms. Evaluation results under 25%, 30% and 35% droop condition are depicted respectively in Figs. 16(b)-16(d). Under 25% and 30% droop conditions, the failures increase linearly with droop duration length without the proposed scheme (no variation adaptive control and always normal mode). Using the proposed scheme (variation adaptive control and adopt switching to enhancing mode), the resilient cache does not fail irrespective of the droop duration length. Under a severe 35% droop condition, failures without the proposed scheme increased numerically to about ten times of those under a 25% droop condition. Using the proposed scheme, the failure rate improved by ×91 of that without the proposed scheme under 50 ms droop duration length.

# 3.3 Processor Performance

The cache reconfiguration affects processor performance. The cache capacity decreases by 16 KB when one block changes its operation mode into bit-enhancing mode. The capacity decrease degrades processor performance since cache misses occur more frequently. Figure 17 shows nor-



**Fig. 16** Voltage droop tolerance and failure count evaluation: (a) droop waveform example, (b) 25%  $V_{\rm dd}$  droop, (c) 30%  $V_{\rm dd}$  droop, and (d) 35%  $V_{\rm dd}$  droop.



**Fig. 17** Normalized IPCs with respect to the number of bit-enhancing mode blocks.

malized instruction per cycles (IPCs) with respect to the number of bit-enhancing mode blocks. The evaluation is conducted using gem5 simulator [13], with benchmarks selected from SPEC 2006 [12]. The average IPC loss is 2.88%

when all blocks are bit-enhancing mode (128 KB cache capacity). The resilient cache operates in bit-enhancing mode only if the operating margin is insufficient, and continues stable operation though processor performance degrades.

#### 4. Conclusion

As described in this report, we proposed a resilient cache with bit-enhancing memory and on-chip diagnosis structures in 40-nm CMOS. The resilient cache has a bit-enhancing memory that can dynamically change itself to enhancing mode and on-chip voltage/temperature monitoring circuit. It dynamically reconfigures its operation mode using the voltage/temperature monitoring result. It achieves a 91 times better failure rate under 35% droop of  $V_{\rm dd}$  compared with that of the conventional design.

#### References

- [1] K.A. Bowman, C. Tokunaga, J.W. Tschanz, A. Raychowdhury, M.M. Khellah, B.M. Geuskens, S.L. Lu, S. Member, P.A. Aseron, T. Karnik, and V. De, "All-digital circuit-level dynamic variation monitor for silicon debug and adaptive clock control," IEEE Trans. Circuits Syst. I, vol.58, no.9, pp.2017–2025, Sept. 2011.
- [2] K.A. Bowman, J.W. Tschanz, S.L. Lu, S. Member, P.A. Aseron, M.M. Khellah, A. Raychowdhury, B.M. Geuskens, C. Tokunaga, C.B. Wilkerson, T. Karnik, and V. De, "A 45 nm resilient microprocessor core for dynamic variation tolerance," IEEE J. Solid-State Circuits, vol.46, no.1, pp.194–208, Jan. 2011.
- [3] J. Tschanz, N.S. Kim, S. Dighe, J. Howard, G. Ruhl, S. Vangal, S. Narendra, Y. Hoskote, H. Wilson, C. Lam, M. Shuman, C. Tokunaga, D. Somasekhar, S. Tang, D. Finan, T. Karnik, N. Borkar, N. Kurd, and V. De, "Adaptive frequency and biasing techniques for tolerance to dynamic temperature-voltage variations and aging," ISSCC Dig. of Tech. Papers, pp.292–293, Feb. 2007.
- [4] A. Raychowdhury, B. Geuskens, K. Bowman, J. Tschanz, S.L. Lu, T. Karnik, M. Khellah, and V. De, "Tunable replica bits for dynamic variation tolerance in 8T SRAM arrays," IEEE J. Solid-State Circuits, vol.46, no.4, pp.797–805, April 2011.
- [5] J. Lee, Y.J. Lee, and Y.B. Kim, "SRAM Word-oriented redundancy methodology using built in self-repair," Proc. IEEE Intl. System-on-Chip Conference (SOCC), pp.219–222, Sept. 2004.
- [6] P.P. Shirvani and E.J. McCluskey, "PADed cache: A new fault-tolerance technique for cache memories," Proc. VLSI Test Symp., pp.440–445, April 1999.
- J.L. Shin, B. Petrick, M. Singh, and A.S. Leon, "Design and implementation of an embedded 512-KB level-2 cache subsystem," IEEE
   J. Solid-State Circuits, vol.40, no.9, pp.1815–1820, Sept. 2005.
- [8] J. Kim, N. Hardavellas, K. Mai, B. Falsafi, and J. Hoe, "Multi-bit error tolerant caches using two-dimensional error coding," Proc. IEEE/ACM Intl. Symp. on Microarchitecture (MICRO), pp.197–209, Dec. 2007.
- [9] H. Sun, N. Zheng, and T. Zhang, "Leveraging access locality for the efficient use of multibit error-correcting codes in L2 cache," IEEE Trans. Comput., vol.58, no.10, pp.1297–1306, Oct. 2009.
- [10] H. Fujiwara, S. Okumura, Y. Iguchi, H. Noguchi, Y. Morita, H. Kawaguchi, and M. Yoshimoto, "Quality of a bit (QoB): A new concept in dependable SRAM," Proc. IEEE Intl. Symp. on Quality Electronic Design (ISQED), pp.98–102, March 2008.
- [11] K. Noguchi and M. Nagata, "An on-chip multichannel waveform monitor for diagnosis of systems-on-a-chip integration," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol.15, no.10, pp.1101–1110, Oct. 2007.
- [12] Standard Performance Evaluation Corporation, "The SPEC CPU

- 2006 Benchmark Suite," http://www.specbench.org
- [13] N. Binkert, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M.D. Hill, D.A. Wood, B. Beckmann, G. Black, S.K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D.R. Hower, and T. Krishna, "The gem5 simulator," ACM SIGARCH Computer Architecture News, vol.39, no.2, pp.1–7, Aug. 2011.
- [14] T. Sawada, T. Toshikawa, K. Yoshikawa, H. Takaya, K. Nii, and M. Nagata, "Immunity evaluation of SRAM core using DPI with on-chip diagnosis structures," Proc. Workshop on Electromagnetic Compatibility of Integrated Circuits (EMC Compo), pp.65–70, Nov. 2011



Yohei Nakata received B.E. and M.E. degrees in Computer and Systems Engineering from Kobe University, Hyogo, Japan in 2008 and 2010, respectively, and received a Ph.D. degree in Electrical Engineering from Kobe University in 2013. His current research interests include dependable and variation-aware processor designs and multi-core processor architecture. Dr. Nakata is a recipient of the 2011 IPSJ Yamashita SIG research award. He is a member of IEEE.



Yuta Kimi received B.E. degree in Computer and Systems Engineering from Kobe University, Hyogo, Japan in 2013. He is currently on the master course at Kobe University. His current research interests include variation-tolerant processor designs and multi-core processor architecture.



Shunsuke Okumura received B.E. and M.E. degrees in Computer and Systems Engineering in 2008 and 2010, respectively from Kobe University, Hyogo, Japan, and received a Ph.D. degree in Electrical Engineering from Kobe University in 2013. His current research is high-performance, low-power SRAM designs, dependable SRAM designs, and error correcting codes implementation. He is a student member of IPSJ and IEEE.



Jinwook Jung received the B.E. and M.E. degrees in computer and systems engineering from Kobe University, Hyogo, Japan, in 2011 and 2013, respectively, with the support of Korea-Japan Joint Government Scholarship Program for the Students in Science and Engineering Departments. He is currently working toward the M.E. degree in system informatics at the same university. His current research interests include low-power and variation-tolerant circuit techniques and microarchitecture designs

for reliability. He is a student member of IEEE.



**Takuya Sawada** received a B.E. and M.E. degree in Department of Computer and System Engineering, Faculty of Engineering, Kobe University in 2008 and 2010, respectively, and received a Ph.D. degree in Graduate School of System Informatics, Kobe University in 2013. He currently works as a semiconductor chip designer at MegaChips Corporation, Osaka, Japan. His present research focus is power supply noise and signal integrity in digital VLSI systems.



**Taku Toshikawa** received a B.E. and M.E. degree in Department of Computer and System Engineering, Faculty of Engineering, Kobe University in 2010 and 2012, respectively. He engaged in the reseach of power noise measurement techniques. He is now with Panasonic Inc.



Makoto Nagata received the B.S. and M.S. degrees in physics from Gakushuin University, Tokyo, Japan, in 1991 and 1993, respectively, and the Ph.D in electronics engineering from Hiroshima University, Japan, in 2001. He was a research associate at Hiroshima University, Japan, from 1994 to 2002, and then an associate professor of Kobe University, Japan, from 2002 to 2009. He is currently a professor of the graduate school of system informatics, Kobe University. His research interests include design

techniques toward high performance mixed analog, RF, and digital VLSI systems with particular emphasis on power noise issues, substrate coupling/crosstalk, signal integrity, as well as mixed-signal testing and diagnosis. Dr. Nagata has been a member of a variety of technical program committees of international conferences such as the Symposium on VLSI Circuits (2002–2009), Custom Integrated Circuits Conference (2007–2009), Asian Solid-State Circuits Conference (2005–2009), Asia and South Pacific Design Automation Conference, and others. He was a technical program chair (2010–2011) and a symposium chair (2012–2013) for Symposium on VLSI circuits. He also served as an associate editor of the IEICE Transactions on Electronics (2002–2005).



Hirofumi Nakano received the B.E. and M.E. degrees in electronics engineering from Osaka University, Osaka, Japan, in 1997 and 1999, respectively. In 1999, he joined the System LSI Business Unit, Mitsubishi Electric Corporation, Itami, Japan, where he has been working on designing embedded SRAMs for CMOS ASICs. In 2003, he was transferred to Renesas Technology Corporation, Itami, Japan, which is a joint company of Mitsubishi Electric Corp. and Hitachi Ltd. in the semiconductor field. He

currently works on the development of IPs for MCU in the Technology Development Unit, Renesas Electronics Corporation, Itami, Japan.



Makoto Yabuuchi was born in Toyama, Japan, in 1979. He received the B.S. and M.S. degrees in electronic engineering from Kanazawa University, Ishikawa, Japan in 2004. He joined Renesas Technology Corporation after fis graduation and was transferred Renesas Electronics Corporation in 2010, where he has been working on designing embedded SRAMs for advanced CMOS logic process.



Hidehiro Fujiwara received the B.E. degree in computer and systems engineering from Kobe University, Hyogo, Japan in 2005. He also received the M.E and Ph.D. degrees in electrical engineering from Kobe University 2006 and 2009, respectively. He did an internship program in Takumi Technology B.V., Eindhoven, the Netherlands in 2008. In 2009, He joined Renesas Electronics Corporation, Japan, where, he has been working on designing embedded SRAM for advanced CMOS logic pro-

cess. Dr. Fujiwara is a member of the IEICE and IEEE Solid-State Circuits Society.



Koji Nii received the B.E. and M.E. degrees in electrical engineering from Tokushima University, Tokushima, Japan, in 1988 and 1990, respectively, and the Ph.D. degree in informatics and electronics engineering from Kobe University, Hyogo, Japan, in 2008. In 1990, he joined the ASIC Design Engineering Center, Mitsubishi Electric Corporation, Itami, Japan, where he has been working on designing embedded SRAMs for CMOS ASICs. In 2003, he was transferred to Renesas Technology Corpo-

ration, Itami, Japan, which is a joint company of Mitsubishi Electric Corp. and Hitachi Ltd. in the semiconductor field. He transferred his work location to Kodaira, Tokyo from Itami, Hyogo on April 2009. His current responsibility is Chief Professional. He currently works on the research and development of 28 nm embedded SRAM and low-power design techniques with power gating for 28 nm High-k/Metal-gate low-power and high-performance platform in the Technology Development Unit, Renesas Electronics Corporation, Kodaira, Tokyo, Japan. Dr. Nii holds 81 US patents, and published 23 IEEE/IEICE papers and 66 talks at major international conferences. He received the Best Paper Awards at IEEE International Conference on Microelectronic Test Structures (ICMTS) in 2007. He also received the LSI IP Design Awards in 2007 and 2008, Japan. He is a Technical Program Committee of the IEEE CICC. He is a senior member of the IEEE Solid-State Circuits Society and the IEEE Electron Devices Society. He is also a Visiting Associate Professor of Graduate School of Natural Science and Technology, Kanazawa University, Ishikawa, Japan.



Hiroyuki Kawai was born in 1960. He received the B.S. and M.S. degrees in control engineering from Osaka University, Osaka, Japan, in 1984 and 1986, respectively, and received the Ph.D. degree in electronics, information and communication engineering from Waseda University, Tokyo, Japan, in 2005. In 1986, he joined the LSI Laboratory, Mitsubishi Electric Corporation, Hyogo, Japan where he worked on the research and development of LSI's for digital image processing including graphics. He was

transferred to Renesas Technology Corporation in 2003 and Renesas Electronics Corporation in 2010. He has been engaged in the development of reconfigurable circuits and communication IP's in the Logic IP development Department of Renesas Electronics Corporation, Hyogo, Japan. His current research interests are innovated architecture for high performance and low power IP's. Dr. Kawai is a member of the Institute of Electronics, Information and Communication Engineers of Japan.



Hiroshi Kawaguchi received B.Eng. and M.Eng. degrees in electronic engineering from Chiba University, Chiba, Japan, in 1991 and 1993, respectively, and earned a Ph.D. degree in electronic engineering from The University of Tokyo, Tokyo, Japan, in 2006. He joined Konami Corporation, Kobe, Japan, in 1993, where he developed arcade entertainment systems. He moved to The Institute of Industrial Science, The University of Tokyo, as a Technical Associate in 1996, and was appointed as a Research

Associate in 2003. In 2005, he moved to Kobe University, Kobe, Japan. Since 2007, he has been an Associate Professor with The Department of Information Science at that university. He is also a Collaborative Researcher with The Institute of Industrial Science, The University of Tokyo. His current research interests include low-voltage SRAM, RF circuits, and ubiquitous sensor networks. Dr. Kawaguchi was a recipient of the IEEE ISSCC 2004 Takuo Sugano Outstanding Paper Award and the IEEE Kansai Section 2006 Gold Award. He has served as a Design and Implementation of Signal Processing Systems (DISPS) Technical Committee Member for IEEE Signal Processing Society, as a Program Committee Member for IEEE Custom Integrated Circuits Conference (CICC) and IEEE Symposium on Low-Power and High-Speed Chips (COOL Chips), and as an Associate Editor of IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences and IPSJ Transactions on System LSI Design Methodology (TSLDM). He is a member of the IEEE, ACM, IEICE, and IPSJ.



Masahiko Yoshimoto joined the LSI Laboratory, Mitsubishi Electric Corporation, Itami, Japan, in 1977. From 1978 to1983 he had been engaged in the design of NMOS and CMOS static RAM. Since 1984 he had been involved in the research and development of multimedia ULSI systems. He earned a Ph.D. degree in Electrical Engineering from Nagoya University, Nagoya, Japan in 1998. Since 2000, he had been a professor of Dept. of Electrical & Electronic System Engineering in Kanazawa Univer-

sity, Japan. Since 2004, he has been a professor of Dept. of Computer and Systems Engineering in Kobe University, Japan. His current activity is focused on the research and development of an ultra low power multimedia and ubiquitous media VLSI systems and a dependable SRAM circuit. He holds on 70 registered patents. He has served on the program committee of the IEEE International Solid State Circuit Conference from 1991 to 1993. Also he served as Guest Editor for special issues on Low-Power System LSI, IP and Related Technologies of IEICE Transactions in 2004. He was a chair of IEEE SSCS (Solid State Circuits Society) Kansai Chapter from 2009 to 2010. He is also a chair of The IEICE Electronics Society Technical Committee on Integrated Circuits and Devices from 2011–2012. He received the R&D100 awards from the R&D magazine for the development of the DISP and the development of the realtime MPEG2 video encoder chipset in 1990 and 1996, respectively. He also received 21th TELECOM System Technology Award in 2006.