

PDF issue: 2025-04-28

## Low-energy block-level instantaneous comparison 7T SRAM for dual modular redundancy

Okumura, Shunsuke ; Nakata, Yohei ; Yanagida, Koji ; Kagiyama, Yuki ; Yoshimoto, Shusuke ; Kawaguchi, Hiroshi ; Yoshimoto, Masahiko

(Citation) IEICE Electronics Express,9(6):470-476

(Issue Date) 2012-03-25

(Resource Type) journal article

(Version) Version of Record

(Rights) copyright©2012 IEICE

(URL)

https://hdl.handle.net/20.500.14094/90002969





# Low-energy block-level instantaneous comparison 7T SRAM for dual modular redundancy

Shunsuke Okumura<sup>1a)</sup>, Yohei Nakata<sup>1</sup>, Koji Yanagida<sup>1</sup>, Yuki Kagiyama<sup>1</sup>, Shusuke Yoshimoto<sup>1</sup>, Hiroshi Kawaguchi<sup>1</sup>, and Masahiko Yoshimoto<sup>1,2</sup>

<sup>1</sup> Depratment of Information science Kobe University

1-1 Rokkodai, Nada, Kobe, Hyogo, 657-8501, Japan

- <sup>2</sup> JST CREST, Japan
- a) s-oku@cs28.cs.kobe-u.ac.jp

**Abstract:** This paper proposes a 7T SRAM that realizes a blocklevel instantaneous comparison feature. The proposed SRAM is useful for operation results comparison in dual modular redundancy (DMR). The data size that can be instantaneously compared is scalable using the proposed structure. The 1-Mb SRAM comprises 16-Kb blocks in which 8-Kb data can be compared in 130.0 ns. The proposed scheme reduces energy consumption in data comparison to 1/418, compared to that of a parallel cyclic redundancy check (CRC) circuit.

Keywords: SRAM, 7T bitcell, comparison circuit, DMR

**Classification:** Integrated circuits

#### References

- K. Shimamura, T. Takehara, Y. Shima, and K. Tsunedomi, "A single-Chip Fail-Safe Microprocessor with Memory Data Comparison Feature," *Pacific Rim International Symposium on Dependable Computing*, pp. 359– 368, 2006.
- [2] T. Sato and T. Funaki, "Dependability, Power, and Performance Tradeoff on a Multicore Processor," Asia and South Pacific Design Automation Conference, pp. 714–719, 2008.
- [3] J. C. Smolens, B. T. Gold, B. Falsafi, and J. C. Hoe, "Reunion: Complexity-Effective Multicore Redundancy," ACM International Symposium on Microarchitecture, vol. 10, no. 5, pp. 223–234, 2006.
- [4] M. A. Gomaa, C. Scarbrough, T. N. Vijaykumar, and I. Pomeranz, "Transient-Fault Recovery for Chip Multiprocessors," *IEEE MICRO*, vol. 23, no. 5, pp. 76–83, 2003.
- [5] H. Fujiwara, S. Okumura, Y. Iguchi, H. Noguchi, Y. Morita, H. Kawaguchi, and M. Yoshimoto, "Quality of a Bit (QoB): A new Consept in Dependable SRAM," *International Symposium on Quality Electronic Design*, pp. 98–102, 2008.
- [6] S. Okumura, S. Yoshimoto, K. Yamaguchi, Y. Nakata, H. Kawaguchi, and M. Yoshimoto, "7T SRAM Enabling Low-Energy Simultaneous Block





Copy," Proc. IEEE Custom Integrated Circuits Conference (CICC), Sept. 2010.

#### **1** Introduction

As silicon LSIs support massive infrastructure in society, we have paid attention to dependable computing systems. However, in advanced CMOS process technology, variability in transistors is an important issue for the reliability. Besides, soft error causes an accidental error. To keep the dependability, fault tolerant systems have been researched. By using advanced process technology, redundant modules are integrated in a single chip, which can detect transient errors. The dual modular redundancy (DMR) adopted on chip multi processors (CMPs) is one method for improving the dependability [1, 2, 3, 4]. There are three requirements for the DMR as key design [3]. First, all execution must be replicated in space or time. Second, the data input to dual processor cores are also replicated and checked its coherence. Finally, the outputs are checked if the execution results are correct. The number of comparison cycles for the input/output data between the dual cores increases with the data size; the cycle overhead degrades the processor performance [1]. The memory bandwidth for the comparison is also a problem in the DMR systems. To reduce the comparison bandwidth, a CRC is implemented in DMR [3, 4]. The energy consumption for the CRC and its bandwidth to CRC registers are still crucial because the comparing data have to be encoded. To achieve high speed and low energy, we propose a 7T SRAM that realizes instantaneous block-level data comparison.

#### 2 7T Comparison SRAM structure and its application

In this section, we mention the overview of the proposed comparison scheme. Fig. 1 (a) depicts the proposed comparison SRAM for the DMR system. In the conventional systems [1], all input/output are stored to "data to be compared" (DC); then the coherence is verified by a data comparator. The DC and data comparator can be replaced with the proposed comparison 7T SRAM block. (The 7T/14T SRAM can improve the reliability when one bit datum is stored in the two bitcells. The 7T/14T bitcell and its applications are researched previously [5, 6].) Data A and B are stored and compared on a block-level basis. Data A and B are stored in the same 7T bitcell pair; in other words, they are allocated in physically adjacent bitcells. (The comparison scheme used 7T bitcell pairs are mentioned later on.) In cases where they are matched in all bitcell pairs, the comparison SRAM outputs a "match" signal, whereas a "miscompare" signal is output if different data are stored. The proposed comparison scheme is carried out within each 7T bitcell pair but without data bus; the operation does not affect the data bandwidth. Therefore, the size of the simultaneously compared data is scalable using this circuit structure. In other words, the comparison cycle is constant in the





proposed comparison scheme even if the comparing data size increases. By applying the proposed comparison SRAM, a constant comparison cycle time and a low-energy operation are both realized.

Next, we explain the data comparison scheme using the proposed 7T SRAM. With a 7T bitcell pair, the data comparison is made. Fig. 1 (b) presents the proposed 7T bitcell pair that achieves high-speed and low-energy data comparison. The 7T bitcells in a pair mutually connect their internal nodes using pMOS transistors. The basis of the data comparison using the 7T bitcell is as follows: Fig. 1 (b) shows cases in which the bitcell pairs store the different data and same data. In making a data comparison, the CTRL signal is "low"; thus the connecting pMOSes are turned on. When different data are stored, a supply current flows though the connecting pMOSes. However, if storing the same data, the bitcell pair does not draw the supply current because no current path exists. Even if many bitcells exist, no supply current occurs if data are totally matched. To detect the current in a miscompare bitcell pair, pMOS switches are added to the supply lines. In a data comparison phase, the switch pMOSes are turned off to make the supply lines floating state; the voltage drop on the supply line can be observed



Fig. 1. (a) DMR concept and bitcell deployment in the proposed SRAM, (b) schematics of 7T bitcell pairs storing different data and same data.







Fig. 2. (a) Block-level instantaneous comparison SRAM architecture, (b) waveforms in the proposed comparison scheme.

in the "miscompare" case. This feature achieves the comparison within the each bitcell pairs.

The compare circuit is presented in Fig. 2 (a). In the proposed comparison circuit, the speed depends on the capacitance of the supply line  $(VDD_{EVEN}/VDD_{ODD})$ . To realize high-speed data comparison, the power supply is segmented into supply lines  $(VDD_{EVEN}<0,\ldots,32>$  and  $VDD_{ODD}$  $<0,\ldots,31>$ ) along the row direction. Each power line is driven by a pMOS switch with a signal, SW. The supply lines are shared by a lower and upper bitcells in different pairs to suppress its area overhead. Furthermore, the control signals are separated into two signals ("/CTRL<sub>M</sub>" and "/CTRL<sub>N</sub>").





If all bitcell pairs were controlled only with a single signal, then the supply lines would be connected all together through the connecting pMOSes in bitcell pairs storing the same data, which would make the capacitance of the supply line large: Two comparisons are conducted for respective "/CTRL<sub>M</sub>" and "/CTRL<sub>N</sub>".

The comparators in Fig. 2 (a) are a sense amplifier type, to which  $VDD_{EVEN}$  and a  $V_{REF}$  are input. The comparators' outputs are forwarded to the domino OR gate. The size of the simultaneously compared data is scalable using this circuit structure. In the comparison phase, there would be some possibility of overwriting in a miscompare bitcell pair. Once overwritten, the miscompare bitcell pair would be handled as a matched one. To prevent erroneous flipping, the driver for the control signals consists of only pMOSes and works as a  $V_{tp}$ -floating driver for the "L" output, which makes the additional pMOSes in a bitcell pair "weakly on" state.

Fig. 2 (b) presents the simulated waveforms in a data comparison. First, the internal nodes of a bitcell pair are connected by control of "/CTRL<sub>M</sub>" (or "/CTRL<sub>N</sub>") signal. Note again that "/CTRL<sub>M</sub>" is not completely grounded; the "weakly on" state prevents a miscompare bitcell pair from overwriting. After the "SW" signal is "high," the supply line becomes a floating state; it is discharged by a current path through miscompare bitcells, and thus is lowered. This is not a destructive detect for miscompare bitcells, because the supply lines cannot be fully pulled down to a retention voltage because of the pMOS drivers for the control signals. In reality, we confirmed that the stored data in miscompare bitcells are not destroyed in the comparison phase.

#### 3 Energy consumption and experimental results

Fig. 3 (a) shows a simulated energy comparison result. We compared the proposed scheme with the circuit comprised of EXOR gates and a comparison circuit for the CRC in the DMR [3, 4]. The proposed SRAM can reduce the comparison energy, because it is not necessary a complex calculation, and data comparison is achieved only by charging and discharging supply lines  $(VDD_{EVEN})$  and  $VDD_{ODD}$  by the control signals. The energy consumption is reduced to 1/486 and 1/418 compared with the EXOR gates and CRC comparison circuit, respectively.

The proposed SRAM with the instantaneous comparison function was implemented in a 65-nm CMOS technology. Fig. 3 (b) shows a layout of 16-kb SRAM block and a photograph of a 1-Mb 7T SRAM test chip. The 1-Mb SRAM comprises 16 kb  $\times$  64 blocks. A 16-kb block consists of 128 rows  $\times$  8 columns  $\times$  16 bits/word, and has 32 comparators.

The measured Shmoo plot in the instantaneous comparison is shown in Fig. 3 (c). In the proposed scheme, when many bitcell pairs store different data, it is easy to detect the miscompare. This is because the short current is increased, and the supply line is strongly pulled down. In a case where 16 miscompare bitcell pairs exist in a shared supply line, the margin is enlarged. Thereby, the proposed scheme can compare 8-kb data in 87 ns at 1.2 V, and







Fig. 3. (a) Energy consumption comparison, (b) 1-Mb SRAM die photograph and 16-kb block layout, and (c) measured Shmoo plot.

the minimum operating voltage is 840 mV. In the worst case of the comparing time, only one bitcell pair is miscompare, where the proposed SRAM makes 8-kb data comparison in 130 ns at 1.2 V.

#### 4 Conclusion

We designed a 7T SRAM that realizes a block-level instantaneous comparison feature. The proposed SRAM is useful for data input or operation results comparison in DMR. The data size that can be instantaneously compared is scalable using the proposed structure. The proposed comparison scheme is conducted within each 7T bitcell pair but without data bus; the operation does not affect the memory bandwidth. In other words, even if a data size to be compared increases, the comparison cycle time keeps constant. The 1-Mb SRAM comprises 16-kb blocks in which 8-kb data can be compared in 130.0 ns. The proposed scheme reduces energy consumption in data comparison to 1/418, compared to that of a parallel cyclic redundancy check (CRC) circuit.





### Acknowledgments

This work was supported by VLSI Design and Education Center (VDEC), the University of Tokyo in collaboration with Cadence Design Systems, Mentor Graphics and Synopsys, Inc.

