<table>
<thead>
<tr>
<th>Title</th>
<th>Hybrid-mode SRAM sense amplifiers: new approach on transistor sizing</th>
</tr>
</thead>
<tbody>
<tr>
<td>Author(s)</td>
<td>Do, Anh Tuan; Kong, Zhi Hui; Yeo, Kiat Seng</td>
</tr>
<tr>
<td>Date</td>
<td>2008</td>
</tr>
<tr>
<td>URL</td>
<td><a href="http://hdl.handle.net/10220/6260">http://hdl.handle.net/10220/6260</a></td>
</tr>
<tr>
<td>Rights</td>
<td>© 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder. <a href="http://www.ieee.org/portal/site">http://www.ieee.org/portal/site</a> This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.</td>
</tr>
</tbody>
</table>
Hybrid-Mode SRAM Sense Amplifiers: New Approach on Transistor Sizing
Do Anh-Tuan, Kong Zhi-Hui, and Yeo Kiat-Seng

Abstract—A novel high-speed sense amplifier for ultra-low-voltage SRAM applications is presented. It introduces a completely different way of sizing the aspect ratio of the transistors on the data-path, hence realizing a current-voltage hybrid mode Sense Amplifier. Extensive post-layout simulations have proved that the new Sense Amplifier provides both high-speed and low-power properties, with its delay and power reduced to 25.8% and 37.6% of those of the best prior art. It also offers a much better read-effectiveness and robustness against the bit- and data-line capacitances as well as VDD variations. Furthermore, the new Sense Amplifier is able to tolerate a large difference between the parasitic capacitances associated with the complementary DLs. It can operate down to a supply voltage of 0.9 V, the lowest reported for a 0.18 μm CMOS process. A modified cross-coupled amplifier is also introduced, allowing the Sense Amplifier to operate down to 0.55 V.

Index Terms—Low-power SRAM, low-voltage SRAM, sense amplifier (SA).

I. INTRODUCTION

SRAM-BASED cache is one of the most important components of state-of-the-art very large-scale integration (VLSI) systems. Fast SRAM caches are vital to increase the speed of the data flows and, hence, the speed of the system [1]. According to the 2002 International Technology Roadmap for Semiconductors (ITRS 2002) [2], the memory chip will occupy 90% of the chip area by 2013. Therefore, the power dissipated within the on-chip caches will become a dominant part of the total power consumption of the chip. In view of the above, there is inevitably an apparent urgency to address these two often-conflicting power and delay requirements [1], [3], [4]. The current-mode sense amplifier (SA) has the ability to quickly amplify a small differential signal from the bit-lines (BLs) and data-lines (DLs) to full CMOS logic-level outputs without requiring large voltage swings of these capacitive lines (whose parasitic capacitances are normally referred to as CBL and CDL, respectively). It is widely used as one of the most effective ways to reduce both sensing delay and power consumption of the SRAM [5]–[14]. In this work, we extensively studied the operation of existing SAs, analyzed their weaknesses and proposed a new sensing scheme that has a much higher driving force, hence providing faster sensing delay, lower power consumption, and greater reliability.

This paper is organized as follows. The operations and read effectiveness of the existing current-mode and the proposed SAs are described in Sections II and III, respectively. Sensing delay and power consumption of the new design are graphically presented in Section IV, in comparison with the three best existing designs, namely, the charge-transfer [13], the ultralow-power [9], and the high-speed [8] designs, respectively. In Section V, we describe a modified version of the proposed SA that can operate at a low supply voltage of 0.55 V. Section VI concludes the paper.

II. CURRENT-MODE SA

A. Current-Mode SA and Its Derivatives

The current-mode sensing scheme (Fig. 1) in SRAM applications was first introduced in [5]. The current-mode SAs, marked by the presence of the conventional current-conveyor [5], is insensitive to the CBL and, hence, offers a higher sensing speed and consumes less power compared with the voltage-mode counterparts [5]. Over the last two decades, a number of current-mode SAs have been proposed, aiming at improving the sensing speed and the power consumption in the read operation of the SRAM [7]–[13]. Since all of these SA designs utilize the differential output currents of the current

Manuscript received January 23, 2008; revised April 16, 2008. This paper was recommended by Associate Editor M. Ghovanloo.

A. T. Do and Z. H. Kong are with the Centre for Integrated Circuits and Systems (CICS), School of Electrical and Electronics Engineering, Nanyang Technological University, Singapore (e-mail: atdo@ntu.edu.sg; zhkong@ntu.edu.sg).

K. S. Yeo is with the School of Electrical and Electronics Engineering, Nanyang Technological University, Singapore (e-mail: eksyeo@ntu.edu.sg).

Digital Object Identifier 10.1109/TCSII.2008.2001965
convoyer, their improvement is only incremental. Reference [5] clearly indicated that the above-mentioned differential current is equal to the current flowing into the cell node where a “0” is stored, i.e., $I_{\text{cell}}$. Our analysis and simulations have proved otherwise. In fact, the differential current is much smaller than the $I_{\text{cell}}$, due to the imperfection of the current conveyer. This issue will be discussed in more detail in the next sections.

B. Imperfections of the Current Conveyor and its Impact on the Current-Mode Sensing Scheme

The current conveyer consists of four identical pMOS transistors P2, P3, P4, and P5 (Fig. 1). Since this configuration is common for all current-mode SAs, its drawback is also their shared weakness. As mentioned in [5], the current conveyer realizes a virtual short circuit across the complementary BLs, i.e., $V_{\text{BL}} = V_{\overline{\text{BL}}}$ during the read operation. Therefore, the currents $I_0$ and $I_1$, which are sourced by the large-sized BL load transistors P0 and P1, respectively, (Fig. 1), are equal, and a current difference of $I_{\text{cell}}$ is realized at the inputs of the current conveyer. This is only true in an ideal case where the process variations and short channel-length effect are not present. In deep submicrometer technologies, these effects become significant and there is a slight difference between the BL voltages. For instance, at $V_{\text{DD}} = 1.8\, \text{V}$, $W_{\text{P0,1}} = 15\, \mu\text{m}$ and $W_{\text{P2,3,4,5}} = 1\, \mu\text{m}$ (Fig. 1), then $V_{\text{BL}} = 1.79\, \text{V}$ and $V_{\overline{\text{BL}}} = 1.77\, \text{V}$. This 0.02-V difference makes a significant impact on the effectiveness of the read process. It can be explained as follows. Since $V_{\text{BL}}$ and $V_{\overline{\text{BL}}}$ are close to $V_{\text{DD}}$, both P0 and P1 operate in the triode region. Their drain currents are proportional to their drain-to-source voltages. Thus, by using $V_{\overline{\text{DD}}} = 1.8\, \text{V}$, $V_{\text{BL}} = 1.79\, \text{V}$, and $V_{\overline{\text{BL}}} = 1.77\, \text{V}$, we have $I_0$ three times larger than $I_1$ (see the second column of Table I). Consequently, the effective difference between the BL currents after taking the $I_{\text{cell}}$ into account is very small ($\Delta I$ in Table I):

$$\text{utilization} = \frac{\Delta I}{\text{Total current}} = \frac{I_0 - I_{\text{cell}}}{I_0 + I_{\text{cell}}}$$

as a figure of merit to measure the effectiveness of the read scheme. In our calculations, we assumed that $I_{\text{cell}}$, which is measured from the standard 6T cell, does not change during the sensing process. As indicated in Table I, at various transistor sizes, % utilization of the current-mode SA is only around 10% with $\Delta I$ much smaller than $I_{\text{cell}}$. Therefore, it can be concluded that the differential current in the current mode needs to be improved to obtain a more effective read operation, regarding both speed and power consumption.

### Table I

<table>
<thead>
<tr>
<th>$W_{\text{P2,3}}$ ($\mu\text{m}$)</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
</tr>
</thead>
<tbody>
<tr>
<td>$I_0$ ($\mu\text{A}$)</td>
<td>119</td>
<td>146</td>
<td>174</td>
<td>198</td>
<td>221</td>
<td>244</td>
<td>267</td>
</tr>
<tr>
<td>$I_1$ ($\mu\text{A}$)</td>
<td>38</td>
<td>76</td>
<td>113</td>
<td>143</td>
<td>172</td>
<td>199</td>
<td>226</td>
</tr>
<tr>
<td>$I_{\text{cell}}$ ($\mu\text{A}$)</td>
<td>27</td>
<td>54</td>
<td>82</td>
<td>106</td>
<td>129</td>
<td>152</td>
<td>174</td>
</tr>
<tr>
<td>$\Delta I$ ($\mu\text{A}$)</td>
<td>11</td>
<td>22</td>
<td>31</td>
<td>37</td>
<td>43</td>
<td>47</td>
<td>52</td>
</tr>
<tr>
<td>% utilization</td>
<td>7.0</td>
<td>9.9</td>
<td>10.8</td>
<td>10.9</td>
<td>10.6</td>
<td>10.5</td>
<td>10.2</td>
</tr>
</tbody>
</table>

![Fig. 2](image-url)  

**Fig. 2.** Proposed SA with a simplified read-cycle-only memory system. Channel lengths of all transistors are 0.18 $\mu$m.

### III. Proposed Hybrid Current-Mode SA

A. Circuit Operation

Our proposed SA is presented in Fig. 2. It consists of 11 pMOS and 5 nMOS transistors (P0 - P10 and N1 - N5). P0 and P3 are responsible for pre-charging the BLs to $V_{\overline{\text{DD}}}$ while P1 and P2 are for holding the BLs at $V_{\overline{\text{DD}}}$ during the read-cycle. P4 and P5 act as a switch to connect the BLs to the DLs. The other 10 transistors (P6 - P10 and N1 - N5) form a cross-coupled inverter which amplifies the small voltage difference on the DLs to the full CMOS logic levels [8], [9], [11], [13] and [14]. The new SA is unique in a way that it eliminated the current conveyer and transformed the normally-large BL loads (P0, P1 in Fig. 1) into small sized transistors (P1, P2 in Fig. 2). They serve to hold the BLs at $V_{\overline{\text{DD}}}$ and not to source the BL currents unlike in the conventional designs. P1 and P2 are purposely sized small so that it is not strong enough to keep the BL at $V_{\overline{\text{DD}}}$ if $I_{\text{cell}}$ is present. As a result, one of the BLs will drop to a lower level than $V_{\overline{\text{DD}}}$ during a read access.

Before any read cycle, the BLs and DLs are precharged to $V_{\overline{\text{DD}}}$ by (P0, P3) and the pre-charge circuit, respectively (Fig. 2). Meanwhile, the Sense Amplifier Enable (SAE) signal turns off N5 to prevent any dc current from flowing to the ground to save power whilst the Equalization (EQ) signal turns on P10 to hold the two nodes E and F at the same potential. When a cell is accessed by the Word Line (WL) and Column Select (CS) signals, the $\overline{\text{Read}}$ signal is triggered high to deactivate P0 and P3. The pre-charge circuit is turned off, but P1 and P2 are still on to hold the BLs and the DLs at $V_{\overline{\text{DD}}}$. Fig. 3 presents the current paths during a read cycle. The Access, Drive, and Load are transistors of the accessed memory cell. We assume that the first cell of the shown column (Fig. 2) is accessed. On the left side, where a “1” is stored, the BL and DL...
Fig. 3. Current paths during the read cycle in the proposed SA on the side where (a) a “1” is stored or (b) a “0” is stored.

### TABLE II

SUMMARY OF CURRENTS CONSUMED DURING A READ CYCLE IN THE PROPOSED SA. $I_{cell} = 92 \mu A$, $I_0' = I_0 - I_{cell}$, $\Delta I = I_1 - I_0$

<table>
<thead>
<tr>
<th>$W_{L2}$ ($\mu$m)</th>
<th>0.3</th>
<th>0.5</th>
<th>0.7</th>
<th>0.9</th>
<th>1.1</th>
<th>1.3</th>
<th>1.5</th>
</tr>
</thead>
<tbody>
<tr>
<td>$I_{L1}$ ($\mu A$)</td>
<td>4</td>
<td>6</td>
<td>8</td>
<td>10</td>
<td>13</td>
<td>15</td>
<td>17</td>
</tr>
<tr>
<td>$I_{L2}$ ($\mu A$)</td>
<td>-88</td>
<td>-86</td>
<td>-84</td>
<td>-82</td>
<td>-79</td>
<td>-77</td>
<td>-75</td>
</tr>
<tr>
<td>$\Delta I$ ($\mu A$)</td>
<td>91</td>
<td>90</td>
<td>88</td>
<td>86</td>
<td>84</td>
<td>82</td>
<td>80</td>
</tr>
<tr>
<td>% utilization</td>
<td>1308</td>
<td>900</td>
<td>735</td>
<td>614</td>
<td>467</td>
<td>410</td>
<td>363</td>
</tr>
<tr>
<td>% utilization (adjusted)</td>
<td>96</td>
<td>94</td>
<td>92</td>
<td>89</td>
<td>86</td>
<td>84</td>
<td>82</td>
</tr>
</tbody>
</table>

remains at $V_{DD}$ since they are held by P1 while no discharge current is available [Fig. 3(a)]. Also, since both DL and BL are kept at $V_{DD}$, the drain-to-source voltage of $P1$ is very small and, hence, it only sources a negligible current $I_1$ during a read, just enough to complement the leakage current along the BL. On the side where a “0” is stored, the cell sinks a current $I_{cell}$ which is larger than the current $I_0$ sourced by P2. As a result, a discharge current of $I_{discharge} (I_{discharge} = I_{cell} - I_0)$ is available to discharge the BL and DL. $I_{discharge}$ then discharges both the BL and DL to lower voltage levels than $V_{DD}$. The cross-coupled amplifier (P6–P10 and N1–N5) will sense the difference between the DLs and amplify it to a full CMOS logic level. It is worth mentioning here that the $R_{read}$ is triggered high during the write cycle to turn off P0 and P3.

### B. Read Effectiveness

Similar to the current-mode SA in Fig. 1, % utilization is also used to measure the new design’s read effectiveness. All simulated results are presented in Table II. It is found that the % utilization of the new design is higher than 100%. This can be explained as the total supplied current ($I_1 + I_0$) is much smaller than $I_{cell}$. However, during standby, a current equivalent to $I_{discharge}$ mentioned above is used to pre-charge one of the BLs to $V_{DD}$ (the other BL is already at $V_{DD}$ and no charging-up is needed). Therefore, the following equation is used to measure the effectiveness of the read scheme:

$$\text{% utilization adjusted} = \frac{\Delta I}{\text{Total current}} = \frac{\Delta I}{I_1 + I_0 + I_{discharge}}$$

It is shown in Table II that the new design offers a much better % utilization with lower power consumption and stronger discharge current than the current-mode scheme.

### C. Tolerance to the Difference Between $C_{DL}$ and $C_{DD}$

All of the SA designs in the references list are based on the assumption that $C_{DL} = C_{DD}$. Therefore, the voltage difference developed on the DLs is governed by

$$V_{DL} = V_o + \frac{I_{BL} \cdot \Delta t}{C_{DL}}$$

$$\Delta V = V_{DL} - V_{DL}' = \frac{I_{BL} \cdot \Delta t}{C_{DD}} = \Delta t \cdot \frac{\Delta I}{C_{DL}}$$

When $C_{DL}$ differs from $C_{DD}$, the SA works properly only if $(I_{BL})/(C_{DL}) \geq (I_{BL})/(C_{DD})$. With the BL currents presented in Table I, it can only tolerate up to 10% difference between the two DL capacitances with reasonable sensing delay. The new design in contrast does not rely on the relationship between $C_{DL}$ and $C_{DD}$ since only one discharge current is available on one side while on the other side, voltage levels (of both the DL and BL) are kept at $V_{DD}$. Fig. 4 illustrates how the sensing delay changes with the difference between $C_{DL}$ and $C_{DD}$. We varied $C_{DD}$ up to $\pm 50\%$ of $C_{DL}$ and the new SA still works with minimum sensing delay variations.

### IV. PERFORMANCE COMPARISONS

The proposed SA and other existing designs [8], [9], and [13] have been optimized and extensively simulated using Cadence’s Affirma Spectre circuit simulator based on a 0.18-μm CMOS process from CHRT. All four circuits were simulated using a simplified read-cycle-only two columns, two rows memory system. The standard 6T SRAM memory cells were used. The new SA’s active layout area is the smallest among the four (see Table III and Fig. 5). This is due to its simple structure without the current-conveyor and current-mirror pairs. All four designs were tested against $C_{DL}$, $C_{DD}$, and $V_{DD}$ variations. In order to gauge the actual behavior of the circuits, a wide range of $C_{DL}$ and $C_{DD}$ (from 1 to 5 pF) have been used to model the actual parasitic capacitances of the memory array. Post-layout simulation results are presented in Figs. 6–8. Fig. 6 evidently shows that only the proposed design can operate down to a $V_{DD}$ of 0.9 V while [13], [9], and [8] cease to work at $V_{DD}$ equal to 1.2, 1.3, and 1.3 V, respectively. Furthermore, at any supply voltage, the new design outperforms the rest with the smallest
sensing delay. Figs. 7 and 8 demonstrate the superiority of the proposed design over the other circuits at 1.8-V supply voltage against $C_{\text{DL}}$ and $C_{\text{IL}}$ variations. For example, at $C_{\text{DL}} = 1 \text{ pF}$, $C_{\text{IL}} = 5 \text{ pF}$ and the load capacitor $C_L = 0.1 \text{ pF}$, its sensing delay is reduced to 25.8%, 20.3%, and 28.6% and its power consumption is decreased to 37.6%, 46.5%, and 24.9% as compared with [13], [9] and [8], respectively. In addition, the new SA offers an enhanced speed robustness against the varying $C_{\text{DL}}$, giving a sensitivity of only 3 ps/pF, which is better than that of the designs in [13], [9], and [8], which are 3.5, 32, and 45 ps/pF, respectively. Table III provides a summary of performance metrics comparisons for all of the SAs working at 1.8 V and the proposed design working at 1.8 and 0.9 V.

V. 0.55-V Sense Amplifier

In an attempt to reduce the supply voltage to lower than 0.9 V, we proposed a modified cross-coupled inverter which can work at a supply voltage of 0.55 V. Two additional pMOS transistors $P_{\text{11}}$ and $P_{\text{12}}$ were added to the conventional design, as shown in Fig. 9. Furthermore, other transistors were resized so that they are strong enough to work at 0.55 V. $P_{\text{11}}$ and $P_{\text{12}}$ serve as a pre-charge circuit to pre-charge the two nodes E and F to the same potential, i.e., $V_{\text{DD}}$. As a result, a small-sized transistor $P_{\text{10}}$ can be used, thus significantly reducing the switching time of the cross-couple inverter [14]. At 0.55-V supply voltage, its

TABLE III

<table>
<thead>
<tr>
<th></th>
<th>Sensing delay, ns</th>
<th>Average power, mW</th>
<th>Layout area, $\mu$m$^2$</th>
</tr>
</thead>
<tbody>
<tr>
<td>Proposed (1.8 V)</td>
<td>0.26</td>
<td>0.244</td>
<td>376</td>
</tr>
<tr>
<td>Proposed (0.9 V)</td>
<td>1.6</td>
<td>0.019</td>
<td>376</td>
</tr>
<tr>
<td>Charge-transfer (1.8V)</td>
<td>1.01</td>
<td>0.597</td>
<td>568</td>
</tr>
<tr>
<td>Ultra low-power (1.8V)</td>
<td>1.28</td>
<td>0.526</td>
<td>579</td>
</tr>
<tr>
<td>High-speed (1.8 V)</td>
<td>0.91</td>
<td>0.983</td>
<td>659</td>
</tr>
</tbody>
</table>

Fig. 5. Layout of the proposed design.

Fig. 6. Sensing delay versus $V_{\text{DD}}$ variation for the circuits in comparison at $C_{\text{IL}} = 1 \text{ pF}$, $C_{\text{DL}} = 1 \text{ pF}$, and $C_L = 0.1 \text{ pF}$.

Fig. 7. Sensing delay and average power at 50 MHz versus $C_{\text{IL}}$ variation for the circuits in comparison at $C_{\text{DL}} = 1 \text{ pF}$ and $C_L = 0.1 \text{ pF}$.

Fig. 8. Sensing delay and average power at 50 MHz versus $C_{\text{DL}}$ variation for the circuits in comparison at $C_{\text{IL}} = 1 \text{ pF}$ and $C_L = 0.1 \text{ pF}$.  

Authorized licensed use limited to: Nanyang Technological University. Downloaded on February 25, 2010 at 22:14:05 EST from IEEE Xplore. Restrictions apply.
sensing delay is 4.53 ns and consumes 6.72 μW. For the purpose of this paper, we only optimize the SA for it to work at a very low voltage, rather than over a wide range (from 1.8 to 0.55 V), since large transistors will consume a huge amount of power if it operates at 1.8 V. However, it is enough to prove the superiority of the proposed design over its state-of-the-art counterparts. Transistor sizing for the improved circuit is also shown in Fig. 9.

VI. CONCLUSION

A robust high-performance SA is presented. It introduces a new read scheme that creatively combines the current- and voltage-sensing schemes to maximize the utilization of Ieff, hence offering a much better performance in terms of both sensing speed and power consumption. Since only one of the BLs and one of the DLs are discharged to lower levels than VDD while their complementary lines are kept at VDD, the new SA is insensitive to the difference between C_DL and C_DL. This feature helps it to cope with the increasing fluctuation of these parasitic capacitances due to the layout and fabrication processes. The new design can operate in a wide supply voltage range, from 1.8 to 0.9 V with minimum performance degradation. Furthermore, a modified cross-coupled inverter is introduced, which brings down the operating voltage to 0.55 V.

Although this modified version needs larger transistor sizing and only work in a small supply voltage range, both versions of the proposed SAs have conclusively proved the robustness and the suitability of the new read scheme for applications where ultralow voltage, ultralow power, and high speed are crucial design considerations.

REFERENCES