<table>
<thead>
<tr>
<th><strong>Title</strong></th>
<th>A 0.2 V, 480 kb subthreshold SRAM with 1 k cells per bitline for ultra-low-voltage computing.</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Author(s)</strong></td>
<td>Kim, Tony Tae-Hyoung; Liu, Jason.; Keane, John.; Kim, Chris H.</td>
</tr>
<tr>
<td><strong>Citation</strong></td>
<td>Kim, T. H., Liu, J., Keane, J., &amp; Kim, C. H. (2008). A 0.2 V, 480 kb subthreshold SRAM with 1 k cells per bitline for ultra-low-voltage computing. IEEE Journal of Solid State Circuits. 43(2), 518-529.</td>
</tr>
<tr>
<td><strong>Date</strong></td>
<td>2008</td>
</tr>
<tr>
<td><strong>URL</strong></td>
<td><a href="http://hdl.handle.net/10220/6332">http://hdl.handle.net/10220/6332</a></td>
</tr>
<tr>
<td><strong>Rights</strong></td>
<td>© 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.</td>
</tr>
</tbody>
</table>
A 0.2 V, 480 kb Subthreshold SRAM With 1 k Cells Per Bitline for Ultra-Low-Voltage Computing

Tae-Hyoung Kim, Student Member, IEEE, Jason Liu, Member, IEEE, John Keane, Student Member, IEEE, and Chris H. Kim, Member, IEEE

Abstract—A 2 μW, 100 kHz, 480 kb subthreshold SRAM operating at 0.2 V is demonstrated in a 130 nm CMOS process. A 10-T SRAM cell allows 1 k cells per bitline by eliminating the data-dependent bitline leakage. A virtual ground replica scheme is proposed for logic “0” level tracking and optimal sensing margin in read buffers. Utilizing the strong reverse short channel effect in the subthreshold region improves cell writability and row decoder performance due to the increased current drivability at a longer channel length. The sizing method leads to an equivalent write wordline voltage boost of 70 mV and a delay improvement of 28% in the row decoder compared to the conventional sizing scheme at 0.2 V. A bitline writeback scheme was used to eliminate the pseudo-write problem in unselected columns.

Index Terms—Low-voltage memory, reverse short channel effect, subthreshold SRAM, voltage scaling.

I. INTRODUCTION

SUBTHRESHOLD logic circuits are becoming increasingly popular in ultra-low-power applications where minimal power consumption is the primary design constraint [1]–[7]. Subthreshold static CMOS logic can operate while consuming roughly an order of magnitude less power than in the normal strong-inversion region. Characteristics of MOS transistors in the subthreshold region are significantly different from those in the strong inversion region. The MOS saturation current, which is a near-linear function of the gate and threshold voltages in that region, becomes an exponential function of those values in the subthreshold regime. This leads to an exponential increase in MOS current variability under process–voltage–temperature (PVT) fluctuations.

Designing robust SRAM memory for subthreshold systems is extremely challenging because of the reduced voltage margin and the increased device variability. Conventional six–transistor (6-T) SRAMs in the subthreshold region fail to deliver the density and yield requirements due to the reduced static noise margin (SNM), poor writability, limited number of cells per bitline, and reduced bitline sensing margin. Previously, 7-T, 8-T and 10-T SRAM cells have been proposed to improve the SNM by decoupling the SRAM cell nodes from the bitline and hence making the read mode SNM equal to the hold mode SNM [6], [8], [9]. Writability has been improved in prior designs by using a higher supply voltage for the write access transistors at the cost of generating and routing the extra supply voltage [6]. The maximum number of cells per bitline in previous subthreshold SRAMs was limited to 256 at 0.3 V [6]. Robust high-density subthreshold SRAMs are indispensable for the successful deployment of subthreshold circuits in emerging ultra-low-power applications.

This paper introduces various circuit techniques for designing robust and high-density SRAMs in the subthreshold regime. The following techniques are proposed to enable a fully functional 480 kb SRAM operating at 0.2 V: 1) decoupled 10-T SRAM cell for read margin improvement; 2) utilizing the reverse short channel effect (RSCE) for write margin improvement; 3) eliminating data-dependent bitline leakage to enable 1 k cells per bitline; 4) Virtual Ground (VGND) replica scheme for improved bitline sensing margin; and 5) writeback scheme for row data preservation in unselected columns during write. A 130 nm SRAM test chip was successfully measured and characterized.

II. PROPOSED SUBTHRESHOLD SRAM DESIGN

A. Decoupled 10-T SRAM Cell

Fig. 1 shows the proposed 10-T SRAM cell and simulated SNM. The proposed SRAM cell consists of a cross-coupled inverter pair (M1, M2, M4, M5), write access devices (M3, M6), and decoupled read-out circuits (M7, M8, M9, M10). The write bitlines (WBL, WBLB) and the read bitline (RBL) are precharged to $V_{DD}$ before the cell is accessed. When read is enabled ($RWL = 1$), RBL is conditionally discharged through pull-down transistors M7, M8, and M9 depending on the QB value. The cell node is decoupled from the read bitline, retaining a hold mode SNM during the read operation. When read is disabled ($RWL = 0$), node $A$ is held to $V_{DD}$ by M10 making the bitline leakage flow from node $A$ to RBL, regardless of the data stored in the SRAM cell. This results in a bitline leakage independent of the cell data allowing a larger number of cells to be attached to a single bitline. Details on this topic will be described in Section II-D. The proposed 10-T SRAM cell has an SNM of 82 mV at a supply voltage of 0.2 V and a temperature of 27 °C while the conventional 6-T SRAM cell SNM is 24 mV under these conditions [Fig. 1(b)]. The SNM of the proposed 10-T SRAM at a supply voltage of 0.2 V is equal to that of the conventional 6-T SRAM cell at 0.4 V [Fig. 1(c)]. In addition, the SNM normalized to supply voltage in Fig. 1(d) shows that the variation of SNM in the proposed 10-T SRAM cell is smaller than that of the conventional 6-T SRAM cell, which is the result of reduced variation in the longer access transistor used in our design to utilize the short channel effect. Further details on this...
Fig. 1. (a) Proposed 10-T SRAM cell with data independent leakage. (b) SNM comparison of conventional 6-T and proposed 10-T SRAM cell. (c) SNM comparison at different corner parameters sweeping supply voltages. (d) Normalized SNM of the results in (c).

topic will be given in Sections II-B and II-C. Write operation is similar to 6-T SRAM cells where the write wordline is asserted (WWL = 1) after new data is loaded onto the write bitlines (WBL, WBLB).

Data retention voltage represents the minimum supply level below which an SRAM cell has a negative SNM. Global process variations and local device mismatches play major roles in determining this voltage. The worst case device corners for the data retention voltage simulation are illustrated in Fig. 2(a). The weak pull-up device connected to Q and the strong pull-down device are the worst case for flipping the logic “1” at node Q. At the other side of the cross-coupled latch, strong pull-up device and weak pull-down device have the largest probability of flipping the logic “0”. The simulated waveforms [Fig. 2(b)] indicate that the proposed 10-T SRAM cell has a data retention voltage of 0.24 V in this worst case scenario. The proposed SRAM has a positive SNM even at the supply voltage of 0.1 V when only global process variation is considered.

B. Impact of RSCE in Subthreshold Operation

Maintaining a sufficient write margin is challenging in subthreshold SRAMs due to the small gate overdrive and large process variation in the write access devices (M3 and M6 in Fig. 1). Virtual supply rails have been used in previous work to improve cell writability [6]. In [6], the cell supply voltage of the selected column becomes floating during write operation. The virtual supply rails collapse making it easier for the write access devices to flip the cell value. However, this technique is not suitable in subthreshold SRAMs as the virtual supply droop cannot be controlled accurately and the SNM is already close to the limitation. Another previous SRAM implementation used a wordline voltage which is higher than the cell voltage to increase the drive current of the write access transistors [6]. However, this technique requires an additional high $V_{DD}$ to be generated and routed.

In this work, we utilize the RSCE in the subthreshold region to improve the cell writability without having to introduce a separate high $V_{DD}$ [10]. RSCE is observed in modern CMOS devices due to the HALO pocket implants used to compensate for the $V_{TH}$ roll-off [10], [11]. RSCE is not a major concern in conventional strong-inversion designs since SCE is dominant in minimum channel length devices in that region. However, in the subthreshold region, only the RSCE is present due to the significantly reduced drain-induced barrier lowering (DIBL) [10]. This causes the $V_{TH}$ to decrease monotonically, and operating current to increase exponentially, with a longer channel...
Fig. 2. (a) Condition for worst case data retention voltage. (b) Simulated waveforms showing a minimum data retention voltage of 0.24 V.

Fig. 3. The reverse short channel on current drivability improvement: (a) Dependency of normalized $V_{TH}$ and drain current on channel length for $V_{DD} = 1.2$ V and 0.2 V. (b) Device cross sections corresponding to A, A', B, and B' in (a).

length as shown in Fig. 3(a). For a fixed device width, maximum current is achieved at a channel length of 0.55 $\mu$m in the subthreshold region, which is 4.6X longer than the minimum channel length (0.12 $\mu$m). Another factor to consider when increasing the channel length for optimal subthreshold sizing is the change in device capacitance because delay and power consumption increases linearly with capacitance.

Fig. 4(a) shows the different components of device capacitance in the subthreshold region. $C_{DEP}$ is the depletion capacitance, $C_{GS,GD}$ is the overlap capacitance, $C_{OX}$ is the oxide capacitance, and $C_J$ is the junction capacitance. To show the impact of increased channel length on device capacitances, the capacitances of a transistor having a constant current is plotted versus channel length in Fig. 4(b). Note that the device width can be reduced as the channel length is increased since RSCE lowers the $V_{TH}$ and exponentially increases the device current. Increasing the channel length alone has no effect on the capacitance at junction ($C_J$, $C_{GS}$, and $C_{GD}$). However, since the device width can be reduced simultaneously for constant current, the sum of junction capacitance and overlap capacitance is reduced as shown in Fig. 4(b). The increase in gate capacitance ($C_G$) is moderate between channel lengths of 0.12 $\mu$m and 0.36 $\mu$m for two reasons. First, the reduction in width makes the increase in gate area smaller. Second, the RSCE associated with longer channel length makes the $C_{DEP}$ smaller since the depletion layer width increases as shown in Fig. 3(b). At channel lengths longer than 0.36 $\mu$m, however, $C_G$ increases linearly since the RSCE is significantly weaker, and gate area must be increased to drive the same current. As a result, the minimum total capacitance for iso-current is obtained at a channel length of 0.36 $\mu$m which is 3X longer than the minimum channel length. By using this optimal channel length, we can reduce delay and power consumption in subthreshold circuits. For equal drive current, device width can be reduced as the channel length is increased, lowering the junction and the overlap capacitances which significantly impact the write power consumption. Utilization of RSCE yields further advantages such as an improved subthreshold slope owing to the longer channel length, and a reduction in the impact of random dopant fluctuation due to the increased gate area [10].
KIM et al.: A 0.2 V, 480 KB SUBTHRESHOLD SRAM WITH 1 K CELLS PER BITLINE FOR ULTRA-LOW-VOLTAGE COMPUTING 521

Fig. 4. Reverse short channel effect utilized for performance improvement. (a) Capacitance in subthreshold MOS device. (b) Capacitance versus channel length for constant current.

C. Utilizing RSCE for Improved Writability, Higher Performance, and Lower Power Consumption

The cell writability in our SRAM design is improved by using write access transistors with a channel length that is 3X the minimum value to utilize the RSCE [Fig. 5(a)]. The stronger drive current enables a robust write operation, and hence lowers the minimum operating voltage. Unlike prior techniques, no additional supply voltage is required for our proposed technique. The bitline capacitance is the sum of the wire capacitance and the capacitance at the junction of the write access transistors. Since neither the junction nor the overlap capacitance change with the increased channel length, the bitline capacitance is not affected.

Simulation results in Fig. 5(b) show that the write operation of the proposed SRAM at 0.2 V is equivalent to that of a conventional scheme using a 0.27 V WWL voltage. Fig. 5(c) and (d) show the write margin simulation results for different supply voltages. Fast pMOS and slow nMOS process parameters were used to represent the worst case write condition. All devices have a minimum channel width (200 nm). A negative write margin in Fig. 5(c) indicates a write failure. Using a channel length of 0.36 μm for M3 and M6, the write margin of the proposed SRAM cell is improved from ~90 mV to 70 mV at 0.2 V. Fig. 5(d) illustrates the equivalent wordline boost normalized to $V_{DD}$. It can be seen that the normalized equivalent wordline boost increases at lower supply voltages, which illustrates the usefulness of the proposed technique in the deeper subthreshold region.
Random dopant fluctuations (RDF) cause parameter mismatches even between devices with identical layout in close proximity [14]. The impact of RDF is more severe in the subthreshold region due to the exponential relationship between the current and threshold voltage [3]. The standard deviation ($\sigma$) of the threshold voltage distribution is known to be proportional to $(WL)^{-1/2}$ [15] where $W$ is the device width and $L$ is the channel length. The gate area of the access transistors $M3$ and $M6$ utilizing RSCE is $0.072 \mu m^2 = 0.2 \mu m \times 0.36 \mu m$ which is 2X larger than the minimum size access transistors in conventional 10-T SRAM cells. This translates into a 58% smaller standard deviation in the threshold voltage reducing the write margin variability in the proposed SRAM cell. Fig. 6(a) and (b) show write margin distributions using Monte Carlo simulation at two different supply levels. It is assumed that each device in the 10-T SRAM has independent threshold voltages which follow a normal distribution. Results are also shown for a 6-T SRAM cell using all minimum channel length devices at 0.2 V and 0.27 V. The average and the standard deviation of the proposed cell’s write margin are 79 mV and 1.4 mV, respectively, which are much superior to those of the conventional cell (65 mV and 15 mV) at 0.2 V. The large improvement comes from the smaller RDF and the increased current drivability of the write access transistors in the proposed 10-T SRAM cell. In addition to the SRAM cells, longer channel length devices are used for the static CMOS gates in the SRAM row decoding path and peripheral read/write circuits to reduce the delay, power consumption, and circuit variability.

D. Data-Independent Bitline Leakage for High Density

The small $I_{on}/I_{off}$ ratio in the subthreshold region limits the number of cells per bitline and negatively impacts the SRAM density. As the number of cells in a bitline increases, bitline leakage from the unaccessed cells can rival the read current of the accessed cell making it difficult to distinguish between the bitline high and low levels. Previous techniques suffer from the data-dependent bitline leakage which can cause the RBL high level to droop or RBL low level to rise based on the data stored.
in the unaccessed cells of a bitline [6], [12]. Fig. 7(a) shows the simplified schematic of the bitline with data-dependent bitline leakage current [6]. For the sake of simplicity, only the cross-coupled inverters and read ports are shown. When reading a “1”, the worst case read bitline (RBL) voltage is determined based on the contention between the pull up current from the accessed cell and the pull down bitline leakage currents from the unaccessed cells. Likewise, when reading a “0”, the contention between the pull down current of the accessed cell and the pull up bitline leakage currents of the unaccessed cells decides the worst case RBL voltage. As the number of cells per bitline increases, the worst case RBL for data “1” decreases and that for data “0” increases due to the bitline leakage current. As a result, the bitline voltage for data “1” may be lower than that for data “0” under the worst case data patterns, which can cause the read buffer to generate an incorrect output as shown in Fig. 7(b). A 0.3 V subthreshold SRAM with 256 cells on a single bitline has been reported in [6]. Our simulations indicate that the maximum number of cells per bitline of the prior design quickly reduces to 16 at a supply voltage of 0.2 V due to the bitline leakage problem.

The proposed 10-T SRAM cell eliminates the data-dependent bitline leakage problem by turning on M10 in Fig. 1(a) when the SRAM cell is unaccessed (RWL = 0). The drain voltage of M10 therefore becomes $V_{DD}$ and forces the leakage current to flow from the cell into the bitline regardless of the data stored. Fig. 8(a) shows the simplified schematic of the proposed bitline with data-independent bitline leakage current. The logic low level is decided by the balance between the pull up leakage current of unaccessed cells and the pull down read current of the accessed cell as shown in Fig. 8(a). The logic high level is close to $V_{DD}$ because both bitline leakage current and cell current are pulling up the RBL. By doing so, RBL voltages for the different logic levels are pinned and are independent of the cell data pattern as described in Fig. 8(b).

![Fig. 8](image.png)

Fig. 8. Effect of data-independent bitline leakage current on bitline voltage. (a) Simplified bitline schematic with data-independent bitline leakage current. (b) Read bitline voltage independency upon data pattern.

![Fig. 9](image.png)

Fig. 9. Simulation results of read bitline voltage with worst case data pattern using nominal process parameter. (a) Conventional scheme with data-dependent bitline leakage current. (b) Proposed scheme eliminating data-dependent bitline leakage current.

in the unaccessed cells of a bitline [6], [12]. Fig. 7(a) shows the simplified schematic of the bitline with data-dependent bitline leakage current [6]. For the sake of simplicity, only the cross-coupled inverters and read ports are shown. When reading a “1”, the worst case read bitline (RBL) voltage is determined based on the contention between the pull up current from the accessed cell and the pull down bitline leakage currents from the unaccessed cells. Likewise, when reading a “0”, the contention between the pull down current of the accessed cell and the pull up bitline leakage currents of the unaccessed cells decides the worst case RBL voltage. As the number of cells per bitline increases, the worst case RBL for data “1” decreases and that for data “0” increases due to the bitline leakage current. As a result, the bitline voltage for data “1” may be lower than that for data “0” under the worst case data patterns, which can cause the read buffer to generate an incorrect output as shown in Fig. 7(b). A 0.3 V subthreshold SRAM with 256 cells on a single bitline has been reported in [6]. Our simulations indicate that the maximum number of cells per bitline of the prior design quickly reduces to 16 at a supply voltage of 0.2 V due to the bitline leakage problem.

The proposed 10-T SRAM cell eliminates the data-dependent bitline leakage problem by turning on M10 in Fig. 1(a) when the SRAM cell is unaccessed (RWL = 0). The drain voltage of M10 therefore becomes $V_{DD}$ and forces the leakage current to flow from the cell into the bitline regardless of the data stored. Fig. 8(a) shows the simplified schematic of the proposed bitline with data-independent bitline leakage current. The logic low level is decided by the balance between the pull up leakage current of unaccessed cells and the pull down read current of the accessed cell as shown in Fig. 8(a). The logic high level is close to $V_{DD}$ because both bitline leakage current and cell current are pulling up the RBL. By doing so, RBL voltages for the different logic levels are pinned and are independent of the cell data pattern as described in Fig. 8(b).

Fig. 9 shows the worst case RBL voltages simulated using HSPICE. It can be seen that the RBL voltage for logic “1” is lower than that for logic “0” in previous scheme [Fig. 9(a)] [6]. However, in this work, a bitline swing of 130 mV irrespective of the column data pattern is achieved at a 0.2 V supply voltage for a 1 k cell bitline [Fig. 9(b)].

**E. VGND Replica Scheme for Improved Sensing Margin**

In subthreshold SRAMs, sense amplifiers are replaced with static inverter type read buffers because it is noise margin that is the key design concern and not the speed [5]. Therefore, these read buffers provide the maximum sensing margin for a given supply voltage due to the full swing in the bitlines. Based on the
fact that the bitline logic levels are insensitive to the column data pattern in our design (Section II-D), a VGND replica scheme is devised to maximize the sensing margin of the read buffers. The proposed VGND replica scheme automatically tracks the optimal read buffer trip point to obtain the largest possible sensing margin. The trip point of the read buffer is set to the middle of the logic high and low levels by using the VGND level generated from a replica bitline as the ground level of the read buffer as shown in Fig. 10. Fig. 10(a) and (b) compare the sensing margin of the proposed scheme with a conventional scheme using a zero ground level. The sensing margin of the conventional scheme degrades significantly as the number of cells per bitline increases because the increased logic “0” level of RBL strengthens the pull down path. However, the trip point of the proposed scheme is always maintained at half the bitline swing because VGND tracks the logic “0” level balancing the strength of pull down device with pull up device. A replica bitline with hardwired data and control signals is used as VGND generator. The reading “0” condition is implemented to generate the logic low level, which is used as the ground level for the read buffers as shown in Fig. 10(c). A single VGND is shared with multiple columns to reduce the area overhead of the replica bitline. Eight columns can share a single VGND generator without generating noise in VGND. VGND level is dependent upon the accessed cell current. Simulation result of VGND at various corner parameters shows a variation of 20 mV, which roughly translates into a trip point variation of 10 mV (Fig. 11). Due to this relatively small variation in trip point, the read buffer can generate robust output data even when the drive current of the devices in the read buffers differ by 5X.

**F. Writeback Scheme for Row Data Preservation**

In a column MUXed array, the write operation still has stability problems because the enabled write wordline is also shared by the unselected columns. This is also referred to as
the pseudo-write (or pseudo-read) problem in conventional 6-T designs. Fig. 12 illustrates this issue where the unselected cells can undergo a write when the WWL signal is asserted while the write bitlines (WBL, WBLB) are precharged to VDD. This is exactly the same condition as the worst case read stability in conventional 6-T SRAMs.

A writeback scheme shown in Fig. 13 is applied to resolve the pseudo-write problem [13]. The write driver consists of a conventional write path and the writeback path. During write operation, read wordline (RWL) and write wordline (WWL) are enabled simultaneously. If the column is not selected for access (Y(\bar{i}) = 0), the write bitlines are kept to VDD and read operation is executed. The writeback signal (WB) is enabled from the rising edge of RWL with additional delay enabling the writeback path and the read data from the read buffer is transferred to D_INT and written back to WBL and WBLB. By rewriting the read data back to WBL and WBLB, there is no voltage difference between write bitlines (WBL, WBLB) and the cell nodes, eliminating the contention current.
III. TEST CHIP IMPLEMENTATION AND EXPERIMENTAL RESULTS

A 1.5 × 4.1 mm² SRAM with 480 kb cells was fabricated in a 130 nm, 8-metal CMOS technology. The cell size is 2.68 × 2.80 μm² using logic design rule. The threshold voltages of nMOS and pMOS are 0.32 V and −0.32 V, respectively. The nominal supply voltage for this process is 1.2 V. No standard IO circuit was used and the supply voltage for subthreshold operation was directly applied to the power pads. The test chip microphotograph is shown in Fig. 14. The test chip contains four SRAM quadrants with different numbers of rows (128, 256, 512, and 1024) to demonstrate our proposed techniques on progressively longer bitlines. Each SRAM quadrant has 256 columns, which are divided by 32 sub-blocks. The size of sub-block with 1024 cells on a bitline is 42.9 × 3181 μm². To verify the effect of RSCE on circuit performance, a replica of the row decoding path was also implemented.

VGND from the replica bitline was measured to validate the proposed sensing scheme. The VGND level corresponds to the logic low level of the bitline. VGNDs of the four quadrants are measured from separate probing pads using a multi-meter. Fig. 15 shows the measurement data. The VGND level depends on the number of cells connected to a bitline and the supply voltage. As the number of cells increases, the amount of leakage current flowing from the unaccessed SRAM cells into the bitline also increases, causing a rise in the VGND level. The normalized VGND voltage also rises significantly as the supply voltage is reduced due to the decreased $I_{on}$-to-$I_{off}$ ratio. This effect is shown in Fig. 15(a) where VGND becomes as high as 50% of the supply voltage at 0.2 V for a bitline with 1 k cells attached. Conventional read buffers will fail under these conditions due to the data-dependent bitline leakage, and the fixed trip point in the read buffers. Our proposed scheme tracks the logic low level using a replica bitline to provide the optimal read margin in the read buffers enabling 1 k cells per bitline. The impact of temperature on the VGND level is small because the change in temperature causes a similar rate of change in both the bitline leakage and cell read current in the subthreshold region, and VGND is determined by the balance between those currents. A 6% change in VGND was measured when varying the temperature from 27°C to 80°C at a supply voltage of 0.2 V [Fig. 15(b)].

Leakage current and power consumption were measured and are summarized in Fig. 16. The leakage current of the 480 k SRAM was 10 μA for a supply voltage of 0.2 V at
27 °C [Fig. 16(a)]. This current increases exponentially as the supply voltage increases. As seen in that figure, the leakage at a supply voltage of 0.2 V is 10% of that at 1.2 V. The total power consumption of the SRAM operating at the maximum frequency with a supply voltage of 0.2 V was 2 uW.

The access time and the maximum operation frequency of the four quadrants were measured. The maximum operating frequency was 100 kHz at 0.2 V and 27 °C for the quadrant with 1 k cells per bitline [Fig. 17(a) and (b)]. The access time difference between the four quadrants was 4X. Operating frequency increases exponentially as the supply voltage is increased due to the subthreshold MOS device behavior [Fig. 17(b)].

The minimum supply voltage for proper read operation is shown in Fig. 18. The quadrants with 128 cells and 1 k cells per bitline were readable at a supply voltage of 0.15 V and 0.17 V, respectively. This difference was caused by the VGND level, which limits the proper operation of the sense amplifier.

Measured waveforms from the replicated row decoding path are shown in Fig. 19(b). For accurate on-chip delay measurements, a differential measurement technique was used where a dummy bypass path was included to cancel out the I/O path delay as shown in Fig. 19(a). Measurement results indicate a 28% delay improvement by utilizing RSCE in the subthreshold region. The devices with longer channel lengths offer a higher drive current per width which in turn is utilized to reduce the junction capacitance for higher performance. Fig. 20(a) shows the read data output waveform at 0.17 V, which demonstrates a 100 kHz operation for the largest quadrant. The implemented SRAM is fully functional.
Fig. 19. Measured performance improvement utilizing RSCE. (a) Block diagram for test circuit implemented. (b) Measured row decoding path delay improvement.

<table>
<thead>
<tr>
<th>TABLE I</th>
<th>COMPARISON BETWEEN OUR DESIGN AND PREVIOUS SUBTHRESHOLD SRAMs</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>This work</td>
</tr>
<tr>
<td>Technology</td>
<td>130 nm CMOS</td>
</tr>
<tr>
<td>Density</td>
<td>480kb</td>
</tr>
<tr>
<td>Number of cells on a bitline</td>
<td>1024</td>
</tr>
<tr>
<td>SRAM cell type</td>
<td>10-T</td>
</tr>
<tr>
<td>Chip size</td>
<td>4.1 × 1.5 mm²</td>
</tr>
<tr>
<td>VDD min</td>
<td>0.2 V @ 1024 cells per bitline, 27 °C</td>
</tr>
<tr>
<td>Performance</td>
<td>120 kHz @ 0.2 V, 27 °C</td>
</tr>
<tr>
<td>Power consumption</td>
<td>2.04 μW</td>
</tr>
</tbody>
</table>

IV. CONCLUSION

A 0.2 V, 480 kb subthreshold SRAM was implemented in a 130-nm process technology. A 10-T SRAM cell is proposed to eliminate the read failure caused by data-dependent bitline leakage. A VGN D replica scheme is proposed to track the logic “low” level of the bitlines under PVT variations, which allows us to achieve the maximum read sensing margin. The strong RSCE in the subthreshold region was utilized to improve cell writability, reduce power consumption, improve logic performance, and enhance circuit immunity to process variations. By combining these proposed circuit techniques, we were able to implement a fully functional subthreshold SRAM with 1 k cells per bitline operating at 0.2 V and 27 °C.

ACKNOWLEDGMENT

The authors would like to thank K. C. Wang, B. Lin, and M. Fisher for assistance with chip fabrication and laboratory equipment.

Fig. 20. Read data waveform at minimum supply voltage.

at 0.2 V for proper read and write operation and the key measured data is summarized in Table I.
His current research interests include theoretical and experimental aspects of VLSI design. He is currently working for a start-up in Los Angeles, CA.

Tae-Hyoung Kim (S’06) received the B.S. and M.S. degrees in electrical engineering from Korea University, Seoul, Korea, in 1999 and 2001, respectively. In 2001, he joined the Device Solution Network Division, Samsung Electronics, Yong-in, Korea. From 2001 to 2005, he performed research on the design of high-speed SRAM memories. He joined the Department of Electrical and Computer Engineering at the University of Minnesota, Minneapolis, in 2005 for a Ph.D. degree. In summer 2007, he was with IBM T. J. Watson Research Center, Yorktown Heights, NY, where he worked on frequency degradation monitoring circuit and isolated NBTI/PBTI test structures. His research interests include low power and high performance VLSI circuit design in nanoscale technologies.

Mr. Kim received a Silver Prize and Honor Prize at the 5th and 7th Humantece Thesis Contest held by Samsung Electronics, Korea in 1999 and 2001, respectively. He was the co-recipient of the 2005 ETRI Journal Paper of the Year Award.

REFERENCES


Jason Liu (M’07) received the B.S. degree in electrical engineering and the B.S. degree in computer engineering from the University of Michigan, Ann Arbor, in 2003 and 2004, respectively. From 2004 to 2005, he worked in the VLSI Circuit Design group at IBM Rochester, where he was a member of the Broadway microprocessor design team, which designed the CPU for the Nintendo Wii. He received the M.S. degree in electrical engineering from the University of Minnesota, Minneapolis, in 2007.

His research interests include high-performance and low-power VLSI circuit design. He is currently working for a start-up in Los Angeles, CA.

John Keane (S’06) received the B.S. degree with highest honors in computer engineering from the University of Notre Dame, South Bend, IN, in 2003. He received the M.S. degree in electrical engineering from the University of Minnesota, Twin Cities, in 2005.

He has completed numerous internships with companies including Seagate and IBM, including a six month assignment with IBM Research in Austin, TX. He is a coauthor of six journal and conference papers. His current research interests include on-chip CMOS sensors for variation and reliability monitoring, and low-power circuit design.

Mr. Keane received a Graduate School Fellowship from the University of Minnesota Graduate School in 2003.

Chris H. Kim (S’98–M’04) received the B.S. degree in electrical engineering and the M.S. degree in biomedical engineering from Seoul National University, Seoul, Korea, in 1998 and 2000, respectively. He received the Ph.D. degree in electrical and computer engineering from Purdue University, West Lafayette, IN.

He spent a year with Intel Corporation where he performed research on variation-tolerant circuits, on-die leakage sensor design and crosstalk noise analysis. He joined the electrical and computer engineering faculty at University of Minnesota, Minneapolis, in 2004. His current research interests include theoretical and experimental aspects of VLSI system design in nanoscale technologies.

Mr. Kim was the recipient of the 2006 and 2007 IBM Faculty Partnership Award, 2005 IEEE Circuits and Systems Society Outstanding Young Author Award, 2005 ISLPED Low Power Design Contest Award, 2003 Intel Ph.D. Fellowship Award, 2001 Magoon’s Award for Excellence in Teaching, and the Best Paper Award in 1999 IEEE-EMBS APBME. He is a co-author of more than 60 journal and conference papers and serves as a technical program committee member for ISLPED, ASSCC, ICCAD, ISQED, and IICDT.