<table>
<thead>
<tr>
<th><strong>Title</strong></th>
<th>Stack sizing for optimal current drivability in subthreshold circuits</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Author(s)</strong></td>
<td>Keane, John.; Eom, Hanyong.; Kim, Tony Tae-Hyoung; Sapatnekar, Sachin.; Kim, Chris H.</td>
</tr>
<tr>
<td><strong>Date</strong></td>
<td>2008</td>
</tr>
<tr>
<td><strong>URL</strong></td>
<td><a href="http://hdl.handle.net/10220/6269">http://hdl.handle.net/10220/6269</a></td>
</tr>
</tbody>
</table>

© 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder. http://www.ieee.org/portal/site This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
design can be easily expanded to a hierarchical 64-bit adder such that the result will be attained in four cycles.

ACKNOWLEDGMENT

The authors would like to thank National Chip Implementation Center (CIC) for providing the service of the chip fabrication.

REFERENCES


Stack Sizing for Optimal Current Drivability in Subthreshold Circuits

John Keane, Hanyong Eom, Taehyung Kim, Sachin Sapatnekar, and Chris Kim

Abstract—Subthreshold circuit designs have been demonstrated to be a successful alternative when ultra-low power consumption is paramount. However, the characteristics of MOS transistors in the subthreshold region are significantly different from those in strong inversion. This presents new challenges in design optimization, particularly in complex gates with stacks of transistors. In this paper, we present a framework for choosing the optimal transistor stack sizing factors in terms of current drivability for subthreshold designs. We derive a closed-form solution for the correct sizing of transistors in a stack, both in relation to other transistors in the stack, and to a single device with equivalent current drivability. Simulation results show that our framework provides a performance benefit ranging up to more than 10% in certain critical paths.

Index Terms—Logical effort, subthreshold logic, ultra low power design.

I. INTRODUCTION

Due to the robust nature of static CMOS logic, circuits in this technology family can operate with supply voltages below the transistor threshold voltage ($V_{TH}$), while consuming orders of magnitude less power than in the normal strong-inversion region. The operating frequency of subthreshold logic is much lower than that of regular strong inversion circuits ($V_{TH} > V_{dd}$) due to the small transistor current, which consists entirely of leakage current. The low operating frequency and low supply voltage combine to reduce both dynamic and leakage power, leading to the significant power savings seen in subthreshold designs.

Subthreshold logic holds promise for the growing number of applications in which minimal power consumption is the primary design constraint. Such circuits have received much attention in recent research, and a number of successful designs have been demonstrated. A multiplexer-based SRAM was proposed for subthreshold operation by Wang and Chandrakasan [1]. They also introduced new tiny-XOR circuits and demonstrated their performance in a fast Fourier transform (FFT) processor running at a supply voltage of 180 mV. Kim et al. [2] presented a new high-density SRAM system operating down to 200 mV at the ISSCC’07. In [3], Kim et al. built an ultra-low-power adaptive filter for hearing aid applications using subthreshold logic. Subthreshold friendly logic styles and massively parallel digital signal processing (DSP) architectures were used in that work to achieve low-voltage operation.

The characteristics of MOS transistors in the subthreshold region are significantly different from those in the strong inversion region. The saturation current, which was a near-linear function of the gate and threshold voltages in the strong inversion region, becomes an exponential function of those values in the subthreshold regime [4]. In this paper, we show that the sizing methods used to obtain maximum performance must be reformulated for use in subthreshold designs due to these different characteristics. In particular, we present a framework

Manuscript received October 22, 2006; revised May 16, 2007.

The authors are with the Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455 USA (e-mail: jkeane@ece.umn.edu; eomxx001@ece.umn.edu; thkim@ece.umn.edu; sachiin@ece.umn.edu; chriskim@ece.umn.edu).

Digital Object Identifier 10.1109/TVLSI.2008.917571
for choosing the optimal transistor stack sizing factors in terms of current drivability for subthreshold circuits. A closed-form solution for the optimal sizing of stacked transistors is derived and shown to match simulation results. Our theoretical sizing values closely match those found in simulations with predictive technology model (PTM) [5], [6] devices ranging from 130-nm technology down to the 45-nm node. This sizing method is shown to provide a clear benefit in logic paths containing a large number of stacks where the nodal capacitance is not dominated by the increased device sizes used in our method.

II. OPTIMAL TWO-STACK SIZING

A. Optimal Ratio Between Two Stacked Devices

The first step we take in developing the subthreshold stack sizing framework is finding the optimal width ratio between transistors in a stack for maximum drive current. Here, we will present a closed-form expression for the relative sizing of two transistors in a stack, showing that it is beneficial to size up the transistor nearest to the supply rail ($V_{dd}$ for PMOS, ground for NMOS). The starting point is the following pair of current equations for upper and lower transistors as situated in an NMOS stack (so the lower device is connected to ground), excluding the common factors that will cancel out when they are equated:

$$I_U = W_U e^{\frac{(V_{dd} - V_T) + \gamma (V_{dd} - V_T)}{mV_T}} \left(1 - e^{-\frac{V_{dd} - V_T}{V_T}}\right)$$

$$I_L = W_L e^{\frac{V_{dd} + \gamma (V_{dd} - V_T)}{mV_T}} \left(1 - e^{-\frac{V_{dd} - V_T}{V_T}}\right).$$

Here, $W_U$ and $W_L$ denote the upper and lower transistor widths, respectively, and $V_X$ denotes the voltage at the node between those devices. The drain-induced barrier lowering (DIBL) coefficient (a negative number) is represented by $\lambda_d$ and $\gamma$ is the body effect coefficient. The thermal voltage is represented by $V_T$, while $V_{dd}$ stands for the nominal threshold voltage. According to simulation results, $V_X \approx 10\%$ of $V_{dd}$. Each $V_X$ term multiplied by the small DIBL coefficient (ranging from roughly $0.01$ to $0.2$ in current bulk technologies) can then be approximated as $\sim 0$. Moreover, note that $e^{-\frac{(V_{dd} - V_T)}{V_T}} \approx 0$. We use the symbol

$$\alpha = e^{\frac{-\lambda_d V_{dd}}{mV_T}}$$

as well as the fact that $m = 1 + \gamma$, to further simplify calculations. Rewriting the two current equations and equating them yields the following relationship:

$$\alpha W_U e^{\frac{-V_X}{V_T}} = W_L \left(1 - e^{-\frac{V_X}{V_T}}\right).$$

Solving for $V_X$ and using the definition $V_T = kT/q$ gives us

$$V_X = \frac{kT}{q} \ln \left(1 + \frac{\alpha W_U}{W_L}\right).$$

We then define $W_T = W_U + W_L$ to eliminate $W_L$, which results in the following current equation:

$$I_U = I_L = \frac{\alpha W_U (W_T - W_L)}{\alpha W_U + W_T - W_L} e^{\frac{V_{dd} - V_T}{mV_T}}.$$

We find the optimal size for $W_U$ by setting $(\partial I_U / \partial W_U)$ equal to zero. Again, using our definition of $V_T$, we then find the optimal size for $W_L$. This derivation results in the following equations:

$$W_U = \frac{W_T}{1 + \sqrt{\alpha}}$$

$$W_L = \frac{W_T}{1 + \sqrt{\alpha}}.$$

According to these results, we expect to drive a higher current through the two-transistor stack when the lower device is larger than the upper transistor by a factor of $\sqrt{\alpha}$. For example, with an NMOS stack in 90-nm PTM technology, when using a $W_U$ of 1 $\mu$m, the optimal $W_L$ would be 1.23 $\mu$m at $V_{dd} = 0.2$ V, and 1.30 $\mu$m at $V_{dd} = 0.3$ V. As shown in (3), $\alpha$ is a function of $V_{dd}$, resulting in the different optimal width ratios for different $V_{dd}$ values.

HSPICE simulations using 45–130 nm PTM technology files closely match the results of our derivation and verify that the benefit of using the $\sqrt{\alpha}$ sizing ratio is more pronounced for larger $\alpha$ values (i.e., when the supply voltage is larger). PMOS transistor stacks exhibited the same sizing trends—optimal sizing requires the upper transistor (adjacent to the power supply) to be sized up by a factor of $\sim \sqrt{\alpha}$.

Fig. 1. DC current in stacks of two devices for a range of $W_U : W_L$ sizing ratios. The total width of the stacked devices is held constant at 1 $\mu$m. The small benefits derived by using skewed stack sizing are indicated in the upper corners of the plots. (a) NMOS $W/U$ ratio. (b) PMOS $W/U$ ratio.

Authorized licensed use limited to: Nanyang Technological University. Downloaded on February 25, 2010 at 21:48:04 EST from IEEE Xplore. Restrictions apply.
TABLE I
NMOS STACK SIZING FACTORS

<table>
<thead>
<tr>
<th>Vdd</th>
<th>Sizing Method</th>
<th>130nm</th>
<th>90nm</th>
<th>65nm</th>
<th>45nm</th>
</tr>
</thead>
<tbody>
<tr>
<td>0.2V</td>
<td>simulation</td>
<td>2.19</td>
<td>2.30</td>
<td>2.42</td>
<td>2.66</td>
</tr>
<tr>
<td></td>
<td>theory</td>
<td>2.39</td>
<td>2.52</td>
<td>2.67</td>
<td>3.04</td>
</tr>
<tr>
<td>0.3V</td>
<td>simulation</td>
<td>2.27</td>
<td>2.44</td>
<td>2.64</td>
<td>3.11</td>
</tr>
<tr>
<td></td>
<td>theory</td>
<td>2.50</td>
<td>2.70</td>
<td>2.93</td>
<td>3.57</td>
</tr>
<tr>
<td>1.2V</td>
<td>simulation</td>
<td>1.58</td>
<td>1.60</td>
<td>1.63</td>
<td>1.69</td>
</tr>
</tbody>
</table>

TABLE II
PMOS STACK SIZING FACTORS

<table>
<thead>
<tr>
<th>Vdd</th>
<th>Sizing Method</th>
<th>130nm</th>
<th>90nm</th>
<th>65nm</th>
<th>45nm</th>
</tr>
</thead>
<tbody>
<tr>
<td>0.2V</td>
<td>simulation</td>
<td>2.33</td>
<td>2.48</td>
<td>2.68</td>
<td>3.00</td>
</tr>
<tr>
<td></td>
<td>theory</td>
<td>2.45</td>
<td>2.66</td>
<td>2.90</td>
<td>3.34</td>
</tr>
<tr>
<td>0.3V</td>
<td>simulation</td>
<td>2.60</td>
<td>2.85</td>
<td>3.20</td>
<td>3.95</td>
</tr>
<tr>
<td></td>
<td>theory</td>
<td>2.57</td>
<td>2.88</td>
<td>3.28</td>
<td>4.13</td>
</tr>
<tr>
<td>1.2V</td>
<td>simulation</td>
<td>1.98</td>
<td>2.08</td>
<td>2.05</td>
<td>2.15</td>
</tr>
</tbody>
</table>

B. Optimal Two-Stack Sizing Factor

After deciding to use a 1:1 ratio for the two devices in a stack, we must find the amount by which they should be sized up to drive the same current as a single transistor. Defining \( W = W_T = W_L \) as the size of each transistor in the stack, we can modify (6) as follows:

\[
I_U = I_L = \frac{e V_{DD} - V_{th}}{1 + e V_{DD} - V_{th}} = \frac{e V_{DD} - V_{th}}{1 + e} W e^{\frac{V_{DD} - V_{th}}{e}} . \tag{9}
\]

For a single transistor, the current equation is

\[
I = W_{eff} e^{\frac{V_{DD} - V_{th}}{e}} = \alpha W_{eff} e^{\frac{V_{DD} - V_{th}}{e}} . \tag{10}
\]

where \( W_{eff} \) stands for the effective width of this device. From (9) and (10), we have the following relationship:

\[
\alpha W_{eff} = \frac{e V_{DD} - V_{th}}{1 + e} W \Rightarrow W_{eff} = \frac{1}{1 + \alpha} W . \tag{11}
\]

According to this equation, two stacked transistors should be sized up by a factor of \( 1 + \alpha \) in relation to a single device for the same current drivability. Tables I and II display \( 1 + \alpha \) stack sizing values from this theory and from simulation results, demonstrating the validity of (11). DC simulations were performed to find the correct sizing for transistors in a stack which is capable of conducting the same amount of current as a single unit-sized device. Sizing factors found in simulations were slightly smaller than those predicted by the theory derived above due to effects not captured by current (1), but the trend with technology scaling is nearly identical in both cases.

Results indicate that stacks need to be sized up by a larger amount in the subthreshold region compared to the strong inversion region. Also note that NMOS stack sizing factors are significantly smaller in strong inversion due to velocity saturation.

III. ARBITRARY STACK SIZES

A. Proof of the Symmetry of the Lowest \( n - 1 \) Device Widths in an \( n \)-Stack

Building an extensive cell library based on this stack sizing framework requires an extension of our work to stacks of three or more devices. The derivation for the current equation of a three-stack, which follows a similar method as the derivation in Section II-A gives us the following result:

\[
I = \alpha \left[ \frac{W_T - W_1 - W_2 W_1 W_2}{W_T - W_1 - W_2 W_1 + W_1 W_2} \right] e^{\frac{V_{DD} - V_{th}}{e}} . \tag{12}
\]

\( W_1 \) and \( W_2 \) stand for the widths of the two lower transistors in the stack of NMOS devices (see notation in Fig. 2). \( W_T \) is defined as \( W_T = W_1 + W_2 + W_3 \) and is used to eliminate \( W_3 \), the width of the upper device. This equation is symmetric with respect to the widths of the \( W_1 \) and \( W_2 \) transistors, indicating that the optimal sizes for the lower two devices in the stack are equal. We now extend this finding through a straightforward direct proof, which confirms the symmetry of the lower \( n - 1 \) transistor widths in a general \( n \)-stack achieving maximum drive current.

The following equations hold for the drive-current through the transistors in an \( n \)-stack

\[
I_n = \alpha W_n \beta e^{-V_{th}/e} \tag{13}
\]

\[
I_{n+1} = \alpha W_{n+1} \beta e^{-V_{th}/e} \tag{14}
\]

\[
I_2 = \alpha W_2 \beta e^{-V_{th}/e} \tag{15}
\]

\[
I_1 = \alpha W_1 \beta \tag{16}
\]

The \( \nu \) variables are shorthand for \( e^{-V_{th}/e} \) and \( \beta \) stands for \( e^{V_{DD} - V_{th}} \).

Step 1) By setting (16) equal to (17), we can show that

\[
\nu_1 = \frac{W_1 + W_2 \nu_2}{W_1 + W_2} . \tag{18}
\]

Step 2) Next, by setting (15) equal to (16), and solving it for \( \nu_2 \), we have

\[
\nu_2 = \frac{W_3 \nu_3 + W_2 \nu_1}{W_3 + W_2} \tag{19}
\]

where \( W_{2||1} = W_2 W_1/(W_2 + W_1) \) is called the parallel combination of \( W_2 \) and \( W_1 \).
Thus, we have proven that the \( W_{n-1}(\nu_{n-2} - \nu_{n-1}) = W_{n-2}(\nu_{n-3} - \nu_{n-2}) \) from this we find

\[
\nu_{n-2} = \frac{\nu_{n-1} W_{n-1} + W_{[n-2]\|1}}{W_{n-1} + W_{[n-2]\|1}} \tag{20}
\]

where \( W_{[n-2]\|1} \) is the parallel combination of transistors 1 through \( n - 2 \).

Step 3) Finally, setting (13) equal to (14), we can solve for \( \nu_{n-1} \)

\[
\nu_{n-1} = \frac{W_{[n-1]\|1}}{\alpha W_n + W_{[n-1]\|1}}. \tag{21}
\]

We now have the following current equation:

\[
I_n = \alpha W_n \beta \nu_{n-1} = \beta \left[ \frac{(\alpha W_n) W_{[n-1]\|1}}{\alpha W_n + W_{[n-1]\|1}} \right]. \tag{22}
\]

Defining \( W_T = \sum_{i=1}^{n} W_i \) and substituting for \( W_n \) in (22), we get

\[
I_n = \beta \left[ \frac{\alpha \left( W_T - \sum_{i=1}^{n} W_i \right) W_{[n-1]\|1}}{\alpha \left( W_T - \sum_{i=1}^{n} W_i \right) + W_{[n-1]\|1}} \right]. \tag{23}
\]

An examination of (23) shows that the variables \( W_i \) through \( W_{n-1} \) appear symmetrically in the expression. Therefore, when \( I_n \) is optimized, \( W_i \) through \( W_{n-1} \) must have identical values, since setting the partial derivative of \( I_n \) with respect to each \( W_i \) for \( i = 1 \) to \( n - 1 \), will result in a symmetric set of \( n - 1 \) equations.

### B. Optimal n-Stack Sizing Factor

Given the symmetry of the lower \( n - 1 \) device sizes, i.e., \( W_1 = W_2 = \cdots = W_{n-1} \), we have the following general form for \( I_n \) in an \( n \)-stack:

\[
I_n = \beta \left[ \frac{\alpha \left( W_T - (n-1) W_X \right) W_X}{\alpha \left( W_T - (n-1) W_X \right) + W_X} \right]. \tag{24}
\]

To optimize \( I_n \), we set \( \partial I_n / \partial W_e = 0 \) to obtain

\[
W_X = \frac{(\alpha n - \alpha - \sqrt{\alpha})}{(\alpha n^2 - 2\alpha n + \alpha - 1)} W_T. \tag{25}
\]

Using the definition of \( W_T \), i.e., \( W_n = W_T - W_X (n - 1) \), we get

\[
W_X = \left[ \left( \frac{\alpha n^2 - 2\alpha n + \alpha - 1}{\alpha n^2} - \alpha - \sqrt{\alpha} \right) (n - 1) \right]^{-1} = \sqrt{\alpha} \tag{26}
\]

Thus, we have proven that the \( \sqrt{\alpha} \) sizing ratio holds for the general \( n \)-stack case.

As in the two-transistor stack case, the scaling factor of \( \sqrt{\alpha} \) leads to a trivial performance benefit (e.g., a 0.3\% increase in current through a PMOS or NMOS stack in 90-nm technology with a total stack width of \( 1 \mu m \)), so sizing all stacked transistors equally is the best choice in terms of overall design complexity. Using (24) and following the example of (11), we find that each device in an \( n \)-stack should then be scaled up by a factor of \( [1 + \alpha \cdot (n - 1)] \) to set the effective width of the stack equal to that of a single unit transistor (see Fig. 2). Note that all work done here again applies to PMOS stacks in a similar manner. The discrepancies between the larger sizing factors predicted by this theory and those found with simulations become slightly more pronounced as the stack size grows. For PMOS three stacks, the difference stays within the \( \sim 4\% - 7\% \) range, while for large alpha values, NMOS sizing factors are overestimated by up to \( \sim 15\% \) due to second-order effects not captured in (1) and (2).

### IV. SIMULATION RESULTS

#### A. Critical Path: A Chain of Stacks

We tested our sizing with 130-, 90-, 65-, and 45-nm PTM simulations using simple chains of logic gates that are representative of those that may be found in the critical path(s) of ultra low power circuits. In order to isolate the benefits of using the larger stack sizing in sub-threshold operation, a consistent beta ratio (PMOS to NMOS width ratio) of 1.5 was employed across all simulations. This nominal value is close to that used in advanced CMOS process simulation. The sizing factors found with dc simulations as described in Section II-B were used. These experimentally determined numbers closely match our theoretical results, as stated earlier.

The logical effort sizing method was used as a straightforward means of quickly optimizing the delay though a logic path [7]. Logical effort is defined as the ratio of the input capacitance of a gate to that of an inverter driving the same amount of output current. Fig. 3 displays logical effort values based on our stack sizing parameters, as well as the corresponding parasitic delay values.
Parasitic delay represents the delay of a gate driving no load, and is set by the parasitic junction capacitance.

While the additional loading on previous stages created by the larger stack sizes here can degrade the performance of some logic chains, critical paths driving substantial fan-out capacitance, and particularly those containing paths dominated by stacks, do benefit from this sizing. The simple circuit illustrated in Fig. 4 is an example of a critical path whose delay is improved with our stack sizing framework. The fan-out inverter simple circuit illustrated in Fig. 4 is an example of a critical path whose delay results for $V_{dd} = 0.3 \text{ V}$ and $V_{dd} = 0.2 \text{ V}$ are shown in Tables III and IV, respectively. As indicated here, the critical path shifts from the stacks path to the Fast path when using the optimized subthreshold sizing, and the critical delay is consistently reduced. Also, note that the 1.2-V sizing scheme was optimal when operating in strong inversion, with improvements over subthreshold sizing performance ranging from $1\%$ to $12.3\%$.

In logic paths where there are not chains of stacks driving each other in sequence, the larger subthreshold stack sizing becomes less beneficial, or even detrimental in terms of performance, due to its loading effect on the previous stage. For instance, if inverters are inserted between each NAND/NOR pair in the circuit in Fig. 4, improvements in subthreshold with our larger stack sizes are reduced to $\sim 1\%$. In a chain of just NAND gates, the smaller stack sizes used in superthreshold were generally better choices across all supply levels. In detailed optimization schemes, care must be taken to account for transient effects, including the variance of load capacitances as operating conditions change. DC sizing schemes such as the one presented here provide us with intuition about the devices we are constructing circuits with, and a starting point for thorough optimization procedures.

V. Conclusion

We have presented a new stack sizing framework for circuits operating in the subthreshold region. A closed-form solution for the optimal width ratio between different devices within a stack, as well as the sizing factor for stacked transistors was presented and shown to closely match experimental results. Our optimization scheme resulted in performance gains of up to $104\%$ in simulations of critical paths where internal node capacitance is not dominated by the increased stack sizing factors.

![Fig. 4. Representative chain of logic gates with FO4 at each output.](image)

**TABLE III**

<table>
<thead>
<tr>
<th>Technology</th>
<th>Conventional 1.2V sizing</th>
<th>Subthreshold 0.3V sizing</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Delay</td>
<td>Crit. Path</td>
</tr>
<tr>
<td>130nm</td>
<td>14.86ns</td>
<td>Stacks</td>
</tr>
<tr>
<td>90nm</td>
<td>14.10ns</td>
<td>Stacks</td>
</tr>
<tr>
<td>65nm</td>
<td>16.14ns</td>
<td>Stacks</td>
</tr>
<tr>
<td>45nm</td>
<td>24.23ns</td>
<td>Stacks</td>
</tr>
</tbody>
</table>

**TABLE IV**

<table>
<thead>
<tr>
<th>Technology</th>
<th>Conventional 1.2V sizing</th>
<th>Subthreshold 0.2V sizing</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Delay</td>
<td>Crit. Path</td>
</tr>
<tr>
<td>130nm</td>
<td>98.12ns</td>
<td>Stacks</td>
</tr>
<tr>
<td>90nm</td>
<td>96.25ns</td>
<td>Stacks</td>
</tr>
<tr>
<td>65nm</td>
<td>113.8ns</td>
<td>Stacks</td>
</tr>
<tr>
<td>45nm</td>
<td>174.6ns</td>
<td>Stacks</td>
</tr>
</tbody>
</table>

REFERENCES


