<table>
<thead>
<tr>
<th>Title</th>
<th>Design of a low power wide-band high resolution programmable frequency divider (Published version)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Author(s)</td>
<td>Yu, Xiao Peng; Do, Manh Anh; Jia, Lin; Ma, Jianguo; Yeo, Kiat Seng</td>
</tr>
<tr>
<td>Citation</td>
<td>Yu, X. P. (2005). Design of a low power wide-band high resolution programmable frequency divider. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 13(9), 1098-1103.</td>
</tr>
<tr>
<td>Date</td>
<td>2005</td>
</tr>
<tr>
<td>URL</td>
<td><a href="http://hdl.handle.net/10220/4569">http://hdl.handle.net/10220/4569</a></td>
</tr>
<tr>
<td>Rights</td>
<td>© 2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.</td>
</tr>
</tbody>
</table>
Considering (4) and the maximum theoretical SFDR, which is 90.5 dBc, \( W = 15 \) and \( D = 14 \) are wise choices. Fig. 3 illustrates the corrected design for the second-order approximation method. The phase accumulator output varies between 0 and 6270. The corrected system is simulated in MATLAB. The FFT calculation shows an SFDR equal to 84.6 dBc, which is the maximum achievable SFDR of the design and it approximately satisfies (14).

3) To prove the high-resolution results, the roots of the first derivative of the error between \( T \cdot 42(x) \) and the ideal cosine function are graphically evaluated in [1] and it is shown that they coincide with the maximum and the minimum of the error function. Clearly, this statement is true for any continuous and differentiable function [10] and nothing can be concluded regarding the resolution of the system.

4) The authors in [1] claimed that the output resolution of 13 bits that they have achieved is “more accurate than any prior work.” However, at least there exists a ROM-less DDFS design whose output resolution is better. The design introduced by Madisetti et al. shows 16 bits output resolution with SFDR better than 100 dBc [11].

IV. CONCLUSION

The DDFS system based on QAF method, introduced in [1], whose SFDR is claimed to be 130 dBc, is revisited. It is proven that the theoretical limit of the DDFS design based on QAF method is 90.5 dBc. MATLAB simulation is used to show that the actual SFDR value obtained from the digital implementation of the method given in [1] is 76 dBc not 130 dBc. Finally, the entire design is corrected and it is shown that the practical SFDR achieved by the corrected method is 84.6 dBc.

REFERENCES


Design of a Low Power Wide-Band High Resolution Programmable Frequency Divider

X. P. Yu, M. A. Do, L. Jia, J. G. Ma, and K. S. Yeo

Abstract—The design of a high-speed wide-band high resolution programmable frequency divider is investigated. A new reloadable D flip-flop for the high speed programmable frequency divider is proposed. It is optimized in terms of propagation delay and power consumption as compared with the existing designs. Measurement results show that an all-stage programmable counter implemented with this D flip-flop using the chartered 0.18-\(\mu\)m CMOS process is capable of operating up to 1.8 GHz for a 1.8 V supply voltage and a 5.8-mW power consumption. By using this counter, an ultra-wide range high resolution frequency divider is achieved with low power consumption for 5–6-GHz wireless LAN applications.

Index Terms—CMOS digital integrated circuits, flip-flops, frequency dividers, frequency synthesizers, phase locked loops.

I. INTRODUCTION

The frequency synthesizer is one of the basic building blocks in modern communication systems. The operating frequency of the frequency synthesizer is limited by the frequency divider and the voltage-controlled oscillator. The function of channel selection in the frequency synthesizer demands programmable division ratios for

Manuscript received March 9, 2005. The authors are with the Center for Integrated Circuits and Systems, School of Electrical and Electronic Engineering, Nanyang Technological University, 639798 Singapore (e-mail: xyup@pmail.ntu.edu.sg). Digital Object Identifier 10.1109/TVLSI.2005.857153
the frequency divider. The integer-N frequency synthesizer is more practical, less costly and of low spurious sideband performance as compared with the fractional-N frequency synthesizer [1]. It is usually formed by a prescaler, a program counter (P counter) and a swallow counter (S counter), as shown in Fig. 1 [1]. Such a topology can provide a programmable division ratio of $N \times P + S$, where $N$, $P$ and $S$ are the division ratios of three blocks respectively. The prescaler provides a dual-modulus of $N/N + 1$. The $P$ counter provides a fixed division ratio according to the requirement of the overall division ratio, while the continuous division ratios from 3 to $2^n$ is achieved through the $S$ counter by periodically reloading the divide-by-2 stages, where $n$ is the number of stages of the $S$ counter. The continuous division ratio is used to select the desired channels.

Much research has been focused on the prescaler design for its highest operating frequency [2]. However, in the modern communication system, there is an increasing demand for multi-standards applications. For example, in the 5–6-GHz band, there are HIPERLAN I/II and IEEE 802.11a standards, they work within the similar band but at different center frequencies and channel spacings. Although many frequency synthesizers for the 5–6-GHz wireless LAN applications have been reported [1], [3], the requirement for wide band and high resolution operations continue to be the problems. To satisfy these requirements, different reference frequencies, and different arrangement for $N$, $P$ and $S$ counters are selected for different applications [1], [4]. For example, only the UNII bands are covered in [1], [3]. In this paper, a new wide-band high resolution programmable frequency divider is proposed. The wide band and high resolution are obtained by using the all-stage programmable topology in both counters.

### II. DESIGN CONSIDERATIONS

The high-speed digital frequency divider is usually formed by cascading divide-by-2 stages. However, this method will only give the division ratios equal to the power of 2. The reloadable digital counter offers a better solution providing a frequency division ratio continuously ranging from 3 to $2^n$, where $n$ is the number of divide-by-2 stages [5]. In a reloadable counter, the number of the input pulses is accumulated until it is equal to a preset value when the counter is reloaded. By changing the preset value, a programmable division ratio is achieved. The counter is fully programmable if all the divide-by-2 stages are reloadable. Each of this reloadable stage is called a bitcell.

In many existing design of integer-N frequency dividers, $N$ and $P$ are fixed while $S$ is variable to get the desired output frequencies [1]. If both the $P$ counter and $S$ counter are programmable, more division ratios can be obtained. Thus, for a desired operating band, the high resolution can be obtained by using a smaller division ratio for the prescaler while increasing the range of the counter’s division ratios. However, in the design of such a wide band high resolution frequency divider, the digital counter must be able to operate at high operating frequencies within the constraints of low power consumption. Therefore, the design a high speed low power counter is the key consideration.

![Fig. 1. An integer-N frequency divider.](image)

The key parameters of high-speed digital circuits are the operating frequency and the power consumption. The maximum toggle frequency of the individual bitcell is inversely proportional to the propagation delay [6].

For the power consumption, the major source of power dissipation in digital CMOS circuits is the switching power [6]:

$$P_{\text{switching}} = C_L V_d^2 f_{\text{clk}}$$

where $C_L$ is the load capacitance and $f_{\text{clk}}$ is the input frequency.

### III. DESIGN OF A HIGH SPEED LOW POWER DIGITAL COUNTER

The toggled TSPC DFF, as shown in Fig. 2, is the most popular divide-by-2 unit in the single-end frequency divider design due to its high operating frequency and low power consumption [7].

The toggled TSPC DFF can only function as a divide-by-2 stage. However, the function of programmable in frequency divider requires the bitcell to be reloadable. The general block view of the reloadable bitcell in a programmable counter is shown in Fig. 3. For the bitcell, if the reload signal $L D$ is logically low, the bitcell is functioned as divide-by-2. When $L D$ is logically high, the input data signal will be disabled, and the programmable bit signal $P I$ will be loaded to output. Because the reload can be triggered at any time, the total propagation delay, the sum of the divide-by-2 delay and the reload delay, decides the maximum operating frequency. As the reload function is only triggered once within the entire division cycle, for a division ratio much larger than 2, the power consumption when the bitcell is functioned as a divide-by-2 unit can be regarded as the power consumption of the bitcell.

In [8], a novel reloadable TSPC-based bitcell is proposed, as shown in Fig. 4.

$Q$ is fed to node $D$ when the bitcell performs the divide-by-2 function. In this bitcell, the upper block which is a traditional 9-transistor TSPC D flip-flop (DFF) functions as a divide-by-2 unit, while the lower part provides the reload function. With the input of signal $L D$, $P M 5$ will pull up the output to logically high or to keep it at logically low depending on the value $P I$. When $L D$ is logically low, the bitcell is a divide-by-2 unit. In difference with a single TSPC unit, $C_{\text{dissipation}}$ will be added to the load capacitance for the final stage. The other delay is...
the reload delay, which is defined as the propagation delay of $PI$ to the output. This step includes three stages of propagation delay which is larger than that of the TSPC divide-by-2 operation. Compared with a toggle TSPC divide-by-2 bitcell, the additional power consumption in this bitcell can be considered as due to the additional inverters’ switching power, while $P5$ creates a short circuit power because the direct path is established periodically with the switching of $N3$. Moreover, $N3$ is also periodically switched by $Qbo f$. Finally, the bitcell has a complex structure with 30 MOS transistors.

A novel bitcell with an integrate reload function into the TSPC based $D$ flip-flop is first proposed in [9], [10], as shown in Fig. 5. This bitcell greatly reduces the circuit complexity of the reloadable bitcell in [8]. However, it may suffer a glitch at the high frequency. Since the $LD$ signal is the logical result of outputs of all stages of the counter, $LD$ will lags behind $D$. If the output has changed from logically high to low and $LD$ becomes logically high to load $PI$ of a logic high to the output, because the late arrival of $LD$, the output will generate a glitch. Such a glitch will cause an erroneous trigger to the next stage by charging the node $A$ at the next stage if a programmable counter based on [9] is used. As a result, the bitcell is not suitable for high speed applications.

A new bitcell is proposed which has an improved architecture to solve the impact of the glitch as shown in Fig. 6. $Q$ is fed to node $D$ when the bitcell performs the divide-by-2 function. It is modified from the bitcell proposed in [11] and based on the similar method as in [9]. The number of MOS transistors has been reduced. Moreover, to achieve a higher operating frequency, the critical path for the signal has been optimized by placing the $LD$ or $\overline{LD}$ controlled MOS transistors near the supply voltage or ground. In the proposed bitcell, during reload, $\overline{PI}$ is loaded to $Q$ instead of $S2$ to reduce the delay from $PI$ to the output. Moreover, in the proposed bitcell, there is a $LD$ controlled feedback to remove the erroneous charge in $S2$ during the reload. The charge at node $A$ will have no impact on node $S2$. Therefore the charging voltage in the node $A$ will not cause an erroneous trigger to the next stage of the bitcell. As a result, the possible glitch in [9] will not have any impact on the proposed bitcell based frequency divider. The power consumption of the $LD$ controlled inverter is negligible since the reload is only triggered once in a whole division cycle. The problem of the glitch in the bitcell of [11] has been solved effectively without any trade-off.

The power consumption of the proposed bitcell can still be analyzed based on the switching activities for the devices. When $LD$ is logically low, the right side is disabled, while the left side is a toggled TSPC divide-by-2 unit. The MOS devices on the right side will be the load capacitors of the divide-by-2 unit. The increase of the power consumption as compared with the other bitcells can be calculated and simulated by adding the capacitances to that of a TSPC divide-by-2 unit. For the propagation delay of the bitcell, if $LD$ is logically low, the bitcell functions as a divide-by-2 unit. The right side of the bitcell is blocked, while at the left side, $LD$ and $\overline{LD}$ controlled PMOS and NMOS will be switched on constantly and can be modeled as fixed resistors, which can be treated as a source resistor for the switching MOS transistors [6]. When $LD$ is logically high, the left side of the bitcell is blocked, $\overline{PI}$ is loaded to $Q$ by only passing one stage of $LD$ controlled inverter. The total delay for the proposed bitcell is still less than that of previous bitcells even the delay for this divide-by-2 increases due to the additional load capacitances.

The comparison of the performances of the three bitcells, namely the TSPC divide-by-2 unit, the bitcell in [8] and the proposed bitcell is simulated using the Cadence SPECTRE RF for the Chartered 0.18 $\mu$m CMOS process. In the high-speed digital circuit, for lower load capacitor, the length of transistors is kept at the minimum (0.18 $\mu$m). For simplicity, the ratio of the width of PMOS over the width of NMOS is 2 [12]. Fig. 7 shows the calculated and the simulated propagation delay and power consumption of the three bitcells for different transistor size. Here the propagation delay is the sum of the divide-by-2 and reload delays. The proposed bitcell achieves about 20% reduction from the power consumption of the bitcell in [8]. The total delay of the proposed bitcell is much lower than that of the bitcell in [8] due to the great reduction in reload delay even the delay of divide-by-2 function is larger. Since the operating frequency is inversely proportional to the total propagation delay, about 25% improvement of the operating frequency is foreseeable.

Moreover, the results are also verified for these circuits fabricated under the same process. The simulated and measured power consumption vs. operating frequency for the three bitcells: single TSPC divide-by-2 unit, the proposed bitcell and the bitcell in [8] are shown in Fig. 8. Because of the glitch problem, the bitcell in [9] could not be simulated to provide comparable results. The input signal is a square waveform with a rail-to-rail logic. The single TSPC divide-by-2 unit...
Fig. 7. Simulation results of the three bitcells.

Fig. 8. Measurement results of the three bitcells.

has the lowest power consumption while the proposed bitcell achieves about 30% reduction in the power consumption of the bitcell in [10]. The measurements for the three dies are carried out with the same conditions of input frequency and input swing. (The power consumption for the buffer is excluded.)

IV. SIMULATION AND MEASUREMENT RESULTS OF THE COUNTER

A programmable counter based on this bitcell is constructed using the architecture as proposed in [5]. To compare the performance of this proposed bitcell with that used in the counter in [5], a six-stage-counter has been realized also using a 0.8 μm CMOS process. Simulation shows the proposed counter is able to work up to 1 GHz with a power consumption of 13.5 mW under the supply voltage of 5 V. In [5], the counter achieves 723 MHz with a 17.2-mW power consumption under the same supply voltage. Further simulations, layout design and fabrication of the proposed counter are, however, based on the Chartered 0.18 μm 1P6M CMOS process. The active area of this programmable counter is about 150 × 75 μm² while the total die size is 300 × 400 μm² including the test pads. The test chip is shown in Fig. 9.

For test proposal, the divide-by-32/33 (PI information of 100 001 and 100 000, respectively, for six stages) is used to illustrate that the chip can work properly since the total propagation delay of the first stage decides the maximum operating frequency.

Fig. 10 shows the post-layout simulation and measurement results for power vs. operating frequency of this programmable counter. In the post-layout simulation, for a 1.8-V supply voltage, this programmable counter achieves a maximum operating frequency of 2 GHz and dissipates about 4.7 mW. It can operate up to 1.67 GHz and consumes 2.4 mW for a 1.5-V supply voltage as illustrated by the dashed circled line. Measurement are carried out on-wafer with an RF probe station, the input signal is provide by the HP E4433B 0.25 MHz–4 GHz signal generator, while the output signals are captured by Lecroy Wavemaster 8600A 6G oscilloscope. Measurement shows that the proposed counter can work up to 1.5 GHz with the 1.5-V supply voltage and the power consumption is 3.5 mW (the power-hungry output buffer consumes 8 mW), while it can operate up to 1.8 GHz with a power consumption of 5.8 mW (buffer consumes 10 mW) for a supply voltage of 1.8 V. Fig. 11 shows the output transient waveform for the divide-by-33 function with an input signal of 1.8 GHz. The proposed counter is the first to achieve the GHz operation for the all-stage programmable counters with a low power consumption. The power/frequency of the proposed counter is 3.2 μW/MHz, while this value is 28.6–34.3 μW/MHz in [8], which is derived from the measurements in [8] for the operating range.
V. A WIDEBAND HIGH RESOLUTION FREQUENCY DIVIDER

A wide range and high resolution programmable frequency divider for wireless LAN applications can be constructed based on the proposed counter by using the structure in [1]. The prescaler can be implemented by any of the topology in literatures [1], [2], [4]. For this implementation, a low power divide-by-4/5 dynamic prescaler in [4] is used. The S counter has the same topology as P counter except for the only difference in the bitcell design where the $L_D$ signal for the left part in the proposed bitcell is replaced with a STOP signal to provide the modulus control signal of the prescaler [9]. The $P$ and $S$ counters in this design are all implemented with the proposed bitcells which are all-stage programmable. This frequency divider for 5–6-GHz applications has a division ratio of $4 \times P + S$. Such division ratios will be able to cover all the demand frequencies for the wireless LAN standards in 5–6-GHz range. Here the reference frequency of 5 MHz is used to cover all the center frequencies of the two standards. The frequency divider achieves a low power consumption of 18 mW in the post-layout simulation. An ultrawide range divider of low power consumption is achievable by using the proposed programmable counter. This topology can also be used in the 2.4-GHz wireless LANs, such as IEEE 802.11b/g and Bluetooth standards.
VI. CONCLUSION
The design difficulties of the wide-band high resolution programmable frequency divider for multi-standard application are investigated. A high speed low power counter is successfully implemented for multi-standard operations. Measurements results show the first GHz all-stage programmable divider with low power consumption is achievable with the proposed bitcell. The application of this counter in a 5-GHz wide-range frequency divider is described.

REFERENCES

Level-Shifter Free Design of Low Power Dual Supply Voltage CMOS Circuits Using Dual Threshold Voltages
Abdulkadir Utku Diril, Yuvraj Singh Dhillon, Abhijit Chatterjee, and Adit D. Singh

Abstract—Usage of dual supply voltages in a digital circuit is an effective way of reducing the dynamic power consumption due to the quadratic relation of supply voltage to dynamic power consumption. But the need for level shifters when a low voltage gate drives a high voltage gate has been a limiting factor preventing widespread usage of dual supply voltages in digital circuit design. The overhead of level shifters forces designers to increase the granularity of dual voltage assignment, reducing the maximum obtainable savings. We propose a method of incorporating voltage level conversion into regular CMOS gates by using a second threshold voltage. Proposed level shifter design makes it possible to apply dual supply voltages at gate level granularity with much less overhead compared to traditional level shifters. We modify the threshold voltage of the high voltage gates that are driven by low voltage gates in order to obtain the level shifting operation together with the logic operation. Using our method, we obtained an average of 20% energy savings for ISCAS’85 benchmark circuits designed using 180-nm technology and 17% when 70-nm technology is used.

Index Terms—CMOS, critical-path, low voltage, low-power design.

I. INTRODUCTION
Dynamic energy consumption in CMOS circuits is proportional to the square of the supply voltage. This makes dual supply voltage usage popular for energy reduction. Since the speed of a gate decreases with decreasing supply voltage, dual supply voltage techniques [1]–[8] put low-voltage gates on the noncritical paths and high-voltage gates on the critical paths. This reduces the energy consumption in the low voltage gates while keeping the circuit delay unchanged.

Gate level dual supply voltage usage in CMOS circuits may suffer from excessive leakage energy if low voltage gates directly drive high voltage gates. In these situations, the PMOS transistor in the high voltage gate is not turned off completely with the low voltage “logic high” input signal.

This leads to the use of level shifters wherever low voltage gates drive high voltage gates. To reduce or eliminate the delay, area, and energy overhead of the level shifters, researchers have proposed clustered voltage scaling (CVS) [1], [2] and module level voltage scaling (MLVS) [3], [4]. In CVS, low voltage clusters are constructed in the circuit in such a way that there is no low voltage gate driving a high voltage gate. This is done by assigning low supply voltage to the gates starting from the circuit outputs depending on their slacks, eliminating the use of level shifters in the combinational logic. MLVS assigns the dual supply voltages to partitions of the circuit, reducing the number of level shifters needed. Clearly both of these methods introduce ad-