<table>
<thead>
<tr>
<th><strong>Title</strong></th>
<th>FPGA implementation of digital filters synthesized using the frequency-response masking technique (Accepted version)</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Author(s)</strong></td>
<td>Lim, Yong Ching; Yu, Ya Jun; Zheng, H. Q.; Foo, Say Wei</td>
</tr>
<tr>
<td><strong>Date</strong></td>
<td>2001</td>
</tr>
<tr>
<td><strong>URL</strong></td>
<td><a href="http://hdl.handle.net/10220/4589">http://hdl.handle.net/10220/4589</a></td>
</tr>
<tr>
<td><strong>Rights</strong></td>
<td>© 2001 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder. <a href="http://www.ieee.org/portal/site">http://www.ieee.org/portal/site</a> This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.</td>
</tr>
</tbody>
</table>
FPGA IMPLEMENTATION OF DIGITAL FILTERS SYNTHESIZED USING THE FREQUENCY-RESPONSE MASKING TECHNIQUE

Y. C. Lim¹, Y. J. Yu¹, H.Q. Zheng² and S.W. Foo¹
¹ Department of Electrical and Computer Engineering
National University of Singapore, Singapore 119260
² School of Engineering, Temasek Polytechnic
Singapore 529757

ABSTRACT

The effective length of a filter designed using the frequency-response masking technique is very high and requires a very large number of delay elements. In this paper, we present some useful techniques for reducing the data transfer between the FPGA and external memory when the random logic are implemented using FPGA and the delay elements are implemented using external memory such as DRAM.

1. INTRODUCTION

The frequency-response masking technique [1-14] uses a system of sub-filters as shown in Fig.1 to synthesize a very sharp filter. The sub-filter determining the transition band of the filter (called the band-edge shaping filter) is obtained by replacing each delay element of a prototype filter by \( M \) delay elements and so has very long filter length although it has very low arithmetic complexity. When an FPGA is used in the implementation, in order to reduce hardware cost, only the arithmetic unit, random logic functions, and a limited amount of storage locations are implemented on the FPGA. The tap delay line of the band-edge shaping filter and those of the masking filters are usually implemented in low cost external memory such as DRAM.

2. EXTERNAL MEMORY ACCESS

Let the lengths of \( H_a(z) \), \( H_{Ma}(z) \), and \( H_{Mc}(z) \) be \( N_a \), \( N_{Ma} \) and \( N_{Mc} \), respectively. Thus, in every sampling interval, \( N_a + N_{Ma} + N_{Mc} \) samples of signal data must be fetched from memory. Taking the symmetry of the coefficients into consideration there are \( (N_a + N_{Ma} + N_{Mc})/2 \) distinct coefficients. If two address pointers (one points to each of the two data to be multiplied by the same coefficient) for each sub-filter are maintained inside the FPGA, then there will be \( (N_a + N_{Ma} + N_{Mc})/2 \) ROM fetches per sampling interval to fetch the coefficients; otherwise, there will be \( (N_a + N_{Ma} + N_{Mc}) \) ROM fetches per sampling interval.

3. REDUCING MEMORY FETCH FOR \( H_{Ma}(z) \) AND \( H_{Mc}(z) \)

Let \( H_{Ma}(z) \) and \( H_{Mc}(z) \) be given by

\[
H_{Ma}(z) = \sum_{n=0}^{N_{Ma}-1} h_{Ma}(n) z^{-n}
\]

(1)

and

\[
H_{Mc}(z) = \sum_{n=0}^{N_{Mc}-1} h_{Mc}(n) z^{-n}
\]

(2)
$H_{Ma}(z)$ and $H_{Mc}(z)$ may be expressed in their $R$ polyphase components as

$$H_{Ma}(z) = \sum_{r=0}^{R-1} z^{-r} \sum_{n=0}^{N_{Ma}/R-1} h_{Ma}(nR + r) z^{-nR}$$

(3)

and

$$H_{Mc}(z) = \sum_{r=0}^{R-1} z^{-r} \sum_{n=0}^{N_{Mc}/R-1} h_{Mc}(nR + r) z^{-nR}$$

(4)

In (3) and (4), we have assumed that $N_{Ma}$ and $N_{Mc}$ are divisible by $R$. If $N_{Ma}$ and $N_{Mc}$ are not divisible by $R$, then zero valued terms are added to $H_{Ma}(z)$ and $H_{Mc}(z)$ until they are divisible by $R$. The synthesis structure for $R = 3$ is shown in Fig. 2. It can be seen from Fig. 2 that the number of memory accesses is reduced by a factor of $R$ but $R - 1$ additional storage locations are needed to store the outputs of the polyphase filters. Furthermore, the coefficients of the polyphase filters may not be symmetrical.

**Fig. 2** Polyphase implementation of $H_{Ma}(z)$ and $H_{Mc}(z)$ for $R = 3$. Note that the number of memory accesses is reduced by a factor of $R$.

**4. $H_{Ma}(z)$ AND $H_{Mc}(z)$ MAY SHARE THE SAME SET OF DELAY ELEMENTS**

If $H_{Ma}(z)$ and $H_{Mc}(z)$ are implemented as shown in Fig. 3, obviously they may share the same delay elements as shown in Fig. 4. The implementation structure of Fig. 4 is useful if the delay elements of $H_{Ma}(z)$ and $H_{Mc}(z)$ are implemented in FPGA where the total number of delay elements must be reduced to the minimum. The corresponding polyphase structure for $R = 2$ is shown in Fig. 5.

**Fig. 3** An implementation for $H_{Ma}(z)$ and $H_{Mc}(z)$.

**Fig. 4** An implementation for $H_{Ma}(z)$ and $H_{Mc}(z)$ where $H_{Ma}(z)$ and $H_{Mc}(z)$ share the same delay elements assuming that $H_{Ma}(z)$ and $H_{Mc}(z)$ have the same order.

**Fig. 5** Polyphase implementation of the circuit shown in Fig. 4 for $R = 2$. 
5 REDUCING MEMORY FETCH FOR \( H_a(z^M) \)

Let

\[
X(z) = \sum_{n=0}^{\infty} x(n)z^{-n}
\]

(5)

\( X(z) \) may be expressed in its polyphase components as

\[
X(z) = \sum_{m=0}^{M-1} z^{-m} X_m(z^M)
\]

(6)

where

\[
X_m(z^M) = \sum_{n=0}^{\infty} x(nM + m)z^{-nM}
\]

(7)

Since \( U(z) = X(z)H_a(z^M) \), we have

\[
U(z) = \sum_{m=0}^{M-1} z^{-m} X_m(z^M) H_a(z^M)
\]

(8)

Let

\[
U_m(z^M) = X_m(z^M) H_a(z^M)
\]

(9)

From (8) and (9), \( U(z) \) may be written as

\[
U(z) = \sum_{m=0}^{M-1} z^{-m} U_m(z^M)
\]

(10)

It can be seen from (10) that \( U_m(z) \) is the \( m^{th} \) polyphase component of \( U(z) \). From (9), it is obvious that the polyphase components of \( U(z) \) are the polyphase components of \( X(z) \) filtered by the same filter \( H_a(z^M) \). Because the same filter is used to filter all the polyphase components of \( U(z) \), the number of ROM fetch operations to fetch the coefficients of \( H_a(z^M) \) can be reduced if a further delay of \( M \) sampling interval in the computation of \( U(z) \) can be tolerated. Let

\[
U'(z) = z^{-M} U(z)
\]

(11)

\( U'(z) \) may be obtained using the configuration shown in Fig.6. In Fig.6, the down sampling and up sampling processes do not cause any aliasing problem. Let \( h_a(n) \) be the \( n^{th} \) coefficient of \( H_a(z) \). In the implementation of Fig.6, in order to minimize the number of ROM fetch operations to fetch the coefficients of \( H_a(z) \), when \( h_a(n) \) for a particular value of \( n \) is fetched, it should be used to multiply with the corresponding elements of \( X_m(z) \) for all \( m \) before it is discarded. This will reduce the number of ROM fetch operations to fetch the coefficients of \( H_a(z) \) by a factor of \( M \) in the expense of an additional \( 2M \) storage locations on the FPGA.

Fig.6 Implementation of \( H_a(z^M) \) where the input signal is split into its polyphase components.

The filters \( H_a(z) \) in Fig.6 may also be split into its polyphase components to reduce the number of memory fetches for fetching \( x(nM+m) \) in the same way as implementing \( H_{M_0}(z) \) and \( H_{M_1}(z) \) by splitting them into their polyphase components as discussed in Section 3. If each \( H_a(z) \) in Fig.6 is split into \( R \) polyphase components, additional \( MR \) storage locations are needed.

6 CONCLUSION

Several techniques for reducing RAM and ROM fetches in FPGA implementation of filters synthesized using the frequency-response masking technique are presented. The number of signal data fetches from RAM when implementing an FIR filter can be reduced by splitting the filter into its polyphase components. The number of coefficient data fetches in the implementation of the band-edge shaping filter can be reduced by splitting the input signal into its polyphase components. A method for \( H_{M_0}(z) \) and \( H_{M_1}(z) \) to share the same delay elements are also presented.

7 REFERENCES


