# A 200MHz, 3mW, 16-Tap Mixed-Signal FIR Filter

Miguel Figueroa and Chris Diorio

Computer Science and Engineering, University of Washington Box 352350, Seattle, WA 98195-2350, USA Phone: +1 (206) 543-7119, Fax: +1 (206) 543-2969, Email: {miguel,diorio}@cs.washington.edu

## Abstract

We have built a 16-tap, 7-bit, 200MHz, mixed-signal FIR filter that consumes 3mW at 3.3V. The filter uses *p*-channel synapse transistors to store the tap coefficients; electron tunneling and hot-electron injection to modify the tap weights; digital registers for the delay line; and multiplying digital-to-analog converters to multiply the digital delay-line values with the analog tap weights. The measured bandwidth is 225MHz; the measured tap multiplier resolution is 7 bits at 200MHz. The total die area is 0.13mm<sup>2</sup>; we can readily scale the design to higher bit resolutions and longer delay-lines.

## Introduction

There is an ever-present need for low-power, highthroughput signal-processing chips. Digital-signal processing (DSP) chips, although immensely popular, tend to be large and power hungry; special-purpose applications still employ custom digital VLSI circuitry. Nonetheless, even custom solutions require significant die area and power consumption (1). This is mainly as a consequence of the digital adders and multipliers. Although analog circuitry can implement arithmetic functions with low power and area, these circuits present other problems such as offsets, error accumulation, and noise sensitivity. In particular, offsets and signal attenuation make it difficult to implement long tapped delay lines.

We have developed a new mixed-signal approach to signalprocessing that exploits the strengths of new devices called synapse transistors (2) to enable high-throughput, low-power, scaleable VLSI circuits. We have built a FIR filter using this technology that employs digital delay lines, synapse transistors for weight storage and updates, and mixed-signal hardware for compact, low-power arithmetic units. Our 16-tap FIR filter, shown in Fig. 1, operates at 200 MHz with 7-bit accuracy, consuming less than 3mW. The die area is 0.13mm<sup>2</sup> in a 0.35µm process.

#### The Filter

Fig. 2(A) shows the filter architecture. Because analog delay lines are difficult to implement in VLSI, we use a 7-bit digital delay line to shift the input signal across the filter taps. The delay line employs TSPC D flip-flops (3) and uses a 1's complement code to simplify the multiplier design.

Central to our approach are the synapse transistors that store the tap weights. Synapse transistors are EEPROM-like devices that afford nonvolatile analog storage, allow bidirectional updates, and offer resolutions exceeding 12 bits (4). The synapse transistor outputs a current that is proportional to the weight magnitude. We store the sign in a static latch.

We multiply the synapse-transistor current by the digital delay-line values using a differential multiplying digital-to-



Fig. 1. The layout. The filter is  $450\mu$ m wide and  $295\mu$ m high in a triple-metal, double poly,  $0.35\mu$ m process available from MOSIS.



Fig. 2. The filter. Part (A) shows the top-level architecture. We use a 7-bit digital delay line and store the tap on analog weight cells. We implement the multipliers using differential multiplying digital-toanalog converters (MDACs). The chip output is a differential current, comprising the sum of the currents from the 16 MDACs. Part (B) shows the weight-storage circuit for a single tap. We store the tap weight magnitude on p-channel synapse transistors, and the sign on a static latch. The weight magnitude can be changed using selectable tunneling and injection circuitry.

analog converter (MDAC) (5). We use 16 MDACs—one for each delay-line tap. We sum the current output from the 16 MDACs by connecting them to common output wires.

Fig. 2(B) depicts a weight cell. We used a 2-segment MDAC, so we use two identical synapse transistors with a common floating-gate in each cell. We have a common bias input, for all 16 cells, that sets the output current range; this feature lets us trade power for performance and resolution.



Fig. 3. Convolution of a square-wave input with uniform tap coefficients. We set all the tap weights to the same positive value, applied a square-wave input with a 50% duty cycle and clock×32 period, and converted the chip's differential current output to a single-ended voltage using the oscilloscope as a 500hm differential transimpedance amplifier. Each step represents the contribution of a single synapse transistor to the output. These data provide a rough measure of tap-weight uniformity; other measurements, taken on an analog memory cell (4), indicate that synapse-transistor storage can exceed 12-bit resolution. The overshoot is clock feedthrough.

We write the weights selectively using electron tunneling and hot-electron injection (2). Both mechanisms can be active during filter operation.

We show the contribution of each tap weight to the filter output in Fig. 3. We show the filter's bandwidth (output amplitude versus clock frequency) in Fig. 4; the filter maintains its performance for clock rates to 225MHz. Finally, we used the filter to compute correlations in a DS-CDMA decoding application (6); we show the results in Fig. 5. We measured a 42.6dB dynamic range for a single user signal.

## Conclusion

We have built a 16-tap FIR filter that uses synapse transistors for analog weight storage and concurrent weight updates. This approach allows us to use mixed-signal arithmetic units, resulting in a compact, high-speed, low-power design. Because we use a digital delay line, we can scale our solution to a larger number of taps (>>16). On a DS-CDMA decoding application, our filter demonstrated an input dynamic range of 42.6dB, enabling us to scale up to 128 taps (up to 128 users).



Fig. 4. Output amplitude versus frequency. We set all tap weights to the same positive value, applied a maximum-amplitude impulse (a pulse whose width was the clock period, with 1/32 duty cycle), and measured the output settling time (into a 50 Ohm load) as we clocked the pulse from tap to tap.

# References

- N. Zhang, C. Teuscher, H. Lee and R. Brodersen, "Architectural implementation issues in a wideband receiver using multiuser detection," *Proc. of the Allerton Conf. On Communication Control and Computing*, September 1998.
- (2) C. Diorio, "A *p*-channel MOS synapse transistor with selfconvergent memory writes," *IEEE Trans. Electron Devices*, vol. 47, no. 2, February 2000.
- (3) J. Rabaey, Digital Integrated Circuits: A Design Approach, Prentice-Hall, 1996, pp. 359-352.
- (4) C. Diorio, S. Mahajan, P. Hasler, B. A. Minch, and C. Mead, "A high-resolution nonvolatile analog memory cell," *Proc. IEEE Intl. Symp. on Circuits and Systems*, vol. 3, pp. 2233– 2236, 1995.
- (5) E. Zuch, Ed, *Data Acquisition and Conversion Handbook*, Datel, 1979, pp. 13-15.
- (6) R. Lupas and S. Verdu, "Linear multiuser detectors for synchronous code-division multiple access channels," *IEEE Trans. Information Theory*, IT-35, January 1989.





Fig. 5 Application to DS-CDMA decoding. We applied a 100Mbps CDMA-like input to the filter, comprising two bit streams encoded using orthogonal bases. We set the tap weights to decode the shown basis. Part (A) is the input bit stream and the basis we used to encode it. Part (B) is the filter output and the strobe pulse we used to recover the data. Part (C) is the reconstructed data, for 64 (superimposed) experiments, showing the logic-level variance at the output. We used the oscilloscope as a differential 50 Ohm transimpedance amplifier, low-pass filter, and track-hold, and reconstructed the output data in software using the oscilloscope measurements.