# A 9.8 Gbps, 6.5 mW Forwarded-clock Receiver with Phase Interpolator and Equalized Current Sampler in 65 nm CMOS

Shunli Ma<sup>1,2</sup>, Sai Manoj P.D.<sup>2</sup>, Hao Yu<sup>2\*</sup>, Junyan Ren<sup>1</sup>, and Roshan Weerasekera<sup>3</sup>

<sup>1</sup>State Key Lab of ASIC and System, Fudan University, Shanghai 200433, China; <sup>2</sup>School of EEE, Nanyang Technological University, Singapore 639798; <sup>3</sup>Institue of Microelectronics, A\*STAR, Singapore

Abstract- A full-rate energy-efficient forwarded-clock (FC) receiver is demonstrated in this paper. A current sampler with continuous-time equalization is realized with 20 GHz bandwidth in sampling for data recovery. Moreover, a phase interpolator is introduced to generate sampling clock with deskew for data recovery. The testing chip was fabricated in 65 nm CMOS process in area of 0.16 mm<sup>2</sup>. Measurement shows that the FC receiver can achieve a data-rate up to 9.8 Gbps and power consumption is 6.5 mW.

Index terms- Forwarded-clock receiver, current-sampling, phase interpolator

#### I. INTRODUCTION

Source synchronous links with forwarded-clock (FC) architecture [1]-[7] is widely deployed in parallel I/O interface due to its low power consumption, inherent correction of clock and data jitter, and appropriate jitter tracking bandwidth (JTB). In the FC receiver, the static phase offset (SPO) between input data and sampling clock is corrected at start-up; while the dynamic phase error (DPO)/jitter is tracked by forwarded clock with jitter correction.

The model of the FC receiver is shown in Fig.1. The data and the clock are sent to receiver simultaneously. However, due to PCB traces mismatch and frequency dependent delay from the channels, the data and the clock have a time misalignment at receiver side especially for high data rate. As result, the SPO has to be corrected before sampling which is realized by PI in this paper. Due to the appropriate jitter-track-bandwidth (JTB) introduced by FC receiver structure, the DPO can be also well restrained. The phaseinterpolator (PI) introduced in this paper can generate the wide-range (0°-360°) of clock deskew which can cover the phase misalignment and make sure the sampling at the center of the data as shown in Fig.1. As a result, the low bit error rate can be achieved and making the FC receiver is insensitive to the jitter.

Moreover, continuous-time linear equalizer (CTLE) is widely utilized in FC receivers [7]-[9] due to its compact structure and better high frequency performances for middle-distance interconnects (such as interposer based memory-logic integration) without decision feedback equalizer (DFE) taps. The CTLE equalizer is usually followed by a sampler in traditional data recovery circuits. But the sampler always has limited bandwidth and speed due to voltage sampling structure that seriously degrades the speed even though the equalizer provides a gain-boost at high frequency to compensate channel loss [9]-[11]. In



Fig.1 FC receiver model and the proposed full-rate and energyefficient FC receiver architecture

this paper, we use a current sampling structure sampler merged with the equalized function to realize high speed sampling.

Current sampler is introduced with 20 GHz bandwidth, 10 GSps sampling rate and 18 dB gain-boost at 10 GHz.



Fig.2 Signal flow (a) and circuit diagram (b) of data recovery by equalized sampler with inductor load (1.2 nH)

Compared to the conventional voltage sampler after the equalizer, the switched-source-follower (SSF) based current sampler is merged with one active CTLE, whose equalization is realized by inductive loading.

The testing chip was fabricated in 65 nm CMOS process within area of  $0.16 \text{ mm}^2$ . The measurements show that: data-rate up to 9.8 Gbps can be achieved with BER below  $10^{-12}$  and energy efficiency of 0.67 mW/Gbps. The rest of the paper is organized as follows. Section II presents the equalized current sampler for data recovery. Section III discusses PI for clock recovery. The FC receiver prototype with measurements results is presented in Section IV and conclusions are drawn in Section V.

## II. DATA RECOVERY: CURRENT SAMPLER WITH CONTINUOUS-TIME EQUALIZATION

As shown in Fig.2 (a)-(b), the proposed current-sampler is merged with the active CTLE equalizer as follows. It consists of input buffer with inductive loading  $L_1$  for active equalization and switched source follower (SSF) which is a current sampling structure. The merging principle of the sampler is that when CLK=1,  $I_1$  will flow through path-I and the input buffer can boost the high frequency part of the data to realize equalization function as shown in Fig.2(a); when CLKB=1, the current  $I_1$  will flow through path-II and  $M_2$  will be turned off to hold the data. As such, the equalization function and the sampling function are realized by proposed circuit simultaneously. Meanwhile, the input matching of FC receiver is realized by shunt resistor  $R_{match}$ .

### A. CTLE Equalization

For middle-distance interconnects (<10 cm) such as interposers for memory-logic integration at inter-die level, a continuous-time linear equalizer (CTLE) is sufficient enough for data recovery [7]-[9] without decision feedback equalizer (DFE) taps. As shown in Fig.2 (b), when the input data with channel loss arrives at input (VIN, VIP) of the input buffer, the compensation at high frequency can be achieved by the inductive load L<sub>1</sub> with gain-boosting. The gain of the input buffer is targeted to have peak at 10 GHz for the compensation. As such, the value of its inductor load L<sub>1</sub> must be optimized. As shown in Fig.3 (a), L<sub>1</sub> is 1.2 nH obtained by sweeping from 0.3 nH to 2.7 nH, and is realized within a compact area of 50 um  $\times$  50 um. Moreover, the current source I<sub>2</sub> can be tuned from 0.6 mA



Fig.3 Simulation results of CTLE equalization of sampler: (a) gain-peaking is above18 dB at 10 GHz; (b) tunable gain-boost at high-frequency from 0.6 mA to 3 mA



Fig.4 The proposed phase interpolator by controlling strength of I/Q signals



Fig.5 Measurement setup and die photo of FC receiver in 65 nm CMOS process

to 2.4 mA for an adaptive equalization as shown in Fig.3 (b).

# B. Current Sampling

Compared to the voltage sampling, the current sampling can achieve superior sampling speed [11]-[12]. To implement the current sampling, the SSF structure is commonly utilized.

As shown in Fig. 2(a), the equalized data can be recognized as "0" or "1" at point X and will be further sampled by SSF. Note that the input buffer transfers the input data from voltage domain to current domain by the transconductance of  $M_8$ . When CLK=1, the current I<sub>1</sub> flows through  $M_2$  by path-I, and the sampler tracks the equalized data at track-mode; when CLKB=1, the current I<sub>1</sub> flows through  $M_4$  and  $R_1$  by path-II, and the sampler holds the input data due to the low voltage of the node X that turns off transistor  $M_2$  at hold mode. Moreover, the bandwidth of the SSF is also improved because the inductor L<sub>1</sub> can absorb part of parasitic capacitor C at the node X [12].

As a result, the equalized current sampler can realize both of the sampling and equalization functions at the same time with the low power and high energy efficiency.

# III. CLOCK RECOVERY: PHASE INTERPOLATOR

In the conventional clock recovery design, the clock deskew is realized by a single ILO and the deskew is highly dependent on the offset frequency between the injected frequency and the ILO's free running frequency. What is worse, it can only provide a 90° phase deskew. In order to achieve a larger phase deskew to cover the phase misalignment between data and clock, a phase interpolator (PI) is applied in this paper to generate clock deskew, instead of utilizing the single ILO.

As shown in Fig.4, the ILO-I with quadrature voltage controlled oscillator (QVCO) structure is firstly locked to



Fig.6 Measured eye diagrams:(a) recovered data at 5 Gbps; (b) recovered data at 9.8 Gbps



Fig.7 Measured ILO-I: (a) IQ signals; (b) peak-to-peak jitter measurement



Fig.8 The measured BER vs. clock deskew at 5 Gbps, 8 Gbps and 10 Gbps

the input clock signal. After locking, the ILO-I provides four phases (I+, I-, Q+, Q-) for the PI, which can realize a range of  $0-360^{\circ}$  for the clock deskew. From the phasevector diagrams as shown in Fig.4, a phase-interpolated vector with  $0-360^{\circ}$  clock deskew can be realized by combining I phase-vector and the Q phase-vector. Meanwhile, the combining strength of the four phases is controlled by off-chip DACs.

#### **IV. MEASUREMENT RESULTS**

The prototype of the proposed FC receiver was fabricated in UMC 65 nm CMOS process. The channel length is  $4\sim5$  cm on FR-4 substrate of 2-layer PCB. The test setup is shown in Fig.5. The random data is generated and transmitted by Agilent J-BERT N4903A. The chip is in area of 0.16 mm<sup>2</sup> with die photo as shown in Fig.5.

### A. Data Recovery Measurements

Firstly, the data recovery is measured. The eye diagrams of the recovered 5 Gbps and 9.8 Gbps data with  $2^{15}$ -1 random data patterns are measured by Agilent J-BERT N4903A as shown in Fig.5 (a)-(b). The eye is well open with 200 mV, and the BER is below  $10^{-12}$  at 5 Gbps and below  $10^{-10}$  at 9.8 Gbps.

## B. Clock Recovery Measurements

Secondly, the clock recovery is measured. The transient I/Q signals of ILO-I in the FC receiver are measured to

TABLE I: COMPARISON WITH STATE-OF-ART FORWARD-CLOCK I/O RECEIVERS

|              | [1]          | [2]          | [3]          | [4]          | This work    |
|--------------|--------------|--------------|--------------|--------------|--------------|
| Technology   | 40nm<br>CMOS | 65nm<br>CMOS | 65nm<br>CMOS | 65nm<br>CMOS | 65nm<br>CMOS |
| Supply (V)   | 1            | 1            | 1            | 1            | 1/0.8*       |
| Architecture | MSSC         | ILO+DJM      | ILO          | DCA+ILO      | ILO+PI       |
| Data rate    | 5.6 Gb/s     | 9.6 Gb/s     | 7.4<br>Gb/s  | 12 Gb/s      | 9.8Gb/s      |
| Clocking     | 1/2 rate     | 1/2 rate     | 1/2 rate     | 1/4 rate     | Full rate    |
| Power(mW)    | 13.5         | 11.8         | 6.8          | 11           | 5~6.5        |
| FoM(mW/Gbps) | 2.4          | 1.22         | 0.92         | 0.917        | 0.65         |

\*Supply voltages:1V for sampler and buffer, and 0.8V for ILOs

check the jitter performance by Agilent Infiniium 90008 with 40 GSps sampling rate and 13 GHz bandwidth. The measured result of the 8 GHz I/Q signals is shown in Fig.7 (a), and its peak-to-peak jitter is around 20 ps as shown in Fig.7 (b). The measured BER with phase deskew is shown in Fig.8.

Lastly, Table I shows the comparison of recently published FC receivers. The proposed FC receiver achieves the data rate of 9.8Gbps and the highest energy efficiency of 0.65mW/Gbps with the full-rate architecture.

## V. CONCLUSION

This paper presents a FC receiver by equalized current sampler for data recovery and phase-interpolation for clock recovery implemented in 65nm CMOS. The current sampler has merged CTLE function with 18dB gain at 10 GHz and 10 GSps sampling speed with 20 GHz bandwidth. Moreover, the PI can provide 0-360° clock deskew. The measurement results show that the data rate is up to 9.8 Gbps with the energy efficiency of 0.65mW/Gbps.

#### Acknowledgement

The authors acknowledge the support from MediaTek for the UMC 65nm CMOS tape-out.

#### References

- J. Zerbe, et al., "A 5.6Gb/s 2.4mW/Gb/s Bidirectional Link With8ns Power-On," VLSI-Symp Circuits, pp. 82-83, Jun.2011.
- [2] S. H. Chung, et al., "1.22 mW/Gb/s 9.6 Gb/s data jitter mixing forwarded-clock receiver robust against power noise with 1.92 ns latency mismatch between data and clock in 65nm CMOS," *VLSI-Symp Circuits*, pp. 144-145, Jun. 2011.
- [3] M. Hossain et al., "A 6.8mW 7.4Gb/s Clock-Forwarded Receiver with up to 300MHz Jitter Tracking in 65nm CMOS," *ISSCC Dig. Tech. Papers*, pp. 158-159, Feb. 2010.
- [4] Y. J.Kim, "A 12Gb/s 0.92 mW/Gb/s forwarded clock receiver based on ILO with 60MHz jitter tracking bandwidth variation using duty cycle adjuster in 65nm CMOS," *VLSI-Symp Circuits*, pp. 236-237,Jun. 2013.
- [5] K. Hu et al., "A 0.6 mW/Gb/s, 6.4–7.2 Gb/s serial link receiver using local injection-locked ring oscillators in 90 nm CMOS,"*IEEE J. Solid State Circuits*, vol. 45, no. 4, pp. 899–908, Apr. 2010.
- [6] K. Hu et al., "0.16-0.25 pJ/bit, 8 Gb/s Near-Threshold Serial Link Receiver With Super-Harmonic Injection-Locking,"*IEEE J. Solid State Circuits*, vol. 47, no. 8, pp. 619–728, Aug. 2010.

- [7] B. Casper and F. O'Mahony, "Clocking analysis, implementation and measurement techniques for high-speed data links—A tutorial," *IEEE TCAS- I*, vol. 56, no. 1, pp. 17–39, Jan.2009.
- [8] M. Meghelli, et al."A 10 Gb/s 5-Tap-DFE/4-Tap-FFE transceiver in 90 nm CMOS," ISSCC Dig. Tech. Papers, pp. 80–81, Feb. 2006.
- [9] S. Gondi, J. Lee, and B. Razavi, "A 10 Gb/s CMOS adaptive equalizer for backplane applications,"*ISSCC Dig. Tech. Papers*, pp. 328–329, Feb. 2005.
- [10] Lee J, Weiner J, Chen Y K, "A 20-GS/s 5-b SiGe ADC for 40-Gb/s coherent optical links," *IEEE Trans. Circuits and Systems I*, vol. 57, pp. 65-74, Oct. 2010.
- [11] S. Yamanaka, K. Sano, and K. Murata, "A 20-GS/s Track-and-hold Amplifier in InP HBT Technology", *IEEE Trans. Microwave Theory & Tech.*, vol. 58, pp. 2334-2339, Sept 2010
- [12] S. L. Ma, J. C. Wang, H. Yu, and J. Y. Ren., "A 32.5-GS/s twochannel time-interleaved CMOS sampler with switched-source follower based track-and-hold amplifier," in *Int. Microw. Symp. Dig.*, Jun 2014, vol. 10, pp. 1-3.