Title: IBM Research GmbH Zurich Research Laboratory R
1IBM Research GmbHZurich Research
LaboratoryRüschlikon, Switzerland
Design Techniques for Ultra-Low-Powerand Compact
Transceivers in CMOS
- Thomas Toifl, Christian Menolfi, Marcel Kossel,
Matthias BrändliThomas Morf, Peter Buchmann,
Martin Schmatz - Feb 19th, 2009
2Comparison of High-speed-serial to Wired Ethernet
2
3Low-power Design Areas
Clock generation for CDR
Transmitter
Sampling latch
Input data path Buffer / Amplifier / Signal adder
DFE architecture
4Outline - Low power techniques
- Transmitter Architecture
- CML vs. SST
- Receiver data path
- Sampling Latch
- Sub-rate Processing
- Sampling in Data Path
- Receiver clock path
- RX Clock Generation with P-PLL
- DFE Architecture
- Integrating DFE
- Switched-cap DFE
- Conclusions
5Transmitter Architecture
- CML vs. Source-Series Terminated (SST)
- SST driver example
6CML vs. SST driver
- 1Vppd swing
- Load current /- 5mA
- Total current 20mA
1Vppd swing Load current /- 5mA Total
current 5mA
- Power is proportional to output swing in both
cases - SST driver allows different termination
options (differential, to GND, to VDD)
1 C. Menolfi et al., "A 16Gb/s Source-Series
Terminated Transmitter in 65nm SOI ," ISSC 2007.
7Half-rate SST Driver
- Transistors are small due to large gate
overdrive - Small input capacitance ? low power
in pre-driver
1 C. Menolfi et al., "A 16Gb/s Source-Series
Terminated Transmitter in 65nm SOI ," ISSC 2007.
8Power Consumption
- 3.6 mW/Gbps _at_ 16 Gbps and 1V swing - Power
proportional to data rate due to full-swing CMOS
clocking
1 C. Menolfi et al., "A 16Gb/s Source-Series
Terminated Transmitter in 65nm SOI ," ISSC 2007.
9Low-power Receiver Data Path
- Sampling Latch
- Sub-rate Processing
- Sampling in Data Path
10Sampling Latch
2
SPA Single-Pole Amp OMA Optimized
Multi-pole RSA Regenerative Amp
Power x Delay
Amplification
- Energy/bit Power Delay - Regenerative
amplification most power efficient
2 J. Wu and B. Wooley., "A 100-MHz Pipelined
CMOS Comparator," JSSC 1988.
11CML vs. DCVS latch
- For Vdd1V transistors are in similar operating
point (VgsVds0.5V) - CML latch achieves smallest ti when R2/gmn
- DCVS latch intrinsically faster
- P/N ratio was gt2, now approaching 1 due to strain
engineering
12Low-power Receiver Data Path
- Sampling Latch
- Sub-rate Processing
- Sampling in Data Path
13Sub-rate processing
t ? 4t Tcycle ? 4Tcycle
Energy / bit
t/ti
- ti intrinsic time constant of latch (CL0)
- t required time constant ti (1CL/Ci)
14Low-power Receiver Data Path Optimization
- Sampling Latch
- Sub-rate Processing
- Sampling in Data Path
15Sampling in Data Path
- Sampling is done by T/H at the input
- Total input capacitance is NCs/2
- Advantages
- - Signal processing now in discrete time domain
- -gt Reduced bandwidth requirements due to
- -gt Sub-rate
- -gt Can use reset to erase history
- -gt Buffers can use incomplete settling or
integration
16Incomplete Settling ? Integrating buffer
9
ts/? 0
Noise
Arbitrary Units
ts Tcycle/2
Power
RL?8, ts/t?0
tRLCL
ts/t
Normalized settling time
Time
- For given CL , settling time tsTcycle/2, and
gm/I 2/Vdsat - tRLCLis varied
- Power can be reduced significantly for ts/t ? 0
(RL?8, integrator) - But Noise rises significantly when ts/t lt 1.5
- Can drive higher load at same noise with lower
power
9 E. Iroaga and B. Murmann, "A 12-Bit 75-MS/s
Pipelined ADC Using Incomplete Settling", JSSC
2007.
17Low-power RX clock generation
- Clock generation
- Phase-programmable PLL (P-PLL)
- Design example 40 Gbit/s RX
18Quarter-rate Dual-loop architecture
- Phase rotators
- Area
- Power
- Mismatch
Previous solution
19Quarter-rate Dual-loop architecture
? Phase-Programmable PLL (P-PLL)
20CDR Architecture with Phase-Programmable PLL
P-PLL
288 steps / 4 UI
19 T. Toifl et al., "A 72 mW 0.03 mm2
inductorless 40 Gb/s CDR in 65 nm SOI CMOS,"
ISSCC 2007.
21P-PLL Key points
- Advantages
- Clocks from VCO go directly into the latches
- No need for phase rotators
- Clock path is very short
- Phase-rotation with XOR phase detectors is
inherently linear - High-frequency noise on input clock signal is
filtered out - Disadvantages
- Phase noise is accumulated due to PLL operation
- But PLL bandwidth is extremely high (gt1 GHz
for 10GHz clock) - gt Noise is attenuated to large extent
- Phase-rotation now in feedback-path
- But No influence on CDR due to high PLL
bandwidth
2240 Gb/s Inductorless CDR circuit in 65nm CMOS
data
data
data
data
data
CDR Logic PRBS15 checker (digital)
Demux
Samplers
VCO
Phase Detect.
150 um
Loop
Loop
Loop
Loop
Loop
Loop
Filter
Filter
Filter
Filter
Filter
Filter
Filter
48
48
48
48
?ref
- BER lt 10-12 _ Area 0.03 mm2 - Vamin 350mV -
1.8mW/Gbps _at_ 40 Gbps
190 um
23P-PLL based Receivers/Transceivers
20 E. Prete et al., "A 100mW 9.6Gb/s
Transceiver in 90nm CMOS for next-generation
memory interfaces ," ISSCC 2006. 21 R. Palmer
et al., "A 14mW 6.25Gb/s Transceiver in 90nm CMOS
for Serial Chip-to-Chip Communication," ISSCC
2007. 19 T. Toifl et al., "A 72 mW 0.03 mm2
inductorless 40 Gb/s CDR in 65 nm SOI CMOS,"
ISSCC 2007.
24DFE Architecture
- Low-power Options
- Integrating DFE
- Switched-cap DFE
- Design Example
- 8 Channel bus with X-Cancellation
25Current-Integrating DFE
Resistors replaced with resettable capacitors
- ? Integrating buffer see slide 16
- Power advantage due to integrating buffer/adder
- 1.4 mW/Gbps for 2 taps, 90nm CMOS
15 M. Park, J. Bulzacchelli, M. Beakes, D.
Friedman, "A 7 Gb/s 9.3 mW 2-Tap Current
Integrating DFE Receiver", ISSCC 2007.
26Switched-cap DFE
Switched-cap DFE operation
- Integrating buffer (as previous slide)
- DFE feedback signal added by charge injection
27Switched-cap approach for filter implementation
One filter tap
binary weighted linear caps
N-tap filter (e.g. N64)
Cap-DAC
select transistors (constant on or off, to reduce
cap load of unused taps)
- DFE/X-DFE implemented by digitally weighted
metal capacitances ? Highly linear - Transistor
are used as switches (no worry about gm, gds ) ?
easily portable
28DFE Architecture
- Low-power Options
- Integrating DFE
- Switched-cap approach
- Design Example
- 8 Channel bus with X-Talk Cancellation
29Motivation Double data rate/pin in single-ended
link
- Graph shows signal and noise with and without
FEXT X-talk cancellation - ? Present single-ended links have data rate
limitation caused by X-talk - ? X-talk cancellation can double the data rate
per pin - But needs to be low power lt10mW/Gbps for TXRX
pair
30Elimination of Pre- and Post-cursor X-Talk
Components
x-2 x-1 x0 x1 x2 x3 x4
- Pre-cursor and cursor X-talk components can
only be eliminated byX-FFE (since at time t0 we
do not yet know the symbol decisions of the
aggressors at RX)
30
31With 5x3-tap X-FFE filter at each TX
x-2 x-1 x0 x1 x2 x3 x4
- Pre-cursor and cursor noise terms are minimized
(See Ref 1)- Post-cursor X-talk noise can be
eliminated with cross-DFE
31 V. Stojanovic et al. "Transmit Pre-emphasis
for High-Speed Time-Division-Multiplexed
Serial-Link Transceiver", IEEE Trans. Comm. 2001
31
32FEXT cancellation with X-FFE and X-DFE
FFE
DFE
channel
X-DFE
X-FFE
TX
RX
- 4 tap Feed-Forward Equalizer (FFE)
- 7x1 tap X-FFE for X-talk canc.
- FFE implemented with digital adder
- lt1 mW/Gbps for 12 taps
- 8 tap Decision-feedback Eq. (DFE)
- 7x8 tap X-DFE for X-talk canc.
- DFE using switched-cap concept
- 3 mW/Gbps for 64 taps
338 channel Receiver with DFE and X-talk
cancellation
1 equalizer tap (out of 8x8 per RX)
data bus
400?m
- Each RX slice implements 8 DFE taps and 7x8
X-DFE taps for X-talk cancellation - Total 64
taps/RX, each tap has local memory, taps
connected by data bus - Metal wire serves as
integration/summation node - Power consumption
(simulated) 3mW/Gbps for 64 taps
50uW/(Gbps.tap)
34Conclusions
- Low power/small area serial links achieved with
- - SST driver
- - DCVS latch, full swing CMOS clocking
- - Sub-rate receivers
- - Sampled data path with buffer reset
- - Integrating/Incomplete settling buffer
- - P-PLL multi-phase clock generation
- - Integrating and switched-cap DFE
- ? Complex equalization (gt50 taps) at low power
possible
35Acknowledgements
- Christian Menolfi
- Peter Buchmann
- Christoph Hagleitner
- Marcel Kossel
- Thomas Morf
- Jonas Weiss
- John Bulzacchelli
- Mounir Meghelli
- Matthias Brändli
- Martin Schmatz
36Thank you!
37Appendix
38Sampling latch
Additional slides
39Sampling Latch
- Latch defines requirements of whole data path -
input cap, clocking, offset, noise - Two functions to be separated
- - Sampling
- - Defined by shape of sampling window
- ? Sensitivity function hs(t)
- - Regenerative amplification
- - Defined by time constant treg - Gain ?
exp (teval/?reg)
40Latch Model
- Latch time variant regenerative amplifier
- - Linear front-end defines sampling window
- - Ideal sampler
- Regenerative amplification
- Threshold comparison to indicate valid 1 or -1
- Feedback tap captures history
41Sensitivity Function hs(t) Evaluation Procedure
teval tC?Q
For every time delay ?t - Determine minimum
height of narrow (e.g. 2ps) pulse required to
flip the latch within required evaluation time
teval ? Can define sensitivity function
hs(?t)1/A(?t)
3 T. Toifl et al., "A 22-Gb/s PAM-4 receiver in
90-nm CMOS SOI technology ," JSSC 2006.
42Latch Sensitivity Function
teval 120ps
log (Sensitivity)
tw
Sensitivity 1/ps V
hs(?t)
f-3db40GHz
teval 120ps
5ps
teval 60ps
Time offset from clock edge ps
Time offset from clock edge ps
- Sensitivity function hs(?t) 1/A(?t) -
Displayed for different teval tC?Q (from 60 to
120 ps) - Defines aperture window of sampler
relative to clock edge - Can calculate -3dB
sampling bandwidth from F(jw)FFT hs(t) - More
accurate than "setup and hold time" description
43Energy Consumption Limit in Pre-charged Latch
- Nodes vo, vob precharged to Vdd - One node (vo
or vob) discharged to ground -gt Minimum
achievable energy consumption Vdd2 CL
44Energy consumption in DCVS (Sense Amplifier)
Latch
vo, vob V
tsc
short circuit current
Current A
- Capacitive Load (intrinsic plus external load)
- Ecap 1.5 Vdd2 (CiCL)Vdd2Ca
- Short-circuit current Isc
- Duration depends on the differential input signal
vx tsc t ln ( voff / vx) - Average over uniform distribution tsct
- Esc Vdd Isc t
45CML vs. DCVS Latch Energy Consumption
Energy / Vdd2 CL
CMLspeed
CMLoptim
DCVS
?reg ps
- CML latch optimized for speed or power
- DCVS latch 2x as power efficient at Tcycle100ps
46CML vs. DCVS Latch Including Clock Power
Energy / Vdd2 CL
CML
CMLoptim
DCVS
?reg ps
- DCVS about 1.8x as power efficient when
regeneration time considered - CML latch can use inductive peaking
- Speedup 1.7
- But Inductors need large area
47CML vs. DCVS Latch with DCVS initial delay
Energy / Vdd2 CL
CML
CMLoptim
DCVS
Tcycle ps
- Comparison for given TregenerationTcycle/2 and
amplification A exp(5) - DCVS latch needs initial time to start latching
- Optimized CML latch consumes 50 more power at
f5GHz - CML requires additional circuitry (e.g. DCVS
latch to convert to CMOS levels)
48CML vs. DCVS latch Power Consumption
- CML latch energy/bit proportional to Tcycle/t.
- DCVS latch energy/bit independent of Tcycle/t.
- Latch gain Alog(Tcycle/2t) for regeneration time
Tcycle/2. - Have to specify required gain A for comparison.
49Sub-rate Processing
f/4
f
N1
N4
- Frequency is lowered by factor N
- - Lowers speed requirements in data path and
latches - - Allows trade-off between area and power
50Sub-rate processing
f/4
f/4
f
- Inherent de-multiplexing to lower speed
- -gt No additional circuitry needed
- Disadvantage
- - Higher input load
- - More area for latches/buffers
51Clocking Style CML vs. full-swing CMOS
Additional slides
52Clocking style CML vs. Full-swing CMOS ?
CML
Full-swing CMOS
- Buffer driving identical buffer and load
capacitance CL. - Regulator needed for full-swing
CMOS clocking style. - Which options results in
smaller power consumption ?
53Clocking style CML vs. Full-swing CMOS?
- Specification
- - trf-20-80 16 Tcycle over process
- CML
- - optimize W, ?VIR
- - DC Gain1.4 in wc corner
- - ?V 250 mV
- gt 400mV se swing in wc corner
- FS-CMOS
- - optimize W, Vreg
- - 0.3V regulator overhead
54 Delay, Rise Time and Energy Consumption
CML
Full-swing CMOS
55Clocking style CML vs. Full-swing CMOS?
CML - optimize W, ?VIR - ?Vmin250mV - Gain
A1.4 in wc corner - Tcycle/t 10 - trf-20-80
16 tcycle
FS-CMOS - optimize W, Vreg such that -
trf-20-8016 tcycle in wc corner - 0.3V
regulator overhead - Vreg,max1V, Vdd,max1.3V
56Supply Voltages for CML vs. FS-CMOS
Vdd,fs
0.3V
Vdd,cml
Voltage V
Vreg,fs
Energy / C (1V)2
CML
FS-CMOS
?Vcml
Cycle time ps
Cycle time ps
Vreg,fs (dotted red line) required regulated
supply voltage for full-swing CMOS at given cycle
time in worst case corner . Vdd,fsVreg,fs0.3V.
?Vcml (dotted blue line) IR. Voltage swing in
CML buffer 2?Vcml . Higher values of ?Vcml
require higher values of Vdd,cml to avoid that
the current source goes into saturation. Graph
reading example For a given cycle time of 100ps,
the optimized full-swing CMOS buffer requires
Vreg,fs of 0.97V and Vdd,fs of 1.27V in the wc
corner. In typical corner Vreg is derived from
the same 1.27V supply.
57P-PLL Implementation
Additional slides
58Phase-Programmable PLL (P-PLL)
e.g. a00.3, a10.7
18 T. Toifl et al., "0.94 ps-rms-jitter 0.016
mm2 2.5 GHz multi-phase generator PLL with 360º
digitally programmable phase shift for 10 Gb/s
serial links," ISSCC 2005.
59XOR Phase Detector Implementation I
From 18, alternative implementation see 22.
60XOR Phase Detector Implementation II
61XOR Phase Detector Implementation III
62P-PLL based RX architecture Clock generation for
DFE
Data output
Data sampler (incl spec DFE)
Early/Late logic (incl pattern filter)
Multi-phase ring oscillator
VCO control
Edge sampler
Phase shifter
Phase shifter to adjust for non-180º D-E timing
difference in DFE operation
CDR loop filter
PD Array
PD Interpol control
PD interpolator
CP
Loop filter
FWD CLK input
Data input
63Solution with 2 Phase Rotators (D and E) Suffers
from Phase Rotator Integral Non-Linearity (INL)
- The phase difference ??(?d - ?e) depends on
the absolute phase position. - This is not the
case for receivers based on P-PLL. (Since the
introduced offset of the phase shifter is static
and does not depend on the absolute phase
position.)
64Low-power RX clock generation
- Clock generation for quarter-rate system
- Phase-programmable PLL (P-PLL)
- Design example 40 Gbit/s RX
65Quarter-rate Clock and Data Recovery
66Low-power RX design example
- 40 Gbit/s CDR circuit
- Full-swing CMOS clocking
- Low-power DCVS latches
- Quarter rate
- NFET-switch based fast T/H
67High-Speed Sampling Stage
Sampling clock ?V 0.9 V tf lt 8ps
- Sampling done with NFET T/H - Sampling clock is
tightly coupled to VCO - Achieves large swing,
fast fall time - FSC Full-Swing Converter
Converts clock from regulated supply to Vdd
domain.
68Data Path Sensitivity Function
f-3db40GHz
5ps
FFT
Attenuation dB
Sensitivity 1/ps V
hs(?t)
Time offset from clock edge ps
Frequency GHz
- Evaluated with procedure described on slides
11-12 - Includes T/H and latch
- Integration window 5ps wide gt Corresponds to
40 GHz bandwidth
69VCO Layout Optimization for Max Speed
- Delay stage n feeds stages n1 and n2
- How to achieve shortest wires ?
- Shown arrangement (0,1,2,7,6,3,4,5) has wire
length of 4 units -
7040 Gbit/s Input Eye
70mV/div
5ps/div
- - Eye opening 19ps horizontal, 350mVdiff-pp
vertical - - PRBS 15
- ?f 400 ppm
- BER10-12
- - Measured eye width inside receiver 11ps
(error free range with phase fixed)
71Implementation Summary
72Comparison with Prior Art
This work
19
CMOS 44 Gb/s
CMOS 40 Gb/s
Tbps/mm2
CMOS 25 Gb/s
SiGe 45 Gb/s
24
CMOS 40 Gb/s
25
23
26
Tbps/W
73CML Latch Speed-optimization
74CML Latch Time Constants tlin and ti
tlin time constant during linear amplification.
Defines sampling window hs(t) treg time
constant during regenerative amplification.
where ? 1-gdsn/gmn, gm?Vsat , I?/2 Vsat2,
?VI0 R, Cn,lin total effective cap from M1-M4
in linear amplification mode. Cn,reg total
effective cap from M1-M4 in regenerative
amplification mode.
75CML Latch Speed-optimization
(1)
(2)
where ? 1-gdsn/gmn, gm?Vsat , I0?/2 Vsat2,
?VI0 R, Cn total effective cap from
M1-M4.Width of M1-M4 assumed equal.
From (2) For speed optimization Min ti(Vsat)
requires Vsat ??V.
- , where ?0.9 was assumed.
- Note (1) It follows that for this case tlin
treg. - (2) Since it follows that a small
treg,min requires large ?V.
76CML Latch Power-optimization procedure
77CML Latch Power-optimization procedure
Optimize Vsat and transistor W to achieve minimum
energy/bit for required - load CL - Total
amplification time Ta (Tcycle/2) -
Amplification -gt Need expressions for - Energy
(W,Vsat) - Ta(W,Vsat) Can optimize
(numerically) Energy (W,Vsat) under constraint
Ta(W,Vsat)Tcycle/2
78Energy (W, Vsat)
Power Energy (W,Vsat) Power Tcycle
79Amplification time Ta (W, Vsat)
Two phases during regenerative amplification ?
two time constants (1) VoutltVsat Small signal
range Idiff gmVg,diff ? treg1 (2) VoutgtVsat
Slewing range Idiff 2I0 ? treg2
Ta treg1 log(Vsat/(AlinVmin) treg2
log(VL/Vsat) , where Alin amplification
during linear phase (see next slide), VL
required latch output voltage, A required latch
gain after Ta (e.g. exp(5)), Vmin VL/A minimum
required input voltage to achieve VL, C' total
cap during regeneration per unit width, ?'
Transconductance parameter per unit width.
(influence of gds here neglected for simplicity )
80Equal time constants in Linear and Regenerative
Phase
- For fair comparison want same time constant in
linear phase as in regeneration phase tlin
treg1 - Otherwise input sampling bandwidth would
suffer. - Can be implemented with resistors
switched on only during linear phase. - Also
lowers the gain during the linear phase. - Can
show that for tlin treg1
81Switched-cap DFE Implementation
Additional slides
82X-talk cancellation receiver basic architecture
summation node distributed cap load on wire pair
(is actually differential)
victim
aggressor 1
aggressor 2
aggressor n
83References
Transmitter, Latch, Receiver data path
1 C. Menolfi, T. Toifl, P. Buchmann, M. Kossel,
T. Morf, J. Weiss, M. Schmatz, "A 16Gb/s
Source-Series Terminated Transmitter in 65nm CMOS
SOI," ISSCC Dig. Tech Papers, pp. 446-447, Feb.
2007. 2 J. Wu and B. Wooley., "A 1OO-MHz
Pipelined CMOS Comparator," IEEE Journal of
Solid-State Circuits, Vol. 23, pp. 1379 - 1385,
December 1988. 3 T. Toifl, C. Menolfi, M.
Ruegg, R. Reutemann, P. Buchmann, M. Kossel, T.
Morf, J. Weiss, M. Schmatz, "A 22-Gb/s PAM-4
receiver in 90-nm CMOS SOI technology," IEEE
Journal of Solid-State Circuits, Vol. 41, pp. 954
- 965, April 2006. 4 B. Wicht, T. Nirschl, D.
Schmitt-Landsiedel, "Yield and Speed Optimization
of a Latch-Type Voltage Sense Amplifier", IEEE
Journal of Solid-State Circuits, vol. 39, pp.
1148 - 1158, July 2004. 5 P. Haydari, R.
Mohanavelu, "Design of Ultrahigh-Speed
Low-Voltage CMOS CML Buffers and Latches", IEEE
Transactions on Very Large Scale Integration
(VLSI) Systems, Vol.12, No. 10, October 2004 6
T. Chalvatzis, K.H.K. Yau, P. Schvan, M.T. Yang,
S.P. Voinigescu, "A 40-Gb/s Decision Circuit in
90-nm CMOS," Proceedings of the 32nd European
Solid-State Circuits Conference, Vol. 32, pp. 512
- 515, September 2006. 7 Y. Okaniwa, H.
Tamura, M. Kibune, D. Yamazaki, T. Cheung, J.
Ogawa, N. Tzartzanis, W. W. Walker, and T.
Kuroda, "A 40-Gb/s CMOS clocked comparator with
bandwidth modulation technique," IEEE Journal of
Solid-State Circuits, vol. 40, pp. 1680 - 1687,
August 2005. 8 M. Choi, A. Abidi, "A 6-b
1.3-Gsample/s A/D converter in 0.35-um CMOS,"
IEEE Journal of Solid-State Circuits, Vol. 36,
pp. 1847 - 1858, Dec 2001. 9 E. Iroaga and B.
Murmann, "A 12-Bit 75-MS/s Pipelined ADC Using
Incomplete Settling", IEEE Journal of Solid-State
Circuits, Vol. 42, pp. 784 - 756, April 2007.
84References
DFE Implementations
10 R. Payne, B. Bhakta, S. Ramaswamy, S. Wu, J.
Powers, P. Landman, U. Erdogan, A. Yee, R. Gu, L.
Wu, B. Parthasarathy, K. Brouse, W. Mohammed, K.
Heragu, V. Gupta, L. Dyson, W. Lee "A 6.25Gb/s
Binary Adaptive DFE with First Post-Cursor Tap
Cancellation for Serial Backplane
Communications", ISSCC Dig. Tech Papers, pp.
68-69, Feb. 2005. 11 B. Leibowitz, J. Kizer, H.
Lee, F. Chen, A. Ho, M. Jeeradit, A. Bansal, T.
Greer, S. Li, R. Farjad-Rad, W. Stonecypher, Y.
Frans, B. Daly, F. Heaton, B. W. Garlepp, C. W.
Werner, N. Nguyen, V. Stojanovic, J L. Zerbe, "A
7.5 Gb/s 10-Tap DFE Receiver with First Tap
Partial Response, Spectrally Gated Adaptation,
and 2nd-Order Data Filtered CDR," ISSCC Dig. Tech
Papers, pp. 228-229, Feb. 2007. 12 T. Beukema,
M. Sorna, K. Selander, S. Zier, B. L. Ji, P.
Murfet, J. Mason, W. Rhee, H. Ainspan, B. Parker,
and M. Beakes, "A 6.4-Gb/s CMOS SerDes core with
feed-forward and decision-feedback equalization,"
IEEE Journal of Solid-State Circuits, vol. 40,
pp. 2633 - 2645, December 2005. 13 J. F.
Bulzacchelli, M. Meghelli, S. V. Rylov, W. Rhee,
A. V. Rylyakov, H. A. Ainspan, B. D. Parker, M.
P. Beakes, A. Chung, T. J. Beukema, P. K.
Pepeljugoski, L. Shan, Y. H. Kwark, S. Gowda, and
D. J. Friedman, "A 10-Gb/s 5-tap DFE/4-tap FFE
transceiver in 90-nm CMOS technology," IEEE
Journal of Solid-State Circuits, vol. 41, pp.
2885 - 2900, December 2006. 14 A.
Emami-Neyestanak, A. Varzaghani, J. Bulzacchelli,
A., C. K. Ken Yang and D. Friedman, "A Low-Power
Receiver with Switched-Capacitor Summation DFE",
Symp.VLSI Circuits Dig. Tech. Papers, June
2006. 15 M. Park, J, Bulzachelli, M. Beakes, D.
Friedman, "A 7Gb/s 9.3mW 2-Tap Current-Integrating
DFE Receiver", ISSCC Dig. Tech Papers, pp.
230-231, Feb. 2007. 16 K. J. Wong, A. Rylyakov,
and C. K. Ken Yang, "A 5-mW 6-Gb/s quarter-rate
sampling receiver with a 2-tap DFE using soft
decisions," Symp. VLSI Circuits Dig. , pp. 190 -
191, June 2006. 17 A. Garg, A. C. Carusone, and
S. P. Voinigescu, "A 1-tap 40-Gb/s look-ahead
decision feedback equalizer in 0.18-µm SiGe
BiCMOS technology," IEEE Journal of Solid-State
Circuits, vol. 41, pp. 2224 - 2232, October 2006.
85References
P-PLL implementations
18 T. Toifl, C. Menolfi, M. Ruegg, R.
Reutemann, P. Buchmann, M. Kossel, T. Morf, J.
Weiss, M. Schmatz, "A 0.94-ps-RMS-jitter
0.016-mm2 2.5-GHz multiphase generator PLL with
360º digitally programmable phase shift for
10-Gb/s serial links", IEEE Journal of
Solid-State Circuits, Vol. 40, pp. 2700 - 2712,
Dec 2005. 19 T. Toifl, C. Menolfi, P.
Buchmann, C. Hagleiter, M. Kossel, T. Morf, J.
Weiss, M. Schmatz, A 72mW 0.03mm2 Inductorless
40Gb/s CDR in 65nm SOI CMOS, ISSCC Dig. Tech
Papers, pp. 410-411, Feb. 2007. 20 E. Prete, D.
Scheideler, A. Sanders, A 100mW 9.6Gb/s
Transceiver in 90nm CMOS for Next- Generation
Memory Interfaces, ISSCC Dig. Tech. Papers, vol.
49, pp. 88-89, Feb. 2006. 21 R. Palmer, J.
Poulton, W. J. Dally, J. Eyles, A. M. Fuller, T.
Greer, M. Horowitz, M. Kellam, F. Quan, F.
Zarkeshvari, A 14mW 6.25Gb/s Transceiver in 90nm
CMOS for Serial Chip-to-Chip Communications,
ISSCC Dig. Tech Papers, pp. 440-441, Feb.
2007 22 J. Poulton, R. Palmer, A.M. Fuller, T.
Greer, J. Eyles, W.J. Dally, M. Horowitz, "A
14-mW 6.25-Gb/s Transceiver in 90-nm CMOS", IEEE
Journal of Solid-State Circuits, vol. 42, pp.
2745 - 2755, December 2007.
86References
25 Gbps CDR Circuits
23 J. Lee, B. Razavi, A 40-Gb/s clock and
data recovery circuit in 0.18-µm CMOS
technology, IEEE Journal of Solid-State
Circuits, vol. 38, pp. 2181 - 2190, Dec.
2003. 24 C. Kromer, G. Sialm, C. Menolfi, M.
Schmatz, F. Ellinger, H. Jackel A 25Gb/s CDR in
90nm CMOS for High-Density Interconnects, ISSCC
Dig. Tech Papers, pp. 326-327, Feb. 2006. 25 D.
Kucharski, K. Kornegay, 2.5 V 43-45 Gb/s CDR
Circuit and 55 Gb/s PRBS Generator in SiGe Using
a Low-Voltage Logic Family, IEEE Journal of
Solid-State Circuits, vol. 41, pp. 2154 - 2165,
Sept. 2006. 26 N. Nedovic, N. Tzartzanis, H.
Tamura, F. Rotella, M. Wiklund, J. Ogawa, W.
Walker, A 40-44Gb/s 3x Oversampling CMOS
CDR/116 DEMUX, ISSCC Dig. Tech Papers, pp.
224-225, Feb. 2007.
87References
Additional/General Papers
27 M. Lee, W. Dally, P. Chiang, "Low-power
area-efficient high-speed I/O circuit
techniques," IEEE Journal of Solid-State
Circuits, vol. 35, pp. 1591 - 1599, November
2000. 28 J. Kim, M. Horowitz," Adaptive supply
serial links with sub-1-V operation and per-pin
clock recovery," IEEE Journal of Solid-State
Circuits, vol. 37, pp. 1403 - 1413, November
2002. 29 K. J. Wong, H. Hatamkhani, M.
Mansuri, and C. K. Ken Yang, "A 27-mW 3.6-Gb/s
I/O transceiver," IEEE Journal of Solid-State
Circuits, vol. 39, pp. 602 - 612, April 2004.
30 B. Casper, J. Jaussi, F. O'Mahony, M.
Mansuri, K. Canagasaby, J. Kennedy, E. Yeung, and
R. Mooney, "A 20Gb/s forwarded clock transceiver
in 90nm CMOS," IEEE International Solid-State
Circuits Conference, vol. XLIX, pp. 90 - 91,
February 2006. 31 R. Gonzalez, B. Gordon, and
M. A. Horowitz, "Supply and threshold voltage
scaling for low power CMOS," IEEE Journal of
Solid-State Circuits, vol. 32, pp. 1210 - 1216,
August 1997. 31 V. Stojanovic et al. "Transmit
Pre-emphasis for High-Speed Time-Division-Multiple
xed Serial-Link Transceiver", IEEE Trans. Comm.
2001
88Acronyms
BER Bit Error RateCDR Clock and Data
recoveryCML Current Mode LogicDCVS
Differential Cascode Voltage Switch
(circuit)DFE Decision Feedback
EqualizerDLL Delay Locked LoopFFT Fast
Fourier TransformFS-CMOS Full-swing CMOS
(inverter based clocking)FSC Full swing
converterINL Integral Non-LinearityISI Intersy
mbol InterferencePLL Phase-Locked
LoopP-PLL Phase Programmable PLLRX
ReceiverS.E. Single-EndedSOI Silicon On
InsulatorSST Source-Series TerminatedVCO Volta
ge Controlled OscillatorT/H Track and HoldTX
TransmitterWC Worst Case