Digital and Other ICs - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

Digital and Other ICs

Description:

Space Telemetry. Parallel conc. of 16-state conv. codes. 384kbps (rate ... Dead-zone estimation circuit with two voltage controlled delay lines. ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 36

Provided by: bori99

Category:

more less

Transcript and Presenter's Notes

Title: Digital and Other ICs

1
Digital (and Other) ICs

Borivoje Nikolic
bora_at_eecs.berkeley.edu

2
Research Areas

Low-Density Parity-Check Decoders
PLLs
Low-Power Digital ICs
Power-performance optimization
Compensating the impact of variations
Working with advanced devices
Analog-to-Digital Converters

3
Outline

Low-Density Parity-Check Decoders
PLLs
Low-Power Digital ICs
Power-performance optimization
Compensating the impact of variations
Working with advanced devices
Analog-to-Digital Converters

4
Iterative Coding

Iterative decoders are a part of many new
standards

Also 10G Ethernet, magnetic disk drive and tape
storage,

5
Low-Density Parity-Check Codes

Low density parity check codes Gallager63
Sparse binary parity check matrix, H
Nullspace of H forms set of codewords
Decoded using message-passing algorithms
Message-passing decoders
Low-density-parity check (LDPC) codes
Turbo-product codes are decoded similarly
interleaver

6
Parallel LDPC Decoder Architecture
A. Blanksby and C. J. Howland, JSSC 2002
PEv4
PEv3
PEv1
PEv2
PEvN

1
2
Interconnect Fabric
. . .
PEc1
PEc2
PEcM

7
Staggered Serial LDPC Decoder
E. Yeo, et. al. Globecom2001
8
LDPC codes based on Galois Fields

Codes based on GF projections are low rate.
No cycles of length 4 (short loop)
Cyclic rows
e.g. (1023 x 1023) code has rate of 0.68
Column splitting
Each column in original matrix is split into four
Non-zero entries in original column are cycled
through the 4 new columns
eg. (1023 x 4092) code has rate of 0.75
Partial loss of regularity (cyclic structure)
Complex O(N2) encoding
Puncturing
Truncate height of PC matrix
Columns in the maximum zero runlength region
correspond to parity bit locations
Cyclic encoding using direct application of PC
matrix now possible

Y. Kou, et. al. ISIT 2000
9
Shift register-based implementation

Staggered decoding.
Regularity of codes based on Finite Field
geometries.

E. Yeo, et. al. Globecom2001
10
4092-bit LDPC Decoder
1.8 million transistors 2.7mm x 3.1mm (10x
smaller than a 1024-bit LDPC decoder) 1GHz Chi
p back in December E.Yeo
11
Structured LDPCs
Bit node groups
Check node groups
Ed Liao

Construction based on Ramanujan graphs allows for
hierarchical decomposition and good performance

12
LDPC Codes - Status

Two students graduated
Engling Yeo (Ph.D). ST Microelectronics,
Berkeley Lab
Ed Liao (M.S.) Qualcomm RD, San Diego
Continuing investigation of variable rate,
variable block size LDPC codes, based on
structured constructions
BEE and ASIC implementations

13
Outline

Low-Density Parity-Check Decoders
PLLs
Low-Power Digital ICs
Power-performance optimization
Compensating the impact of variations
Working with advanced devices
Analog-to-Digital Converters

14
PLL Jitter Analysis
15
PLL Jitter Analysis
Adjusting the loop characteristics (wN, z)
modulates the output jitter. There exists a
minimum that depends on the noise source
characteristics.
Problem Noise characteristics are NOT known a
priori!
Therefore, adaptive jitter optimization is
desirable!
16
PLL Circuit
17
Jitter Estimation
Signals track jitter boundaries
18
Implementation
Jitter Estimation

Circuit designed in 0.13 mm CMOS and taped out
Chip back in December, in the evaluation
Socrates Vamvakos

PLL
DL
Driver
19
Outline

Low-Density Parity-Check Decoders
PLLs
Low-Power Digital ICs
Power-performance optimization
Compensating the impact of variations
Working with advanced devices
Analog-to-Digital Converters

20
Power is a Problem

If we continue doing business as usual, both
dynamic and leakage power will be a problem

chips are getting hot
and phones leaky!

Need to delivermaximum performance under power
constraints

From S. Borkar, Intel
21
Optimizing Combinational Circuits
Initial W, Vdd,Vth
netlist
Static timer (C)

OPTIMIZER (Matlab)
Minimize DELAY subject to
Maximum ENERGY

Delay, Energy
W, Vdd,Vth
Output

Generate Energy Delay (E-D) tradeoffs for
combinational blocks
Investigate the optimality of any given design

Optimize critical single-cycle blocks
Use inside microarchitecture optimizer

22
Example 64-bit CLA Adders

Wide adders are common in the critical paths of
high performance microprocessors
Static adders are low power but slow
Domino logic is the choice for short cycle times
Setup
0.13?m, 6M, 1.2V
Cout 450fF
Cin ? 150fF

Zlatanovici et al., ESSCIRC03
23
Adder Architecture in 90nm CMOS
psel
pc2
pc3
pc4
pc1
carry-in
c1
c2
c3
c4
sum select
H4, I4
H16, I16
H64
t, g gen
a630
t630
group16 gen
group64 gen
group4 gen
b630
g630
group64 gen
group16 prop
group4 prop
XOR (transmission gate)
sum gen
sum0630
j, k (static)
sum1630
Critical path of five gate delays 6.3 FO4 _at_
8.5 pJ/cycle
S. Kao, R. Zlatanovici
24
90nm Design

Finalize test strategy
Implement clock generator and assemble all
circuits
Extract layout for timing and power verification
Tapeout Feb03

25
Optimizing Pipelined Circuits

Cycle boundaries transparent latches

COMBINATIONAL LOGIC

Grand goal find the configuration (transistor
sizes, cutset) achieving shortest cycle time for
given power budget and pipeline depth

26
Using Posynomial Models Results

Widening profile of NANDs and NORs, 3 cycles
Latches migrate towards the output as the power
constraint tightens start with the input
Reverse direction of migration for narrowing
profile
No migration for flat profile
Demonstrate on a floating-point unit

R. Zlatanovici
27
Micro-Architecture Optimization
A, B adders Input data rate f
Optimal ELk/ESw about 0.5 (All designs operate
at the throughput of the nominal design sized
for minimum delay under Vddmax and Vthref)
D. Markovic
28
Time-Mux SVD Example
s1,w1
s2,w2
s3,w3
s4,w4
PE U?
PE U?
PE U?
PE U?
rk4
rk3
rk2
rk1
y

PE too Fast
Large Area

PE-U?1
PE-U?2
wasted time
wasted time
PE-U?3
0
Tsymbol
PE-U?4
PE-U?1
PE-U?2
PE-U?3
PE-U?4
s1,w1
s2,w2
Time-Mux Architecutre
PE U?
s3,w3
y

Some Mux overhead
Large Area reduction

s4,w4
29
Energy-Area Tradeoff

Top1 can be achieved with M5 (E lt Eop1) or M3
(E lt Eop2 )
Area (M5) 3/5 Area (M3)

Energy-Area is a measure of the overall chip cost
30
Working With Advanced Devices
Gate
Gate
Gate
Source
Drain
Source
Drain
Source
Drain
Buried Oxide
Gate
Tbody
Substrate
Bulk MOSFET
Double-Gate (DG)
Ultra-Thin Body (UTB)
12
12
7.2
10.2
9
8.7
7.9
Match Delay
Energy fJ
4.4
FO4 ps
FO4 ps
Match Leakage
Match Power
3.3
Bulk
Bulk
Bulk
DG
UTB
DG
UTB
DG
UTB
by changing VDD
L. Chang, T.-J. King
31
FinFET SRAM Array

FinFet devices, Ldrawn 50nm Leff 20nm
1 metal layer, 0.35µm technology
SRAM Cell size 5.75x4 µm
WL poly
BL M1 fin

15x15 SRAM array
Static NAND decoder
Cross-coupled latch-based sense amp
Array size approx.140µm x 70µm
Sematech run Jan04

R. Zlatanovici, S. Balasubramanian, with Prof.
T.-J. King
32
Advanced Devices
HP FinFET
LP FinFET

Back-Gated MOSFET

Enhancement mode
Accumulation mode
DSP
embedded
100.0
uP
BG-ENH
20
10.0
HP Fin
BG-ACC
HP Fin
BG ACC
10
1.0
BG ENH
LP Fin
LP Fin
0.1
0
0
2
4
1.E02
1.E03
1.E04
J. GarrettS. Balasubramanianwith T.J. King
delay (ps)
Frequency (GHz)
Logic depth
Adaptive VDD, Vth
33
Outline

Low-Density Parity-Check Decoders
PLLs
Low-Power Digital ICs
Power-performance optimization
Compensating the impact of variations
Working with advanced devices
Analog-to-Digital Converters

34
ADCs

Measured 1.8-V, 14-b, 12-MS/s pipelined ADC in
0.18-mm CMOS with 102-dB SFDR (Yun Chiu)
In design 500MS/s, 12-b, 1.2V digitally
background calibrated ADC in 0.13mm CMOS
After the lunch

35
Summary