Title: ELEN 468 Advanced Logic Design
1ELEN 468Advanced Logic Design
- Lecture 31
- Midterm-2 Review
2Design Cycles
System/Architectural Design
HDL
Logic Design
Verification/Simulation
Physical Design/Layout
Parasitic Extraction
Fabrication
Testing
3Design and Technology Styles
- Custom design
- Mostly manual design, long design cycle
- High performance, high volume
- Microprocessors, analog, leaf cells, IP
- Standard cell
- Pre-designed cells, CAD, short design cycle
- Medium performance, ASIC
- FPGA/PLD
- Pre-fabricated, fast automated design, low cost
- Prototyping, reconfigurable computing
4Primitives
5Primitives
- Pre-defined primitives
- Total 26 pre-defined primitives
- All combinational
- Tri-state primitives have multiple output, others
have single output - User-Defined Primitives (UDP)
- Combinational or sequential
- Single output
- UDP vs. modules
- Used to model cell library
- Require less memory
- Simulate faster
6Edge-sensitive Behavior
- primitive d_flop( q, clock, d )
- output q
- input clock, d
- reg q
- table
- // clock d state q/next_state
- (01) 0 ? 0 // Parentheses indicate
signal transition - (01) 1 ? 1 // Rising clock edge
- (0?) 1 1 1
- (0?) 0 0 0
- (?0) ? ? - // Falling clock edge
- ? (??) ? - // Steady clock
- endtable
- endprimitive
clock
d
q
d_flop
7Delay Model and Simulation
8Asymmetric Delay Assignment
- module nand1 ( O, A, B )
- input A, B
- output O
- nand ( O, A, B )
- specify
- specparam
- T01 1.133.097.75
- T10 0.932.507.34
- ( AgtO ) ( T01, T10 )
- ( BgtO ) ( T01, T10 )
- endspecify
- endmodule
Min delay Typical delay Max delay
Falling time
Rising time
9Simulation with Delay
A
X
B
X
A
C
D
C
3
2
X
13
B
D
X
0 10 20 30
40 50
15
tsim
A x B x C x D x
A 1 B 0
B 1
A 0
B 0
C 1
C 0
C 0
D 0
D 1
D 1
10Delay Models
- Gate delay
- Intrinsic delay
- Layout-induced delay due to capacitive load
- Waveform slope-induced delay
- Net delay/transport delay
- Signal propagation delay along interconnect wires
- Module path delay
- Delay between input port and output port
11Inertial Delay
- Delay is caused by charging and discharging node
capacitors in circuit - Gate delay and wire delay
- Pulse rejection
- If pulse with is less than delay, the pulse is
ignored
A
C
D
B
12Gate Delay
- and (yout, x1, x2) // default, zero gate delay
- and 3 (yout, x1, x2) // 3 units delay for all
transitions - and (2,3) G1(yout, x1, x2) // rising, falling
delay - and (2,3) G1(yout, x1, x2), G2(yout2, x3, x4)
- // Multiple instances
- a_buffer (3,5,2) (yout, x) // UDP, rise, fall,
turnoff - bufif1 (345, 679, 578) (yout, xin,
enable) - // mintypmax / rise, fall, turnoff
- Simulators simulate with only one of min, typ and
max delay values - Selection is made through compiler directives or
user interfaces - Default delay is typ delay
13Gate and Wire Model
C
R
r resistance per unit length c capacitance
per unit length
L
rL
cL/2
cL/2
14Example of Model
15Delay Estimation
2
R2
R
R1
C2
0
1
C0
C1
3
R3
C3
- D0 R ( C0 C1 C2 C3 )
- D1 D0 R1 ( C1 C2 C3 )
- D2 D1 R2 C2
- D3 D1 R3 C3
16Net Delay
-
- wire 2 y_tran
- and 3 (y_tran, x1, x2)
- buf 1 (buf_out, y_tran)
- and 3 (y_inertial, x1, x2)
-
x1
x2
y_inertial
x1
y_tran
buf_out
y_tran
x2
y_inertial
buf_out
17Clock Scheduling
LD logic delay
i
j
ti
tj
Clock
18Timing Constraints
hold
setup
tj
LDmin
ti
LDmax
- skewij ti tj gt holdmax LDmin
- skewij ti tj lt CP LDmax setupmax
- CP clock period
19Time Scales
- Time scale directive timescale
lttime_unitgt/lttime_precisiongt - time_unit -gt physical unit of measure, time
scale of delay - time_precision -gt time resolution/minimum step
size during simulation - time_unit ? time_precision
20Example of Time Scale
- timescale 1 ns / 10 ps
- module modA( y, x1, x2 )
-
- nand (3.225, 4.237) ( y, x1, x2 )
- endmodule
- timescale 10 ns / 10 ns
- module modB()
-
- modA M1(y, x1, x2)
- initial begin
- monitor ( time,
- f x1 b x2 b y b,
- realtime, x1, x2, y )
- end
- initial begin
- 5 x1 0 x2 0
- 5 x2 1
- 5 x1 1
- 5 x2 0
- t real_t x1 x2 y
- -------------------------------------------------
- 0 0.000000 x1x x2x yx
- 5 5.000000 x10 x20 yx
- 5 5.323000 x10 x20 y1
- 10 10.000000 x10 x21 y1
- 15 15.000000 x11 x21 y1
- 15 15.424000 x11 x21 y0
- 20 20.000000 x11 x20 y0
- 20 20.323000 x11 x20 y1
21Assignment
22Blocking and Non-blocking Assignment
- initial
- begin
- a 1
- b 0
- a b // a 0
- b a // b 0
- end
- initial
- begin
- a 1
- b 0
- a lt b // a 0
- b lt a // b 1
- end
- Blocking assignment
- Statement order matters
- A statement has to be executed before next
statement - Non-blocking assignment lt
- Concurrent assignment
- Normally the last assignment at certain
simulation time step - If it triggers other blocking assignments, it is
executed before the blocking assignment it
triggers - If there are multiple non-blocking assignments to
same variable in same behavior, latter overwrites
previous
23Procedural Continuous Assignment
- Continuous assignment establishes static binding
for net variables - Procedural continuous assignment (PCA)
establishes dynamic binding for variables - assign deassign for register variables only
- force release for both register and net
variables
24Intra-assignment Delay Blocking Assignment
- // B 0 at time 0
- // B 1 at time 4
-
- 5 A B // A 1
- C D
-
- A 5 B // A 0
- C D
-
- A _at_(enable) B
- C D
-
- A _at_(named_event) B
- C D
- If timing control operator(,_at_) on LHS
- Blocking delay
- RHS evaluated at (,_at_)
- Assignment at (,_at_)
- If timing control operator(,_at_) on RHS
- Intra-assignment delay
- RHS evaluated immediately
- Assignment at (,_at_)
25Example
initial begin a 10 1 b 2 0 c 3 1
end initial begin d lt 10 1 e lt 2 0 f lt
3 1 end
t a b c d e f 0 x x x x x x 2 x x x x 0 x
3 x x x x 0 1 10 1 x x 1 0 1 12 1 0 x 1 0 1 15
1 0 1 1 0 1
26Tell the Differences
always _at_ (a or b) y ab always _at_ (a or
b) 5 y ab always _at_ (a or b) y 5
ab always _at_ (a or b) y lt 5 ab
Which one describes or gate?
Event control is blocked
27Race Condition
- always _at_ ( posedge clk ) // c will get previous
b or new b ? - c b
- always _at_ ( posedge clk )
- b a
28Avoid Race Condition
- always _at_ ( posedge clk ) // Solution 1 merge
always - begin
- c b b a
- end
- always _at_ ( posedge clk ) // Solution 2
intra-assignment delay - c 1 b
- always _at_ ( posedge clk )
- b 1 a
- always _at_ ( posedge clk ) // Solution 3
non-blocking assignment - c lt b
- always _at_ ( posedge clk )
- b lt a
29Finite State Machine
30FSM Example Speed Machine
31Verilog Code for Speed Machine
- // Explicit FSM style
- module speed_machine ( clock, accelerator, brake,
speed ) - input clock, accelerator, brake
- output 10 speed
- reg 10 state, next_state
- parameter stopped 2b00
- parameter s_slow 2b01
- parameter s_medium 2b10
- parameter s_high 2b11
- assign speed state
- always _at_ ( posedge clock )
- state lt next_state
- always _at_ ( state or accelerator or brake )
- if ( brake 1b1 )
- case ( state )
- stopped next_state lt stopped
- s_low next_state lt stopped
- s_medium next_state lt s_low
- s_high next_state lt s_medium
- default next_state lt stopped
- endcase
- else if ( accelerator 1b1 )
- case ( state )
- stopped next_state lt s_low
- s_low next_state lt s_medium
- s_medium next_state lt s_high
- s_high next_state lt s_high
- default next_state lt stopped
- endcase
- else next_state lt state
- endmodule
32State Encoding Example
33State Encoding
- A state machine having N states will require at
least log2N bits register to store the encoded
representation of states - Binary and Gray encoding use the minimum number
of bits for state register - Gray and Johnson code
- Two adjacent codes differ by only one bit
- Reduce simultaneous switching
- Reduce crosstalk
- Reduce glitch
34One-hot Encoding
- Employ one bit register for each state
- Less combinational logic to decode
- Consume greater area, does not matter for certain
hardware such as FPGA - Easier for design, friendly to incremental change
- case and if statement may give different result
for one-hot encoding - Runs faster
- define state_0 3b001
- define state_1 3b010
- define state_2 3b100
35Synthesis
36Unexpected and Unwanted Latch
- Combinational logic must specify output value for
all input values - Incomplete case statements and conditionals (if)
imply - Output should retain value for unspecified input
values - Unwanted latches
37Example of Unwanted Latch
- module myMux( y, selA, selB, a, b )
- input selA, selB, a, b
- output y
- reg y
- always _at_ ( selA or selB or a or b )
- case ( selA, selB )
- 2b10 y a
- 2b01 y b
- endcase
- endmodule
b
selA
en
selB
y
selA
latch
selB
a
38Synthesis of Register Variables
- A hardware register will be generated for a
register variable when - It is referenced before value is assigned in a
behavior - Assigned value in an edge-sensitive behavior and
is referenced by an assignment outside the
behavior - Assigned value in one clock cycle and referenced
in another clock cycle - Multi-phased latches may not be supported in
synthesis
39Synthesis of Arithmetic Operators
- If corresponding library cell exists, an operator
will be directly mapped to it - Synthesis tool may select among different options
in library cell, for example, when synthesize an
adder - Small wordlength -gt ripple-carry adder
- Long wordlength -gt carry-look-ahead adder
- Need small area -gt bit-serial adder
- Implementation of and /
- May be inefficient when both operands are
variables - If a multiplier or the divisor is a power of two,
can be implemented through shift register
40Synthesis of fork join Blocks
- Synthesis tools may
- Either fail
- Or require that it does not contain event and
delay controls that are equal to or longer than a
clock cycle equivalent to a set of non-blocking
assignments
41Static Loops without Internal Timing Controls gt
Combinational Logic
- module count1sA ( bit_cnt, data, clk, rst )
- parameter data_width 4 parameter cnt_width
3 - output cnt_width-10 bit_cnt
- input data_width-10 data input clk, rst
- reg cnt_width-10 cnt, bit_cnt, i reg
data_width-10 tmp - always _at_ ( posedge clk )
- if ( rst ) begin cnt 0 bit_cnt 0 end
- else begin cnt 0 tmp data
- for ( i 0 i lt data_width i i 1 )
- begin
- if ( tmp0 ) cnt cnt 1
- tmp tmp gtgt 1 end
- bit_cnt cnt
- end
- endmodule
42Static Loops with Internal Timing Controls gt
Sequential Logic
- module count1sB ( bit_cnt, data, clk, rst )
- parameter data_width 4 parameter cnt_width
3 - output cnt_width-10 bit_cnt
- input data_width-10 data input clk, rst
- reg cnt_width-10 cnt, bit_cnt, i reg
data_width-10 tmp - always _at_ ( posedge clk )
- if ( rst ) begin cnt 0 bit_cnt 0 end
- else begin
- cnt 0 tmp data
- for ( i 0 i lt data_width i i 1 )
- _at_ ( posedge clk )
- begin if ( tmp0 ) cnt cnt 1
- tmp tmp gtgt 1 end
- bit_cnt cnt
- end
- endmodule
43Non-Static Loops without Internal Timing Controls
gt Not Synthesizable
- module count1sC ( bit_cnt, data, clk, rst )
- parameter data_width 4 parameter cnt_width
3 - output cnt_width-10 bit_cnt
- input data_width-10 data input clk, rst
- reg cnt_width-10 cnt, bit_cnt, i reg
data_width-10 tmp - always _at_ ( posedge clk )
- if ( rst ) begin cnt 0 bit_cnt 0 end
- else begin
- cnt 0 tmp data
- for ( i 0 tmp i i 1 )
- begin if ( tmp0 ) cnt cnt 1
- tmp tmp gtgt 1 end
- bit_cnt cnt
- end
- endmodule
44Non-Static Loops with Internal Timing Controls gt
Sequential Logic
- module count1sD ( bit_cnt, data, clk, rst )
- parameter data_width 4 parameter cnt_width
3 - output cnt_width-10 bit_cnt
- input data_width-10 data input clk, rst
- reg cnt_width-10 cnt, bit_cnt, i reg
data_width-10 tmp - always _at_ ( posedge clk )
- if ( rst ) begin cnt 0 bit_cnt 0 end
- else begin bit_counter
- cnt 0 tmp data
- while ( tmp )
- _at_ ( posedge clk ) begin
- if ( rst ) begin cnt 0 disable
bit_counter end - else begin cnt cnt tmp0 tmp
tmp gtgt 1 end - bit_cnt cnt
- end
- end
- endmodule
45VHDL
46Example
- -- eqcomp4 is a four bit equality comparator
- -- Entity declaration
- entity eqcomp4 is
- port ( a, b in bit_vector( 3 downto 0 )
- equals out bit ) -- equal is
active high - end eqcomp4
- -- Architecture body
- architecture dataflow of eqcomp4 is
- begin
- equals lt 1 when ( a b ) else 0
- end dataflow
47Behavioral Descriptions
- library ieee
- use ieee.std_logic_1164.all
- entity eqcomp4 is port (
- a, b in std_logic_vector( 3 downto 0 )
- equals out std_logic )
- end eqcomp4
- architecture behavioral of eqcomp4 is
- begin
- comp process ( a, b ) -- sensitivity list
- begin
- if a b then equals lt 1
- else equals lt 0 -- sequential
assignment - endif
- end process comp
- end behavioral
48Dataflow Descriptions
- library ieee
- use ieee.std_logic_1164.all
- entity eqcomp4 is port (
- a, b in std_logic_vector( 3 downto 0 )
- equals out std_logic )
- end eqcomp4
- architecture dataflow of eqcomp4 is
- begin
- equals lt 1 when ( a b ) else 0
- end dataflow
- -- No process
- -- Concurrent assignment
49Structural Descriptions
- library ieee
- use ieee.std_logic_1164.all
- entity eqcomp4 is port (
- a, b in std_logic_vector( 3 downto 0 )
equals out std_logic ) - end eqcomp4
- use work.gatespkg.all
- architecture struct of eqcomp4 is
- signal x std_logic_vector( 0 to 3)
- begin
- u0 xnor2 port map ( a(0), b(0), x(0) ) --
component instantiation - u1 xnor2 port map ( a(1), b(1), x(1) )
- u2 xnor2 port map ( a(2), b(2), x(2) )
- u3 xnor2 port map ( a(3), b(3), x(3) )
- u4 and4 port map ( x(0), x(1), x(2), x(3),
equals ) - end struct
50Test and Design For Testability
51Single Stuck-at Fault
- Three properties define a single stuck-at fault
- Only one line is faulty
- The faulty line is permanently set to 0 or 1
- The fault can be at an input or output of a gate
- Example XOR circuit has 12 fault sites ( ) and
24 single stuck-at faults
Faulty circuit value
Good circuit value
c
j
0(1)
s-a-0
d
a
1(0)
g
h
1
z
i
0
1
e
b
1
k
f
Test vector for h s-a-0 fault
52Stuck-Open Example
Vector 1 test for A s-a-0 (Initialization vector)
Vector 2 (test for A s-a-1)
VDD
pMOS FETs
Two-vector s-op test can be constructed
by ordering two s-at tests
A
1 0
0 0
Stuck- open
B
C
0
1(Z)
Good circuit states
nMOS FETs
Faulty circuit states
53Stuck-Short Example
Test vector for A s-a-0
VDD
PFETs
IDDQ path in faulty circuit
A
Stuck- short
1 0
B
Good circuit state
C
0 (X)
NFETs
Faulty circuit state
54Test Pattern for Stuck-At Faults
Ygood (a?b?c)
SA1
Ya-SA1 (b?c)
No need to enumerate all input combinations to
detect a fault
Test pattern a,b,c 011
55Fault Simulation
- Fault simulation Problem Given
- A circuit
- A sequence of test vectors
- A fault model
- Determine
- Fault coverage - fraction (or percentage) of
modeled faults detected by test vectors - Set of undetected faults
- Motivation
- Determine test quality and in turn product
quality - Find undetected fault targets to improve tests
56Goal of Design for Testability (DFT)
- Improve
- Controllability
- Observability
- Predictability
57Scan Storage Cell
Q, So
D
Si
SSC
N/T
SSC
Clk
Q
D
58Integrated Serial Scan
PI
PO
SFF
SCANOUT
Combinational logic
SFF
SFF
Control
SCANIN
59Interconnect Timing Optimization
60Buffers Reduce Wire Delay
t_unbuf R( cx C ) rx( cx/2 C ) t_buf
2R( cx/2 C ) rx( cx/4 C ) tb t_buf
t_unbuf RC tb rcx2/4
61Buffers Improve Slack
RAT 300 Delay 350 Slack -50
slackmin -50
RAT 700 Delay 600 Slack 100
RAT Required Arrival Time Slack RAT - Delay
RAT 300 Delay 250 Slack 50
Decouple capacitive load from critical path
slackmin 50
RAT 700 Delay 400 Slack 300
62Candidate Solution Characteristics
- Each candidate solution is associated with
- vi a node
- ci downstream capacitance
- qi RAT
63Van Ginnekens Algorithm
- Start from sinks
- Candidate solutions are generated
64Solution Pruning
- Two candidate solutions
- (v, c1, q1)
- (v, c2, q2)
- Solution 1 is inferior if
- c1 gt c2 larger load
- and q1 lt q2 tighter timing
65Slew Constraints
- When a buffer is inserted, assume ideal slew rate
at its input - Check slew rate at downstream buffers/sinks
- If slew is too large, candidate is discarded
66Cost-Slack Trade-off
67Continuous Wire Sizing
x
Min delay wire shape w(x) a(e-bx)
68Wire Sizing Monotone Property
- Ancestor edges cannot be narrower than downstream
edges
69Simultaneous Buffer Insertion and Wire Sizing
70Area or Radius?
Radius the longest source-sink path length
- Dijkstras shortest path tree
- Short path to sinks
- Large total wire length
- Prims minimum spanning tree
- Small total wire length
- Long path to sinks
71Area Radius Trade-off
- Find a solution in middle
- Not too much area
- Not too long radius
- How to find an ideal point?
72Prims and Dijkstras Algorithms
- d(i,j) length of edge (i, j)
- p(i) length of path from source to i
- Prim min d(i,j) Dijkstra min d(i,j) p(i)
p(i)
i
j
73The Prim-Dijkstra Trade-off
- Prim add edge minimizing d(i,j)
- Dijkstra add edge minimizing p(i) d(i,j)
- Trade-off c?p(i) d(i,j) for 0 c 1
- When c0, trade-off Prim
- When c1, trade-off Dijkstra
74Spanning Tree ? Steiner Tree
75P-Tree Abstract Tree
g
d
c
f
a
e
g
b
f
76P-Tree Embedding
Hanan grid
j
i
d
c
a
h
b
77Gate Characteristics
78I-V Characteristics
- Cutoff region
- Vgs lt Vt
- Ids 0
- Linear region
- Vgs gt Vt, 0 lt Vds lt Vgs-Vt
- Ids B(Vgs-Vt)Vds V2ds/2
- Saturation region
- Vgs gt Vt, 0 lt Vgs-Vt lt Vds
- Ids B(Vgs-Vt)2/2
- B a W/L
d
g
s
Ids
Vds
79Switching Characteristics
Vin
Vdd
in
out
d
t
Vout
Ids
t
Vds
tfall
tdelay
80Falling and Rising Procedure
Input rising
Input falling
Vdd
Vdd
Vdd
Vdd
out
out
out
out
Saturation
Linear
Saturation
Linear
81Falling Time
- Falling time t1 t2
- t1 Vout drops from 0.9Vdd to Vdd-Vt
- t2 Vout drops from Vdd-Vt to 0.1Vdd
- Falling time rising time k
C / (B Vdd) - Delay Falling time / 2
82Gate Power Dissipation
- Leakage power
- Dynamic power
- Short circuit power
83Leakage Power
- Static
- Leakage current a ? Vdd
- Leakage current b/Vt
- Killer to CMOS technology
Vdd
Vdd
Leakage
out
out
Leakage
Linear
Saturation
84Dynamic Power
- Occurs at each switching
- Pd CL?Vdd2?fp
- fp switching frequency
Vdd
Vdd
out
out
Linear
Saturation
85Short Circuit Power
- During switching, there is a short moment when
both PMOS and CMOS are partially on - Ps Q?(Vdd-Vt)3?tr?fp
- tr rising time
Input falling
Vdd
Vdd
out
out
Input rising