Title: Delay Model and Simulation
1Delay Model and Simulation
2Simulation with Delay
A
X
B
X
A
C
D
C
3
2
X
13
B
D
X
0 10 20 30
40 50
15
tsim
A x B x C x D x
A 1 B 0
B 1
A 0
B 0
C 1
C 0
C 0
D 0
D 1
D 1
3Delay Models
- Gate delay
- Intrinsic delay
- Layout-induced delay due to capacitive load
- Waveform slope-induced delay
- Net delay/transport delay
- Signal propagation delay along interconnect wires
- Module path delay
- Delay between input port and output port
4Inertial Delay
- Delay is caused by charging and discharging node
capacitors in circuit - Gate delay and wire delay
- Pulse rejection
- If pulse with is less than delay, the pulse is
ignored
A
C
D
B
5Gate Delay
- and (yout, x1, x2) // default, zero gate delay
- and 3 (yout, x1, x2) // 3 units delay for all
transitions - and (2,3) G1(yout, x1, x2) // rising, falling
delay - and (2,3) G1(yout, x1, x2), G2(yout2, x3, x4)
- // Multiple instances
- a_buffer (3,5,2) (yout, x) // UDP, rise, fall,
turnoff - bufif1 (345, 679, 578) (yout, xin,
enable) - // mintypmax / rise, fall, turnoff
- Simulators simulate with only one of min, typ and
max delay values - Selection is made through compiler directives or
user interfaces - Default delay is typ delay
6Gate and Wire Model
C
R
r resistance per unit length c capacitance
per unit length
L
rL
cL/2
cL/2
7Example of Model
8Delay Estimation
2
R2
R
R1
C2
0
1
C0
C1
3
R3
C3
- D0 R ( C0 C1 C2 C3 )
- D1 D0 R1 ( C1 C2 C3 )
- D2 D1 R2 C2
- D3 D1 R3 C3
9Clock Scheduling
LD logic delay
i
j
ti
tj
Clock
10Timing Constraints
hold
setup
tj
LDmin
ti
LDmax
- skewij ti tj gt holdmax LDmin
- skewij ti tj lt CP LDmax setupmax
- CP clock period
11Assignment
12Blocking and Non-blocking Assignment
- initial
- begin
- a 1
- b 0
- a b // a 0
- b a // b 0
- end
- initial
- begin
- a 1
- b 0
- a lt b // a 0
- b lt a // b 1
- end
- Blocking assignment
- Statement order matters
- A statement has to be executed before next
statement - Non-blocking assignment lt
- Concurrent assignment
- Normally the last assignment at certain
simulation time step - If it triggers other blocking assignments, it is
executed before the blocking assignment it
triggers - If there are multiple non-blocking assignments to
same variable in same behavior, latter overwrites
previous
13Procedural Continuous Assignment
- Continuous assignment establishes static binding
for net variables - Procedural continuous assignment (PCA)
establishes dynamic binding for variables - assign deassign for register variables only
- force release for both register and net
variables
14Intra-assignment Delay Blocking Assignment
- // B 0 at time 0
- // B 1 at time 4
-
- 5 A B // A 1
- C D
-
- A 5 B // A 0
- C D
-
- A _at_(enable) B
- C D
-
- A _at_(named_event) B
- C D
- If timing control operator(,_at_) on LHS
- Blocking delay
- RHS evaluated at (,_at_)
- Assignment at (,_at_)
- If timing control operator(,_at_) on RHS
- Intra-assignment delay
- RHS evaluated immediately
- Assignment at (,_at_)
15Example
initial begin a 10 1 b 2 0 c 3 1
end initial begin d lt 10 1 e lt 2 0 f lt
3 1 end
t a b c d e f 0 x x x x x x 2 x x x x 0 x
3 x x x x 0 1 10 1 x x 1 0 1 12 1 0 x 1 0 1 15
1 0 1 1 0 1
16Tell the Differences
always _at_ (a or b) y ab always _at_ (a or
b) 5 y ab always _at_ (a or b) y 5
ab always _at_ (a or b) y lt 5 ab
Which one describes or gate?
Event control is blocked
17Race Condition
- always _at_ ( posedge clk ) // c will get previous
b or new b ? - c b
- always _at_ ( posedge clk )
- b a
18Avoid Race Condition
- always _at_ ( posedge clk ) // Solution 1 merge
always - begin
- c b b a
- end
- always _at_ ( posedge clk ) // Solution 2
intra-assignment delay - c 1 b
- always _at_ ( posedge clk )
- b 1 a
- always _at_ ( posedge clk ) // Solution 3
non-blocking assignment - c lt b
- always _at_ ( posedge clk )
- b lt a
19Finite State Machine
20FSM Example Speed Machine
21Verilog Code for Speed Machine
- // Explicit FSM style
- module speed_machine ( clock, accelerator, brake,
speed ) - input clock, accelerator, brake
- output 10 speed
- reg 10 state, next_state
- parameter stopped 2b00
- parameter s_slow 2b01
- parameter s_medium 2b10
- parameter s_high 2b11
- assign speed state
- always _at_ ( posedge clock )
- state lt next_state
- always _at_ ( state or accelerator or brake )
- if ( brake 1b1 )
- case ( state )
- stopped next_state lt stopped
- s_low next_state lt stopped
- s_medium next_state lt s_low
- s_high next_state lt s_medium
- default next_state lt stopped
- endcase
- else if ( accelerator 1b1 )
- case ( state )
- stopped next_state lt s_low
- s_low next_state lt s_medium
- s_medium next_state lt s_high
- s_high next_state lt s_high
- default next_state lt stopped
- endcase
- else next_state lt state
- endmodule
22State Encoding Example
23State Encoding
- A state machine having N states will require at
least log2N bits register to store the encoded
representation of states - Binary and Gray encoding use the minimum number
of bits for state register - Gray and Johnson code
- Two adjacent codes differ by only one bit
- Reduce simultaneous switching
- Reduce crosstalk
- Reduce glitch
24One-hot Encoding
- Employ one bit register for each state
- Less combinational logic to decode
- Consume greater area, does not matter for certain
hardware such as FPGA - Easier for design, friendly to incremental change
- case and if statement may give different result
for one-hot encoding - Runs faster
- define state_0 3b001
- define state_1 3b010
- define state_2 3b100
25Transistor Level Model
26Static CMOS Circuits
- module cmos_inverter ( out, in )
- output out
- input in
- supply0 GND
- supply1 PWR
- pmos ( out, PWR, in )
- nmos ( out, GND, in )
- endmodule
Vdd
in
out
d
drain
source
gate
27Pull Gates
- module nmos_nand_2 ( Y, A, B )
- output Y
- input A, B
- supply0 GND
- tri w
- pullup ( Y )
- nmos ( Y, w, A )
- nmos ( w, GND, B )
- endmodule
Vdd
Vdd
Y
Y
A
A
B
B
28Assign Drive Strengths
- nand ( pull1, strong0 ) G1( Y, A, B )
- wire ( pull0, weak1 ) A_wire net1 net2
- assign ( pull1, weak0 ) A_net reg_b
- Drive strength is specified through an unordered
pair - one value from supply0, strong0, pull0, weak0 ,
highz0 - the other from supply1, strong1, pull1, weak1,
highz1 - Only scalar nets may receive strength assignment
- When a tri0 or tri1 net is not driven , it is
pulled to indicated logic value with strength of
pull0 or pull1 - The trireg net models capacitance holds a charge
after the drivers are removed, the net has a
charge strength of small, medium(default) or
large capacitor
29Signal Strength Levels
- Su0
- St0
- Pu0
- La0
- We0
- Me0
- Sm0
- HiZ0
Su1
St1
Pu1 La1 We1
Me1 Sm1 HiZ1
Supply Drive Strong Drive Pull Drive Large
Capacitor Weak Drive Medium Capacitor Weak
Capacitor High Impedance
- Signal strength signals ability to act as a
logic driver determining the resultant logic
value on a net - Signal contention between multiple drivers of
nets - Charge distribution between nodes in a circuit
- Default strong drive
- Capacitive strengths may be assigned only to
trireg nets
30Strength Reduction
- Dependence of output strength on input strength
- Combinational and pull gate NO, except 3-state
gates - Transistor switch and bi-directional gates YES
- In general, output strength lt input strength
31Transistor Switch and Bi-directional Gate
- Transistor switch
- nmos, pmos, cmos
- Bi-directional gate
- tran, tranif0, tranif1
- If input ( supply0 or supply1 )
- Output ( strong0, strong1 )
- Otherwise
- Output strength input strength
32Signal Contention Known Strength and Known Value
- Signal with greater strength dominates
- Same strength, different logic values
- wand -gt and, wor -gt or
- Otherwise -gt x
We0
driver1
Pu1
Pu1
driver2
33Synthesis
34Unexpected and Unwanted Latch
- Combinational logic must specify output value for
all input values - Incomplete case statements and conditionals (if)
imply - Output should retain value for unspecified input
values - Unwanted latches
35Example of Unwanted Latch
- module myMux( y, selA, selB, a, b )
- input selA, selB, a, b
- output y
- reg y
- always _at_ ( selA or selB or a or b )
- case ( selA, selB )
- 2b10 y a
- 2b01 y b
- endcase
- endmodule
b
selA
en
selB
y
selA
latch
selB
a
36Synthesis of case and if
- case and if statement imply priority
- Synthesis tool will determine if case items of a
case statement are mutually exclusive - If so, synthesis will treat them with same
priority and synthesize a mux - A synthesis tool will treat casex and casez same
as case - x and z will be treated as dont cares
- Post-synthesis simulation result may be different
from pre-synthesis simulation
37Example of if and case
-
- input 30 data
- output 10 code
- reg 10 code
- always _at_(data)
- begin // implicit priority
- if ( data3 ) code 3
- else if (data2) code 2
- else if (data1) code 1
- else if (data0) code 0
- else code 2bx
- end
-
- input 30 data
- output 10 code
- reg 10 code
- always _at_(data)
- case (data)
- 4b1000 code 3
- 4b0100 code 2
- 4b0010 code 1
- 4b0001 code 0
- default code 2bx
- endcase
38Synthesis of Register Variables
- A hardware register will be generated for a
register variable when - It is referenced before value is assigned in a
behavior - Assigned value in an edge-sensitive behavior and
is referenced by an assignment outside the
behavior - Assigned value in one clock cycle and referenced
in another clock cycle - Multi-phased latches may not be supported in
synthesis
39Synthesis of Arithmetic Operators
- If corresponding library cell exists, an operator
will be directly mapped to it - Synthesis tool may select among different options
in library cell, for example, when synthesize an
adder - Small wordlength -gt ripple-carry adder
- Long wordlength -gt carry-look-ahead adder
- Need small area -gt bit-serial adder
- Implementation of and /
- May be inefficient when both operands are
variables - If a multiplier or the divisor is a power of two,
can be implemented through shift register
40Static Loops without Internal Timing Controls gt
Combinational Logic
- module count1sA ( bit_cnt, data, clk, rst )
- parameter data_width 4 parameter cnt_width
3 - output cnt_width-10 bit_cnt
- input data_width-10 data input clk, rst
- reg cnt_width-10 cnt, bit_cnt, i reg
data_width-10 tmp - always _at_ ( posedge clk )
- if ( rst ) begin cnt 0 bit_cnt 0 end
- else begin cnt 0 tmp data
- for ( i 0 i lt data_width i i 1 )
- begin
- if ( tmp0 ) cnt cnt 1
- tmp tmp gtgt 1 end
- bit_cnt cnt
- end
- endmodule
41Static Loops with Internal Timing Controls gt
Sequential Logic
- module count1sB ( bit_cnt, data, clk, rst )
- parameter data_width 4 parameter cnt_width
3 - output cnt_width-10 bit_cnt
- input data_width-10 data input clk, rst
- reg cnt_width-10 cnt, bit_cnt, i reg
data_width-10 tmp - always _at_ ( posedge clk )
- if ( rst ) begin cnt 0 bit_cnt 0 end
- else begin
- cnt 0 tmp data
- for ( i 0 i lt data_width i i 1 )
- _at_ ( posedge clk )
- begin if ( tmp0 ) cnt cnt 1
- tmp tmp gtgt 1 end
- bit_cnt cnt
- end
- endmodule
42Non-Static Loops without Internal Timing Controls
gt Not Synthesizable
- module count1sC ( bit_cnt, data, clk, rst )
- parameter data_width 4 parameter cnt_width
3 - output cnt_width-10 bit_cnt
- input data_width-10 data input clk, rst
- reg cnt_width-10 cnt, bit_cnt, i reg
data_width-10 tmp - always _at_ ( posedge clk )
- if ( rst ) begin cnt 0 bit_cnt 0 end
- else begin
- cnt 0 tmp data
- for ( i 0 tmp i i 1 )
- begin if ( tmp0 ) cnt cnt 1
- tmp tmp gtgt 1 end
- bit_cnt cnt
- end
- endmodule
43Non-Static Loops with Internal Timing Controls gt
Sequential Logic
- module count1sD ( bit_cnt, data, clk, rst )
- parameter data_width 4 parameter cnt_width
3 - output cnt_width-10 bit_cnt
- input data_width-10 data input clk, rst
- reg cnt_width-10 cnt, bit_cnt, i reg
data_width-10 tmp - always _at_ ( posedge clk )
- if ( rst ) begin cnt 0 bit_cnt 0 end
- else begin bit_counter
- cnt 0 tmp data
- while ( tmp )
- _at_ ( posedge clk ) begin
- if ( rst ) begin cnt 0 disable
bit_counter end - else begin cnt cnt tmp0 tmp
tmp gtgt 1 end - bit_cnt cnt
- end
- end
- endmodule
44VHDL
45Example
- -- eqcomp4 is a four bit equality comparator
- -- Entity declaration
- entity eqcomp4 is
- port ( a, b in bit_vector( 3 downto 0 )
- equals out bit ) -- equal is
active high - end eqcomp4
- -- Architecture body
- architecture dataflow of eqcomp4 is
- begin
- equals lt 1 when ( a b ) else 0
- end dataflow
46Behavioral Descriptions
- library ieee
- use ieee.std_logic_1164.all
- entity eqcomp4 is port (
- a, b in std_logic_vector( 3 downto 0 )
- equals out std_logic )
- end eqcomp4
- architecture behavioral of eqcomp4 is
- begin
- comp process ( a, b ) -- sensitivity list
- begin
- if a b then equals lt 1
- else equals lt 0 -- sequential
assignment - endif
- end process comp
- end behavioral
47Dataflow Descriptions
- library ieee
- use ieee.std_logic_1164.all
- entity eqcomp4 is port (
- a, b in std_logic_vector( 3 downto 0 )
- equals out std_logic )
- end eqcomp4
- architecture dataflow of eqcomp4 is
- begin
- equals lt 1 when ( a b ) else 0
- end dataflow
- -- No process
- -- Concurrent assignment
48Structural Descriptions
- library ieee
- use ieee.std_logic_1164.all
- entity eqcomp4 is port (
- a, b in std_logic_vector( 3 downto 0 )
equals out std_logic ) - end eqcomp4
- use work.gatespkg.all
- architecture struct of eqcomp4 is
- signal x std_logic_vector( 0 to 3)
- begin
- u0 xnor2 port map ( a(0), b(0), x(0) ) --
component instantiation - u1 xnor2 port map ( a(1), b(1), x(1) )
- u2 xnor2 port map ( a(2), b(2), x(2) )
- u3 xnor2 port map ( a(3), b(3), x(3) )
- u4 and4 port map ( x(0), x(1), x(2), x(3),
equals ) - end struct
49Test and Design For Testability
50Single Stuck-at Fault
- Three properties define a single stuck-at fault
- Only one line is faulty
- The faulty line is permanently set to 0 or 1
- The fault can be at an input or output of a gate
- Example XOR circuit has 12 fault sites ( ) and
24 single stuck-at faults
Faulty circuit value
Good circuit value
c
j
0(1)
s-a-0
d
a
1(0)
g
h
1
z
i
0
1
e
b
1
k
f
Test vector for h s-a-0 fault
51Stuck-Open Example
Vector 1 test for A s-a-0 (Initialization vector)
Vector 2 (test for A s-a-1)
VDD
pMOS FETs
Two-vector s-op test can be constructed
by ordering two s-at tests
A
1 0
0 0
Stuck- open
B
C
0
1(Z)
Good circuit states
nMOS FETs
Faulty circuit states
52Stuck-Short Example
Test vector for A s-a-0
VDD
PFETs
IDDQ path in faulty circuit
A
Stuck- short
1 0
B
Good circuit state
C
0 (X)
NFETs
Faulty circuit state
53Test Pattern for Stuck-At Faults
Ygood (a?b?c)
SA1
Ya-SA1 (b?c)
No need to enumerate all input combinations to
detect a fault
Test pattern a,b,c 011
54Fault Simulation
- Fault simulation Problem Given
- A circuit
- A sequence of test vectors
- A fault model
- Determine
- Fault coverage - fraction (or percentage) of
modeled faults detected by test vectors - Set of undetected faults
- Motivation
- Determine test quality and in turn product
quality - Find undetected fault targets to improve tests
55Goal of Design for Testability (DFT)
- Improve
- Controllability
- Observability
- Predictability
56Scan Storage Cell
Q, So
D
Si
SSC
N/T
SSC
Clk
Q
D
57Integrated Serial Scan
PI
PO
SFF
SCANOUT
Combinational logic
SFF
SFF
Control
SCANIN
58Interconnect Timing Optimization
59Buffers Reduce Wire Delay
t_unbuf R( cx C ) rx( cx/2 C ) t_buf
2R( cx/2 C ) rx( cx/4 C ) tb t_buf
t_unbuf RC tb rcx2/4
60Buffers Improve Slack
RAT 300 Delay 350 Slack -50
slackmin -50
RAT 700 Delay 600 Slack 100
RAT Required Arrival Time Slack RAT - Delay
RAT 300 Delay 250 Slack 50
Decouple capacitive load from critical path
slackmin 50
RAT 700 Delay 400 Slack 300
61Slew Constraints
- When a buffer is inserted, assume ideal slew rate
at its input - Check slew rate at downstream buffers/sinks
- If slew is too large, candidate is discarded
62Cost-Slack Trade-off
63Wire Sizing Monotone Property
- Ancestor edges cannot be narrower than downstream
edges
64Area or Radius?
Radius the longest source-sink path length
- Dijkstras shortest path tree
- Short path to sinks
- Large total wire length
- Prims minimum spanning tree
- Small total wire length
- Long path to sinks
65Area Radius Trade-off
- Find a solution in middle
- Not too much area
- Not too long radius
- How to find an ideal point?
66Gate Characteristics
67I-V Characteristics
- Cutoff region
- Vgs lt Vt
- Ids 0
- Linear region
- Vgs gt Vt, 0 lt Vds lt Vgs-Vt
- Ids B(Vgs-Vt)Vds V2ds/2
- Saturation region
- Vgs gt Vt, 0 lt Vgs-Vt lt Vds
- Ids B(Vgs-Vt)2/2
- B a W/L
d
g
s
Ids
Vds
68Falling Time
- Falling time t1 t2
- t1 Vout drops from 0.9Vdd to Vdd-Vt
- t2 Vout drops from Vdd-Vt to 0.1Vdd
- Falling time rising time k
C / (B Vdd) - Delay Falling time / 2
69Gate Power Dissipation
- Leakage power
- Dynamic power
- Short circuit power
70Leakage Power
- Static
- Leakage current a ? Vdd
- Leakage current b/Vt
- Killer to CMOS technology
Vdd
Vdd
Leakage
out
out
Leakage
Linear
Saturation
71Dynamic Power
- Occurs at each switching
- Pd CL?Vdd2?fp
- fp switching frequency
Vdd
Vdd
out
out
Linear
Saturation
72Short Circuit Power
- During switching, there is a short moment when
both PMOS and CMOS are partially on - Ps Q?(Vdd-Vt)3?tr?fp
- tr rising time
Input falling
Vdd
Vdd
out
out
Input rising
73Low Power Design
74Clock Gating
- Gate off clock to idle functional units
- e.g., floating point units
- need logic to generate
disable
signal - increases complexity of control logic
- consumes power
- timing critical to avoid clock glitches
at
OR gate output - additional gate delay on clock signal
- gating OR gate can replace a buffer in the clock
distribution tree
75Active Power Reduction - Supply Voltage Reduction
Static
Dynamic
- Adjusting operation voltage and frequency to
performance requirements - High performance high Vdd frequency
- Power saving low Vdd frequency
- Pros
- Doesnt limit performance
- Cons
- Penalty of transition between different power
states can be high (in performance and power) - Additional control logic
- Pros
- Always active in saving
- Cons
- Additional power delivery network
- Needs special care of interface between power
domains - signals close to Vt excessive leakage and
reduced noise margins
76Dynamic Frequency and Voltage Scaling
- Always run at the lowest supply voltage that
meets the timing constraints - DFS (dynamic frequency scaling) saves only power
- DVS (dynamic voltage scaling) DFS saves both
energy and power - A DVSDFS system requires the following
- A programmable clock generator (PLL)
- PLL from 200MHz ? 700MHz in increments of 33MHz
- A supply regulation loop that sets the minimum
VDD necessary for operation at the desired
frequency - 32 levels of VDD from 1.1V to 1.6V
- An operating system that sets the required
frequency supply voltage to meet the task
completion deadlines - heavier load ? ramp up VDD, when stable speed up
clock - lighter load ? slow down clock, when PLL locks
onto new rate, ramp down VDD
77Design with Dual Vth
Dual Vth evaluation
- Dual Vth design
- Two flavors of transistors slow high Vth, fast
low Vth - Low Vth are faster, but have 10X leakage
78Power Gating Using Sleep Transistors
- Or can reduce leakage by gating the supply rails
when the circuit is in sleep mode - in normal mode, sleep 0 and the sleep
transistors must present as small a resistance as
possible (via sizing) - in sleep mode, sleep 1, the transistor stack
effect reduces leakage by orders of magnitude
- Or can eliminate leakage by switching off the
power supply (but lose the memory state)