Delay Model and Simulation - PowerPoint PPT Presentation

1 / 78
About This Presentation
Title:

Delay Model and Simulation

Description:

Module path delay. Delay between input port and output port. ELEN 468 Lecture 30 ... at least log2N bits register to store the encoded representation of states ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 79
Provided by: Jian93
Category:

less

Transcript and Presenter's Notes

Title: Delay Model and Simulation


1
Delay Model and Simulation
2
Simulation with Delay
A
X
B
X
A
C
D
C
3
2
X
13
B
D
X
0 10 20 30
40 50
15
tsim
A x B x C x D x
A 1 B 0
B 1
A 0
B 0
C 1
C 0
C 0
D 0
D 1
D 1
3
Delay Models
  • Gate delay
  • Intrinsic delay
  • Layout-induced delay due to capacitive load
  • Waveform slope-induced delay
  • Net delay/transport delay
  • Signal propagation delay along interconnect wires
  • Module path delay
  • Delay between input port and output port

4
Inertial Delay
  • Delay is caused by charging and discharging node
    capacitors in circuit
  • Gate delay and wire delay
  • Pulse rejection
  • If pulse with is less than delay, the pulse is
    ignored

A
C
D
B
5
Gate Delay
  • and (yout, x1, x2) // default, zero gate delay
  • and 3 (yout, x1, x2) // 3 units delay for all
    transitions
  • and (2,3) G1(yout, x1, x2) // rising, falling
    delay
  • and (2,3) G1(yout, x1, x2), G2(yout2, x3, x4)
  • // Multiple instances
  • a_buffer (3,5,2) (yout, x) // UDP, rise, fall,
    turnoff
  • bufif1 (345, 679, 578) (yout, xin,
    enable)
  • // mintypmax / rise, fall, turnoff
  • Simulators simulate with only one of min, typ and
    max delay values
  • Selection is made through compiler directives or
    user interfaces
  • Default delay is typ delay

6
Gate and Wire Model
C
R
r resistance per unit length c capacitance
per unit length
L
rL
cL/2
cL/2
7
Example of Model
8
Delay Estimation
2
R2
R
R1
C2
0
1
C0
C1
3
R3
C3
  • D0 R ( C0 C1 C2 C3 )
  • D1 D0 R1 ( C1 C2 C3 )
  • D2 D1 R2 C2
  • D3 D1 R3 C3

9
Clock Scheduling
LD logic delay
i
j
ti
tj
Clock
10
Timing Constraints
hold
setup
tj
LDmin
ti
LDmax
  • skewij ti tj gt holdmax LDmin
  • skewij ti tj lt CP LDmax setupmax
  • CP clock period

11
Assignment
12
Blocking and Non-blocking Assignment
  • initial
  • begin
  • a 1
  • b 0
  • a b // a 0
  • b a // b 0
  • end
  • initial
  • begin
  • a 1
  • b 0
  • a lt b // a 0
  • b lt a // b 1
  • end
  • Blocking assignment
  • Statement order matters
  • A statement has to be executed before next
    statement
  • Non-blocking assignment lt
  • Concurrent assignment
  • Normally the last assignment at certain
    simulation time step
  • If it triggers other blocking assignments, it is
    executed before the blocking assignment it
    triggers
  • If there are multiple non-blocking assignments to
    same variable in same behavior, latter overwrites
    previous

13
Procedural Continuous Assignment
  • Continuous assignment establishes static binding
    for net variables
  • Procedural continuous assignment (PCA)
    establishes dynamic binding for variables
  • assign deassign for register variables only
  • force release for both register and net
    variables

14
Intra-assignment Delay Blocking Assignment
  • // B 0 at time 0
  • // B 1 at time 4
  • 5 A B // A 1
  • C D
  • A 5 B // A 0
  • C D
  • A _at_(enable) B
  • C D
  • A _at_(named_event) B
  • C D
  • If timing control operator(,_at_) on LHS
  • Blocking delay
  • RHS evaluated at (,_at_)
  • Assignment at (,_at_)
  • If timing control operator(,_at_) on RHS
  • Intra-assignment delay
  • RHS evaluated immediately
  • Assignment at (,_at_)

15
Example
initial begin a 10 1 b 2 0 c 3 1
end initial begin d lt 10 1 e lt 2 0 f lt
3 1 end
t a b c d e f 0 x x x x x x 2 x x x x 0 x
3 x x x x 0 1 10 1 x x 1 0 1 12 1 0 x 1 0 1 15
1 0 1 1 0 1
16
Tell the Differences
always _at_ (a or b) y ab always _at_ (a or
b) 5 y ab always _at_ (a or b) y 5
ab always _at_ (a or b) y lt 5 ab
Which one describes or gate?
Event control is blocked
17
Race Condition
  • always _at_ ( posedge clk ) // c will get previous
    b or new b ?
  • c b
  • always _at_ ( posedge clk )
  • b a

18
Avoid Race Condition
  • always _at_ ( posedge clk ) // Solution 1 merge
    always
  • begin
  • c b b a
  • end
  • always _at_ ( posedge clk ) // Solution 2
    intra-assignment delay
  • c 1 b
  • always _at_ ( posedge clk )
  • b 1 a
  • always _at_ ( posedge clk ) // Solution 3
    non-blocking assignment
  • c lt b
  • always _at_ ( posedge clk )
  • b lt a

19
Finite State Machine
20
FSM Example Speed Machine
21
Verilog Code for Speed Machine
  • // Explicit FSM style
  • module speed_machine ( clock, accelerator, brake,
    speed )
  • input clock, accelerator, brake
  • output 10 speed
  • reg 10 state, next_state
  • parameter stopped 2b00
  • parameter s_slow 2b01
  • parameter s_medium 2b10
  • parameter s_high 2b11
  • assign speed state
  • always _at_ ( posedge clock )
  • state lt next_state
  • always _at_ ( state or accelerator or brake )
  • if ( brake 1b1 )
  • case ( state )
  • stopped next_state lt stopped
  • s_low next_state lt stopped
  • s_medium next_state lt s_low
  • s_high next_state lt s_medium
  • default next_state lt stopped
  • endcase
  • else if ( accelerator 1b1 )
  • case ( state )
  • stopped next_state lt s_low
  • s_low next_state lt s_medium
  • s_medium next_state lt s_high
  • s_high next_state lt s_high
  • default next_state lt stopped
  • endcase
  • else next_state lt state
  • endmodule

22
State Encoding Example
23
State Encoding
  • A state machine having N states will require at
    least log2N bits register to store the encoded
    representation of states
  • Binary and Gray encoding use the minimum number
    of bits for state register
  • Gray and Johnson code
  • Two adjacent codes differ by only one bit
  • Reduce simultaneous switching
  • Reduce crosstalk
  • Reduce glitch

24
One-hot Encoding
  • Employ one bit register for each state
  • Less combinational logic to decode
  • Consume greater area, does not matter for certain
    hardware such as FPGA
  • Easier for design, friendly to incremental change
  • case and if statement may give different result
    for one-hot encoding
  • Runs faster
  • define state_0 3b001
  • define state_1 3b010
  • define state_2 3b100

25
Transistor Level Model
26
Static CMOS Circuits
  • module cmos_inverter ( out, in )
  • output out
  • input in
  • supply0 GND
  • supply1 PWR
  • pmos ( out, PWR, in )
  • nmos ( out, GND, in )
  • endmodule

Vdd
in
out
d
drain
source
gate
27
Pull Gates
  • module nmos_nand_2 ( Y, A, B )
  • output Y
  • input A, B
  • supply0 GND
  • tri w
  • pullup ( Y )
  • nmos ( Y, w, A )
  • nmos ( w, GND, B )
  • endmodule

Vdd
Vdd
Y
Y
A
A
B
B
28
Assign Drive Strengths
  • nand ( pull1, strong0 ) G1( Y, A, B )
  • wire ( pull0, weak1 ) A_wire net1 net2
  • assign ( pull1, weak0 ) A_net reg_b
  • Drive strength is specified through an unordered
    pair
  • one value from supply0, strong0, pull0, weak0 ,
    highz0
  • the other from supply1, strong1, pull1, weak1,
    highz1
  • Only scalar nets may receive strength assignment
  • When a tri0 or tri1 net is not driven , it is
    pulled to indicated logic value with strength of
    pull0 or pull1
  • The trireg net models capacitance holds a charge
    after the drivers are removed, the net has a
    charge strength of small, medium(default) or
    large capacitor

29
Signal Strength Levels
  • Su0
  • St0
  • Pu0
  • La0
  • We0
  • Me0
  • Sm0
  • HiZ0

Su1
St1
Pu1 La1 We1
Me1 Sm1 HiZ1
Supply Drive Strong Drive Pull Drive Large
Capacitor Weak Drive Medium Capacitor Weak
Capacitor High Impedance
  • Signal strength signals ability to act as a
    logic driver determining the resultant logic
    value on a net
  • Signal contention between multiple drivers of
    nets
  • Charge distribution between nodes in a circuit
  • Default strong drive
  • Capacitive strengths may be assigned only to
    trireg nets

30
Strength Reduction
  • Dependence of output strength on input strength
  • Combinational and pull gate NO, except 3-state
    gates
  • Transistor switch and bi-directional gates YES
  • In general, output strength lt input strength

31
Transistor Switch and Bi-directional Gate
  • Transistor switch
  • nmos, pmos, cmos
  • Bi-directional gate
  • tran, tranif0, tranif1
  • If input ( supply0 or supply1 )
  • Output ( strong0, strong1 )
  • Otherwise
  • Output strength input strength

32
Signal Contention Known Strength and Known Value
  • Signal with greater strength dominates
  • Same strength, different logic values
  • wand -gt and, wor -gt or
  • Otherwise -gt x

We0
driver1
Pu1
Pu1
driver2
33
Synthesis
34
Unexpected and Unwanted Latch
  • Combinational logic must specify output value for
    all input values
  • Incomplete case statements and conditionals (if)
    imply
  • Output should retain value for unspecified input
    values
  • Unwanted latches

35
Example of Unwanted Latch
  • module myMux( y, selA, selB, a, b )
  • input selA, selB, a, b
  • output y
  • reg y
  • always _at_ ( selA or selB or a or b )
  • case ( selA, selB )
  • 2b10 y a
  • 2b01 y b
  • endcase
  • endmodule

b
selA
en
selB
y
selA
latch
selB
a
36
Synthesis of case and if
  • case and if statement imply priority
  • Synthesis tool will determine if case items of a
    case statement are mutually exclusive
  • If so, synthesis will treat them with same
    priority and synthesize a mux
  • A synthesis tool will treat casex and casez same
    as case
  • x and z will be treated as dont cares
  • Post-synthesis simulation result may be different
    from pre-synthesis simulation

37
Example of if and case
  • input 30 data
  • output 10 code
  • reg 10 code
  • always _at_(data)
  • begin // implicit priority
  • if ( data3 ) code 3
  • else if (data2) code 2
  • else if (data1) code 1
  • else if (data0) code 0
  • else code 2bx
  • end
  • input 30 data
  • output 10 code
  • reg 10 code
  • always _at_(data)
  • case (data)
  • 4b1000 code 3
  • 4b0100 code 2
  • 4b0010 code 1
  • 4b0001 code 0
  • default code 2bx
  • endcase

38
Synthesis of Register Variables
  • A hardware register will be generated for a
    register variable when
  • It is referenced before value is assigned in a
    behavior
  • Assigned value in an edge-sensitive behavior and
    is referenced by an assignment outside the
    behavior
  • Assigned value in one clock cycle and referenced
    in another clock cycle
  • Multi-phased latches may not be supported in
    synthesis

39
Synthesis of Arithmetic Operators
  • If corresponding library cell exists, an operator
    will be directly mapped to it
  • Synthesis tool may select among different options
    in library cell, for example, when synthesize an
    adder
  • Small wordlength -gt ripple-carry adder
  • Long wordlength -gt carry-look-ahead adder
  • Need small area -gt bit-serial adder
  • Implementation of and /
  • May be inefficient when both operands are
    variables
  • If a multiplier or the divisor is a power of two,
    can be implemented through shift register

40
Static Loops without Internal Timing Controls gt
Combinational Logic
  • module count1sA ( bit_cnt, data, clk, rst )
  • parameter data_width 4 parameter cnt_width
    3
  • output cnt_width-10 bit_cnt
  • input data_width-10 data input clk, rst
  • reg cnt_width-10 cnt, bit_cnt, i reg
    data_width-10 tmp
  • always _at_ ( posedge clk )
  • if ( rst ) begin cnt 0 bit_cnt 0 end
  • else begin cnt 0 tmp data
  • for ( i 0 i lt data_width i i 1 )
  • begin
  • if ( tmp0 ) cnt cnt 1
  • tmp tmp gtgt 1 end
  • bit_cnt cnt
  • end
  • endmodule

41
Static Loops with Internal Timing Controls gt
Sequential Logic
  • module count1sB ( bit_cnt, data, clk, rst )
  • parameter data_width 4 parameter cnt_width
    3
  • output cnt_width-10 bit_cnt
  • input data_width-10 data input clk, rst
  • reg cnt_width-10 cnt, bit_cnt, i reg
    data_width-10 tmp
  • always _at_ ( posedge clk )
  • if ( rst ) begin cnt 0 bit_cnt 0 end
  • else begin
  • cnt 0 tmp data
  • for ( i 0 i lt data_width i i 1 )
  • _at_ ( posedge clk )
  • begin if ( tmp0 ) cnt cnt 1
  • tmp tmp gtgt 1 end
  • bit_cnt cnt
  • end
  • endmodule

42
Non-Static Loops without Internal Timing Controls
gt Not Synthesizable
  • module count1sC ( bit_cnt, data, clk, rst )
  • parameter data_width 4 parameter cnt_width
    3
  • output cnt_width-10 bit_cnt
  • input data_width-10 data input clk, rst
  • reg cnt_width-10 cnt, bit_cnt, i reg
    data_width-10 tmp
  • always _at_ ( posedge clk )
  • if ( rst ) begin cnt 0 bit_cnt 0 end
  • else begin
  • cnt 0 tmp data
  • for ( i 0 tmp i i 1 )
  • begin if ( tmp0 ) cnt cnt 1
  • tmp tmp gtgt 1 end
  • bit_cnt cnt
  • end
  • endmodule

43
Non-Static Loops with Internal Timing Controls gt
Sequential Logic
  • module count1sD ( bit_cnt, data, clk, rst )
  • parameter data_width 4 parameter cnt_width
    3
  • output cnt_width-10 bit_cnt
  • input data_width-10 data input clk, rst
  • reg cnt_width-10 cnt, bit_cnt, i reg
    data_width-10 tmp
  • always _at_ ( posedge clk )
  • if ( rst ) begin cnt 0 bit_cnt 0 end
  • else begin bit_counter
  • cnt 0 tmp data
  • while ( tmp )
  • _at_ ( posedge clk ) begin
  • if ( rst ) begin cnt 0 disable
    bit_counter end
  • else begin cnt cnt tmp0 tmp
    tmp gtgt 1 end
  • bit_cnt cnt
  • end
  • end
  • endmodule

44
VHDL
45
Example
  • -- eqcomp4 is a four bit equality comparator
  • -- Entity declaration
  • entity eqcomp4 is
  • port ( a, b in bit_vector( 3 downto 0 )
  • equals out bit ) -- equal is
    active high
  • end eqcomp4
  • -- Architecture body
  • architecture dataflow of eqcomp4 is
  • begin
  • equals lt 1 when ( a b ) else 0
  • end dataflow

46
Behavioral Descriptions
  • library ieee
  • use ieee.std_logic_1164.all
  • entity eqcomp4 is port (
  • a, b in std_logic_vector( 3 downto 0 )
  • equals out std_logic )
  • end eqcomp4
  • architecture behavioral of eqcomp4 is
  • begin
  • comp process ( a, b ) -- sensitivity list
  • begin
  • if a b then equals lt 1
  • else equals lt 0 -- sequential
    assignment
  • endif
  • end process comp
  • end behavioral

47
Dataflow Descriptions
  • library ieee
  • use ieee.std_logic_1164.all
  • entity eqcomp4 is port (
  • a, b in std_logic_vector( 3 downto 0 )
  • equals out std_logic )
  • end eqcomp4
  • architecture dataflow of eqcomp4 is
  • begin
  • equals lt 1 when ( a b ) else 0
  • end dataflow
  • -- No process
  • -- Concurrent assignment

48
Structural Descriptions
  • library ieee
  • use ieee.std_logic_1164.all
  • entity eqcomp4 is port (
  • a, b in std_logic_vector( 3 downto 0 )
    equals out std_logic )
  • end eqcomp4
  • use work.gatespkg.all
  • architecture struct of eqcomp4 is
  • signal x std_logic_vector( 0 to 3)
  • begin
  • u0 xnor2 port map ( a(0), b(0), x(0) ) --
    component instantiation
  • u1 xnor2 port map ( a(1), b(1), x(1) )
  • u2 xnor2 port map ( a(2), b(2), x(2) )
  • u3 xnor2 port map ( a(3), b(3), x(3) )
  • u4 and4 port map ( x(0), x(1), x(2), x(3),
    equals )
  • end struct

49
Test and Design For Testability
50
Single Stuck-at Fault
  • Three properties define a single stuck-at fault
  • Only one line is faulty
  • The faulty line is permanently set to 0 or 1
  • The fault can be at an input or output of a gate
  • Example XOR circuit has 12 fault sites ( ) and
    24 single stuck-at faults

Faulty circuit value
Good circuit value
c
j
0(1)
s-a-0
d
a
1(0)
g
h
1
z
i
0
1
e
b
1
k
f
Test vector for h s-a-0 fault
51
Stuck-Open Example
Vector 1 test for A s-a-0 (Initialization vector)
Vector 2 (test for A s-a-1)
VDD
pMOS FETs
Two-vector s-op test can be constructed
by ordering two s-at tests
A
1 0
0 0
Stuck- open
B
C

0
1(Z)
Good circuit states
nMOS FETs
Faulty circuit states
52
Stuck-Short Example
Test vector for A s-a-0
VDD
PFETs
IDDQ path in faulty circuit
A
Stuck- short
1 0
B
Good circuit state
C

0 (X)
NFETs
Faulty circuit state
53
Test Pattern for Stuck-At Faults
Ygood (a?b?c)
SA1
Ya-SA1 (b?c)
No need to enumerate all input combinations to
detect a fault
Test pattern a,b,c 011
54
Fault Simulation
  • Fault simulation Problem Given
  • A circuit
  • A sequence of test vectors
  • A fault model
  • Determine
  • Fault coverage - fraction (or percentage) of
    modeled faults detected by test vectors
  • Set of undetected faults
  • Motivation
  • Determine test quality and in turn product
    quality
  • Find undetected fault targets to improve tests

55
Goal of Design for Testability (DFT)
  • Improve
  • Controllability
  • Observability
  • Predictability

56
Scan Storage Cell
Q, So
D
Si
SSC
N/T
SSC
Clk
Q
D
57
Integrated Serial Scan
PI
PO
SFF
SCANOUT
Combinational logic
SFF
SFF
Control
SCANIN
58
Interconnect Timing Optimization
59
Buffers Reduce Wire Delay
t_unbuf R( cx C ) rx( cx/2 C ) t_buf
2R( cx/2 C ) rx( cx/4 C ) tb t_buf
t_unbuf RC tb rcx2/4
60
Buffers Improve Slack
RAT 300 Delay 350 Slack -50
slackmin -50
RAT 700 Delay 600 Slack 100
RAT Required Arrival Time Slack RAT - Delay
RAT 300 Delay 250 Slack 50
Decouple capacitive load from critical path
slackmin 50
RAT 700 Delay 400 Slack 300
61
Slew Constraints
  • When a buffer is inserted, assume ideal slew rate
    at its input
  • Check slew rate at downstream buffers/sinks
  • If slew is too large, candidate is discarded

62
Cost-Slack Trade-off
63
Wire Sizing Monotone Property
  • Ancestor edges cannot be narrower than downstream
    edges

64
Area or Radius?
Radius the longest source-sink path length
  • Dijkstras shortest path tree
  • Short path to sinks
  • Large total wire length
  • Prims minimum spanning tree
  • Small total wire length
  • Long path to sinks

65
Area Radius Trade-off
  • Find a solution in middle
  • Not too much area
  • Not too long radius
  • How to find an ideal point?

66
Gate Characteristics
67
I-V Characteristics
  • Cutoff region
  • Vgs lt Vt
  • Ids 0
  • Linear region
  • Vgs gt Vt, 0 lt Vds lt Vgs-Vt
  • Ids B(Vgs-Vt)Vds V2ds/2
  • Saturation region
  • Vgs gt Vt, 0 lt Vgs-Vt lt Vds
  • Ids B(Vgs-Vt)2/2
  • B a W/L

d
g
s
Ids
Vds
68
Falling Time
  • Falling time t1 t2
  • t1 Vout drops from 0.9Vdd to Vdd-Vt
  • t2 Vout drops from Vdd-Vt to 0.1Vdd
  • Falling time rising time k
    C / (B Vdd)
  • Delay Falling time / 2

69
Gate Power Dissipation
  • Leakage power
  • Dynamic power
  • Short circuit power

70
Leakage Power
  • Static
  • Leakage current a ? Vdd
  • Leakage current b/Vt
  • Killer to CMOS technology

Vdd
Vdd
Leakage
out
out
Leakage
Linear
Saturation
71
Dynamic Power
  • Occurs at each switching
  • Pd CL?Vdd2?fp
  • fp switching frequency

Vdd
Vdd
out
out
Linear
Saturation
72
Short Circuit Power
  • During switching, there is a short moment when
    both PMOS and CMOS are partially on
  • Ps Q?(Vdd-Vt)3?tr?fp
  • tr rising time

Input falling
Vdd
Vdd
out
out
Input rising
73
Low Power Design
74
Clock Gating
  • Gate off clock to idle functional units
  • e.g., floating point units
  • need logic to generate
    disable
    signal
  • increases complexity of control logic
  • consumes power
  • timing critical to avoid clock glitches
    at
    OR gate output
  • additional gate delay on clock signal
  • gating OR gate can replace a buffer in the clock
    distribution tree

75
Active Power Reduction - Supply Voltage Reduction
Static
Dynamic
  • Adjusting operation voltage and frequency to
    performance requirements
  • High performance high Vdd frequency
  • Power saving low Vdd frequency
  • Pros
  • Doesnt limit performance
  • Cons
  • Penalty of transition between different power
    states can be high (in performance and power)
  • Additional control logic
  • Pros
  • Always active in saving
  • Cons
  • Additional power delivery network
  • Needs special care of interface between power
    domains
  • signals close to Vt excessive leakage and
    reduced noise margins

76
Dynamic Frequency and Voltage Scaling
  • Always run at the lowest supply voltage that
    meets the timing constraints
  • DFS (dynamic frequency scaling) saves only power
  • DVS (dynamic voltage scaling) DFS saves both
    energy and power
  • A DVSDFS system requires the following
  • A programmable clock generator (PLL)
  • PLL from 200MHz ? 700MHz in increments of 33MHz
  • A supply regulation loop that sets the minimum
    VDD necessary for operation at the desired
    frequency
  • 32 levels of VDD from 1.1V to 1.6V
  • An operating system that sets the required
    frequency supply voltage to meet the task
    completion deadlines
  • heavier load ? ramp up VDD, when stable speed up
    clock
  • lighter load ? slow down clock, when PLL locks
    onto new rate, ramp down VDD

77
Design with Dual Vth
Dual Vth evaluation
  • Dual Vth design
  • Two flavors of transistors slow high Vth, fast
    low Vth
  • Low Vth are faster, but have 10X leakage

78
Power Gating Using Sleep Transistors
  • Or can reduce leakage by gating the supply rails
    when the circuit is in sleep mode
  • in normal mode, sleep 0 and the sleep
    transistors must present as small a resistance as
    possible (via sizing)
  • in sleep mode, sleep 1, the transistor stack
    effect reduces leakage by orders of magnitude
  • Or can eliminate leakage by switching off the
    power supply (but lose the memory state)
Write a Comment
User Comments (0)
About PowerShow.com