Shuvendu K. Lahiri

About This Presentation

Title:

Shuvendu K. Lahiri

Description:

wd. wa. Memory M Modeled as Function. M(a): Value at location a ... Can decide more expressive class. CVC (Successor of SVC) runs out of memory on larger cases ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 51

Provided by: Shuvend6

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Shuvendu K. Lahiri

1
Modeling and Verification of Out-of-Order
Microprocessors in UCLID

Shuvendu K. Lahiri
Sanjit A. Seshia
Randal E. Bryant
Carnegie Mellon University, USA

2
Processor Verification

Views of System Operation
Instruction Set
Instructions executed in sequential order
Instruction modifies programmer-visible state
Microarchitecture
At any given time, multiple instructions in
flight
State held in hidden pipeline registers and
buffers
Verification Task
Prove all instruction sequences execute as
predicted by instruction set model

3
Introduction and Related Work

Inorder Pipeline Verification
Burch and Dill, CAV 94
Relates implementation and specification by
completing partially-executed instructions in the
pipeline (flushing)
Infinite data words, memories
Bounded (fixed) resources only
Cant model a reorder buffer (ROB) of arbitrary
length
Out-of-Order Processor Verification
Arbitrary large (64-128) reorder buffer,
reservation stations and load-store queues
Very large number of instruction in the pipeline
No finite flushing function to drain the pipeline

4
Out-Of-Order Processor Verification

Theorem Proving approaches
Hosabettu et al. (00), Sawada et al.(98), Arons
et al.(00)
Write inductive invariants
Manually guide the theorem-provers for proving
invariants
Large, complicated proof scripts (fragile)
Seldom have good counterexample facilities
Compositional Model Checking McMillan et al.
Use compositional model checking with temporal
case splitting, path splitting, symmetry and
data-type reduction
Does not need to write inductive invariants
User needs to manually decompose the proof
Has not been demonstrated effective for deep,
superscalar pipelines
Other Approaches
Finite State Model Checking Berezin et al.,
Incremental Flushing Skakkaebek et al.,
Decision Procedure Velev

5
Contributions

Extends the work by Bryant Velev
Restricted to Inorder pipelines with bounded
resources
Application of UCLID
Modeling Framework for Out-Of-Order processors
Application of three verification approaches to
Out-Of-Order Processor
Effective use of automated decision procedure
For proving large formulas automatically
Simple heuristics for quantifier instantiation

6
CLU Logic of UCLID

Terms (T ) Integer Expressions
ITE(F, T1, T2) If-then-else
Fun (T1, , Tk) Function application
succ (T) Increment
pred (T) Decrement
Formulas (F ) Boolean Expressions
?F, F1 ? F2, F1 ? F2 Boolean connectives
T1 T2 Equation
T1 lt T2 Inequality
P(T1, , Tk) Predicate application
Functions (Fun) Integers ? Integer
f Uninterpreted function symbol
? x1, , xk . T Function definition
Predicates (P) Integers ? Boolean
p Uninterpreted predicate symbol
l x1, , xk . F Predicate definition

7
Decision Procedure
CLU Formula
Lambda Expansion
?-free Formula
Function Predicate Elimination

Operation
Series of transformations leading to
propositional formula
Propositional formula checked with BDD or SAT
tools
Bryant, Lahiri, Seshia CAV02

Function-free Formula
Convert to Boolean Formula
Boolean Formula
Boolean Satisfiability
8
Modeling Memories with ?s

Writing Transforms Memory
nextM Write(M, wa, wd)
? a . ITE(a wa, wd, M(a))
Future reads of address wa will get wd

Memory M Modeled as Function
M(a) Value at location a

Initially
Arbitrary state
Modeled by uninterpreted function m0

9
Modeling Unbounded FIFO Buffer

Queue is Subrange of Infinite Sequence
h INT
Head of the queue
t INT
Tail of the queue
q INT ? INT
Function mapping indices to values
q(i) valid only when h ? i lt t

q(h2)
q(h1)
q(h)
head
q(h1)

q(t2)
q(t1)
tail
q(t)
q(t1)
10
Modeling FIFO Buffer (cont.)
11
Modeling Parallel Updates

Simultaneous-Update Memories
Update arbitrary subset of entries at the same
step
nextM ?i. ITE(P(i), D(i), M(i))
Any entry, i, which satisfies a predicate P(i)
will get updated with D(i)
Useful for modeling Reorder Buffers
Forwarding data to all dependant instructions

12
UCLID description
Bounded Property Checking
Correspondence Checking
Inductive Invariant Checking
Term-level Symbolic Simulator
Decision Procedure
Counter Example Generator
SAT
BDD

Systems are modeled in CLU logic
Three verification techniques
Based on Symbolic Simulation
Uses the decision procedure
Counter example traces generated for verification
failures

13
Verification Techniques in UCLID

Bounded Property Checking
Start in reset state
Symbolically simulate for fixed number of steps
Verify a safety property for all states reachable
within the fixed number of steps from the start
state
Correspondence Checking
Run 2 different simulations starting in most
general state
Prove that final states equivalent
e.g. Burch-Dill Technique
Invariant Checking
Start in general state s
Prove Inv(s) ? Inv(nexts)
Limited support for automatic quantifier
instantiation

14
An Out-of-order Processor (OOO)
valid tag val
D E C O D E
incr
dispatch
Program memory
valid value src1valid src1val src1tag src2valid sr
c2val src2tag dest op
result
PC
Register Rename Unit
1st Operand
result bus
retire
2nd Operand
ALU

Reorder Buffer
execute
head
tail
Reorder Buffer Fields

Out of order execution engine
Register Renaming
Inorder retirement
Unbounded Reorder buffer
Arithmetic instructions only
Model different components in UCLID

15
Verification of OOO Automation vs. Guarantee
Method Resources Verification ( of steps) Auxiliary variables Invariants
Bounded Property Checking Unbounded Bounded None None
Burch-Dill Technique Fixed Unbounded None Very few
Inductive Invariant Checking Unbounded Unbounded Significant Significant, including those for auxiliary variables

Presence of decision procedure
Efficiency Allows improved bounded property
checking and Burch-Dill method
Automation Reduces manual guidance in proving
invariants
Automatic Instantiation of quantifiers

16
Technique 1 Bounded Property Checking

Debugging OOO using Bounded Property Checking
All the errors were discovered during this phase
Counterexample trace of great help
Debugging Motorola ELF
Superscalar out-of-order processor
Reorder Buffer, memory unit, load-store queues
etc.
Applied during early design exploration phase

17
Bounded Property Checking Results
Model steps terms Term formula size Prop Formula Size UCLID time (s) SVC time (s)
OOO unit 10 59 2566 15290 10.8 233.18
14 87 7480 62504 76.55 gt 5 hrs
20 129 19921 263413 1679.12 gt 1 day
Elf 6 33 218 942 1.2 10.9
8 70 1085 4481 8.4 1851.6
10 104 2467 16453 30.6 gt 1 day
12 149 4553 54288 111.0 gt 1 day

SVC (Stanford) Another decision procedure to
solve CLU formulas
Can decide more expressive class
CVC (Successor of SVC) runs out of memory on
larger cases

18
Technique 2 Burch-Dill Technique
k issue width of OOO
?impl Transition function of OOO
?spec Transition function of ISA
Abs Relates OOO state with an ISA
state

Restrict the number of entries in the Reorder
Buffer
The number of ROB entry r
Flushing as the abstraction function Abs
Alternate between executing the instruction at
the head of the reorder buffer and retiring the
head
Inductive Invariants required for the initial
state Qimpl
Critical for Out-of-Order processor verification
Redundancy present in the OOO model
Because of out-of-order execution and register
renaming

19
Technique 2 Burch-Dill Technique
k issue width of OOO
?impl Transition function of OOO
?spec Transition function of ISA
Abs Relates OOO state with an ISA
state

More automated than inductive invariant checking
Does not require auxiliary structures,
Far fewer invariants than invariant checking
Only 4 invariants compared to about 12 for
inductive invariant checking approach

20
Burch-Dill Technique for OOO

Exponential blowup with the number of ROB entries
Limited to r 8 entries currently
r 8 finished after case-splitting in 2.5hrs

Of ROB Entries of terms Term formula size Prop Formula Size UCLID time (s)
2 63 398 5325 6.83
3 83 618 10248 30.23
4 103 886 18175 157.41
6 143 1534 41208 3051.79
8 183 2342 82915 gt31hrs
21
Technique 3 Invariant Checking

Deriving the inductive invariants
Require additional (auxiliary) variables to
express invariants
Auxiliary variables do not affect system
operation
Proving that the invariants are inductive
Automate proof of invariants in UCLID
Eliminates need for large (often fragile) proof
script

22
Restricted Invariants and Proofs

Restricted classes of invariants
?x1?x2?xk ?(x1xk)
?(x1xk) is a CLU formula without quantifiers
x1xk are integer variables free in ?(x1xk)
Proving these invariants requires quantifiers
(?x1?x2?xk ?(x1xk)) ? ?y1?y2?ym ?(y1ym)
Automatic instantiation of x1xk with concrete
terms
Sound but incomplete method
Reduce the quantified formula to a CLU formula
Can use the decision procedure for CLU

23
Shadow Structures

Auxiliary variables
Added to predict correct value of state variables
3 shadow variables for 3 state variables
rob.value shdw.value
rob.src1val shdw.src1val
rob.src2val shdw.src2val
Similar to McMillans approach and Arons et al.s
approach

24
Adding Shadow Structures
result bus

shdw.src1valrob.tail ? Rfisa(src1)
shdw.src2valrob.tail ? Rfisa(src2)
shdw.valuerob.tail ?
ALU(Rfisa(src1), Rfisa(src2), op)

25
Adding Shadow Structures
result bus

?robt. rob.valid(t) ? rob.value(t)
shdw.value(t)
?robt. rob.src1valid(t) ? rob.src1val(t)
shdw.src1val(t)
?robt. rob.src2valid(t) ? rob.src2val(t)
shdw.src2val(t)

26
Refinement Maps

Correspondence with a sequential ISA model
OOO and ISA synchronized at dispatch
For Register File Contents
?r. reg.valid(r) ? reg.val(r) Rfisa(r)
For Program Counter
PCooo PCisa

27
Invariants

Tag Consistency invariants (2)
Instructions only depend on instruction preceding
in program order
Register Renaming invariants (2)
Tag in a rename-unit should be in the ROB, and
the destination register should match
?r.?reg.valid(r)? (rob.head ? reg.tag(r) lt
rob.tail ? rob.dest(reg.tag(r)) r )
For any entry, the destination should have
reg.valid as false and tag should contain this or
later instruction
?robt.(?reg.valid(rob.dest(t)) ?
t ? reg.tag(rob.dest(t)) lt rob.tail)

28
Invariants (cont.)

Executed instructions have operands ready
?robt. rob.valid(t) ?
rob.src1valid(t) ? rob.src2valid(t)
Shadow-Value-Operands Relationship
?robt. shdw.value(t) Alu(shdw.src1val(t),shdw.sr
c2val(t),rob.op(t))
Producer-Consumer Values (2)
?robt. ?rob.src1valid(t) ?
shdw.src1val(t) shdw.value(rob.src1tag(t))
Total 13 Invariants
Includes Refinement Maps
Constraints on Shadow Variables

29
Proving Invariants

Proved automatically
Quantifier instantiation was sufficient in these
cases
Relieves the user of writing proof scripts to
discharge the proofs
Time spent 54s on 1.4GHz m/c
Total effort 2 person days
Not possible to use SVC or CVC
Ordering between integer array indices
?robt. ?rob.src1valid(t) ? rob.src1tag(t) lt t
SVC/CVC interprets terms over reals
(x lt y1) ? (x ? y)
Valid when x,y are integers
Invalid when x,y are reals

30
Why Quantifier Instantiation works
31
Extensions to the base model

Increase concurrency of design
Infinite number of execution units
Any subset of dispatch,execute,retire,nop can
be active
The same invariants were proved inductive without
any changes
Scalar ? Superscalar
Incorporate issue width 2 and retire width 2
Data forwarding logic of the processor gets
complicated
Same set of invariants proved automatically
No change in the proof script !!
Runtime increased from 54s to 134s

32
Adding circular reorder buffer

ROB modeled as a finite but arbitrary-size
circular FIFO
Tags are reused
No dispatch when the reorder buffer is full
Changes in the model
Add a predicate rob.present() to indicate a rob
entry contains valid entry
Change the dispatch logic to stall when ROB full
Modify lt to incorporate wrap-around
Changes in proof script
Add 1 invariant about the relationship of
rob.present and active elements of ROB
Again the proof of invariants automatic !!

33
Liveness Proof

Liveness
Every dispatched instruction is eventually
retired
Assumes a fair scheduler
Attempts to execute the instruction at the head
infinitely often
Proceed by a high level induction
Not mechanical
Similar to Hosabettu CAV98 approach
Most lemmas required are already proved during
safety proof (in UCLID)
Concise proof

34
Current Status and Future Work

Use of decision procedure in deductive
verification
Automate proof of invariants in
micro-architecture verification with
speculation, memory instructions CMU-TR
Automate proof of invariants in verification of a
directory based cache coherence protocol with
unbounded clients and unbounded channels
Need ways to generate (some) invariants
automatically
Pnueli et al.s invisible invariant method
CAV01
Difficult to handle unbounded data, uninterpreted
functions and ordering
Detecting convergence of such term-level models
Would enable automatic proof of models with
finite buffers

Questions

36
Introduction and Related Work

Microprocessor Verification
Finite state symbolic Model Checking,
Berezin et al.
Compositional Model Checking,
McMillan et al.
Symbolic Simulation Decision Procedure based,
Burch Dill,
Bryant Velev
Theorem Proving Techniques,
Sawada Hunt,
Hosabettu et al.,
Arons Pnueli

37
Outline

UCLID TOOL
Logic and Decision Procedure
Modeling Framework
Verification Frameworks
Out-Of-Order Processor Description
Bounded Property Checking Results
Burch-Dill Verification Results
Invariant Checking Framework
Shadow Structures
Inductive Invariant Checking and Quantifiers
Invariants Required
Extension of the simple processor model

38
Exploiting Positive Equality

Decision Procedure exploits positive-equality
Bryant, German, Velev , CAV99
Extended in presence of succ, pred operations
Bryant, Lahiri, Seshia CAV02
Positive Equality
Number of interpretations can be greatly reduced
Equations appearing only under even of
negations assigned false
Except when restricted by functional consistency
Terms compared in these equations get distinct
interpretations --- called p-terms
Identifying p-terms is a pre-processing step

39
Instruction Set Architecture (ISA)
40
Symbols

?????????????????????????????????????????????

41
UCLID description
42
Modeling Circular Queues
43
Term-level modeling

Abstract Bit-Vectors with Integers (Terms)
Allow restricted set of operations
xy, x ? y, succ(x), pred(x)
Black-box certain combinational blocks
Replace by uninterpreted functions
Maintain functional consistency

f
44
Example Motorola ELF Processor

Features
32-bit Dual issue with 64 GPRs
5 stage pipeline
Out-of-order issue, in order completion of up to
2 instructions
Load/Store unit
3-cycle load latency
Fully pipelined
Load queue for loads that miss in cache
Store queue for retiring store instruction
Other buffers to hide cache miss latency
1000 lines of UCLID model derived from 20K lines
of RTL

45
Bounded Property Checking

Compare the micro-architecture with a sequential
ISA model w.r.t. Register File, Memory and PC
ISA model synchronized at completion

dimpl
dimpl
dimpl
dimpl
dimpl
dimpl
46
Quantifier Instantiation

Prove
(?x1?x2?xk ?(x1xk)) ? ?y1?y2?ym ?(y1ym)
Introduce Skolem Constants (y1,,ym)
(?x1?x2?xk ?(x1,,xk)) ? ?(y1,,ym)
Instantiate x1,,xk with concrete terms
Assume single-arity functions and predicates
Let Fx f f(x) is a sub-expression of
?(x1xk)
Let Tf t f(t) is a sub-expression of
?(y1ym)
For each bound variable x, Ax t?f ? Fx and t
? Tf
Instantiate ? over Axi x Ax2 ...x Axk
Formula size grows exponentially with the number
of bound variables

47
Updating Shadow Structures

During the dispatch of new instruction
I ltsrc1,src2,dest,opgt
nextshdw.value
?t. (t rob.tail ?
Alu(Rfisa(src1),Rfisa(src2),op)
shdw.value(t))
nextshdw.src1val
?t. (t rob.tail ?
Rfisa(src1) shdw.src1val(t))
nextshdw.src2val
?t. (t rob.tail ?
Rfisa(src2) shdw.src2val(t))

48
Adding Shadow Structures
D E C O D E
incr
Program memory
PC
49
Refinement Maps

For Register File Contents
?r. reg.valid(r) ? reg.val(r) Rfisa(r)
If a register is not being modified by any
instruction in ROB, then the value matches the
ISA value
For Program Counter
PCooo PCisa

50
Invariants
valid value src1valid src1val src1tag src2valid sr
c2val src2tag dest op
0
51
Burch-Dill Technique

More automated than inductive invariant checking
Does not require auxiliary structures,
Far fewer invariants than invariant checking
Only 4 invariants compared to about 12 for
inductive invariant checking approach
Invariants on initial state Qooo
Instructions only depend on instruction preceding
in program order
Tag in a rename-unit should be in the ROB, and
the destination register should match
For any entry, the destination should have
reg.valid as false and tag should contain this or
later instruction
rob.head ? rob.tail ? rob.head r

52
Invariants

Total 13 invariants required
Refinement map for RF and PC (2)
Shadow structure constraints (3)
Tag Consistency invariants (2)
Instructions only depend on instruction preceding
in program order
Circular Register Renaming invariants (2)
Tag in a rename-unit should be in the ROB, and
the destination register should match
?r.?reg.valid(r)? (rob.head ? reg.tag(r) lt
rob.tail ? rob.dest(reg.tag(r)) r )
For any entry, the destination should have
reg.valid as false and tag should contain this or
later instruction
?robt.(?reg.valid(rob.dest(t)) ?
t ? reg.tag(rob.dest(t)) lt rob.tail)

Write a Comment

User Comments (0)

About PowerShow.com

Shuvendu K. Lahiri - PowerPoint PPT Presentation

Shuvendu K. Lahiri

wd. wa. Memory M Modeled as Function. M(a): Value at location a ... Can decide more expressive class. CVC (Successor of SVC) runs out of memory on larger cases ... – PowerPoint PPT presentation