Title: Shuvendu K. Lahiri
1Modeling and Verification of Out-of-Order
Microprocessors in UCLID
- Shuvendu K. Lahiri
- Sanjit A. Seshia
- Randal E. Bryant
- Carnegie Mellon University, USA
2Processor Verification
- Views of System Operation
- Instruction Set
- Instructions executed in sequential order
- Instruction modifies programmer-visible state
- Microarchitecture
- At any given time, multiple instructions in
flight - State held in hidden pipeline registers and
buffers - Verification Task
- Prove all instruction sequences execute as
predicted by instruction set model
3Introduction and Related Work
- Inorder Pipeline Verification
- Burch and Dill, CAV 94
- Relates implementation and specification by
completing partially-executed instructions in the
pipeline (flushing) - Infinite data words, memories
- Bounded (fixed) resources only
- Cant model a reorder buffer (ROB) of arbitrary
length - Out-of-Order Processor Verification
- Arbitrary large (64-128) reorder buffer,
reservation stations and load-store queues - Very large number of instruction in the pipeline
- No finite flushing function to drain the pipeline
4Out-Of-Order Processor Verification
- Theorem Proving approaches
- Hosabettu et al. (00), Sawada et al.(98), Arons
et al.(00) - Write inductive invariants
- Manually guide the theorem-provers for proving
invariants - Large, complicated proof scripts (fragile)
- Seldom have good counterexample facilities
- Compositional Model Checking McMillan et al.
- Use compositional model checking with temporal
case splitting, path splitting, symmetry and
data-type reduction - Does not need to write inductive invariants
- User needs to manually decompose the proof
- Has not been demonstrated effective for deep,
superscalar pipelines - Other Approaches
- Finite State Model Checking Berezin et al.,
Incremental Flushing Skakkaebek et al.,
Decision Procedure Velev
5Contributions
- Extends the work by Bryant Velev
- Restricted to Inorder pipelines with bounded
resources - Application of UCLID
- Modeling Framework for Out-Of-Order processors
- Application of three verification approaches to
Out-Of-Order Processor - Effective use of automated decision procedure
- For proving large formulas automatically
- Simple heuristics for quantifier instantiation
6CLU Logic of UCLID
- Terms (T ) Integer Expressions
- ITE(F, T1, T2) If-then-else
- Fun (T1, , Tk) Function application
- succ (T) Increment
- pred (T) Decrement
- Formulas (F ) Boolean Expressions
- ?F, F1 ? F2, F1 ? F2 Boolean connectives
- T1 T2 Equation
- T1 lt T2 Inequality
- P(T1, , Tk) Predicate application
- Functions (Fun) Integers ? Integer
- f Uninterpreted function symbol
- ? x1, , xk . T Function definition
- Predicates (P) Integers ? Boolean
- p Uninterpreted predicate symbol
- l x1, , xk . F Predicate definition
7Decision Procedure
CLU Formula
Lambda Expansion
?-free Formula
Function Predicate Elimination
- Operation
- Series of transformations leading to
propositional formula - Propositional formula checked with BDD or SAT
tools - Bryant, Lahiri, Seshia CAV02
Function-free Formula
Convert to Boolean Formula
Boolean Formula
Boolean Satisfiability
8Modeling Memories with ?s
- Writing Transforms Memory
- nextM Write(M, wa, wd)
- ? a . ITE(a wa, wd, M(a))
- Future reads of address wa will get wd
- Memory M Modeled as Function
- M(a) Value at location a
- Initially
- Arbitrary state
- Modeled by uninterpreted function m0
9Modeling Unbounded FIFO Buffer
- Queue is Subrange of Infinite Sequence
- h INT
- Head of the queue
- t INT
- Tail of the queue
- q INT ? INT
- Function mapping indices to values
- q(i) valid only when h ? i lt t
q(h2)
q(h1)
q(h)
head
q(h1)
q(t2)
q(t1)
tail
q(t)
q(t1)
10Modeling FIFO Buffer (cont.)
11Modeling Parallel Updates
- Simultaneous-Update Memories
- Update arbitrary subset of entries at the same
step - nextM ?i. ITE(P(i), D(i), M(i))
- Any entry, i, which satisfies a predicate P(i)
will get updated with D(i) - Useful for modeling Reorder Buffers
- Forwarding data to all dependant instructions
12UCLID description
Bounded Property Checking
Correspondence Checking
Inductive Invariant Checking
Term-level Symbolic Simulator
Decision Procedure
Counter Example Generator
SAT
BDD
- Systems are modeled in CLU logic
- Three verification techniques
- Based on Symbolic Simulation
- Uses the decision procedure
- Counter example traces generated for verification
failures
13Verification Techniques in UCLID
- Bounded Property Checking
- Start in reset state
- Symbolically simulate for fixed number of steps
- Verify a safety property for all states reachable
within the fixed number of steps from the start
state - Correspondence Checking
- Run 2 different simulations starting in most
general state - Prove that final states equivalent
- e.g. Burch-Dill Technique
- Invariant Checking
- Start in general state s
- Prove Inv(s) ? Inv(nexts)
- Limited support for automatic quantifier
instantiation
14An Out-of-order Processor (OOO)
valid tag val
D E C O D E
incr
dispatch
Program memory
valid value src1valid src1val src1tag src2valid sr
c2val src2tag dest op
result
PC
Register Rename Unit
1st Operand
result bus
retire
2nd Operand
ALU
Reorder Buffer
execute
head
tail
Reorder Buffer Fields
- Out of order execution engine
- Register Renaming
- Inorder retirement
- Unbounded Reorder buffer
- Arithmetic instructions only
- Model different components in UCLID
15Verification of OOO Automation vs. Guarantee
Method Resources Verification ( of steps) Auxiliary variables Invariants
Bounded Property Checking Unbounded Bounded None None
Burch-Dill Technique Fixed Unbounded None Very few
Inductive Invariant Checking Unbounded Unbounded Significant Significant, including those for auxiliary variables
- Presence of decision procedure
- Efficiency Allows improved bounded property
checking and Burch-Dill method - Automation Reduces manual guidance in proving
invariants - Automatic Instantiation of quantifiers
16Technique 1 Bounded Property Checking
- Debugging OOO using Bounded Property Checking
- All the errors were discovered during this phase
- Counterexample trace of great help
- Debugging Motorola ELF
- Superscalar out-of-order processor
- Reorder Buffer, memory unit, load-store queues
etc. - Applied during early design exploration phase
17Bounded Property Checking Results
Model steps terms Term formula size Prop Formula Size UCLID time (s) SVC time (s)
OOO unit 10 59 2566 15290 10.8 233.18
14 87 7480 62504 76.55 gt 5 hrs
20 129 19921 263413 1679.12 gt 1 day
Elf 6 33 218 942 1.2 10.9
8 70 1085 4481 8.4 1851.6
10 104 2467 16453 30.6 gt 1 day
12 149 4553 54288 111.0 gt 1 day
- SVC (Stanford) Another decision procedure to
solve CLU formulas - Can decide more expressive class
- CVC (Successor of SVC) runs out of memory on
larger cases
18Technique 2 Burch-Dill Technique
k issue width of OOO
?impl Transition function of OOO
?spec Transition function of ISA
Abs Relates OOO state with an ISA
state
- Restrict the number of entries in the Reorder
Buffer - The number of ROB entry r
- Flushing as the abstraction function Abs
- Alternate between executing the instruction at
the head of the reorder buffer and retiring the
head - Inductive Invariants required for the initial
state Qimpl - Critical for Out-of-Order processor verification
- Redundancy present in the OOO model
- Because of out-of-order execution and register
renaming
19Technique 2 Burch-Dill Technique
k issue width of OOO
?impl Transition function of OOO
?spec Transition function of ISA
Abs Relates OOO state with an ISA
state
- More automated than inductive invariant checking
- Does not require auxiliary structures,
- Far fewer invariants than invariant checking
- Only 4 invariants compared to about 12 for
inductive invariant checking approach
20Burch-Dill Technique for OOO
- Exponential blowup with the number of ROB entries
- Limited to r 8 entries currently
- r 8 finished after case-splitting in 2.5hrs
Of ROB Entries of terms Term formula size Prop Formula Size UCLID time (s)
2 63 398 5325 6.83
3 83 618 10248 30.23
4 103 886 18175 157.41
6 143 1534 41208 3051.79
8 183 2342 82915 gt31hrs
21Technique 3 Invariant Checking
- Deriving the inductive invariants
- Require additional (auxiliary) variables to
express invariants - Auxiliary variables do not affect system
operation - Proving that the invariants are inductive
- Automate proof of invariants in UCLID
- Eliminates need for large (often fragile) proof
script
22Restricted Invariants and Proofs
- Restricted classes of invariants
- ?x1?x2?xk ?(x1xk)
- ?(x1xk) is a CLU formula without quantifiers
- x1xk are integer variables free in ?(x1xk)
- Proving these invariants requires quantifiers
- (?x1?x2?xk ?(x1xk)) ? ?y1?y2?ym ?(y1ym)
- Automatic instantiation of x1xk with concrete
terms - Sound but incomplete method
- Reduce the quantified formula to a CLU formula
- Can use the decision procedure for CLU
23Shadow Structures
- Auxiliary variables
- Added to predict correct value of state variables
- 3 shadow variables for 3 state variables
- rob.value shdw.value
- rob.src1val shdw.src1val
- rob.src2val shdw.src2val
- Similar to McMillans approach and Arons et al.s
approach
24Adding Shadow Structures
result bus
- shdw.src1valrob.tail ? Rfisa(src1)
- shdw.src2valrob.tail ? Rfisa(src2)
- shdw.valuerob.tail ?
- ALU(Rfisa(src1), Rfisa(src2), op)
25Adding Shadow Structures
result bus
- ?robt. rob.valid(t) ? rob.value(t)
shdw.value(t) - ?robt. rob.src1valid(t) ? rob.src1val(t)
shdw.src1val(t) - ?robt. rob.src2valid(t) ? rob.src2val(t)
shdw.src2val(t)
26Refinement Maps
- Correspondence with a sequential ISA model
- OOO and ISA synchronized at dispatch
- For Register File Contents
- ?r. reg.valid(r) ? reg.val(r) Rfisa(r)
- For Program Counter
- PCooo PCisa
27Invariants
- Tag Consistency invariants (2)
- Instructions only depend on instruction preceding
in program order - Register Renaming invariants (2)
- Tag in a rename-unit should be in the ROB, and
the destination register should match - ?r.?reg.valid(r)? (rob.head ? reg.tag(r) lt
rob.tail ? rob.dest(reg.tag(r)) r ) - For any entry, the destination should have
reg.valid as false and tag should contain this or
later instruction - ?robt.(?reg.valid(rob.dest(t)) ?
- t ? reg.tag(rob.dest(t)) lt rob.tail)
28Invariants (cont.)
- Executed instructions have operands ready
- ?robt. rob.valid(t) ?
- rob.src1valid(t) ? rob.src2valid(t)
- Shadow-Value-Operands Relationship
- ?robt. shdw.value(t) Alu(shdw.src1val(t),shdw.sr
c2val(t),rob.op(t)) - Producer-Consumer Values (2)
- ?robt. ?rob.src1valid(t) ?
- shdw.src1val(t) shdw.value(rob.src1tag(t))
- Total 13 Invariants
- Includes Refinement Maps
- Constraints on Shadow Variables
29Proving Invariants
- Proved automatically
- Quantifier instantiation was sufficient in these
cases - Relieves the user of writing proof scripts to
discharge the proofs - Time spent 54s on 1.4GHz m/c
- Total effort 2 person days
- Not possible to use SVC or CVC
- Ordering between integer array indices
- ?robt. ?rob.src1valid(t) ? rob.src1tag(t) lt t
- SVC/CVC interprets terms over reals
- (x lt y1) ? (x ? y)
- Valid when x,y are integers
- Invalid when x,y are reals
30Why Quantifier Instantiation works
31Extensions to the base model
- Increase concurrency of design
- Infinite number of execution units
- Any subset of dispatch,execute,retire,nop can
be active - The same invariants were proved inductive without
any changes - Scalar ? Superscalar
- Incorporate issue width 2 and retire width 2
- Data forwarding logic of the processor gets
complicated - Same set of invariants proved automatically
- No change in the proof script !!
- Runtime increased from 54s to 134s
32Adding circular reorder buffer
- ROB modeled as a finite but arbitrary-size
circular FIFO - Tags are reused
- No dispatch when the reorder buffer is full
- Changes in the model
- Add a predicate rob.present() to indicate a rob
entry contains valid entry - Change the dispatch logic to stall when ROB full
- Modify lt to incorporate wrap-around
- Changes in proof script
- Add 1 invariant about the relationship of
rob.present and active elements of ROB - Again the proof of invariants automatic !!
33Liveness Proof
- Liveness
- Every dispatched instruction is eventually
retired - Assumes a fair scheduler
- Attempts to execute the instruction at the head
infinitely often - Proceed by a high level induction
- Not mechanical
- Similar to Hosabettu CAV98 approach
- Most lemmas required are already proved during
safety proof (in UCLID) - Concise proof
34Current Status and Future Work
- Use of decision procedure in deductive
verification - Automate proof of invariants in
micro-architecture verification with
speculation, memory instructions CMU-TR - Automate proof of invariants in verification of a
directory based cache coherence protocol with
unbounded clients and unbounded channels - Need ways to generate (some) invariants
automatically - Pnueli et al.s invisible invariant method
CAV01 - Difficult to handle unbounded data, uninterpreted
functions and ordering - Detecting convergence of such term-level models
- Would enable automatic proof of models with
finite buffers
35 36Introduction and Related Work
- Microprocessor Verification
- Finite state symbolic Model Checking,
- Berezin et al.
- Compositional Model Checking,
- McMillan et al.
- Symbolic Simulation Decision Procedure based,
- Burch Dill,
- Bryant Velev
- Theorem Proving Techniques,
- Sawada Hunt,
- Hosabettu et al.,
- Arons Pnueli
37Outline
- UCLID TOOL
- Logic and Decision Procedure
- Modeling Framework
- Verification Frameworks
- Out-Of-Order Processor Description
- Bounded Property Checking Results
- Burch-Dill Verification Results
- Invariant Checking Framework
- Shadow Structures
- Inductive Invariant Checking and Quantifiers
- Invariants Required
- Extension of the simple processor model
38Exploiting Positive Equality
- Decision Procedure exploits positive-equality
- Bryant, German, Velev , CAV99
- Extended in presence of succ, pred operations
- Bryant, Lahiri, Seshia CAV02
- Positive Equality
- Number of interpretations can be greatly reduced
- Equations appearing only under even of
negations assigned false - Except when restricted by functional consistency
- Terms compared in these equations get distinct
interpretations --- called p-terms - Identifying p-terms is a pre-processing step
39Instruction Set Architecture (ISA)
40Symbols
- ?????????????????????????????????????????????
41UCLID description
42Modeling Circular Queues
43Term-level modeling
- Abstract Bit-Vectors with Integers (Terms)
- Allow restricted set of operations
- xy, x ? y, succ(x), pred(x)
- Black-box certain combinational blocks
- Replace by uninterpreted functions
- Maintain functional consistency
f
44Example Motorola ELF Processor
- Features
- 32-bit Dual issue with 64 GPRs
- 5 stage pipeline
- Out-of-order issue, in order completion of up to
2 instructions - Load/Store unit
- 3-cycle load latency
- Fully pipelined
- Load queue for loads that miss in cache
- Store queue for retiring store instruction
- Other buffers to hide cache miss latency
- 1000 lines of UCLID model derived from 20K lines
of RTL
45Bounded Property Checking
- Compare the micro-architecture with a sequential
ISA model w.r.t. Register File, Memory and PC - ISA model synchronized at completion
dimpl
dimpl
dimpl
dimpl
dimpl
dimpl
46Quantifier Instantiation
- Prove
- (?x1?x2?xk ?(x1xk)) ? ?y1?y2?ym ?(y1ym)
- Introduce Skolem Constants (y1,,ym)
- (?x1?x2?xk ?(x1,,xk)) ? ?(y1,,ym)
- Instantiate x1,,xk with concrete terms
- Assume single-arity functions and predicates
- Let Fx f f(x) is a sub-expression of
?(x1xk) - Let Tf t f(t) is a sub-expression of
?(y1ym) - For each bound variable x, Ax t?f ? Fx and t
? Tf - Instantiate ? over Axi x Ax2 ...x Axk
- Formula size grows exponentially with the number
of bound variables
47Updating Shadow Structures
- During the dispatch of new instruction
- I ltsrc1,src2,dest,opgt
- nextshdw.value
- ?t. (t rob.tail ?
- Alu(Rfisa(src1),Rfisa(src2),op)
shdw.value(t)) - nextshdw.src1val
- ?t. (t rob.tail ?
- Rfisa(src1) shdw.src1val(t))
- nextshdw.src2val
- ?t. (t rob.tail ?
- Rfisa(src2) shdw.src2val(t))
48Adding Shadow Structures
D E C O D E
incr
Program memory
PC
49Refinement Maps
- For Register File Contents
- ?r. reg.valid(r) ? reg.val(r) Rfisa(r)
- If a register is not being modified by any
instruction in ROB, then the value matches the
ISA value - For Program Counter
- PCooo PCisa
50Invariants
valid value src1valid src1val src1tag src2valid sr
c2val src2tag dest op
0
51Burch-Dill Technique
- More automated than inductive invariant checking
- Does not require auxiliary structures,
- Far fewer invariants than invariant checking
- Only 4 invariants compared to about 12 for
inductive invariant checking approach - Invariants on initial state Qooo
- Instructions only depend on instruction preceding
in program order - Tag in a rename-unit should be in the ROB, and
the destination register should match - For any entry, the destination should have
reg.valid as false and tag should contain this or
later instruction - rob.head ? rob.tail ? rob.head r
52Invariants
- Total 13 invariants required
- Refinement map for RF and PC (2)
- Shadow structure constraints (3)
- Tag Consistency invariants (2)
- Instructions only depend on instruction preceding
in program order - Circular Register Renaming invariants (2)
- Tag in a rename-unit should be in the ROB, and
the destination register should match - ?r.?reg.valid(r)? (rob.head ? reg.tag(r) lt
rob.tail ? rob.dest(reg.tag(r)) r ) - For any entry, the destination should have
reg.valid as false and tag should contain this or
later instruction - ?robt.(?reg.valid(rob.dest(t)) ?
- t ? reg.tag(rob.dest(t)) lt rob.tail)