Title: Formal Processor Verification
1Decision Procedures Customized for Formal
Verification
Randal E. Bryant
Carnegie Mellon University
http//www.cs.cmu.edu/bryant
Contributions by former graduate students Sanjit
Seshia, Shuvendu Lahiri
2Outline
- Context
- Infinite state models of hardware systems
- Verification techniques
- Needs
- Requirements for decision procedures
- Dealing with quantifiers
- Our Solution
- SAT-based procedure
- Eager Boolean encoding
3Verification Example
- Task
- Verify that microprocessor correctly implements
instruction set definition - Even though heavily pipelined
Alpha 21264 Microprocessor Microprocessor Report,
Oct. 28, 1996
4Existing Hardware Verification Methods
- Simulators, equivalence checkers, model checkers,
- All Operate at Bit Level
- View each register or memory bit as state
variable - Behavior of each state variable defined by
Boolean function - Strengths
- Finite-state systems conceptually simple
- BDDs SAT procedures allow high degrees of
automation - Limitations
- State space can be very large
- Only verify fixed instantiation of system
- Specific memory sizes, number of processes,
buffer lengths,
5Verification Challenges
- Sources of Complexity
- Lots of internal state
- Complex control logic
- Opportunities
- Most of the logic serves to store, select, and
communicate data
Alpha 21264 Microprocessor Microprocessor Report,
Oct. 28, 1996
6Applying Data Abstraction to Hardware Verification
- Idea
- Abstract details of data encodings and operations
- Keep control logic precise
- Applications
- Verify overall correctness of system
- Assuming individual functional units correct
- Advantages of Abstraction
- Abstract infinite-state system easier to verify
than detailed finite-state one - Parametric representation allows verification of
many different system variants - Arbitrary number of processes, buffer lengths,
etc.
7Word Abstraction
Control Logic
- Data Abstract details of form functions
- Control Keep at bit level
- Timing Keep at cycle level
8Data Abstraction 1 Bits ? Terms
x0
x1
x2
xn-1
- View Data as Symbolic Words
- Arbitrary integers
- No assumptions about size or encoding
- Classic model for reasoning about software
- Can store in memories registers
9Modeling Data Selection
- If-Then-Else Operation
- Mulitplexor
- Allows control-dependent data flow
10Abstracting Data Bits
Control Logic
11Abstraction 2 Uninterpreted Functions
f
- For any Block that Transforms or Evaluates Data
- Replace with generic, unspecified function
- Only assumed property is functional consistency
- a x ? b y ? f (a, b) f (x, y)
12Abstracting Functions
Control Logic
Data Path
Com. Log. 1
Com. Log. 1
- For Any Block that Transforms Data
- Replace by uninterpreted function
- Ignore detailed functionality
- Conservative approximation of actual system
13Modeling Data-Dependent Control
Branch?
Cond
Adata
p
Branch Logic
Bdata
- Model by Uninterpreted Predicate
- Yields arbitrary Boolean value for each control
data combination - Produces same result when arguments match
- Pipeline reference model will branch under
same conditions
14Abstraction 3 Modeling Memories as Mutable
Functions
- Memory M Modeled as Function
- M(a) Value at location a
- Initially
- Arbitrary state
- Modeled by uninterpreted function m0
15Effect of Memory Write Operation
- Writing Transforms Memory
- M? Write(M, wa, wd)
- Reading from updated memory
- Address wa will get wd
- Otherwise get whats already in M
- Express with Lambda Notation
- M?
- ? a . ITE(a wa, wd, M(a))
16Systems with Buffers
Circular Queue
Unbounded Buffer
- Modeling Method
- Mutable function to describe buffer contents
- Integers to represent head tail pointers
- Parameterize buffer capacity with symbolic value
Max
17Some History of Term-Level Modeling
- Historically
- Standard model used for program verification
- Unbounded integer data types
- Widely used with theorem-proving approaches to
hardware verification - E.g, Hunt 85
- Automated Approaches to Hardware Verification
- Burch Dill, 95
- Tool for verifying pipelined microprocessors
- Implemented by form of symbolic simulation
- Continued application to pipelined processor
verification
18UCLID
- Seshia, Lahiri, Bryant, CAV 02
- Term-Level Verification System
- Language for describing systems
- Inspired by CMU SMV
- Symbolic simulator
- Generates integer expressions describing system
state after sequence of steps - Decision procedure
- Determines validity of formulas
- Support for multiple verification techniques
- Available by Download
- http//www.cs.cmu.edu/uclid
19Required Logic
- Scalar Data Types
- Formulas (F ) Boolean Expressions
- Control signals
- Terms (T ) Integer Expressions
- Data values
- Functional Data Types
- Functions (Fun) Integer ? Integer
- Immutable Functional units
- Mutable Memories
- Predicates (P) Integer ? Boolean
- Immutable Data-dependent control
- Mutable Bit-level memories
20CLU Logic
- Counter Arithmetic, Lambda Expressions and
Uinterpreted Functions - Terms (T ) Integer Expressions
- ITE(F, T1, T2) If-then-else
- Fun (T1, , Tk) Function application
- succ (T) Increment
- pred (T) Decrement
- Formulas (F ) Boolean Expressions
- ?F, F1 ? F2, F1 ? F2 Boolean connectives
- T1 T2 Equation
- T1 lt T2 Inequality
- P(T1, , Tk) Predicate application
21CLU Logic (Cont.)
- Functions (Fun) Integer ? Integer
- f Uninterpreted function symbol
- ? x1, , xk . T Function definition
- Predicates (P) Integer ? Boolean
- p Uninterpreted predicate symbol
- ? x1, , xk . F Predicate definition
22Outline
- Context
- Infinite state models of hardware systems
- Verification techniques
- Needs
- Requirements for decision procedures
- Dealing with quantifiers
- Our Solution
- SAT-based procedure
- Eager Boolean encoding
23Verifying Safety Properties
Bad States
Reachable States
Reset States
Reset
- State Machine Model
- State encoded as Booleans, integers, and
functions - Next state function expresses how updated on each
step - Prove System will never reach bad state
24Bounded Model Checking
Bad States
R2
- Repeatedly Perform Image Computations
- Set of all states reachable by one more state
transition - Underapproximation of Reachable State Set
- But, typically catch most bugs with 810 steps
R1
Reset States
25Implementing BMC
Satisfiable?
- Construct verification condition formula for step
n by symbolically simulating system for n cycles - Check with decision procedure
- Do as many cycles as tractable
26True Model Checking
Bad States
R2
- Impractical for Term-Level Models
- Many systems never reach fixed point
- Can keep adding elements to buffer
- Convergence test undecidable
- (Bryant, Lahiri, Seshia, CHARME 03)
R1
Reset States
- Reach Fixed-Point
- Rn Rn1 Reachable
27Inductive Invariant Checking
Bad States
Reachable States
Reset States
- Key Properties of System that Make it Operate
Correctly - Formulate as formula I
- Prove Inductive
- Holds initially I(s0)
- Preserved by all state changes I(s) ? I(?(i, s))
28Inductive Invariants
- Formulas I1, , In
- Ij(s0) holds for any initial state s0, for 1 ? j
? n - I1(s) ? I2(s) ? ? In(s) ? Ij(s? ) for any
current state s and successor state s? for 1 ? j
? n - Overall Correctness
- Follows by induction on time
- Restricted form of invariants
- ?x1?x2?xk ?(x1xk)
- ?(x1xk) is a CLU formula without quantifiers
- x1xk are integer variables free in ?(x1xk)
- Express properties that hold for all buffer
indices, register IDs, etc.
29Proving Invariants
- Proving invariants inductive requires quantifiers
- ?x1?x2?xk ?(x1xk) ? ?y1?y2?ym ?(y1ym)
- Prove unsatisfiability of formula
- ?x1?x2?xk ?(x1xk) ? ??(y1ym)
- Undecidable Problem
- In logic with uninterpreted functions and equality
30Invariant CheckingOut-of-Order Processor Designs
base exc exc / br exc / br / mem-simp exc / br / mem
Total Invariants 13 34 39 67 71
UCLID time 54 s 236 s 403 s 1594 s 2200 s
Person time 2 days 7 days 9 days 24 days 34 days
- Generating invariants requires considerable human
effort - Impractical for realistic designs
31Constructing Invariants from Predicates
Predicates
rob.head ? reg.tag(r)
Invariant
?r,t.?reg.valid(r) ? reg.tag(r) t ?
(rob.head ? reg.tag(r) lt rob.tail ?
rob.dest(t) r )
reg.valid(r)
Result Correctness
reg.tag(r) t
rob.dest(t) r
32Automatic Predicate Abstraction
- Graf Saïdi, CAV 97
- Idea
- Given set of predicates P1(s), , Pk(s)
- Boolean formulas describing properties of system
state - View as abstraction mapping States ? 0,1k
- Defines abstract FSM over state set 0,1k
- Form of abstract interpretation
- Do reachability analysis similar to symbolic
model checking - Early Implementations Inefficient
- Guess at possible next abstract states
- Test with call to decision procedure
33P.E. as Invariant Generator
- Reach Fixed-Point on Abstract System
- Termination guaranteed, since finite state
- Equivalent to Computing Invariant for Concrete
System - Strongest possible invariant that can be
expressed by formula over these predicates
Abstract System
34Symbolic Formulation of Predicate Abstraction
Lahiri, Bryant, Cook, CAV 03
- Basic Operation
- Compute set of legal abstract next states ??(B?)
given current abstract states ?(B) - B, B? Abstract current and next-state state
variables - ?, ?? Boolean formulas
- Create formula of form ?(S,B?)
- Possible combinations of current concrete state S
and next abstract state B? - Formulate as Quantifier Elimination Problem
- Generate formula of form ??(B?) ? ? S
?(S,B?) - S Integer variables
- For interpretation of B?, formula ?? true iff
?(S,B?) satisfiable
35Outline
- Context
- Infinite state models of hardware systems
- Verification techniques
- Needs
- Requirements for decision procedures
- Dealing with quantifiers
- Our Solution
- SAT-based procedure
- Eager Boolean encoding
36Decision Procedure Needs
- Bounded Model Checking
- Satisfiability of quantifier-free CLU formula
- Handled by decision procedure
- Invariant Checking
- Satisfiability of quantified CLU formula
- Undecidable
- Predicate Abstraction
- Eliminate quantifiers from CLU formula
- Role of Decision Procedure
- Apply in sound, but incomplete way
37UCLID Decision Procedure Operation
CLU Formula
Lambda Expansion
?-free Formula
- Series of transformations leading to
propositional formula - Except for lambda expansion, each has polynomial
complexity
Function Predicate Elimination
Term Formula
Finite Instantiation
Boolean Formula
Boolean Satisfiability
38SAT-based Decision Procedures
39Eager Encoding Characteristics
- Must encode all information about domain
properties into Boolean formula - Some properties can give exponential blowup
- Lets SAT solver do all of the work
- Good Approach for Some Domains
- Modern SAT solvers have remarkable capacity
- Good at extracting relevant portions out of very
large formulas - Learns about formula properties as search proceeds
40Advances in Eager SAT Encodings
- Per-constraint encoding of EUF (Equality
Uninterp. Functs.) - Goel, et al., CAV 98
- Exploit polarity structure of equations
- Bryant, German, Velev, CAV 99
- Reduce variable ranges in small-domain encodings
- Pnueli, Rodeh, Shtrichman, Siegel, CAV 99
- Sparse encoding of transitivity constraints
- Bryant, Velev, CAV 00
- Select encoding method using criteria trained by
machine learning - Lahiri, Seshia, Bryant, DAC 03
- Exploit sparseness in linear constraints
- Seshia, Bryant, LICS 04
41Encoding Methods
Difference Logic Formula
Small Domain Encoding (SD)
Per-Constraint Encoding (PC)
42Small Domain Encoding (SD)
Bryant, Lahiri, Seshia, CAV02
x ? y ? y ? z ? z ? x1
?0x1x0? ? ?0y1y0? ? ?0y1y0? ? ?0z1z0? ? ?0z1z0? ?
?0x1x0?1
- Observation
- To check satisfiability, need to consider all
possible relative orderings of finitely-many
expressions
- Can use Boolean encoding of finite range of
values - 4 values in this case, so 2-bit encoding
43Per-Constraint Encoding (PC)
Strichman, Seshia, Bryant, CAV02
x ? y ? y ? z ? z ? x1
44Size of Boolean Encoding SD better than PC
- Let N be size of original difference logic
formula - Size of a directed acyclic graph representation
- SD encoding size is worst-case O(N2)
- PC encoding size is worst-case O(2N)
- Can generate O(2N) transitivity constraints
45Impact on SAT problem SD vs PC
- Experimentally compared zChaff performance on SD
and PC encodings of several unsatisfiable
formulas - Sample result
Method Boolean variables CNF Clauses Conflict Clauses zChaff Time (sec)
PC 57211 169387 150 0.56
SD 23112 67699 15811 21.63
PC better than SD for zChaff
46How to Choose Encoding
- Hybrid Strategy
- Partition variables into classes
- Which ones are compared to each other
- For each class, choose encoding method
- PC except SD when PC blows up
- How to Determine Whether PC Will Work
- Try to predict based on formula characteristics
- Number of constraints, density,
- Selection procedure trained by machine learning
47Some Lessons Weve Learned About Decision
Procedures
- Preserve Boolean Structure
- Other approaches require collapsing to
conjunctions of predicates (or extracting them
dynamically) - Exploit Problem Characteristics
- Sparseness
- Polarity structure
- Let SAT Solver Do the Work
- Eager encoding provide sufficient set of
constraints to prove / disprove formula - They are good at digesting large volume of
information
48Invariant Checking Revisited
- Prove Unsatisfiability of Formula
- ?x1?x2?xk ?(x1xk) ? ??(y1ym)
- General Form ?X ?(X) ? ??(Y)
- Quantifier Instantiation
- Generate expressions E1(Y), , En(Y)
- Using terms that appear in Q
- Expand as ?(E1(Y)) ? ? ?(En(Y)) ? ??(Y)
- If unsatisfiable, then so is quantified formula
- Sound, but incomplete
- Trade-off
- Be clever about instantiation, or
- Instantiate many terms and rely on decision
procedure capacity
49Predicate Abstraction Revisited
- Formulate as Quantifier Elimination Problem
- Generate formula of form ??(B?) ? ? S
?(S,B?) - S Integer variables
- Use Eager SAT Encoding of ?
- Get formula ? A P(A,B?)
- A Boolean variables
- Satisfying solutions for P w.r.t. B? same as
those for ? - Core problem of symbolic model checking
50Quantifier Elimination for P.A.
- Formula ? A P(A,B?)
- A Boolean variables
- Typically 200 variables for A, 20 for B
- BDD-Based
- Use partitioning techniques developed for
symbolic model checking - Typically too many total Boolean variables
- SAT Enumeration
- Find satisfying solution ?(A) ? ?(B?) to P
- Enumerate solution ?(B?)
- Reformulate P as P ? ??(B?)
- Performance about 1000 solutions / second
51Why Verification Tasks Feasible
- CLU Logic Fairly Simple
- Equality, uninterpreted functions, difference
constraints - Small model property
- Deep Reasoning Not Required
- Formulas large and messy, but straightforward
- Verifying systems that are designed to have
constrained behaviors - Only checking effect of a few cycles of system
operation
52Decision Procedures Revisited
- SAT-Based Approaches Effective
- Good performance as decision procedures
- Key to implementing predicate abstraction
- Quantifier elimination
- Eager Encoding Gives Good Performance
- Avoids many iterations of theory-specific
checkers - Extends to linear integer arithmetic
- Seshia Bryant, LICS 04
- Quantifier-free Presburger
- Small domain encoding exploiting sparseness
53Areas of Research
- Bit-Vector Decision Procedures
- True model for hardware low-level software
- Bit-field extraction
- Bit-wise Boolean operations
- Overflow effects
- Automatically apply abstractions
- Abstract to symbolic terms whenever possible
- Boolean Quantifier Elimination
- SAT enumeration still not good enough
- Limits predicate abstraction to 25 predicates
- Core problem for symbolic model checking
54More Research
- Proof Generation
- Hard to see how to generate unsatisfiability
proof for CLU formula - Debugging Support
- Bounded model checking provide counterexample
trace - Invariant checking hard to determine why
invariant fails - And may be due to weakness in quantifier
instantiation - Predicate abstraction Gets nowhere without right
set of predicates - Proving Liveness
- Current abstractions do not preserve liveness
properties - Can help in proving progress invariant
55