Title: Using First-Order Theorem Provers in Data Structure Verification
1Using First-Order Theorem Provers in Data
Structure Verification
- Charles Bouillaguet
- Ecole Normale Supérieure, Cachan, France
Viktor Kuncak Martin Rinard MIT CSAIL
2Implementing Data Structures is Hard
- Often small, but complex code
- Lots of pointers
- Unbounded, dynamic allocation
- Complex shape invariants
- Dag
- Properties involving arithmetic (ordering)
- Need strong invariants to guarantee correctness
- e.g. lookup in ordered tree needs sortedness
3How to obtain reliable data structure
implementations?
- Approach
- Prove that the program is correct
- For all program executions (sound)
- Verified properties
- Data structure operations do not crash
- Data structure invariants are preserved
- Data structure content is correctly updated
4Infrastructure
- Jahob system for verifying data structure
implementation - Kuncak, Wies, Zee, Rinard, Nguyen, Bouillaguet,
Schmitt, Marnette, Bugrara - Analyzed programs subset of Java
- Specification subset of Isabelles language
5Summary of Verified Data Structures
- Implementations of relations
- Add a binding
- Remove all bindings for a given key
- Test key membership
- Retrieve data bound to a key
- Test emptiness
- Verified implementations
- Linked list
- Ordered tree
- Hash table
6An Example Ordered Trees
- Implementation of a finite map
- Operations
- insert
- lookup
- remove
- Representation invariants
- tree shaped (acyclicity, unique parent)
- ordering constraints
keyvalue
right
left
7Sample code
- public static FuncTree update(int k, Object v,
FuncTree t) -
-
-
- FuncTree new_left, new_right Object
new_data int new_key - if (tnull)
- new_data v new_key k
- new_left null new_right null
- else
- if (k lt t.key)
- new_left update(k, v, t.left)
new_right t.right - new_key t.key new_data t.data
- else if (t.key lt k) else
- new_data v new_key k
- new_left t.left new_right
t.right -
- FuncTree r new FuncTree()
- r.left new_left r.right new_right
- r.data new_data r.key new_key
8Sample code
- public static FuncTree update(int k, Object v,
FuncTree t) - / requires "v null
- ensures "result..content t..content -
(x,y). xk (k,v) / -
- FuncTree new_left, new_right Object
new_data int new_key - if (tnull)
- new_data v new_key k
- new_left null new_right null
- else
- if (k lt t.key)
- new_left update(k, v, t.left)
new_right t.right - new_key t.key new_data t.data
- else if (t.key lt k) else
- new_data v new_key k
- new_left t.left new_right
t.right -
- FuncTree r new FuncTree()
- r.left new_left r.right new_right
- r.data new_data r.key new_key
no null dereferences
3 lines spec 30 lines code
postcondition holds and invariants preserved
9Ordered tree interface
- public ghost specvar content "(int obj) set"
"" - public static FuncTree empty_set()ensures
"result..content " - public static FuncTree add(int k, Object v,
FuncTree t)requires "v null (ALL y. (k,y)
t..content)ensures "result..content
t..content Un (k,v) - public static FuncTree update(int k, Object v,
FuncTree t)requires "v nullensures
"result..content t..content - (x,y). xk
(k,v) - public static Object lookup(int k, FuncTree t)
ensures "((k, result) t..content)
(result null (ALL v. (k,v) t..content)) - public static FuncTree remove(int k, FuncTree
t)ensures "result..content t..content -
(x,y). xk
10Representation Invariants
- public final class FuncTree private int
keyprivate Object dataprivate FuncTree left,
right - / public ghost specvar content "(int obj)
set" -
- invariant ("content definition") "this null
--gt content (key, data) Un left..content
Un right..content" -
- invariant ("null implies empty") "this null
--gt content " -
- invariant ("left children are smaller")
- "ALL k v. (k,v) left..content --gt k lt
key - invariant ("right children are bigger")
"ALL k v. (k,v) right..content --gt k gt key" - /
abstract set-valued field
tuples
implicit universal quantification over this
equality between sets
arithmetic
explicit quantification
11How could these properties be verified?
12Standard Approach
eauto intros . intuition subst . apply
Extensionality_Ensembles. unfold Same_set.
unfold Included. unfold In. unfold In in
H1. intuition. destruct H0. destruct (eq_nat_dec
x1 ArraySet_size).subst. rewrite
arraywrite_match in H0 auto. intuition. subst.
apply Union_intror. auto with sets. assert (x1 lt
ArraySet_size). omega. clear n. apply
Union_introl. rewrite arraywrite_not_same_i in
H0.unfold In. exists x1. intuition.omega.
inversion H0 subst clear H0. unfold In in
H3. destruct H3. exists x1. intuition. rewrite
arraywrite_not_same_i. intuition omega. omega.
exists ArraySet_size. intuition. inversion H3.
subst. rewrite arraywrite_match trivial.
- Transform program into a logic formula
- Using weakest precondition
- The program is correct iff the formula is valid
- Prove the formula
- Very difficult formulas interactively (Coq,
Isabelle) - Decidable classes automated (MONA, CVCL, Omega)
- This talk difficult formulas in automated way )
- low efficiency
- 1 line per grad student-minute
- parallelization looks non-trivial
13Formulas in Jahob
- Very expressive specification language
- Higher-Order features
- How to prove formulas automatically?
- Convert them to something simpler
- Decidable classes
- First-Order Logic
14Automated reasoning in Jahob
15Why FOL?
- Existing theorem provers
- SPASS, E, Vampire, Theo, Prover9,
- continuously improving (yearly competition)
- Effective on formulas with short proofs
- Handle nicely formulas with quantifiers
16HOL ? FOL
- Ideas
- avoid axiomatizing rich theories
- Translate what can naturally be expressed in FOL
- soundly approximate the rest
- Sound, incomplete approach
- Full details in long version of the paper
- (x,y) ? z.content ? Content(x,y,z)
- w.f y ?(xy ? wv) ? (x ? y ? wf(y) )
- ?x.E ?x.F??x. EF
-
17Arithmetic
- Numbers are uninterpreted constants in FOL
- Provers do not know that 112 !
- Still need to reason about arithmetic
- Our Solution
- Provide partial, incomplete axiomatization
- Still cannot deduce 112 !
- comparison between constants in formula
- Satisfactory results in practice
- ordering of elements in tree
- array bound checks
18Observation
- Most formulas are easy to prove
- ie in no measurable time
- have very short proofs (in of resolution step)
- Problem often concentrated in a small number that
take very long to prove - We applied two existing techniques to make them
easier - Eliminating type/sort information
- Filtering unnecessary assumptions
19Sort Information
- Specification language has sorts
- Integers
- Objects
- Boolean
- Translate to unsorted FOL
- ?(x Obj). P(x)
- ?
- ?x. Obj(x) ?P(x)
20Sort Information
- Encoding sort information
- bigger formulas
- longer proofs
- Formulas become harder to prove
- Temptation to omit sort information
21Effect on hard formulas
- Formulas that take more than 1s to prove, from
the Tree implementation (SPASS)
22Omitting Sorts (contd)
- Great speed-up (more than x10 sometimes) !
- However
- ? (x yS). x y
- ? (x yT). x ? y
- Satisfiable with sorts (Sa, Tb,c)
- Unsatisfiable without!
- Omitting sort guards breaks soundness!!!
- Possible workaround type-check generated proof
- When it is possible to skip type-checking ?
23Omitting Sorts Result
- We proved the following
- Theorem. Suppose that
- Sorts are pair-wise disjoint (no sub-sorting)
- Sorts have the same cardinality
- Then omitting sort guards is
- sound and complete
- This justify this useful optimization
24Assumption Filtering
- Provers get confused by too many assumptions
- Lots of useless assumptions
- Hardest shown benchmark needs 12 out of 56
- Big benchmark on average 33 necessary
- Assumption filtering
- Try to eliminate irrelevant assumptions
automatically - Give a score to assumption based on relevance
25Experimental results
26Verification effort
- Decreased as we improved the system
- functional list was easy
- a few days for trees
- two hours for simple hash table
- FOL Currently most usable method for these kind
of data structures
27Related work
- Interactive Provers Isabelle, Coq, HOL, PVS,
ACL2 - First-Order ATP
- Vampire Voronkov 04
- SPASS Weidenbach 01
- E Shultz IJCAR04
- Program Checking
- ESC/Java2 Kiniry, Chalin, Hurlin
- Krakatoa Marche, Paulin-Mohring, Urbain 03
- Spec Barnett, DeLine, Jacobs, Fähndrich,
Leino, Schulte, Venter 05 - Hob system verify set implementations (we verify
relations) - Shape analysis
- PALE - Møller and Schwartzbach PLDI01
- TVLA - Sagiv, Reps, and Wilheim TOPLAS02
- Roles - Kuncak, Lam, and Rinard POPL02
28Multiple Provers - Screenshot
29Conclusion
- Jahob verification system
- Automation by translation HOL?FOL
- omitting sorts theorem gives speedup
- filtering automates selection of assumptions
- Promising experimental results
- strong properties correct implementation
- Do not crash
- operations correctly update the content,
clarifies behavior in case of duplicate keys, - representation invariants preserved (ordering,
treeness, each element is in appropriate bucket) - relatively fast
- verification effort much smaller than using
interactive provers
30Thank you
- Formal Methods are the Future of computer
Science. - Always have been
- Always will be.
- Questions ?
31Converting to GCL
- Conditionnal statement easy
- if cond then tbranch else fbranch
- (Assume cond tbranch ) ? (Assume
!cond fbranch ) - Procedure calls
- Could inline (potentially exponential blowup)
- Desugaring (modularity)
- r CALL m(x, y, z)
- Assert (ms precondition)
- Havoc r
- Havoc vars modified by m
- Assume (ms postcondition)
32Converting to GCL (contd)
- Loops invariant required
- while / invariant / (condition) lbody
- assert invariant
- havoc vars(lbody)
- assume invariant
- ((assume condition
- lbody
- assert invariant
- assume false)
- ? (assume !condition))
invariant hold initially
no assumptions on variables except that
invariant hold
condition hold
invariant is preserved
no need to verify anything more
or condition do not hold and execution continues
33Verification condition for remove
- ((((fieldRead Pair_data null) null)
((fieldRead FuncTree_data null) null)
((fieldRead FuncTree_left null) null)
((fieldRead FuncTree_right null) null) (ALL
(xObjobj). (xObj Object)) ((Pair Int
FuncTree) null) ((Array Int FuncTree)
null) ((Array Int Pair) null) (null
Object_alloc) (pointsto Pair Pair_data Object)
(pointsto FuncTree FuncTree_data Object)
(pointsto FuncTree FuncTree_left FuncTree)
(pointsto FuncTree FuncTree_right FuncTree)
comment ''unalloc_lonely'' (ALL (xobj). ((x
Object_alloc) --gt ((ALL (yobj). ((fieldRead
Pair_data y) x)) (ALL (yobj). ((fieldRead
FuncTree_data y) x)) (ALL (yobj).
((fieldRead FuncTree_left y) x)) (ALL
(yobj). ((fieldRead FuncTree_right y) x))
((fieldRead Pair_data x) null) ((fieldRead
FuncTree_data x) null) ((fieldRead
FuncTree_left x) null) ((fieldRead
FuncTree_right x) null)))) comment
''ProcedurePrecondition'' (True comment
''FuncTree_PrivateInv content definition'' (ALL
(thisobj). (((this Object_alloc) (this
FuncTree) ((this obj) null)) --gt
((fieldRead (FuncTree_content (obj gt ((int
obj)) set)) (this obj)) ((((fieldRead
(FuncTree_key (obj gt int)) (this obj)),
(fieldRead (FuncTree_data (obj gt obj)) (this
obj))) Un (fieldRead (FuncTree_content
(obj gt ((int obj)) set)) (fieldRead
(FuncTree_left (obj gt obj)) (this obj))))
Un (fieldRead (FuncTree_content (obj gt ((int
obj)) set)) (fieldRead (FuncTree_right (obj
gt obj)) (this obj))))))) comment
''FuncTree_PrivateInv null implies empty'' (ALL
(thisobj). (((this Object_alloc) (this
FuncTree) ((this obj) null)) --gt
((fieldRead (FuncTree_content (obj gt ((int
obj)) set)) (this obj)) ))) comment
''FuncTree_PrivateInv no null data'' (ALL
(thisobj). (((this Object_alloc) (this
FuncTree) ((this obj) null)) --gt
((fieldRead (FuncTree_data (obj gt obj)) (this
obj)) null))) comment ''FuncTree_PrivateIn
v left children are smaller'' (ALL (thisobj).
(((this Object_alloc) (this FuncTree)) --gt
(ALL k. (ALL v. (((k, v) (fieldRead
(FuncTree_content (obj gt ((int obj)) set))
(fieldRead (FuncTree_left (obj gt obj)) (this
obj)))) --gt (intless k (fieldRead
(FuncTree_key (obj gt int)) (this
obj)))))))) comment ''FuncTree_PrivateInv right
children are bigger'' (ALL (thisobj). (((this
Object_alloc) (this FuncTree)) --gt (ALL k.
(ALL v. (((k, v) (fieldRead (FuncTree_content
(obj gt ((int obj)) set)) (fieldRead
(FuncTree_right (obj gt obj)) (this obj))))
--gt ((fieldRead (FuncTree_key (obj gt int))
(this obj)) lt k))))))) comment ''t_type''
(((t obj) (FuncTree obj set)) ((t
obj) (Object_alloc obj set)))) --gt ((comment
''TrueBranch'' (((t obj) null) bool) --gt
(comment ''ProcedureEndPostcondition''
((((fieldRead (FuncTree_content (obj gt ((int
obj)) set)) (null obj)) ((fieldRead
(FuncTree_content (obj gt ((int obj)) set))
(t obj)) - p. (EX x y. ((p (x, y)) (x
(k int)))))) (ALL (framedObjobj).
(((framedObj Object_alloc) (framedObj
FuncTree)) --gt ((fieldRead FuncTree_content
framedObj) (fieldRead FuncTree_content
framedObj))))) comment ''FuncTree_PrivateInv
content definition'' (ALL (thisobj). (((this
Object_alloc) (this FuncTree) ((this
obj) null)) --gt ((fieldRead (FuncTree_content
(obj gt ((int obj)) set)) (this obj))
((((fieldRead (FuncTree_key (obj gt int))
(this obj)), (fieldRead (FuncTree_data (obj
gt obj)) (this obj))) Un (fieldRead
(FuncTree_content (obj gt - And 200 more kilobytes
- Infeasible to prove directly
34Splitting heuristic
- Verification condition is big conjunction
- conjunctions in postcondition
- proving each invariant
- proving each branch in program
- Solution split VC into individual conjuncts
- Prove each conjunct separately
- Each conjunct has form
- H1 /\ /\ Hn ? Gi
- Tree.Remove has 230 such conjuncts
- How do we prove them?
35Detupling (contd)
36Handling of Fields (contd)
- We dealt with field updates
- New function expressed in terms of old one
- Base case field variables
- Natural encoding in FOL using functions
- x y.f ! x f(y)
37Future work
- Verify more examples
- balanced trees
- fancy priority queues (binomial, Fibonacci, )
- hash table with dynamic resizing
- hash function
- verify clients of data structures
- Improve assumption filtering
- take rarity of symbols into account
- check for occurring polarity