Using First-Order Theorem Provers in Data Structure Verification - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Using First-Order Theorem Provers in Data Structure Verification

Description:

SPASS, E, Vampire, Theo, Prover9, ... continuously improving (yearly competition) ... filtering. take rarity of symbols into account. check for occurring ... – PowerPoint PPT presentation

Number of Views:160

Avg rating:3.0/5.0

Slides: 38

Provided by: diE8

Category:

more less

Transcript and Presenter's Notes

Title: Using First-Order Theorem Provers in Data Structure Verification

1
Using First-Order Theorem Provers in Data
Structure Verification

Charles Bouillaguet
Ecole Normale Supérieure, Cachan, France

Viktor Kuncak Martin Rinard MIT CSAIL
2
Implementing Data Structures is Hard

Often small, but complex code
Lots of pointers
Unbounded, dynamic allocation
Complex shape invariants
Dag
Properties involving arithmetic (ordering)
Need strong invariants to guarantee correctness
e.g. lookup in ordered tree needs sortedness

3
How to obtain reliable data structure
implementations?

Approach
Prove that the program is correct
For all program executions (sound)
Verified properties
Data structure operations do not crash
Data structure invariants are preserved
Data structure content is correctly updated

4
Infrastructure

Jahob system for verifying data structure
implementation
Kuncak, Wies, Zee, Rinard, Nguyen, Bouillaguet,
Schmitt, Marnette, Bugrara
Analyzed programs subset of Java
Specification subset of Isabelles language

5
Summary of Verified Data Structures

Implementations of relations
Add a binding
Remove all bindings for a given key
Test key membership
Retrieve data bound to a key
Test emptiness
Verified implementations
Linked list
Ordered tree
Hash table

6
An Example Ordered Trees

Implementation of a finite map
Operations
insert
lookup
remove
Representation invariants
tree shaped (acyclicity, unique parent)
ordering constraints

keyvalue
right
left
7
Sample code

public static FuncTree update(int k, Object v,
FuncTree t)
FuncTree new_left, new_right Object
new_data int new_key
if (tnull)
new_data v new_key k
new_left null new_right null
else
if (k lt t.key)
new_left update(k, v, t.left)
new_right t.right
new_key t.key new_data t.data
else if (t.key lt k) else
new_data v new_key k
new_left t.left new_right
t.right
FuncTree r new FuncTree()
r.left new_left r.right new_right
r.data new_data r.key new_key

8
Sample code

public static FuncTree update(int k, Object v,
FuncTree t)
/ requires "v null
ensures "result..content t..content -
(x,y). xk (k,v) /
FuncTree new_left, new_right Object
new_data int new_key
if (tnull)
new_data v new_key k
new_left null new_right null
else
if (k lt t.key)
new_left update(k, v, t.left)
new_right t.right
new_key t.key new_data t.data
else if (t.key lt k) else
new_data v new_key k
new_left t.left new_right
t.right
FuncTree r new FuncTree()
r.left new_left r.right new_right
r.data new_data r.key new_key

no null dereferences
3 lines spec 30 lines code
postcondition holds and invariants preserved
9
Ordered tree interface

public ghost specvar content "(int obj) set"
""
public static FuncTree empty_set()ensures
"result..content "
public static FuncTree add(int k, Object v,
FuncTree t)requires "v null (ALL y. (k,y)
t..content)ensures "result..content
t..content Un (k,v)
public static FuncTree update(int k, Object v,
FuncTree t)requires "v nullensures
"result..content t..content - (x,y). xk
(k,v)
public static Object lookup(int k, FuncTree t)
ensures "((k, result) t..content)
(result null (ALL v. (k,v) t..content))
public static FuncTree remove(int k, FuncTree
t)ensures "result..content t..content -
(x,y). xk

10
Representation Invariants

public final class FuncTree private int
keyprivate Object dataprivate FuncTree left,
right
/ public ghost specvar content "(int obj)
set"
invariant ("content definition") "this null
--gt content (key, data) Un left..content
Un right..content"
invariant ("null implies empty") "this null
--gt content "
invariant ("left children are smaller")
"ALL k v. (k,v) left..content --gt k lt
key
invariant ("right children are bigger")
"ALL k v. (k,v) right..content --gt k gt key"
/

abstract set-valued field
tuples
implicit universal quantification over this
equality between sets
arithmetic
explicit quantification
11
How could these properties be verified?
12
Standard Approach
eauto intros . intuition subst . apply
Extensionality_Ensembles. unfold Same_set.
unfold Included. unfold In. unfold In in
H1. intuition. destruct H0. destruct (eq_nat_dec
x1 ArraySet_size).subst. rewrite
arraywrite_match in H0 auto. intuition. subst.
apply Union_intror. auto with sets. assert (x1 lt
ArraySet_size). omega. clear n. apply
Union_introl. rewrite arraywrite_not_same_i in
H0.unfold In. exists x1. intuition.omega.
inversion H0 subst clear H0. unfold In in
H3. destruct H3. exists x1. intuition. rewrite
arraywrite_not_same_i. intuition omega. omega.
exists ArraySet_size. intuition. inversion H3.
subst. rewrite arraywrite_match trivial.

Transform program into a logic formula
Using weakest precondition
The program is correct iff the formula is valid
Prove the formula
Very difficult formulas interactively (Coq,
Isabelle)
Decidable classes automated (MONA, CVCL, Omega)
This talk difficult formulas in automated way )

low efficiency
1 line per grad student-minute
parallelization looks non-trivial

13
Formulas in Jahob

Very expressive specification language
Higher-Order features
How to prove formulas automatically?
Convert them to something simpler
Decidable classes
First-Order Logic

14
Automated reasoning in Jahob
15
Why FOL?

Existing theorem provers
SPASS, E, Vampire, Theo, Prover9,
continuously improving (yearly competition)
Effective on formulas with short proofs
Handle nicely formulas with quantifiers

16
HOL ? FOL

Ideas
avoid axiomatizing rich theories
Translate what can naturally be expressed in FOL
soundly approximate the rest
Sound, incomplete approach
Full details in long version of the paper
(x,y) ? z.content ? Content(x,y,z)
w.f y ?(xy ? wv) ? (x ? y ? wf(y) )
?x.E ?x.F??x. EF

17
Arithmetic

Numbers are uninterpreted constants in FOL
Provers do not know that 112 !
Still need to reason about arithmetic
Our Solution
Provide partial, incomplete axiomatization
Still cannot deduce 112 !
comparison between constants in formula
Satisfactory results in practice
ordering of elements in tree
array bound checks

18
Observation

Most formulas are easy to prove
ie in no measurable time
have very short proofs (in of resolution step)
Problem often concentrated in a small number that
take very long to prove
We applied two existing techniques to make them
easier
Eliminating type/sort information
Filtering unnecessary assumptions

19
Sort Information

Specification language has sorts
Integers
Objects
Boolean
Translate to unsorted FOL
?(x Obj). P(x)
?
?x. Obj(x) ?P(x)

20
Sort Information

Encoding sort information
bigger formulas
longer proofs
Formulas become harder to prove
Temptation to omit sort information

21
Effect on hard formulas

Formulas that take more than 1s to prove, from
the Tree implementation (SPASS)

22
Omitting Sorts (contd)

Great speed-up (more than x10 sometimes) !
However
? (x yS). x y
? (x yT). x ? y
Satisfiable with sorts (Sa, Tb,c)
Unsatisfiable without!
Omitting sort guards breaks soundness!!!
Possible workaround type-check generated proof
When it is possible to skip type-checking ?

23
Omitting Sorts Result

We proved the following
Theorem. Suppose that
Sorts are pair-wise disjoint (no sub-sorting)
Sorts have the same cardinality
Then omitting sort guards is
sound and complete
This justify this useful optimization

24
Assumption Filtering

Provers get confused by too many assumptions
Lots of useless assumptions
Hardest shown benchmark needs 12 out of 56
Big benchmark on average 33 necessary
Assumption filtering
Try to eliminate irrelevant assumptions
automatically
Give a score to assumption based on relevance

25
Experimental results
26
Verification effort

Decreased as we improved the system
functional list was easy
a few days for trees
two hours for simple hash table
FOL Currently most usable method for these kind
of data structures

27
Related work

Interactive Provers Isabelle, Coq, HOL, PVS,
ACL2
First-Order ATP
Vampire Voronkov 04
SPASS Weidenbach 01
E Shultz IJCAR04
Program Checking
ESC/Java2 Kiniry, Chalin, Hurlin
Krakatoa Marche, Paulin-Mohring, Urbain 03
Spec Barnett, DeLine, Jacobs, Fähndrich,
Leino, Schulte, Venter 05
Hob system verify set implementations (we verify
relations)
Shape analysis
PALE - Møller and Schwartzbach PLDI01
TVLA - Sagiv, Reps, and Wilheim TOPLAS02
Roles - Kuncak, Lam, and Rinard POPL02

28
Multiple Provers - Screenshot
29
Conclusion

Jahob verification system
Automation by translation HOL?FOL
omitting sorts theorem gives speedup
filtering automates selection of assumptions
Promising experimental results
strong properties correct implementation
Do not crash
operations correctly update the content,
clarifies behavior in case of duplicate keys,
representation invariants preserved (ordering,
treeness, each element is in appropriate bucket)
relatively fast
verification effort much smaller than using
interactive provers

30
Thank you

Formal Methods are the Future of computer
Science.
Always have been
Always will be.
Questions ?

31
Converting to GCL

Conditionnal statement easy
if cond then tbranch else fbranch
(Assume cond tbranch ) ? (Assume
!cond fbranch )
Procedure calls
Could inline (potentially exponential blowup)
Desugaring (modularity)
r CALL m(x, y, z)
Assert (ms precondition)
Havoc r
Havoc vars modified by m
Assume (ms postcondition)

32
Converting to GCL (contd)

Loops invariant required
while / invariant / (condition) lbody
assert invariant
havoc vars(lbody)
assume invariant
((assume condition
lbody
assert invariant
assume false)
? (assume !condition))

invariant hold initially
no assumptions on variables except that
invariant hold
condition hold
invariant is preserved
no need to verify anything more
or condition do not hold and execution continues
33
Verification condition for remove

((((fieldRead Pair_data null) null)
((fieldRead FuncTree_data null) null)
((fieldRead FuncTree_left null) null)
((fieldRead FuncTree_right null) null) (ALL
(xObjobj). (xObj Object)) ((Pair Int
FuncTree) null) ((Array Int FuncTree)
null) ((Array Int Pair) null) (null
Object_alloc) (pointsto Pair Pair_data Object)
(pointsto FuncTree FuncTree_data Object)
(pointsto FuncTree FuncTree_left FuncTree)
(pointsto FuncTree FuncTree_right FuncTree)
comment ''unalloc_lonely'' (ALL (xobj). ((x
Object_alloc) --gt ((ALL (yobj). ((fieldRead
Pair_data y) x)) (ALL (yobj). ((fieldRead
FuncTree_data y) x)) (ALL (yobj).
((fieldRead FuncTree_left y) x)) (ALL
(yobj). ((fieldRead FuncTree_right y) x))
((fieldRead Pair_data x) null) ((fieldRead
FuncTree_data x) null) ((fieldRead
FuncTree_left x) null) ((fieldRead
FuncTree_right x) null)))) comment
''ProcedurePrecondition'' (True comment
''FuncTree_PrivateInv content definition'' (ALL
(thisobj). (((this Object_alloc) (this
FuncTree) ((this obj) null)) --gt
((fieldRead (FuncTree_content (obj gt ((int
obj)) set)) (this obj)) ((((fieldRead
(FuncTree_key (obj gt int)) (this obj)),
(fieldRead (FuncTree_data (obj gt obj)) (this
obj))) Un (fieldRead (FuncTree_content
(obj gt ((int obj)) set)) (fieldRead
(FuncTree_left (obj gt obj)) (this obj))))
Un (fieldRead (FuncTree_content (obj gt ((int
obj)) set)) (fieldRead (FuncTree_right (obj
gt obj)) (this obj))))))) comment
''FuncTree_PrivateInv null implies empty'' (ALL
(thisobj). (((this Object_alloc) (this
FuncTree) ((this obj) null)) --gt
((fieldRead (FuncTree_content (obj gt ((int
obj)) set)) (this obj)) ))) comment
''FuncTree_PrivateInv no null data'' (ALL
(thisobj). (((this Object_alloc) (this
FuncTree) ((this obj) null)) --gt
((fieldRead (FuncTree_data (obj gt obj)) (this
obj)) null))) comment ''FuncTree_PrivateIn
v left children are smaller'' (ALL (thisobj).
(((this Object_alloc) (this FuncTree)) --gt
(ALL k. (ALL v. (((k, v) (fieldRead
(FuncTree_content (obj gt ((int obj)) set))
(fieldRead (FuncTree_left (obj gt obj)) (this
obj)))) --gt (intless k (fieldRead
(FuncTree_key (obj gt int)) (this
obj)))))))) comment ''FuncTree_PrivateInv right
children are bigger'' (ALL (thisobj). (((this
Object_alloc) (this FuncTree)) --gt (ALL k.
(ALL v. (((k, v) (fieldRead (FuncTree_content
(obj gt ((int obj)) set)) (fieldRead
(FuncTree_right (obj gt obj)) (this obj))))
--gt ((fieldRead (FuncTree_key (obj gt int))
(this obj)) lt k))))))) comment ''t_type''
(((t obj) (FuncTree obj set)) ((t
obj) (Object_alloc obj set)))) --gt ((comment
''TrueBranch'' (((t obj) null) bool) --gt
(comment ''ProcedureEndPostcondition''
((((fieldRead (FuncTree_content (obj gt ((int
obj)) set)) (null obj)) ((fieldRead
(FuncTree_content (obj gt ((int obj)) set))
(t obj)) - p. (EX x y. ((p (x, y)) (x
(k int)))))) (ALL (framedObjobj).
(((framedObj Object_alloc) (framedObj
FuncTree)) --gt ((fieldRead FuncTree_content
framedObj) (fieldRead FuncTree_content
framedObj))))) comment ''FuncTree_PrivateInv
content definition'' (ALL (thisobj). (((this
Object_alloc) (this FuncTree) ((this
obj) null)) --gt ((fieldRead (FuncTree_content
(obj gt ((int obj)) set)) (this obj))
((((fieldRead (FuncTree_key (obj gt int))
(this obj)), (fieldRead (FuncTree_data (obj
gt obj)) (this obj))) Un (fieldRead
(FuncTree_content (obj gt
And 200 more kilobytes
Infeasible to prove directly