Title: Shape Analysis via 3-Valued Logic
1Shape Analysisvia 3-Valued Logic
- Mooly Sagiv
- Tel Aviv University
http//www.cs.tau.ac.il/msagiv/toplas02.ps www.cs
.tau.ac.il/tvla
2Topics
- A new abstract domain for static analysis
- Abstract dynamically allocated memory
- TVLA A system for generating abstract
interpreters - Applications
3Motivation
- Dynamically allocated storage and pointers are
essential programming tools - Object oriented
- Modularity
- Data structure
- But
- Error prone
- Inefficient
- Static analysis can be very useful here
4A Pathological C Program
a malloc() b a free (a) c malloc
() if (b c) printf(unexpected equality)
5Dereference of NULL pointers
- typedef struct element
- int value
- struct element next
- Elements
bool search(int value, Elements c) Elements
elemfor (elem c c ! NULL
elem elem-gtnext) if (elem-gtval
value) return TRUE return FALSE
6Dereference of NULL pointers
- typedef struct element
- int value
- struct element next
- Elements
bool search(int value, Elements c) Elements
elemfor (elem c c ! NULL
elem elem-gtnext) if (elem-gtval
value) return TRUE return FALSE
potential null de-reference
7Memory leakage
typedef struct element int value struct
element next Elements
- Elements reverse(Elements c)
- Elements h,gh NULLwhile (c! NULL) g
c-gtnext h c c-gtnext h c
g return h
8Memory leakage
typedef struct element int value struct
element next Elements
- Elements reverse(Elements c)
- Elements h,gh NULLwhile (c! NULL) g
c-gtnext h c c-gtnext h c
g return h
leakage of address pointed-by h
9Memory leakage
typedef struct element int value struct
element next Elements
- Elements reverse(Elements c)
- Elements h,gh NULLwhile (c! NULL) g
c-gtnext h c c-gtnext h c
g return h
? No memory leaks
10Example List Creation
typedef struct node int val struct
node next List
List create () List x, t x NULL while ()
do t malloc() t ?nextx x
t return x
? No null dereferences
? No memory leaks
? Returns acyclic list
11Example Collecting Interpretation
12Example Abstract Interpretation
13Challenge 1 - Memory Allocation
- The number of allocated objects/threads is not
known - Concrete state space is infinite
- How to guarantee termination?
14Challenge 2 - Destructive Updates
- The program manipulates states using destructive
updates - e ? next t
- Hard to define concrete interpretation
- Harder to define abstract interpretation
15Challenge 2 - Destructive Update
Unsound ?
16Challenge 2 - Destructive Update
Imprecise ?
17Challenge 3 Re-establishing Data Structure
Invariants
- Data-structure invariants typically only hold at
the beginning and end of ADT operations - Need to verify that data-structure invariants are
re-established
18Challenge 3 Re-establishing Data Structure
Invariants
- rotate(List first, List last)
- if ( first ! NULL)
- last ? next first
- first first ? next
- last last ? next
- last ? next NULL
-
19Plan
- Concrete interpretation
- Canonical abstraction
- Abstract interpretation using canonical
abstraction - The TVLA system
20Traditional Heap Interpretation
- States Two level stores
- Env Var ? Values
- fields Loc ? Values
- ValuesLoc ?Atoms
- Example
- Env x ? 30, p ? 79
- next 30 ?40, 40 ? 50, 50 ?79, 79 ? 90
- val 30 ?1, 40 ? 2, 50 ?3, 79 ? 4, 90 ?5
21Predicate Logic
- Vocabulary
- A finite set of predicate symbols Peach with a
fixed arity - Logical Structures S provide meaning for
predicates - A set of individuals (nodes) U
- pS (US)k ? 0, 1
- FOTC over TC,????? express logical structure
properties
22Representing Stores as Logical Structures
- Locations ? Individuals
- Program variables ? Unary predicates
- Fields ? Binary predicates
- Example
- U u1, u2, u3, u4, u5
- x u1, p u3
- n ltu1, u2gt, ltu2, u3gt, ltu3, u4gt, ltu4, u5gt
23Formal Semantics of First Order Formulae
- For a structure SltUS, pSgt
- Formulae ? with LVar free variables
- Assignment z LVar?US
- ???S(z) 0, 1
?1?S(z)1
?0?S(z)0
?p (v1, v2, , vk)?S(z)pS (z(v1), z(v2), ,
z(vk))
24Formal Semantics of First Order Formulae
- For a structure SltUS, pSgt
- Formulae ? with LVar free variables
- Assignment z LVar?US
- ???S(z) 0, 1
??1??2?S(z)max (??1 ?S(z), ??2 ?S(z))
??1??2?S(z)min (??1 ?S(z), ??2 ?S(z))
???1?S(z)1- ??1 ?S(z)
??v ?1?S(z)max ??1 ?S(zv?u) u ? US
25Formal Semantics of Transitive Closure
- For a structure SltUS, pSgt
- Formulae ? with LVar free variables
- Assignment z LVar?US
- ???S(z) 0, 1
?p(v1, v2)?S(z) max u1, ..., uk ? U,
Z(v1)u1, Z(v2)uk min1 ? i lt k
pS(ui, ui1)
26Concrete Interpretation Rules
Statement Update formula
x NULL x(v) 0
x malloc() x(v) IsNew(v)
xy x(v) y(v)
xy ?next x(v) ?w y(w) ? n(w, v)
x ?nexty n(v, w) (?x(v)? n(v, w)) ? (x(v) ? y(w))
27Invariants
- No memory leaks?v ?x ?PVar ?w x(w) ? n(w,
v) - Acyclic list(x)?v, w x(v) ? n(v, w) ? ?n(w,
v) - Reverse (x)?v, w, r x(v) ? n(v, w) ?
n(w, r) ? n(r, w)
28Why use logical structures?
- Naturally model pointers and dynamic allocation
- No a priori bound on number of locations
- Use formulas to express semantics
- Indirect store updates using quantifiers
- Can model other features
- Concurrency
- Abstract fields
29Why use logical structures?
- Behaves well under abstraction
- Enables automatic construction of abstract
interpreters from concrete interpretation rules
(TVLA)
30Collecting Interpretation
- The set of reachable logical structures in every
program point - Statements operate on sets of logical structures
- Cannot be directly computed for programs with
unbounded store and loops
x NULL while () do t malloc()
t ?nextx x t
empty
31Plan
- Concrete interpretation
- Canonical abstraction
- TVLA
32Canonical Abstraction
- Convert logical structures of unbounded size into
bounded size - Guarantees that number of logical structures in
every program is finite - Every first-order formula can be conservatively
interpreted
33Kleene Three-Valued Logic
- 1 True
- 0 False
- 1/2 Unknown
- A join semi-lattice 0 ? 1 1/2
Logical order
34Boolean Connectives Kleene
353-Valued Logical Structures
- A set of individuals (nodes) U
- Predicate meaning
- pS (US)k ? 0, 1, 1/2
36Canonical Abstraction
- Partition the individuals into equivalence
classes based on the values of their unary
predicates - Every individual is mapped into its equivalence
class - Collapse predicates via ?
- pS (u1, ..., uk) ? pB (u1, ..., uk)
f(u1)u1, ..., f(uk)uk) - At most 2A abstract individuals
37Canonical Abstraction
x NULL while () do t malloc()
t ?nextx x t
u1
u2
u3
u1
u2,3
x
t
38Canonical Abstraction
x NULL while () do t malloc()
t ?nextx x t
n
n
u2
u1
u3
x
t
39Canonical Abstraction and Equality
- Summary nodes may represent more than one
element - (In)equality need not be preserved under
abstraction - Explicitly record equality
- Summary nodes are nodes with eq(u, u)1/2
40Canonical Abstraction and Equality
eq
eq
eq
x NULL while () do t malloc()
t ?nextx x t
n
n
eq
u1
u2
u3
eq
x
t
eq
eq
eq
eq
n
u2,3
u1
u2,3
x
t
n
41Canonical Abstraction
x NULL while () do t malloc()
t ?nextx x t
n
n
u1
u2
u3
x
t
42Challenges Heap ConcurrencyYahav POPL01
- Concurrency with the heap is evil
- Java threads are just heap allocated objects
- Data and control are strongly related
- Thread-scheduling info may require understanding
of heap structure (e.g., scheduling queue) - Heap analysis requires information about thread
scheduling
Thread t1 new Thread() Thread t2 new
Thread() t t1 t.start()
43Configurations Example
held_by
atl_C
atl_1
rvalmyLock
rvalmyLock
blocked
atl_1
atl_0
atl_0
rvalmyLock
l_0 while (true) l_1 synchronized(myLock)
l_C // critical actions l_2 l_3
44Concrete Configuration
held_by
atl_1
atl_C
rvalmyLock
blocked
rvalmyLock
atl_1
atl_0
atl_0
rvalmyLock
45Abstract Configuration
held_by
blocked
atl_C
atl_1
rvalmyLock
rvalmyLock
atl_0
46 Examples Verified
Program Property
twoLock Q No interference No memory leaks Partial correctness
Producer/consumer No interference No memory leaks
Apprentice Challenge Counter increasing
Dining philosophers with resource ordering Absence of deadlock
Mutex Mutual exclusion
Web Server No interference
47Summary
- Canonical abstraction guarantees finite number of
structures - The concrete location of an object plays no
significance - But what is the significance of 3-valued logic?
48Topics
- Embedding
- Instrumentation
- Abstract Interpretation
- Extensions
49Embedding
50Embedding
- B ?f S
- onto function f
- pB(u1, .., uk) ? pS (f(u1), ..., f(uk))
- S is a tight embedding of B with respect to f if
- pS(u1, .., uk) ?pB (u1 ..., uk) f(u1)u1,
..., f(uk)uk - Canonical Abstraction is a tight embedding
51Embedding (cont)
- S1 ?f S2 ? every concrete state represented by S1
is also represented by S2 - The set of nodes in S1 and S2 may be different
- No meaning for node names (abstract locations)
- ?(S) S 2-valued structure S, S ?f S
52Embedding Theorem
- Assume B ?f S, pB(u1, .., uk) ? pS
(f(u1), ..., f(uk)) - Then every formula ? is preserved
- If ??? 1 in S, then ??? 1 in B
- If ??? 0 in S, then ??? 0 in B
- If ??? 1/2 in S, then ??? could be 0 or 1 in B
53Embedding Theorem
- For every formula ? is preserved
- If ??? 1 in S, then ??? 1 for all B??(S)
- If ??? 0 in S, then ??? 0 for all B??(S)
- If ??? 1/2 in S, then ??? could be 0 or 1 in
?(S)
54Challenge 2 - Destructive Update
x
n
p
y
y?next NULL
n(v, w) ?y(v)? n(v, w)
Sound ?
55Challenge 2 - Destructive Update
x
n
p
y
y?next NULL
n(v, w) ? y(v)? n(v, w)
Sound ?
56Embedding Theorem
?v x(v)
1Yes
?v x(v)?t(v)
1Yes
?v x(v)?y(v)
0No
?v,w x(v)?n(v, w)
½Maybe
?v, w x(v)?n(v, w) ?n(v, w)
0No
?v,w x(v) ? n(v,w) ? n(w, w)
1/2Maybe
57Summary
- The embedding theorem eliminates the need for
proving near commutavity - Guarantees soundness
- Applied to arbitrary logics
- But can be imprecise
58Limitations
- Information on summary nodes is lost
- Leads to useless verification
59Increasing Precision
- User (Programming Language) supplied global
invariants - Naturally expressed in FOTC
- Record extra information in the concrete
interpretation - Tune the abstraction
- Refine concretization
60Cyclicity predicate
cx() ?v1,v2 x(v1) ? n(v1,v2) ? n(v2, v2)
cx()0
u1
u2
un
x
n
n
n
t
n
u2..n
u1
x
cx()0
t
n
61Cyclicity predicate
cx() ?v1,v2 x(v1) ? n(v1,v2) ? n(v2, v2)
n
cx()1
u1
u2
un
x
n
n
n
t
n
u2..n
u1
x
cx()1
t
n
62Heap Sharing predicate
is(v) ?v1,v2 n(v1,v) ? n(v2,v) ? v1 ? v2
is(v)0
is(v)0
is(v)0
u1
u2
un
x
n
n
n
t
n
u2..n
u1
x
t
n
is(v)0
is(v)0
63Heap Sharing predicate
is(v) ?v1,v2 n(v1,v) ? n(v2,v) ? v1 ? v2
is(v)0
is(v)1
is(v)0
u1
u2
un
x
n
n
n
t
n
64Concrete Interpretation Rules
Statement Update formula
x NULL x(v) 0
x malloc() x(v) IsNew(v)
xy x(v) y(v)
xy ?next x(v) ?w y(w) ? n(w, v)
x ?nextNULL n(v, w) ?x(v)? n(v, w) is(v) is(v) ? ?v1, v2 n(v1, v) ?n(v2, v) ? ?x(v1) ? ?x(v2) ? ?eq(v1, v2)
65Reachability predicate
tn(v1, v2) n(v1,v2)
u2
u1
un
x
n
n
n
t
n
u2..n
u1
x
t
n
66Additional Instrumentation predicates
- reachable-from-variable-x(v)
- cfb(v) ?v1 f(v, v1) ?b(v1, v)
- tree(v)
- dag(v)
- inOrder(v) ?v1 n(v, v1) ? dle(v,v1)
- Weakest Precondition Ramalingam PLDI 02
67Instrumentation (Summary)
- Refines the abstraction
- Adds global invariants
- But requires update-formulas (generated
automatically in TVLA2
is(v) ?v1,v2 n(v1,v) ? n(v2,v) ? v1 ? v2
is(v) ? ?v1,v2 n(v1,v) ? n(v2,v) ? v1 ? v2
?(S)S S ? ?, S ?f S
68Plan
- Embedding Theorem
- Instrumentation
- Abstract interpretation using canonical
abstraction - TVLA
69Best Conservative Interpretation (CC79)
70Best Transformer (x x ? n)
inverse embedding
71Focus- Based Transformer (x x ? n)
x
y
inverse embedding
canonic abstraction
72Focus-Based Transformer (x x ? n)
x
y
73Semantic Reduction
- Improve the precision by recovering properties of
the program semantics - A Galois connection (L1, ?, ?, L2)
- An operation opL2?L2 is a semantic reduction
- ?l?L2 op(l)?l
- ?(op(l)) ?(l)
- Can be applied before and after basic operations
74Three Valued Logic Analysis (TVLA)T. Lev-Ami
R. Manevich
- Input (FOTC)
- Concrete interpretation rules
- Definition of instrumentation predicates
- Definition of safety properties
- First Order Transition System (TVP)
- Output
- Warnings (text)
- The 3-valued structure at every node (invariants)
75Null Dereferences
bool search( int value, Element ?x) Element
? c x while ( x ! NULL ) if (c? val
value) return TRUE c c ? n return
FALSE
typedef struct element int value struct
element ?n Element
Demo
40
76TVLA inputs
- TVP - Three Valued Program
- Predicate declaration
- Action definitions SOS
- Control flow graph
- TVS - Three Valued Structure
Demo
77Challenge 1
- Write a C procedure on which TVLA reports false
null dereference
78Proving Correctness of Sorting Implementations
(Lev-Ami, Reps, S, Wilhelm ISSTA 2000)
- Partial correctness
- The elements are sorted
- The list is a permutation of the original list
- Termination
- At every loop iterations the set of elements
reachable from the head is decreased
79Example InsertSort
List InsertSort(List x) List r, pr, rn, l,
pl r x pr NULL while (r ! NULL)
l x rn r ? n pl NULL while
(l ! r) if (l ? data gt r ? data)
pr ? n rn r ? n l
if (pl NULL) x r else pl ? n
r r pr break
pl l l l ? n
pr r r rn return x
typedef struct list_cell int data
struct list_cell n List
pred.tvp
actions.tvp
Run Demo
80Example InsertSort
List InsertSort(List x) if (x NULL)
return NULL pr x r x-gtn while (r !
NULL) pl x rn r-gtn l x-gtn while (l
! r) pr-gtn rn r-gtn
l pl-gtn r r pr
break pl l l
l-gtn pr r r rn
typedef struct list_cell int data
struct list_cell n List
Run Demo
14
81Example Reverse
typedef struct list_cell int data
struct list_cell n List
List reverse (List x) List y, t y
NULL while (x ! NULL) t y
y x x x ? next y ? next
t return y
Run Demo
82Challenge
- Write a sorting C procedure on which TVLA fails
to prove sortedness or permutation
83Example Mark and Sweep
void Sweep() unexplored Universe
collected ? while (unexplored ? ?) x
SelectAndRemove(unexplored) if (x ? marked)
collected collected ? x
assert(collected Universe
Reachset(root) )
void Mark(Node root) if (root ! NULL)
pending ? pending pending ? root
marked ? while (pending ? ?)
x SelectAndRemove(pending) marked
marked ? x t x ? left if (t
? NULL) if (t ? marked)
pending pending ? t t x ? right
if (t ? NULL) if (t ? marked)
pending pending ? t
assert(marked Reachset(root))
pred.tvp
Run Demo
84Challenge 2
- Use TVLA to show termination of markAndSweep
85Verification of Safety Properties(PLDI02, 04)
- The Canvas Project (with IBM Watson)
- (Component Annotation, Verification and Stuff)
Component a library with cleanly encapsulated
state
Client a program that uses the library
- Lightweight Specification
- "correct usage" rules a client must follow
- "call open() before read()"
Certification does the client program satisfy the
lightweight specification?
86Prototype Implementation
- Applied to several example programs
- Up to 5000 lines of Java
- Used to verify
- Absence of concurrent modification exception
- JDBC API conformance
- IOStreams API conformance
87(No Transcript)
88(No Transcript)
89(No Transcript)
90Scaling
- Staged analysis
- Controlled complexity
- More coarse abstractions Manevich SAS04
- Handle libraries
- Use procedure specificationsYorsh, TACAS04
- Decision procedures for linked data
structuresImmerman, CAV04, Lev-Ami, CADE05 - Handling procedures
- Compute procedure summaries Jeannet, SAS04
- Local heaps Rinetzky, POPL05
91Local heaps Rinetzky, POPL05
call p(x)
y
g
t
92Why is Heap Analysis Difficult?
- Destructive updating through pointers
- p?next q
- Produces complicated aliasing relationships
- Track aliasing on 3-valued structures
- Dynamic storage allocation
- No bound on the size of run-time data structures
- Canonical abstraction ? finite-sized 3-valued
structures - Data-structure invariants typically only hold at
the beginning and end of operations - Need to verify that data-structure invariants are
re-established - Query the 3-valued structures that arise at the
exit
93Summary
- Canonical abstraction is powerful
- Intuitive
- Adapts to the property of interest
- Used to verify interesting program properties
- Very few false alarms
- But scaling is an issue
94Summary
- Effective Abstract Interpretation
- Always terminates
- Precise enough
- But still expensive
- Can model
- Heap
- Unbounded arrays
- Concurrency
- More instrumentation can mean more efficient
- But canonic abstraction is limited
- Correlation between list lengths
- Arithmetic
- Partial heaps
95Summary
- The embedding theorem eliminates the need for
proving near commutavity - Guarantees soundness
- Applied to arbitrary logics
- But can be imprecise