Title: Introduction to Abstract Interpretation
1Introduction to Abstract Interpretation
- Neil Kettle, Andy King and Axel Simon
- a.m.king_at_kent.ac.uk
- http//www.cs.kent.ac.uk/amk
- Acknowledgments much of this material has been
adapted from surveys by Patrick and Radia Cousot
2Applications of abstract interpretation
- Verification can a concurrent program deadlock?
Is termination assured? - Parallelisation are two or more tasks
independent? What is the worst/base-case running
time of function? - Transformation can a definition be unfolded?
Will unfolding terminate? - Implementation can an operation be specialised
with knowledge of its (global) calling context? - Applications and players are incredibly diverse
3House-keeping
4Computing Lab Xmas Party
- Located in Origins the restaurant in Darwin
- A buffer lunch will be served courtesy of the
department - Department will supply some wine (which last year
lasted 10 minutes) - Bar will be open afterwards if some wine is not
enough wine - Send an e-mail to Deborah Sowrey
D.J.Sowery_at_kent.ac.uk if you want to attend - Come along and meet other post-grads
5Casting out nines algorithm
- Which of the following multiplications are
correct - 2173 ? 38 81574 or
- 2173 ? 38 82574
- Casting out nines is a checking technique that is
really a form of abstract interpretation - Sum the digits in the multiplicand n1, multiplier
n2 and the product n to obtain s1, s2 and s. - Divide s1, s2 and s by 9 to compute the
remainder, that is, r1 s1 mod 9, r2 s2 mod 9
and r s mod 9. - If (r1 ? r2) mod 9 ? r then multiplication is
incorrect - The algorithm returns incorrect or dont know
6Running the numbers for 2173 ? 38 81574
- Compute r1 (2173) mod 9
- Compute r2 (38) mod 9
- Calculate (r1 ? r2) mod 9
- Calculate r (81574) mod 9
- Check ((r1 ? r2) mod 9 r)
- Deduce that 2173 ? 38 81574 is
7Abstract interpretation is a theory of
relationships
- The computational domain for multiplication
(concrete domain) - N the set of non-negative integers
- The computational domain of remainders used in
the checking algorithm (abstract domain) - R 0, 1, , 8
- Key question is what is the relationship between
an element n?N which is used in the real
algorithm and its analog r?R in the check
8What is the relationship?
- When multiplicand is n1 456, say, then the
check uses r1 (456) mod 9 4 - Observe that
- 456 mod 9
- (4100 56) mod 9
- (490 410 56) mod 9
- (410 56) mod 9
- ((4 5)10 6) mod 9
- ((4 5)9 (4 5) 6) mod 9
- (4 5 6) mod 9
- More generally, induction can show r1 n1 mod 9
and r2 n2 mod 9
9Correctness is the preservation of relationships
- The check simulates the concrete multiplication
and, in effect, is an abstract multiplication - Concrete multiplication is n n1 ? n2
- Abstract multiplication is r (r1 ? r2) mod 9
- Where r1 describes n1 and r2 describes n2
- For brevity, write r ? n iff r n mod 9
- Then abstract multiplication preserves ? iff
whenever r1 ? n1 and r2 ? n2 it follows that r ? n
10Correctness argument
- Suppose r1 ? n1 and r2 ? n2
- If
- n n1 ? n2 then
- n mod 9 (n1 ? n2) mod 9 hence
- n mod 9 ((n1 mod 9) ? (n2 mod 9)) mod 9 whence
- n mod 9 (r1 ? r2) mod 9 r therefore
- r ? n
- Consequently if ?(r ? n) then n ? n1 ? n2
11Summary
- Formalise the relationship between the data
- Check that the relationship is preserved by the
abstract analogues of the concrete operations - The relational framework Acta Informatica,
30(2)103-129,1993 not only emphases the theory
of relations but is very general
12Numeric approximation and widening
- Abstract interpretation does not require a domain
to be finite
13Interval approximation
- Consider the following Pascal-like program
- SYNTOX PLDI90 inferred the invariants scoped
within - Invariants occur between consecutive lines in the
program - i?0,15 asserts 0?i?15 whereas i?0,0 means i0
begin i 0 1 i?0,0 while (i
lt 16) do 2 i?0,15
i i 1 3 i?1,16 end
4 i?16,16
14Compilation versus (classic) interpretation
- Abstract compilation compile the concrete
program into an abstract program (equation
system) and execute the abstract program - good separation of concerns that aids debugging
- the particulars of the domain can be exploited to
reorder operations, specialise operations, etc - Abstract interpretation run the concrete
program but on-the-fly interpret its concrete
operations as abstract operations - ideal for a generic framework (toolkit) which is
parameterised by abstract domain plugins
15Abstract domain that is used in interval analysis
- Domain of intervals includes
- l,u where l ? u and l,u ? Z for bounded sets ie
0, 5?0,1,4 since 0,1,4 ? 0, 5 - ? to represent the empty set of numbers, that is,
? ? ? - l,? for sets which are bounded below such as
l,l2,l4, - -?,u to represent sets which are bounded above
such as ..,l-5,l-3,l
16Weakening intervals
if then 1 i?0,2 else 2
i?3,5 endif 3 i?0,5
- Join (path merge) is defined
- Put d1?d2 d1 if d2 ?
- d2 else if d1 ?
- min(l1,l2), max(u1,u2)
otherwise - whenever d1 l1,u1 and d2 l2,u2
17Strengthening intervals
- Meet is defined
- Put d1?d2 ? if (d1 ?) ? (d2 ?)
- max(l1,l2), min(u1,u2) otherwise
- whenever d1 l1,u1 and d2 l2,u2
3 i?0,5 if (2 lt i) then 4 i?3,5
else 5 i?0,2
18Meet and join are the basic primitives for
compilation
- I1 0,0 since program point (1) immediately
follows the i 0 - I2 (I1? I3) ? -?, 15 since
- control from program points (1) and (3) flow
into (2) - point (2) is reached only if i lt 16 holds
- I3 n1 n ? I2 since (3) is only reachable
from (2) via the increment - I4 (I1? I3) ? 16, ? since
- control from (1) and (3) flow into (4)
- point (4) is reached only if ?(i lt 16) holds
19Interval iteration
20Jacobi versus Gauss-Seidel iteration
- With Jacobi, the new vector ?I1,I2,I3,I4? of
intervals is calculated from the old
?I1,I2,I3,I4? - With Gauss-Seidel iteration
- I1 is calculated from ?I1,I2,I3,I4?
- I2 is calculated from ?I1,I2,I3,I4?
- I3 is calculated from ?I1,I2,I3,I4?
- I4 is calculated from ?I1,I2,I3,I4?
21Gauss-Seidel versus chaotic iteration
- Observe that I4 might change if either I1 or I3
change, hence evaluate I4 after I1 and I3
stabilise - Suggests that wait until stability is achieved at
one level before starting on the next
I1
I2
I1
I4
I3
I4
I2, I3
22Gauss-Seidel versus chaotic iteration
- Chaotic iteration can postpone evaluating Ii for
bounded number of iterations - I1 is calculated from ?I1,-,-,-?
- I2 and I3 are calculated Gauss-Seidel style
from ?I1,I2,I3,-? - I4 is calculated from ?I1,I2,I3,I4?
- Fast and (incremental) fixpoint solvers TOPLAS
22(2)187-223,2000 apply chaotic iteration
23Research challenge
- Compiling to equations and iteration is
well-understood (albeit not well-known) - The implicit assumption is that source is
available - With the advent of component and multi-linguistic
programming, the problem is how to generate the
equations from - A specification of the algorithm or the API
- The types of the algorithm or component
- In the interim, environments with support for
modularity either - Equip the programmer with an equation language
- Or make worst-case assumptions about behaviour
24Suppose i was decremented rather than incremented
begin i 0 1 i?0,0 while (i
lt 16) do 2 i?-?,0
i i -1 3 i?-?,-1 end
4 i??
- I1 0,0
- I2 (I1? I3) ? -?, 15
- I3 n-1 n ? I2
- I4 (I1? I3) ? 16, ?
25Ascending chain condition
- A domain D is ACC iff it does not contain an
infinite strictly increasing chain d1ltd2ltd3lt
where dltd iff d?d and d?d (see below) - The interval domain D is ordered by
- ? ? d forall d?D and
- l1,u1 ? l2,u2 iff l2?l1?u1?u2
- and is not ACC since 0,0lt-1,0lt-2,0lt
T
-4 3 2 1 0 1 2 3 4
?
26Some very expressive relational domains are ACC
- The sub-expression elimination relies on
detecting duplicated expression evaluation - Karr Acta Informatica, 6, 133-151 noticed that
detecting an invariance such as - y x/2 7 was key to this optimisation
begin x sin(a) 2 y sin(a)
7 end
27The affine domain
- The domain of affine equations over n variables
is - D ?A,B?A is m?n dimensional matrix and
- B is m dimensional column vector
-
- D is ordered by
- ?A1,B1???A2,B2? iff (if A1xB1 then A2xB2)
28Pre-orders versus posets
- A pre-order ?D, ?? is a set D ordered by a binary
relation ? such that - If d?d for all d?D
- If d1?d2 and d2?d3 then d1?d3
- A poset is pre-order ?D, ?? such that
- If d1?d2 and d2?d3 then d1?d3
29The affine domain is a pre-order (so it is not
ACC)
- Observe ?A1,B1???A2,B2? but ?A2,B2???A1,B1?
- A1 B1 A2 B2
- To build a poset from a pre-order
- define d?d iff d?d and d?d
- define d? d?Dd?d and D? d?d?D
- define d? ? d? iff d?d
- The poset ?D?, ?? is ACC since chain length is
bounded by the number of variables n
30Inducing termination for non-ACC (and huge ACC)
domains
- Enforce convergence for intervals with a widening
operator ?D?D ? D - ??d d
- d?? d
- l1,u1 ? l2,u2 if l2ltl1 then -? else l1,
- if u1ltu2 then ? else u1
- Examples
- 1,2?1,2 1,2
- 1,2?1,3 1,? but 1,3?1,2 1,3
- Safe since li,ui?(l1,u1?l2,u2) for i?1,2
31Chaotic iteration with widening
- To terminate it is necessary to traverse each
loop a finite number of times - It is sufficient to pass through I2 or I3 a
finite number of times Bourdoncle, 1990 - Thus widen at I3 since it is simpler
I1
I2
I3
I4
32Termination for the decrement
- I1 0,0
- I2 (I1? I3) ? -?, 15
- I3 I3?n-1 n ? I2 note the fix
- I4 (I1? I3) ? 16, ?
- When I2 -1,0 and I3 -1,0, then
- I3?n1 n ? I2 -1,0 ? -2,-1 -?,0
33Widening dynamic data-structures
cons
cons
cons
or
or
or
or
0
nil
cons
0
1
nil
begin i 0 p nil while (i
lt 16) do i i 1 p new
cons(i, p) 1p?cons(i, cons(0,nil))
end
cons
0
2
nil
1
or
or
0
nil
cons
0
1
nil
0
nil
34Depth-2 versus type-graph widening
cons
cons
or
or
or
or
cons
0
2
nil
1
0
2
nil
1
any
any
- Type-graph widening is more compact
- Type-graph widening becomes difficult when a list
contains lists as its elements - In constraint-based analysis, widening is
dispensed with altogether
35(Malicious) research challenge
- Read a survey paper to find an abstract domain
that is ACC but has a maximal chain length of
O(2n) - Construct a program with O(n) symbols that
iterates through all O(2n) abstractions - Publish the program in IPL
36Not all numeric domains are convex
- A set S?Rn is convex iff for all x,y?S it follows
that ?x (1-?)y 0???1 ? S - The 2 leftmost sets in R2 are convex but the 2
rightmost sets are not.
37Are intervals or affine equations convex?
- Suppose the values of n variables are represented
by n intervals l1,u1,,ln,un - Suppose x?x1,,xn?, y?y1,,yn??Rn are described
by the intervals - Then each li?xi?ui and each li?yi?ui u
- Let 0???1 and observe z ?x (1-?)y ??x1
(1-?)y1, , ?xn (1-?)yn? - Therefore li?min(xi, yi) ? ?xi (1-?)yi ?
max(xi, yi)?ui and convexity follows
38Arithmetic congruences are not convex
- Elements of the arithmetic congruence (AC) domain
take the form x 2y 1 (mod 3) which describes
integral values of x and y - More exactly, the AC domain consists of
conjunctions of equations of the form - c1x1cmxm (c mod n) where ci,c?Z and n?N
- Incredibly AC is ACC IJCM, 30, 165--190, 1989
39Research challenge
- Søndergaard FSTTCS,95 introduced the concept of
an immediate fixpoint - Consider the following (groundness) dependency
equations over the domain of Boolean functions
?Bool, ?, ?? - f1 x ? (y ? z)
- f2 ?t(?x(?z(u ? (t?x) ? v ? (t?z) ? f4)))
- f3 ?u (?v(x ? u ? z ? v ? f2))
- f4 f1? f3
- Where ?x(f) fx ?true?fx ?false thus ?x(x?y)
true and ?x(x?y) y
40The alternative tactic
- The standard tactic is to apply iteration
- Søndergaard found that the system can be solved
symbolically (like a quadratic) - This would be very useful for infinite domains
for improved precision and predictability
41Combining analyses
- Verifiers and optimisers are often multi-pass,
built from several separate analyses - Should the analyses be performed in parallel or
in sequence? - Analyses can interact to improve one another
(problem is in the complexity of the interaction
Pratt)
42Pruning combined domains
- Suppose that ?1? D1?C and ?2?D2?C, then how is
DD1?D2 interpreted? - Then ?d1,d2??c iff d1?1c ? d2?2c
- Ideally, many ?d1,d2??D will be redundant, that
is, ??c?C . c?1d1?c?2d2
43Time versus precision from TOPLAS
17(1)28--44,1993
44The Galois framework
- Abstract interpretation is often presented in
terms of Galois connections
45Lattices a prelude to Galois connections
- Suppose ?S, ?? is a poset
- A mapping ?S?S?S is a join (least upper bound)
iff - a?b is an upper bound of a and b, that is, a?a?b
and b?a?b for all a,b?S - a?b is the least upper bound, that is, if c?S is
an upper bound of a and b, then a?b?c - The definition of the meet ?S?S?S (the greatest
lower bound) is analogous
46Complete lattices
- A lattice ?S, ?, ?, ?? is a poset ?S, ?? equipped
with a join ? and a meet ? - The join concept can often be lifted to sets by
defining ??(S)?S iff - t?(?T) for all T?S and for all t?T
- if t?s for all t?T then (?T)?s
- If meet can often be lifted analogously, then the
lattice is complete - A lattice that contains a finite number of
elements is always complete
47A lattice that is not complete
- A hyperplane in 2-d space in a line and in 3-d
space is a plane - A hyperplane in Rn is any space that can be
defined by x?Rn c1x1cnxn c where
c1,,cn,c?R - A halfspace in Rn is any space that can be
defined by x?Rn c1x1cnxn ? c - A polyhedron is the intersection of a finite
number of half-spaces
48Examples and non-examples in planar space
49Join for polyhedra
- Join of polyhedra P1 and P2 in Rn coincides (with
the topological closure) of the convex hull of
P1?P2
50The join of an infinite set of polyhedra
- Consider the following infinite chain of regular
polyhedra - The only space that contains all these polyhedra
is a circle yet this is not polyhedral
51?A, ?, C, ?? is Galois connection whenever
- ?A, ?A? and ?C, ?C? are complete lattices
- The mappings ?C?A and ?A?C are monotonic, that
is, - If c1 ?C c2 then ?(c1) ?A ?(c2)
- If a1 ?A a2 then ?(a1) ?C ?(a2)
- The compositions ???A?A and ???C?C are
extensive and reductive respectively, that is, - c ?C (???)(c) for all c?C
- (???)(a) ?A a for all a?A
52A classic Galois connection example
- The concrete domain ?C,?C,?C,?C? is ??(Z),?,?,??
- The abstract domain ?A,?A,?A,?A? where
- A ?,,-,T
- ? ?A a ?AT for all a?A
- join ?A and meet ?A are defined by
53The relationship between A and C
- The concretisation mapping ?A?C is defined
- ?(?) Ø
- ?() n?Z n gt 0
- ?(-) n?Z n lt 0
- ?(T) Z
- The abstraction mapping ?C?A is defined
- ?(S) ? if S Ø
- ?(S) else if n gt 0 for all n?S
- ?(S) - else if n lt 0 for all n?S
- ?(S) Z otherwise
54Avoiding repetition
- Can define ? with ? and vice versa
- ?(S) ?Aa?A S ? ?(a)
- And dually ?(a) ?S?Z ?(S) ?A a
- As an example consider ?(1,2)
- 1,2 ? ?(T) ?
- 1,2 ? ?() ?
- 1,2 ? ?(-) ?
- 1,2 ? ?(?) ?
- Therefore ?(1,2) ?A, T
55Collecting domains and semantics
- Observe that C is not that concrete programs
include operations such as Z?Z?Z - C?(Z) is collecting domain which is easier to
abstract than Z since it already a lattice - To abstract Z?Z?Z, say, we synthesise a
collecting version C?(Z)??(Z)??(Z) and then
abstract that - Put S1 C S2 n1n2 n1? S1 and n2 ? S2
56Safety and optimality requirements
- Safety requires ?(?(a1)C?(a2)) ?C a1 A a2 for
all a1,a2?A - Optimality POPL,269282,1979 also requires a1
A a2 ?C ?(?(a1)C?(a2)) - Arguing optimality is harder than safety since
rare-case approximation can simplify a tricky
argument JLP
57Abstract multiplication
- Consider safety for ?(?()C?()) ?C A
- Recall ?() n?Z n gt 0
- Thus ?()C?() n1n2 n1n2 gt 0
- Hence ?(?()C?()) ?C A
- Need A ?C ?(?()C?()) for optimality
- Recall ?(?()C?()) ?C A
- Hence ?(?()C?()) ? ?,
- But ?() ? Ø, thus ?()C?() ? Ø
- Therefore ?(?()C?()) ? ?
58Exotic applications of abstract interpretation
- Recovering programmer intentions for
understanding undocumented or third-party code - Verifying that a buffer-over cannot occur, or
pin-pointing where one might occur in a C program - Inferring the environment in which is a system of
synchronising agents will not deadlock - Lower-bound time-complexity analysis for
granularity throttling - Binding-time analysis for inferring off-line
unfolding decisions which avoid code-bloat
59Pointers to the literature
- SAS, POPL, ESOP, ICLP, ICFP,
- Useful review articles and books
- Patrick and Radhia Cousot, Comparing the Galois
connection and Widening/Narrowing approaches to
Abstract Interpretation, PLILP, LNCS 631,
269-295, 1992. Available from LIX library. - Patrick and Radhia Cousot, Abstract
interpretation and Application to Logic Programs,
JLP, 13(2-3)103-179, 1992 - Flemming Neilson, Hanne Riis Neilson and Chris
Hankin, Principles of Program Analysis, Springer,
1999. - Patrick has a database of abstract interpretation
researchers and regularly writes tutorials, see,
CC02.
60Appendix SAT solving
- SAT is not a form of abstract interpretation but
abstraction and abstract interpretation is often
used to reduce a verification problem to a
satisfiability checking problem - Acknowledgments much of this material is adapted
from the review article, The Quest for Efficient
Boolean Satisfiability Solvers by Zhang and
Malik, 2002.
61The SAT problem
- Given an arbitrary prepositional formula, f say,
does there exist a variable assignment (a model)
under which f evaluates to true - One model for f (x?y) is ?x?true, y?true
- SAT is the stereotypic NP-complete problem but
this does not preclude the existence of efficient
SAT algorithms for certain SAT instances - Stålmarck US Patent N527689,1995 and
applications in AI planning, software
verification, circuit testing have promoted a
resurgence of interest in SAT
62The other type of completeness
- A SAT algorithm is said to be complete iff (given
enough resource) it will either - compute a satisfying variable assignment or
- verify that no such assignment exists
- A SAT algorithm is incomplete (stochastic) iff
unsatisfiability cannot always be detected - Trade incompleteness for speed when a solution is
very likely to exist (planning applications). - In program verification (partial) correctness
often follows by proving unsatisfiability
63The Davis-Logemann-Loveland (DPLL) approach
- 1st generation solvers such as POSIT, 2cl, CSAT,
etc based on PDLL as are the 2nd generation
solvers such as SATO and zChaff which tune PDLL - Davis and Putman JACM,7201215,1960 proposed
resolution for Boolean SAT DLL
CACM,5394397,1962 replaced resolution with
search to improve memory usage (special case) - CNF used to simplify unsatisfiability checking
conversion is polynomial JSC,2,293304, 1986 - CNF is a conjunction of clauses, for example,
(x?y) (x?y)?(y?x) (x??y)?(?x?y)
64The Davis-Logemann-Loveland (PDLL) algorithm
bool function DPLL(f, ?) begin ?fail, ??
unit(f, ?) if (fail) return false if
(satisfied(f, ?)) return true else if
(unsatisfied(f, ?)) return false else
begin let x ? var(f)-var(?)
if (DPLL(f, ??x?true)) return
true else return DPLL(f,
??x?false) end end end
- unit applies unit propagation, possibly detecting
unsatisfiability - satisfied returns true if one literal in each
clause is true - unsatisfied return false if there exists one
clause with every literal false - non-determinacy is in the choice of variable
- stack for search
65Unit propagation
- Unit clause rule if all the literals but one are
false, then the remainder is set to true - Many SAT solvers use a counter scheme Crawford,
AAAI, 1993 that uses - One counter per clause to track the number of
false literals in each clause - If a count reaches the total number of literals,
then unsatisfiability has been detected - Otherwise if it one less then remaining literal
is set - Each assignment updates many counts and pointer
bases scheme are used within SATO and zChaff Gu
et al, DIMACS series DMTCS, 1997
66Choices, choices
- If variables remain uninstantiated after
propagation, then resort to random binding - Better to rank variables by the number of times
they occur in clauses which are not (yet) true - But a variable in 128 clauses each with 2
uninstantiated variables is a better candidate
than another in 128 clauses each with 32
uninstantiated variables - But what about the overhead of ranking especially
with learnt clauses - But what about trailing for backtracking
- But what about intelligent back-jumping