Title: Abstraction and Approximation via Abstract Interpretation:
1Abstraction and Approximation via Abstract
Interpretation
Abstract interpretation
- a systematic approach to program analysis
- and verification
- Giorgio Levi
- Dipartimento di Informatica, Università di Pisa
- levi_at_di.unipi.it
- http//www.di.unipi.it/levi.html
2Abstraction and approximation
Abstract interpretation
- two relevant concepts in several areas of
computer science (and engineering)
- to reason about complex systems
- to make reasoning computationally feasible
3Abstract Interpretation(Cousot Cousot, POPL 77
79)
Abstract interpretation
- a 20-years old technique to systematically handle
abstraction and approximation
- born to describe (and prove correct) static
analyses (for imperative programs) - popular mainly in declarative paradigms
- viewed today as a general technique to reason
about semantics at different levels of
abstraction - successfully applied to distributed and mobile
systems and to model checking - recently applied to program verification
4Abstract Interpretation, Semantics, Analysis
Algorithms
Abstract interpretation
- how abstract interpretation is often used in
static program analysis
- a semantics
- an analysis algorithm developed by ad-hoc
techniques - the A.I. Theory (definition of an abstract
domain) is used to prove that the algorithm is
correct, i.e., that its results are an
approximation of the property to be analyzed
5Abstract Interpretation, Semantics, Analysis
Algorithms
Abstract interpretation
- the abstract interpretation I like
- a semantics
- an abstract domain designed to model the property
to be analyzed - the A.I. Theory is used to systematically derive
the abstract semantics - the analysis algorithm is exactly the computation
of the abstract semantics and is correct by
construction
6Abstract InterpretationTheory in 4 Steps
Abstract interpretation
- concrete and abstract domain
- the Galois insertion
- abstract operations
- from the concrete to the abstract semantics
7Concrete and Abstract Domains
Abstract interpretation
- two complete partial orders
- the partial orders reflect precision
- smaller is better
- concrete domain (C, ?)
- C has the structure of a powerset
- abstract domain (A, ?)
- each abstract value is a description of a set
of concrete values
8The Sign Abstract Domain
Abstract Domain
- concrete domain (P(Z), ?)
- sets of integers
- abstract domain (Sign, ?)
9Galois insertions
Galois insertion
- (C, ?), (A, ?)
- ? A ? C (concretization)
- ? C ? A (abstraction)
- ? , ? monotonic
- ?x?C. x ? ?(?(x))
- ?y?A. ?(?(y)) y
- ? , ? mutually determine each other
10The sign example
Galois insertion
- ?sign (y) glb of
- bot , if y ?
- - , if y ? yylt0
- 0- , if y ? yy?0
- 0 , if y 0
- 0 , if y ? yy ? 0
- , if y ? yygt0
- top , if y ? Z
- ?sign (x)
- ?, if x bot
- yygt0, if x
- yy?0, if x 0
- 0, if x 0
- yy?0, if x 0-
- yylt0, if x -
- Z, if x top
11Abstract Operations
Abstract operations
- the concrete semantic evaluation function is
defined in terms of primitive semantic operations
fi on C - for each fi we need to provide a corresponding
fi? defined on A
- fia must be locally correct, i.e. ?x1,..,xn ?C.
fi
(x1,..,xn) ? ?(fi? (?(x1),..,?(xn))) - the optimal (most precise) abstract operator is
fi? (y1,..,yn) ?(fi (?(y1),.., ?(yn))) - the operator is complete (precise) if ?x1,..,xn
?C. ?(fi (x1,..,xn)) fi? (?(x1),.., ?(xn)))
12Times Sign
Abstract operations
13Plus Sign
Abstract operations
14The Sign example
Abstract operations
- Times and Plus are the usual operations lifted to
P(Z) - both Timessign and Plussign are optimal (hence
correct) - Timessign is also complete (no approximation)
- Plussign is necessarily incomplete
- ?sign(Times(2,-3))
Timessign(?sign(2),?sign(-3)) - ?sign(Plus(2,-3)) ? Plussign(?sign(2),?si
gn(-3))
15The Abstract Semantics
Abstract semantics
- F concrete semantic evaluation function
- if we start from a standard semantic definition,
the lifting to the powerset (collecting
semantics) is simply a conceptual operation - lfp F concrete semantics
- F? abstract semantic evaluation function
- obtained by replacing in F every concrete
semantic operation by a corresponding (locally
correct) abstract operation - lfp F? abstract semantics
- global correctness
? (lfp F) lfp F? - the abstract semantics is less precise than the
abstraction of the concrete semantics
16Where does the approximation come from?
Abstract semantics
- incomplete abstract operations
- more execution paths in the abstract control flow
- the abstract state has not enough information to
make deterministic choices - conditionals, pattern matching, etc.
- the set of resulting abstract states is turned
into a single abstract state, by performing an
abstract lub operation
17Approximation in abstract Sign computations
Abstract semantics
- abstract state x
- if xgt2 then y3 else y-5
- the abstract guard can be both true and false
- both paths need to be abstractly evaluated
- the two resulting abstract states are merged by
performing a lub in Sign - abstract state x,ytop
- concrete state x3
- if xgt2 then y3 else y-5
- concrete state x3, y3
18a (lfp F) ? lfp F?why computing lfp F??
Abstract interpretation
- lfp F cannot be computed in finitely many steps
- ? steps are in general required
- lfp F? can be computed in finitely many steps, if
the abstract domain is finite or at least
noetherian - no infinite increasing chains
- static analysis 1
- noetherian abstract domain
- termination, approximation
- static analysis 2
- non-noetherian domain
- termination via widening
- further approximation
- comparative semantics
- non-noetherian domain
- abstraction without approximation (completeness)
? (lfp F) lfp F?
19Static Analysis
Static Analysis
- abstract domain and Galois connection to model
the property - (possibly optimal) correct abstract operations
- F?
- the analysis is the computation of lfp F?
- if the abstract domain is non-noetherian, or
- if the complexity of lfp F? is too high
- use a widening operator
- which effectively computes an (upper)
approximation of lfp F? - one example later
20Comparative Semantics ? (lfp F) lfp F?
Semantics
- hierarchy of transition systems semantics (P.
Cousot, MFPS 97) - trace, big-step operational, denotational,
relational, predicate transformer, axiomatic,
etc. - systematic reconstruction of several fixpoint
(TP-like) semantics for (positive) logic programs
(Comini, Levi Meo, Info. Comp. 00) - applied in Pisa also to finite failure infinite
computations, CLP, CCP, Prolog, ?-Prolog, sequent
calculi
- none of the two fixpoints is finitely computable
- useful to reason about different semantics and to
systematically derive more abstract semantics - choice of the most adequate reference semantics
for analysis and verification - F? is less expensive than F in computing the
observable property modeled by ? - no junk
21Polymorphic type inference in ML-like
functional languages
Static analysis
- inference rules mimic the concrete semantics
- in the structure of the semantic evaluation
function - in the semantic domains (environment)
- semantics to well-typed programs only introduces
approximation - if true then 2 else false
- the most general polymorphic type for recursive
functions is not computable - the inferred type may not be the most general
- some type-correct programs cannot be typed
- the ad-hoc solution
- Milners algorithm, specified by a set of
inference rules - an elegant, well-understood, universally accepted
semantic formalization - the systematic derivation via abstract
interpretation - provides a better insight
- shows how to improve precision
22Polymorphic type inference via Abstract
Interpretation
Type inference
- an optimal abstract operation
- ?((t1,c1),(t2,c2)) (int,
c1?c2 ?t1int ,t2int) - abstracting functional values
- the concrete semantics E ?x.e
r ?v. E e (bind ? x v) - the abstract value let
v1 newvar() in let
(v2,c2) E? e (bind ? x (v1,)) in (v1c2 -gt
v2,c2)
- abstract values pairs of
- a term (with variables)
- type expression
- a constraint (on variables)
- set of term equalities in solved form
- partial order (on terms only)
- top is no type
- bottom is any type
- t1 ? t2, if t2 is an instance of t1
- the domain is non-noetherian
- there exist infinite increasing chains
23Recursion and Widening
Type inference
- the solution in Milners algorithm
- take the results of the first two iterations and
compute their lub (most general common
instantiation, computed through unification) - if the lub is top (unification fails), the
program is not typable (type error) - this is exactly a widening operator, which
returns a (correct) upper approximation of the
lfp (Furiesi, Master Thesis
Pisa. 99)
- the abstraction of recursive functions is similar
to the one of regular functions, but - a fixpoint computation is required
- the first approximation of the abstract value of
the function is bottom - since the abstract domain is non-noetherian the
fixpoint computation may diverge
24How to improve precision
Type inference
- straightforward!
- perform at most k iterations of the fixpoint
computation - if we reach a fixpoint, it is the most general
type - otherwise, we apply Milners widening to the last
two results - we succeed in typing more functions
- we get more precise types
- one example (due to Cousot)
- CaML
- let rec f f1 g n x if n0 then (g x) else
- (((((f f1)(fun x -gt (fun h -gt
(g(h x)))))(n - 1))(x))(f1)) - This expression has type ('a -gt 'a) -gt 'b
but is here used with type 'b - our answer (the fixpoint is reached in 3
iterations) - val f ('a -gt 'a) -gt ('a -gt 'b) -gt int -gt 'a -gt
'b ltfungt
25Abstract Interpretation vs. Type Systems
Abstract Interpretation
- Patrick Cousot has reconstructed a hierarchy of
type systems for ML-like languages by using
abstract interpretation (Cousot, POPL 97) - type systems have been proposed to cope with
other static analyses (strictness, various
properties related to security)
- type systems need to be proved correct wrt a
semantics - abstract semantics are systematically derived
from the semantics and are correct by
construction - two related open interesting problems
- comparison of the two approaches from the
viewpoint of expressive power and analysis
precision (and complexity) - definition of methods to automatically translate
formalizations from one approach to the other
26Static Analysis of Logic Programs
Static analysis
- abstract Interpretation is very popular in logic
languages - the computational model has several opportunities
for optimization, based on analysis results - it is (relatively) easy to define, because the
standard semantics is collecting and the concrete
domain (sets of substitutions) is quite simple - several important properties (groundness,
freeness, sharing, depth(k))
- for some properties (i.e., groundness and
sharing) a lot of different abstract domains - techniques to compare the relative precision of
abstract domains - important results on techniques for the
systematic design of abstract domains, which can
probably be applied to other paradigms as well - abstract compilation in CLP (Giacobazzi, Debray
Levi, JLP 95) - the program is transformed by syntactically
replacing concrete constraints by abstract
constraints - the abstract computation is a standard CLP
computation on a different constraint system
27Groundness in Logic Programs
Groundness analysis
- CLP version
- concrete domain
- (P(Eqns), ?), sets of sets of term equations in
solved form - concrete semantics
- the CLP version of the s-semantics (answer
constraints) - 3 abstract domains
- G the property of being ground
- DEF functional groundness dependencies
- POS DEF some disjunctive information
- lattices shown in the 2-variables case
28An example
Groundness analysis
- the program
- p(X,Y) - Xa.
- p(X,Y) - Yb.
- q(X,Y) - XY.
- r(X,Y) - p(X,Y),q(X,Y).
- the concrete semantics
- p(X,Y) -gt Xa,Yb
- q(X,Y) -gt XY
- r(X,Y) -gt Xa,Ya,Xb,Yb
- in the concrete semantics of r
- both the arguments are bound to ground terms (in
all the answer constraints) -
29The domain G
Groundness analysis
- ?G (v)
- ?, if v bot
- e ? Eqns X is bound to a ground term in e ,
if v X X is always ground - Eqns, if v true no groundness information
-
- the concrete semantics
- p(X,Y) -gt Xa,Yb
- q(X,Y) -gt XY
- r(X,Y) -gt Xa,Ya,Xb,Yb
- the program
- p(X,Y) - Xa.
- p(X,Y) - Yb.
- q(X,Y) - XY.
- r(X,Y) - p(X,Y),q(X,Y).
- the abstraction of the concrete semantics
- p(X,Y) -gt true
- q(X,Y) -gt true
- r(X,Y) -gt X Y
- the abstract semantics
- p(X,Y) -gt true
- q(X,Y) -gt true
- r(X,Y) -gt true
- the abstract program
- p(X,Y) - lubG (X,Y).
- q(X,Y) - true.
- r(X,Y) - glbG (p(X,Y),q(X,Y)).
30The domain Def
Groundness analysis
- ?Def (v)
- e ? Eqns X Y ? e,
if v X ? Y X is ground if
and only if Y is ground - e ? Eqns X t ? e and Y occurs in t,
if v X ? Y if X is ground then Y is
ground - ..
-
- the concrete semantics
- p(X,Y) -gt Xa,Yb
- q(X,Y) -gt XY
- r(X,Y) -gt Xa,Ya,Xb,Yb
- the program
- p(X,Y) - Xa.
- p(X,Y) - Yb.
- q(X,Y) - XY.
- r(X,Y) - p(X,Y),q(X,Y).
- the abstraction of the concrete semantics
- p(X,Y) -gt true
- q(X,Y) -gt X ? Y
- r(X,Y) -gt X Y
- the abstract semantics
- p(X,Y) -gt true
- q(X,Y) -gt X ?Y
- r(X,Y) -gt X ?Y
- the abstract program
- p(X,Y) - lubDef (X,Y).
- q(X,Y) - X ? Y.
- r(X,Y) - glbDef (p(X,Y),q(X,Y)).
31The domain Pos
Groundness analysis
- ?pos (v)
- e ?Eqns either X or Y is bound to a ground
term in e , if v X ? Y either X or Y
is ground - .
-
- the concrete semantics
- p(X,Y) -gt Xa,Yb
- q(X,Y) -gt XY
- r(X,Y) -gt Xa,Ya,Xb,Yb
- the program
- p(X,Y) - Xa.
- p(X,Y) - Yb.
- q(X,Y) - XY.
- r(X,Y) - p(X,Y),q(X,Y).
- the abstraction of the concrete semantics
- p(X,Y) -gt X ? Y
- q(X,Y) -gt X ? Y
- r(X,Y) -gt X Y
- the abstract semantics
- p(X,Y) -gt X ? Y
- q(X,Y) -gt X ?Y
- r(X,Y) -gt X Y
- the abstract program
- p(X,Y) - lubpos (X,Y).
- q(X,Y) - X ? Y.
- r(X,Y) - glbpos (p(X,Y),q(X,Y)).
32Program Verification byAbstract Interpretation
Verification
- F concrete semantic evaluation function
- concrete enough to observe the property
- the property is modeled by an abstract domain
(A, ?) and a Galois insertion ? ,? - F? abstract semantic evaluation function
- S? specification of the property, i.e.,
abstraction of the intended concrete semantics - partial correctness ?(lfp F) ? S?
- sufficient partial correctness condition F? ( S?
) ? S? (Comini, Levi, Meo Vitiello, JLP 99)
- if F? ( S? ) ? S?
- then S? is a prefixpoint of F?
- hence
- ?(lfp F) ? lfp F? ? Sa
33Analysis and Verification
Verification
- F concrete semantic evaluation function
- F? abstract semantic evaluation function
- analysis compute lfp F?
- we need to compute a fixpoint
- noetherian domain or widening
- S? specification of the property
- verification prove F? ( S? ) ? S?
- no fixpoint computation and no need for
noetherian domains - finite representation of the specification
- decidability of ?
34Completeness of the proof method
Verification
- assume the program to be partially correct wrt
the specification S?, i.e., ?(lfp F) ? S? - then there exists another specification T? ,
stronger than S?, such that the sufficient
condition F? ( T? ) ? T? holds
- we have shown that the proof method is complete
if and only if the abstraction is complete
(precise) (Levi Volpe, PLILP 98)
35Proof methods and the reference semantics
Verification
- one can be interested in establishing different
kinds of properties - of the final state
- of the relation between initial and final state
- of the relation between specific pairs of
intermediate states, e.g., procedure calls - .
- there exist different corresponding proof methods
- all the proof methods are instances of F? ( S? )
? S? for different choices of the concrete
semantic evaluation function F - F can be derived by abstract interpretation
(comparative semantics) from the most concrete
semantics, i.e., a trace semantics - first step of abstraction choice of the right
semantics
in (positive) logic programming, all the known
verification methods have been reconstructed
(Levi Volpe, PLILP 98)
36Making F? ( S? ) ? S? effective
Verification
- extensional specifications
- typical analysis properties described by
noetherian abstract domains - properties such as polimorphic types which lead
to finite abstract semantics, even with
non-noetherian domains - intensional specifications, specified by means of
assertions
- assertions are abstract domains
- a formula describes the set of all the concrete
states which satisfy it (concretization) - if the specification language is closed under
conjunction, it is easy to define the abstraction
function - we can derive an abstract function Fa, which
computes on the domain of assertions and
instantiate the verification condition
(Comini, Gori Levi, MFCSIT 00)
the relation ? on the domain of assertions must
be decidable an open problem completeness of
the abstract semantics associated to a specific
language of assertions
37Specification Languages
Verification
- decidable specification languages have been
proposed for functional programming and logic
programming - one example a powerful language which allows one
to express several properties of logic programs,
including types, freeness and groundness (Volpe,
SCP 00)
- experiments using Horn Clause Logic as
specification language (Comini, Gori Levi,
AGP 00) - it is not decidable
- most of the verification conditions can be proved
without using a theorem prover - simple logic program transformation techniques,
which can be partially supported by an automatic
tool
38Systematic abstract domain design
Domain design
- once we have the abstract domain, the design of
the abstract semantics is systematic - abstract interpretation theory provides results
which can be exploited to make the design of
abstract domains (more) systematic - to compare and combine domains
- to refine domains so as to improve their precision
- reduced product (of domains A and B)
- allows one to analyze (together) the properties
modeled by A and B - often delivers better results than the separate
analyses - because of domain interaction
- lifting to the powerset (and disjunctive
completion ) - roughly speaking, transform A into P(A)
- better precision
- no loss of information in computing lubs
39Operations on Abstract Domains
Domain design
- several useful operators on abstract domains
(refinements) - a survey in (File, Giacobazzi Ranzato, ACM
Comput. Surv. 96)
- linear completion
(Giacobazzi, Ranzato Scozzari, SAS 98) - functional dependencies modeled by linear
implication - reconstruction of all the known domains for
groundness analysis (Scozzari, SAS 97) - DEF G -gt G
- POS DEF -gt DEF
- POS POS -gt POS
- optimality of POS
- successfully applied to other domains for logic
programs - types (Levi Spoto, PLILP 98)
- sharing and freeness (Levi
Spoto, PEPM 00) - open problems
- do the same refinements apply to other
programming paradigms? - can refinements be extended to domains of
assertions and to type systems?
40Abstract Interpretation
Abstract Interpretation
- a mathematically simple and solid foundation for
- comparative semantics
- static analysis
- verification
- a methodology for the systematic derivation of
- abstract domains from the property
- complexity issues?
- quantitative analyses?
- abstract semantics from the concrete semantics
and the abstract domain