Title: A brief Introduction to Automated Theorem Proving
1A brief Introduction to Automated Theorem Proving
- Theoretical Foundations, History and the
Resolution Calculus for classical First-order
Logic - Uwe Keller
- based on material by B. Beckert, R. Hähnle, A.
Voronkov, A. Leitsch and T. Tammet
2Content
- Intoduction
- Motivation History
- Theorem Proving, ATP and Calculi
- Foundations
- FOL, Normalforms Preprocessing, Metaresults
- Resolution
- Basic calculus, Unification
- Refinements, Redundancy
- Decision procedures
- Chain Resolution
- A Variant of Resolution for the Semantic Web
- Demo
3Part IIntroduction
- Motivation History
- Theorem Proving, ATP and Calculi
4Logic and Theorem Proving
Real-world description in natural
language. Mathematical Problems Program
Specification
Formalization
Syntax (formal language). First-order Logic,
Dynamic Logic,
Semantics (truth function)
Calculus (derivation / proof)
Correctness
Valid Formulae
Provable Formulae
Completeness
5How did it start
- Results from first-half of the 20th century in
mathematical logic showed - we can do logical reasoning with a limited set of
simple (computable) rules in restricted formal
languages like First-order Logic (FOL) - That means computers can do reasoning!
- Implementation of ATP
- First Computers where needed - )
- AI as a prominent field Reasoning as a basic
skill! - Mid 1950s first attempts to implement an ATP
- Today
- (A)TP is no longer only a part of main stream AI
- Central shared problem How to represent and
search extremely large search spaces!
6A rough timeline in ATP
- before 1950 Proof-theoretic Work by Skolem,
Herbrand, Gentzen and Schütte - 1954 First machine-generated Proof (Davis)
- 1955ff Semantic Tableaus (Beth, Hinitkka)
- 1957 First machine-generated Proof in Logic
Calculus (Newell Simon) - 1957 Lazy substitution by free (dummy) Vars
(Kanger, Prawitz) - 1958 First prover for Predicate Logic (Prawitz)
- 1959 More provers (Gilmore, Wang)
- 1960 Davis-Putnam Procedure (Davis, Putnam,
Longman) - 1963 Unification (J.A. Robinson)
- 1963ff Resolution (J.A. Robinson) Inverse
Method (Maslov) - 1963ff Modern Tableau Method (Smullyan, Lis)
without Unification - 1968 Modelelimination (Loveland), with
Unification - 1970ff PROLOG (Colmerauer, Kowalski),
Refinements of Resolution - 1971 Connection Method (Bibel), Matings
(Andrews) with Unification - 1985 ATP in non-classical logics, Renaissance of
Tableaux Methods - 1987 Tableaus with Unification
- 1993ff Renewed interest in Instance-based
Methods DPLL, Modelevolution -
7Theorem Proving
- Given
- a formal language (or logic) L
- a calculus C for this language ( set of rules)
- a conjecture S and a set of assumptions or axioms
A in the language L - Determine
- Can we construct a proof for S (from A) in
calculus C? - Logic Syntax Semantics Calculus
- TP Proof-search in C (Huge search problem)
- Correctness and completeness of Calculi essential
properties - Calculus Non-deterministic Algorithm
- Central problem in ATP How to implement a
non-deterministic algorithm efficiently on a
deterministic machine - )
8Theorem Proving (II)
- Research areas
- Interactive / tactic TP vs. Automated TP
- Classical Logic vs. Non-classical logics
- Calculi for
- ATP - General principle Refutation approach
- Resolution, Tableau, Inverse Method,
Instance-based Methods - ITP General principle Show Proof
situation/context - Sequent Calculi
- others General principle Generation of complex
formulae based on very simple axioms - Hilbert-style Calculi
- Central difference
- What are the elements in a proof what is a
proof?
9Main TP Applications
- Main Applications
- Software Hardware Verification
- Theorem proving in Mathematics
- Query answering in rich knowledge bases
(Ontologies) - Verification of cryptographic protocols
- Retrieval of Software Components
- Reasoning in non-classical Logics
- Program synthesis
- many systems implemented
- ATP Vampire, Otter, Spass, E-SETHEO, Darwin,
Epilog, SNARK, Gandalf - ITP Isabelle/HOL, Coq, Theorema, KeY-Prover
10Why is FOL of special interest in the ATP
community ?
- There are less more expressive logics than FOL
- Classical Propositional Logic, Modal
Propositional Logic, Description Logics, Temporal
Propositional Logic - Higher-order Predicate Logics, Dynamic Predicate
Logics, Type Theory - Research in ATP mainly focused on FOL
- FOL is very expressive, many real-world problems
can be formalized in FOL - FOL turned out to be the most expressive logic
that one can adequately approach with ATP
techniques
11Example
- Theorem in (elementary) Calculus
- Nullstellensatz Every function which is
continous over a closed interval Ia,b must
take the value 0 somewhere in I if f(a) lt 0 and
f(b) gt 0 - Proof idea Consider the Supremum l of set M
x f(x) lt 0, altxltb and show that f(l) 0
12Example (II)
- Formalization
- Compact (only LEQ)
- Redundancy-free
- Specific definitions
- Continous functions
- Main idea of proofis already encoded
- Use Supremum
- Can be done by anATP system
- but without properFormalization ?!?
- ATP better than humanprover? Robbins Problem in
Algebra - Intelligent Proving vs.Combinatorical proving
13Part IIFoundations
- FOL, Normalforms Preprocessing, Metaresults
14Classical First-order Logic (FOL)
- Syntax
- Signature
- Function Symbols, Predicate Symbols, Arity,
logical Connectives, Quantors - Terms (over ), Atomic Formulae (over ),
Formluae (over ) - Definition relative to the signature of the
predicate logic - Semantics
- First-order structure / interpretation S (U,I)
- Universe U Signature-Interpretation I
- Constants I(c) element of U
- Functionsymbols I(f) total functions on U
- Relationsymbols I(R) relation on U
- Logical connectives and quantors in the usual way
- Definition relative to the signature of the
predicate logic
15Classical FOL (II)
- Model of a statement
- An interpretation S (U,I) is called a model of
a statement s iff valS(s) t - What does it mean to infer a statement from given
premisses? - Informally Whenever our premisses P hold it is
the case that the statement holds as well - Formally Logical Entailment
- For every interpretation S which is a model of P
it holds that S is a model of S as well - Special case Validity Set of premisses is
empty - Logical entailment in a logic L is the (semantic)
relation that a calculus C aims at formalizing
syntactically (by means of a derivability
relation)! - Logical entailment considers semantics
(Interpretations) relative to a set of premisses
or axioms!
16Normal Forms
- What is a normal form?
- Why are they interesting?
- Relation to ATP?
- Conversion of input to a specifc NF my be
required by a calculus (e.g. Resolution) )
Preprocessing step - ATP in a sense can be seen as a conversion in a
NF itself, borderline is fuzzy in a sense - Normalforms in FOL
- Negation Normal Form
- Standard Form
- Prenex Normal Form
- Clause Normal Form (in a sense a logic free
form) - There are logics where certain NF do not exist,
like CNF in a Dynamic First-order Logic - Certain calculi then can not be applied in these
logics!
17Negation Normal Form
- A formula is in Negation NF (NNF) iff. it
contains no implication and no bi-implication
symbols and all negation symbols occur only as
part of a literal (directly in front of atomic
formulae) - How to achieve this NF ?
- Replace implication and bi-implication by their
definition (in terms of Æ and Ç) - Move negation symbols inside to atomic formulae
- De Morgan laws
- Dualize quantifiers when moving negation symbols
over a quantor - Eliminate multiple negations
- All these syntactical transformations generate
semantically equivalent formulae - Example
-
18Standard Form
- A formula A is in Standard Form if no variable x
in A occurs both bound and free and no bound
variable is used as a quantor variable for
multiple subformulae - How to generate this NF?
- Bounded renaming of quantor variables and the
respective occurrences - Transformed formulae is semantically equivalent
to original one - Example
- (8 x P(x) Æ Q(z)) ! (9 x R(x) Ç 9 z (P(z) Æ
Q(z)))
19Prenex Normal Form
- A formula A is in Prenex NF iff. it is of the
form A Q1x1 Qnxn B where Qk is a universal
or existential quantor and B contains no
quantors. B is called the Matrix of A - How to construct this NF?
- Transform A in NNF and Standard Form
- Move iteratively outermost quantor to the outside
until it reaches another quantor. Quantors may
not cross quantors of different sort (in-scope
relation between quantor occurrences may not be
changed) - This transformation generates a formulae which is
logically equivalent to the original one. - Example
20Clause Normal Form
- A formula A is in Clause NF iff. it is in PNF,
closed, the prefix only contains universal
quantors and the Matrix is on conjunctive normal
form. - In other words A 8 x1 8 xn ( (L1,1 Ç Ç
L1,m1) Æ Æ (Lk,1 Ç Ç Lk,mk)) where Li,j is a
literal (negated or positive atomic formula) - How to construct this NF?
- Transform A in NNF and Standard Form
- Transform result in PNF
- Remove existential quantors by Skolemization
(Function terms) - Apply Distributivity laws to convert Matrix of
the result in conjuntive normal form (conjunction
of discjunction of literals) - This transformation results in a formula which is
not logically equivalent, but it is
satisfiability-preserving (which is enough for
the ATP methods later) - Example
21Clause Normal Form (II)
- A formula A is in Clause NF can be written as A
8 x1 8 xn ( (L1,1 Ç Ç L1,m1) Æ Æ (Lk,1 Ç
Ç Lk,mk)) where Li,j is a literal (negated or
positive atomic formula) - Since every formula can be transformed into CNF,
the CNF can be seen as logic free
representation of a formulae - All quantors are universal, no free variables are
allowed -gt drop quantors - Matrix is in CNF Conjunction of Disjunction of
Literals -gt Model as a Set of Sets of Literals - Example
- The sketched transformation to CNF is not optimal
- Exponential blowup possible (already for NNF)
- Syntactical structure of the original formula
gets lost - Skolemsymbols have unnecessarily many parameters
- Unnecessarily many new skolem systems are
introduced - One can improve all these aspects of a
transformation to CNF! - Skolemization before PNF transformation,
Definitorial CNF for Matrix, Reuse of Skolem
functions
22Metaresults
- Metaresult Property of a Logic L
- Most famous example Gödels Incompleteness
Theorems! - Here some metaresults for FOL which form the
theoretical foundation of ATP - carry over to many other logics as well
- Deduction Theorem
- If M s ² s then M ² s ! s
- Logical entailment can be reduced to validity
- Proof by contradiction
- If M is a set of closed formulae thenM ² s iff.
M s is unsatisfiable (i.e. has no model) - Logical entailment can be reduced to
unsatisfiability checking - Refutation can be used as a universal principle
for inference in FOL
23Metaresults (II)
- Complexity of logical entailment, validity and
satisfiability - Propositional Logic
- Logical entailment (²-relation) is decidable,
Satisfiability too - Set of valid formulae is co-NP-complete
- Set of satisfiable formulae is NP-complete
- First-order Predicate Logic
- Logical entailment / validity / satisfiability is
undecidable - Set of valid formulae is semi-decidable
(recursively enumerable) - Set of satisfiable formulae is not recursively
enumerable
24Metaresults (III)
- Term Interpretations and Herbrand Theorem
- S (U,I) is term-interpretation if U Term0?
- Let Term0? be non-empty. An interpretation S
(U,I) is called Herbrand-Interpretation if - S is term-interpretation and
- I(f)(t1,,tn) f(t1,,tn) for all n-ary function
symbols f 2 ? and ground terms t1,,tn - Herbrand-Modell of s is Herbrand-Intp. I with I ²
s - Herbrand-Interpretations are special because they
have a simple universe (syntactical) and Terms
are basically uninterpreted. Quantifiers then
have ground terms as their range! - Computers can deal with such special
(syntactical) interpretations, but not with
interpretations in general!
25Metaresults (IV)
- Term Interpretations and Herbrand Theorem
- Let M be a set of closed formulae s in
Prenex-Normalform that contain no existential
quantors (for instance s in CNF) - Let T be a set of terms (over signature ?)
- T(M) set of T-instances of M, i.e. replace
every occurence of a (universal) variable in any
formulae in M with any term in T - Herbrand Theorem
- Let Term0? be non-empty and M a set of formulae
in Prenex-NF without existential quantors. - Then the following statements are equivalent
- M has a model
- M has a Herbrand-model
- Term0?(M) has a model
- The last set is a set of formulae in
propositional logic
26Metaresults (V)
- Compactness of FOL
- A (possibly infinite) set M of formulae has a
model iff every finite subset M ½ M has a model
(i.e. is satisfiable) - Combining Compactness with Herbrands Theorem
- Let Term0? be non-empty and M a set of formulae
in Prenex-NF without existential quantors. - Then M is unsatisfiable iff. T(M) is
unsatisfiable for a finite set of ground terms T
½ Term0? - Note that T is a finite set of ground terms over
the signature ? of the formula set M - No external functions symbols have to be
considered! - Allows for using guided substitutions
(Unification!)
27Metaresults (VI)
- That means logical entailment / validity can be
checked - by reduction to unsatisfiabiliy of a set of
formulae M - which can done by finding suitable finite
(counter)-examples for the quantfied variables
such that a contradiction arises - One can only use the Signature ? of the given set
M to find the counterexamples - Basically this is what all ATP procedures do
Find a finite set of counterexamples (objects)
such that a respective instance of the orginial
formula set is determined as being inconsistent
(unsatisfiable) - The theorem immediately gives an algorithm for
ATP! - Problem How to construct / find T in the theorem
in a clever way?
28Herbands TheoremFrom Clause Logic to
Propositional Logic
Clauses
Clause Logic
(Ground) Substitutions ?
Incons- istent set
Ground clauses
Propositional Logic
29Part IIIThe Resolution Calculus
- Pre-resolution phase
- Gilmores Methods, Davis-Putnam Procedure
- Unification
- Basic Resolution Calculus
- Refinements, Redundancy
30Pre-Resolution period Gilmores Method
- First ATP procedure for First-order logic
- Directly based on Herbrands Theorem
- Reduction of FOL entailment to satisfiability in
Prop. Logic - How to generate candidates C for propositional
satisiability checking from a FOL clause set C - Saturation by ground instances from Hn(C) ( set
of ground terms of depth n) - More precisely Successively generate the sets
Cn of ground clauses c? c 2 C and rg(?) µ
Hn(C) - Since H_n( C) grow exponentially it is very
important to have a good algorithm for checking
satisfiability
31Pre-Resolution period Gilmores Method
- Easy test of satisfiability of the generated C
set of ground clauses - Transform C into Disjunctive Normal Form
- D DNF(C) is unsatisfiable iff every
consitutent of D contains a contradiction L Æ L
for some literal L - Can be done in deterministic time O(n log(n))
- Problem Convertion from CNF into DNF (almost
always) exponential (inherently complex, since
otherwise P NP), (not known at that time!) - Pseudocode
begin contr false while not contr do
D DNF(C_n) contr all constitutents of
D contain complementary literals nn1
end while end
32Pre-Resolution period Gilmores Method
- Weak points of Gilmores approach
- The generation of the candidate ground clause
sets Cn to be checked - the discjunctive normal form transfomation
- First weakness is inherent to all procedures
directly applying Herbrands theorem - The second problem concerns propositional logic
only - Gilmores pioneering implementation did not yield
actual proofs for quite simple predicate logic
formulas - A possible improvement
- Avoid transformation to DNF and try to find
good decision methods for satisfiability on
CNFs - This is basically what was achieved by Davis and
Putnam DP,1960 shortly after Gilmores
implementation
33Pre-Resolution periodDavis-Putnam Procedure
- Like Gilmores method based on successive
production of ground caluse sets CN and testing
of their unsatisfiability - (Still) very efficient decision method for
satisfiability. Requires CNF for ground clauses. - Invented originally for FOL, it became the most
powerful SAT decision procedure for Propositional
Logic. Many very powerful SAT solvers still are
refining DPP today. - Davis-Logemann-Loveland Rules DLL, 1962
- Preliminary step Reduce all clauses in C
- Eliminate multiple occurrences of the same
literal (leave only one). Generates a clause set
C - Then apply the follwing rules non-deterministicall
y to C - Tautology-Rule
- One-Literal-Rule
- Pure-Literal-Rule
- Splitting-Rule
34Pre-Resolution periodDavis-Putnam Procedure
- Davis-Logemann-Loveland Rules DLL, 1962
- Tautology-Rule Delete all clauses in C
containing complementary literals - One-Literal-Rule If there is a clauses c l
with only one literal l, remove all clauses d
from C which contain l, and remove the dual
literal ld from all other clauses - Pure-Literal-Rule Let D µ C with the following
property There exists a literal l appearing in
all clauses of D, but ld does not appear in C.
Then delete D from C - Splitting-Rule Let C A1,,An,B1,,Bm R
such that R contains l nor ld, all Ai contain l
but not ld and all Bj contain ld but not l. Let
Ai Ai after deletion of l and let Bj Bj
after deletion of ld.Then split C into C1
A1,,An R and C2B1,,Bm R - Properties of the DLL procedure
- The rules are essentially reductive (atoms are in
each step deleted) - The rules are correct (rules preserve
satisfiability in case of split only for one of
the new introduced clauses sets - The procedure generates sets that contain the
empty clause for all cases (of the applied
splits) iff C is unsatisfiable (decision
criteria correctness and completeness,
termination)
Example C P Ç Q, R Ç S Ç S, R Ç S, R Ç S,
R Ç S, P Ç Q Ç P
35Pre-Resolution periodDavis-Putnam Procedure
- Pseudocode of the First-order ATP procedure by
Davis Putnam
begin C finite set of clauses if C does not
contain (real) function symbols then apply
DP1 DP3 to C_0 check the DP decision tree for
unsatisfiability else begin n 0 contr
false while not contr do perform DP1 DP3 on
C_n if the DP-decision tree proves
unsatisfiability then contr true else
contr false nn1 end while end end
- Nondeterministic (DP3)
- If C does not contain function symbols (with
arity gt 0) then the procedure always terminates
( decision procedure for FOL clause set) - If C is satisfiable and C contains function
symbols then the algorithm does not terminate - Yields a decision procedure for validity of the
Bernays-Schönfinkel class in FOL (8 9)
DP1 Reduce all clauses DP2 Delete all
tautologies DP3 Construct a DP decision tree
according to the given rules
36InterludeInferences Inference systems
- An inference I has the formwhere n 0, F1,,Fn,
G are formulae - An inference rule R is a set of inferences
- more precisely a decidable (usually efficiently
computable) n1-ary relation over formuale - Usually one uses schematic variables for
representing formulae in inference rules and
attach some (most often syntactic) conditions to
these variables - Every instance I 2 R is called an instance of R
- An inference system is a (finite) set of
inference rules - A proof of G from P in is a finite sequence of
formulae F1, Fn such that - Fn F and
- for all Fi (i n) it holds that either Fi 2 N or
there is an inference I such that Fi is the
conclusion of I and all the premisses P1, Pj of
I are contained in the prefix F1, , F(i-1) - Here we mainly consider inference systems on
clauses, for instance Resolution
F1 F2 Fn G
Premisses
Conclusion
37A Revolution in ATP Robinsons Resolution
Principle
- In some sense the simplest possible calculus for
FOL (without equality) - In principle only a single inference rule which
combines substution and atomic cut - Possible since it requires set of input formulae
in CNF (very simple and uniform syntactic form) - Binary substitution rule computing a minimal
substitution which makes two atoms equal - A quote from Robinsons landmarking paper
Robinson, 1965 - Theorem-proving on the computer, using procedures
based on the fundamental theorem of Herbrand
concerning the FOL Predicate Calculus, is
examined with a view towards improving the
efficiency and widening the range of practical
applicability of these procedures. A close
analysis of the process of substitution (of terms
for variables) and the process of
truth-functional analysis of the results of such
substitutions reveals that both processes can be
combined into a single new iterating process
(called resolution) which is vastly more
efficient than the older cylcic procedures
consisting of substitution stages alternating
with truth-functional analysis stages.
38A Revolution in ATP Robinsons Resolution
Principle
- The basic Resolution Calculus (BRC)
- Ground case
- General case
- Fundamental aspects
- Iterative grounding of the clause set
- Guided guessing of interesting instances
(Unification) built into the calculus - Resolving upon an atom L does not require L to be
ground (unnecessary grounding avoided)
L Ç C L Ç D C Ç D
C Ç L Ç L C Ç L
Binary Resolution
Factoring
L Ç C L Ç D (C Ç D)?
C Ç L Ç L (C Ç L)?
Binary Resolution
Factoring
where ? is the most general unifier of L and L
39Basic Resolution CalculusProperties
- Properties of the basic Resolution Calculus
- Given any two clauses, there are only finitely
many resolvents using the Resolution Inference
Rule. - The Resolution Calculus is sound
- If c is provable from C in BRC then C ² c
- This means in particular If we can derive the
emtpy clause then C is unsatisfiable - The Resolution Calculus is refutationally
complete - A set C of clauses is unsatisfiable then the
empty clause can be proven (derived) from C - Altogether
- A set C of clauses is unsatisfiable iff. there is
a proof for the empty clause from C in BRC - Remark Soundness of the inference system can be
relaxed to satisfiability- preserving! - How to find a contradiction (empty clause)
starting with an initial (unsatisfiable) formula
set? - Saturation approach (wrt. the inference system
BRC)
40ResolutionProof search by Saturation
- Saturated sets
- A set of clauses C is called saturated (wrt.
inference system ?) if every inference in ? with
premises in C gives a clause in C - Completness reformulated (in terms of saturated
sets) - A set C of clauses is unsatisfiable iff every
saturated set S of clauses with C µ S also
contains the empty clause - That means Simply construct a(ny) saturated set
S of clauses (wrt. BRC) S (saturation algorithm) - Simple algorithm
- S set of input clauses
- while not finished do
- Repeatedly apply all inferences to clauses in S,
adding to S conclusions of these inferences - If the empty clause is proved, terminate with
success. If no inference rule is applicable,
terminate with failure
41ResolutionProof search by Saturation
Conclusions
42ResolutionProof search by Saturation
43ResolutionProof search by Saturation
- Possible theoretical scenarios
- At some moment the empty clause is generated, in
this case the input set of clauses is
unsatisfiable - Saturation will terminate without ever generating
the empty clause, in this case the input set of
clauses is satisfiable - Saturation will run forever, but without
generating the empty clause. In this case the
input set of clauses is satisfiable - Possible practical scenarios
- At some moment the empty clause is generated, in
this case the input set of clauses is
unsatisfiable - Saturation will terminate without ever generating
the empty clause, in this case the input set of
clauses is satisfiable - Saturation will run until we run out of
resources, but without generating the empty
clause. In this case it is unknown whether the
input set of clauses is (un)satisfiable
44ResolutionHow to saturate in clever way ?
- The simple saturation algorithm is highly
inefficient - Apply inferences not in an arbitrary way, but
within some senseful / useful order. - Generate the empty clause as early as possible in
the saturation process - Prefer some inferences over others (in a
sense), for instance goal directedness - Actually what we need to ensure then to have
completness guaranteed is fairness - A saturation algorithm is fair iff every possible
inference is eventually selected - Completness Theorem reformulated (for Saturation
Algorithms) - Let A be a fair saturation algorithm. A set C of
clauses is unsatisfiable iff A eventually
produces the empty clause - Central problem How to find good saturation
algorithms!
45How to guess suitable instances?Unification
46ExampleBasic Resolution Calculus
47Enhancing Efficiency Refinements of Resolution
48Resolution RefinementsHyperresolution
49Resolution RefinementsOrdered Resolution
50Enhancing Efficiency Redundancy Criteria in
Resolution
51Part IVChain Resolution
- A Variant of Resolution for the Semantic Web
52Part IVDemo
- assisted by a Resolution-based ATP System
VAMPIRE
53And there is a lot we have not talked about yet
- Different Calculi
- Tableaux Methods, Instance-based Methods, Inverse
Method - Decision Procedures
- Theory Reasoning, in particular equality
- ATP in other logics
- Modal, temporal logics, description logics
- Logics for non-monotonic reasoning
- Paraconsistent logics
- Reasoning tasks other than logical entailment /
unsatisfiability - Query answering
54References further Reading