Title: Deductive Databases
1Deductive Databases
- CS 95 Advanced Database Systems
- Handout 6
2Deductive Databases
- An area that is the intersection of databases,
logic, and artificial intelligence or knowledge
bases - A deductive database system is a database system
that includes capabilities to define (deductive)
rules, which can deduce or infer additional
information from the facts that are stored in a
database - Part of the theoretical foundation for some
deductive database systems is mathematical logic,
such rules are often referred to as logic
databases. - May be also referred to as intelligent databases,
expert database systems or knowledge-based
systems. - This systems also incorporate reasoning and
inferencing capabilities using techniques that
were developed in the field of artificial
intelligence.
3Knowledge-based Systems vs Deductive Database
Systems
- Knowledge-based expert systems have traditionally
assumed that the data needed resides in main
memory hence secondary storage management is not
an issue. - Deductive database systems attempt to change this
restriction so that either a DBMS is enhanced to
handle an expert system interface or an expert
system is enhanced to handle secondary storage
resident data. - The knowledge in an expert or knowledge-based
system is extracted from application experts and
refers to an application domain rather than to
knowledge inherent in the data.
4Deductive Databases Terminology
- A deductive database system is a database system
that includes capabilities to define (deductive)
rules, which can deduce or infer additional
information from the facts that are stored in a
database - Rules are specified using a declarative language
- a language in which we specify what to achieve
rather than how to achieve it. - An inference engine (or deduction mechanism)
within the system can deduce new facts from the
database by interpreting these rules. - Model used for deductive databases is closely
related to the relational model, and particularly
to the domain relational calculus formalism.
5Deductive Databases Terminology (contd)
- Deductive databases is also related to the field
of logic programming and the Prolog language. - Deductive database work based on logic has used
Prolog as a starting point. - Datalog - a variation of Prolog which is used to
define rules declaratively in conjunction with an
existing set of relations, which are themselves
treated as literals in the language. - Although the language structure of Datalog
resembles that of Prolog, its operational
semantics - that is, how a Datalog program is to
be executed - is still a topic of active
research.
6Deductive Databases Facts and Rules
- A deductive database uses two main types of
specifications facts and rules. - Facts are specified in a manner similar to the
ways relations are specified, except that it is
not necessary to include attribute names. - Recall that a tuple in a relation describes some
real-world fact whose meaning is partly
determined by the attribute name. - In a deductive database, the meaning of an
attribute value in a tuple is determined solely
by its position within the tuple. - Rules are somewhat similar to relational views.
- They specify virtual relations that are not
actually stored but can be formed from the facts
by applying inference mechanisms based on the
rule specifications. - The main difference between rules and views is
that rules may involve recursion and hence may
yield virtual relations that cannot be defined in
terms of standard relational views.
7Deductive Databases Evaluation of Prolog Programs
- The evaluation of Prolog programs is based on a
technique called backward chaining which involves
a top-down evaluation of goals. - A goal in Prolog is equivalent to a query in a
relational database system. - In a deductive database that use Datalog
attention has been devoted to handling large
volumes of data stored in a relational database.
Hence, evaluation techniques have been devised
that resemble that of bottom-up evaluation
(forward chaining). - Prolog suffers from the limitation that the order
of specifications of facts and rules is
significant in evaluation moreover, the order of
literals within a rule is significant. - The execution techniques for Datalog programs
attempt to circumvent these problems.
8Prolog Programming System
- Prolog is a logic programming system that is
based on a resolution theorem prover. The system
consists of two main components the Prolog
database and the inference engine. The Prolog
database contains the sequence of Horn clauses
that defines the logic program. The Prolog
inference engine provides the control mechanism
for proof construction using a theorem proving
algorithm based on unification and backtracking.
Prolog is not a pure logic programming language
but rather a practical and partial implementation
of logic programming. Apart from Horn clause
logic, Prolog also incorporates evaluable
predicates that have only a procedural
interpretation and second-order predicate logic
features which allow the representation and
manipulation of lists.
9Components of the Prolog System
10Prolog Programming System (contd)
- The data objects of Prolog, called terms, can be
either a constant, a variable, a structure or a
list. Prolog is a function free language
functional expressions are not valid terms but
structures are allowed, which can be used to the
same effect as functional expressions. Each type
of term is briefly described below - Constants include integers (e.g. 0, 1, 10),
reals (e.g. 1.45, 10.04), strings (e.g. "Hello")
and atoms (e.g. like, john, 'New York') which
normally begin with a lower case letter or
enclosed in single quotations. Some special
combinations are also considered atoms (e.g. ?-,
-, -gt). The special underline character '_' may
be inserted in the middle of an atom to improve
its legibility. - Variables are similar to atoms except that they
begin with a capital letter or an underline
character '_' (e.g. X, Name, _address). The
underline character '_' also denotes an anonymous
variable whose instances are always unique within
the Prolog system.
11Prolog Programming System (contd)
- Structures are more complex data objects. A
structure comprises a functor and a sequence of
one or more terms called arguments. A functor is
characterized by its name, which is an atom, and
its arity or number of arguments. In contrast to
functional expressions, structures are not
evaluated when used as arguments. However, the
use of structures as arguments allows
meta-programming in Prolog since a structure both
can be manipulated as a datum when used as an
argument and evaluated as a procedure when taken
independently as a predicate. For example, the
structure point/3 with arguments X, Y and Z,
which is written as point(X, Y, Z), can be used
as an argument to line/2 as follows
line(point(X1, Y1, Z1), point(X2, Y2, Z2)). - Lists are concatenations of Prolog terms that
has the form .(a, .(b, .(c, ))) or simply a,
b, c. - Similar to logic programs, a clause in Prolog can
be either a fact, a rule or a query. In Prolog,
the '-' is used instead of '¼' as the
implication symbol and the naming convention for
atoms and variables is the reverse of that of the
standard logic program notation, that is, atoms
start with a lower-case character and variables
start with an upper-case character. Prolog has a
declarative and procedural semantics which is
basically similar to that of logic programs.
12Prolog Comparison Predicates
13Prolog Representation of Entity-Relationship
Database Schemes
14Prolog Representation of Entity-Relationship
Database Relations
15Prolog Representation of Entity-Relationship
Database Relations
16Prolog Evaluation Strategy
- As mentioned above, the Prolog inference engine
(PIE) is based on a resolution theorem-prover
that is based on unification and backtracking.
Briefly, resolution is an inference pattern that
permits the taking of arbitrarily large inference
steps which require very considerable
computational effort to carry out (Robinson,
1992) unification is the process of matching a
subgoal with the head of a clause and
backtracking is a non-deterministic process of
reviewing the goals which have been satisfied and
attempting to resatisfy these goals by finding
alternative solutions (Cohen, 1992). The Prolog
goal evaluation strategy is by default top-down
and proceeds from left-to-right (see also Figure
4.2).
17Prolog Evaluation Strategy
- The Prolog inference engine (PIE) is based on a
resolution theorem-prover that is based on
unification and backtracking. Briefly,
resolution is an inference pattern that permits
the taking of arbitrarily large inference steps
which require very considerable computational
effort to carry out unification is the process
of matching a subgoal with the head of a clause
and backtracking is a non-deterministic process
of reviewing the goals which have been satisfied
and attempting to resatisfy these goals by
finding alternative solutions. The Prolog goal
evaluation strategy is by default top-down and
proceeds from left-to-right.
18Prolog Evaluation Strategy
- (a) p(a,b). q(b,d).
- p(a,c). q(c,f).
- r(A,B,C) - p(A,B), q(B,C).
- (b) - r(X,Y,Z).
- (c) (1) 0 CALL r(X,Y,Z)?
- (2) 1 CALL p(X,Y)?
- (2) 1 EXIT p(a,b)
- (3) 1 CALL q(b,Z)?
- (3) 1 EXIT q(b,d)
- (1) 0 EXIT r(a,b,d)
-
(1) 0 REDO r(a,b,d)? (3) 1 REDO
q(b,d)? (3) 1 FAIL q(b,Z) (2) 1
REDO p(a,b)? (2) 1 EXIT p(a,c) (4)
1 CALL q(c,Z)? (4) 1 EXIT q(c,f)
(1) 0 EXIT r(a,c,f) (1) 0 REDO
r(a,c,f)? (4) 1 REDO q(c,f)? (4)
1 FAIL q(c,Z) (2) 1 REDO p(a,c)?
(2) 1 FAIL p(X,Y) (1) 0 FAIL r(X,Y,Z)
Prolog Evaluation (a) Database, (b) Query, (c)
Evaluation Trace
19Prolog/Datalog Notation
- Notation is based on providing predicates with
unique names. - A predicate has an implicit meaning, which is
suggested by the predicate name, and a fixed
number of arguments. - If an argument are all constant values, the
predicate simply states that a certain fact is
true. - If the predicate has variables for arguments, it
is either considered as a query or as part of a
rule or constraint. - Prolog convention - all constant values in a
predicate are either numeric or character
strings they are represented as identifiers (or
names) starting with lowercase letters only,
whereas variable names always start with an
uppercase letter.
20Prolog/Datalog Example
(b) The supervisory tree
- (a) Prolog notation
- Facts
- supervise(franklin,john).
- supervise(franklin,namesh).
- supervise(franklin,joyce).
- supervise(jennifer,alicia).
- supervise(jennifer,ahmad).
- supervise(james, franklin).
- supervise(james, jennifer).
- Rules
- superior(X,Y) - supervise(X,Y).
- superior(X,Y) - supervise(X,Z), superior(Z,Y).
- subordinate(X,Y) - superior(Y,X).
- Queries
- superior(james,Y)?
- superior(james,joyce)?
21Deductive Databases Summary
- stores knowledge with the DB
- different methods of storing knowledge provide
the terms - KBMS or Expert Databases - use expert system
IF..THEN..ELSE type rules - Deductive or Logic-Based databases often use
Prolog-type rules - Expert databases generally incorporate knowledge
extracted from experts in the field to provide
reasoning and inferencing capabilities. - Logic-Based use axioms (logic theory) to store
the data and deductive axioms (rules) to extend
that information - eg to store the fact that Anne is the parent of
Betty use parent (Anne, Betty) parent (Betty,
Cameron) - Now a grandparent can be defined by the rule
grandparent (X, Z) parent (X, Y), parent (Y,
Z) - Many forms of deductive databases exist including
Deductive Object-Oriented Databases - Applications include
- Enterprise modelling
- Hypothesis testing
- Electronic commerce