Title: Experiments with MRDTL
1Experiments with MRDTL A Multi-relational
Decision Tree Learning Algorithm
Experiments with MRDTL A Multi-relational
Decision Tree Learning Algorithm
Hector Leiva, Anna Atramentov and Vasant
Honavar Artificial Intelligence
Laboratory Department of Computer Science
and Graduate Program in Bioinformatics and
Computational Biology Iowa State University Ames,
IA 50011, USA www.cs.iastate.edu/honavar/aigroup/
html
Support provided in part by National Science
Foundation, Carver Foundation, and Pioneer
Hi-Bred, Inc.
2Motivation
- Importance of multi-relational learning
- Growth of data stored in MRDB
- Techniques for learning unstructured data often
extract the data into MRDB - Expanding of the techniques for multi-relational
learning - Blockeels framework (ILP)(1998)
- Getoors framework (first order extensions of
PM)(2001) - Knobbes framework (MRDM)(1999)Problem no
experimental results available
Goals
- Perform experiments and evaluate performance of
the Knobbes framework - Understand strengths and limits of the approach
3Multi-Relational Learning Literature
- Inductive Logic Programming
- First order extensions of probabilistic models
- Multi-Relational Data Mining
- Propositionalization methods
- PRMs extension for cumulative learning for
learning and reasoning as agents interact with
the world - Approaches for mining data in form of graph
- Blockeel, 1998 De Raedt, 1998 Knobbe et
al., 1999 Friedman et al., 1999 Koller, 1999
Krogel and Wrobel, 2001 Getoor, 2001 Kersting
et al., 2000 Pfeffer, 2000 Dzeroski and Lavrac,
2001 Dehaspe and De Raedt, 1997 Dzeroski et
al., 2001 Jaeger, 1997 Karalic and Bratko,
1997 Holder and Cook, 2000 Gonzalez et al.,
2000
4Problem Formulation
- Given Data stored in relational data base
- Goal Build decision tree for predicting target
attribute in the target table
Example of multi-relational database
Department Department Department
d1 Math 1000
d2 Physics 300
d3 Computer Science 400
schema
instances
Department
ID
Specialization
Students
Graduate Student Graduate Student Graduate Student Graduate Student Graduate Student Graduate Student
s1 John 2.0 4 p1 d3
s2 Lisa 3.5 10 p4 d3
s3 Michel 3.9 3 p4 d4
Grad.Student
ID
Name
GPA
Publications
Advisor
Department
Staff
ID
Name
Department
Position
Salary
Staff Staff Staff Staff Staff
p1 Dale d1 Professor 70 - 80k
p2 Martin d3 Postdoc 30-40k
p3 Victor d2 VisitorScientist 40-50k
p4 David d3 Professor 80-100k
5Propositional decision tree algorithm.
Construction phase
Day Outlook Temp Hum-ty Wind PlayT
d1 Sunny Hot High Weak No
d2 Sunny Hot High Strong No
Day Outlook Temp-re Humidity Wind PlayTennis
d1 Sunny Hot High Weak No
d2 Sunny Hot High Strong No
d3 Overcast Hot High Weak Yes
d4 Overcast Cold Normal Weak No
Day Outlook Temp Hum-ty Wind PlayT
d3 Overcast Hot High Weak Yes
d4 Overcast Cold Normal Weak No
d1, d2, d3, d4
Tree_induction(D data) A
optimal_attribute(D) if stopping_criterion
(D) return leaf(D) else Dleft
split(D, A) Dright
splitcomplement(D, A) childleft
Tree_induction(Dleft) childright
Tree_induction(Dright) return node(A,
childleft, childright)
Outlook
6MR setting. Splitting data with Selection Graphs
Department
Graduate Student
ID Name GPA Public. Advisor Department
s1 John 2.0 4 p1 d3
s2 Lisa 3.5 10 p4 d3
s3 Michel 3.9 3 p4 d4
ID Specialization Students
d1 Math 1000
d2 Physics 300
d3 Computer Science 400
Staff
Department
ID Name Department Position Salary
p4 David d3 Professor 80-100k
ID Name Department Position Salary
p1 Dale d1 Professor 70 - 80k
p2 Martin d3 Postdoc 30-40k
p3 Victor d2 VisitorScientist 40-50k
p4 David d3 Professor 80-100k
Grad.Student
ID Name Department Position Salary
p1 Dale d1 Professor 70-80k
Staff
ID Name Department Position Salary
p2 Martin d3 Postdoc 30-40k
p3 Victor d2 VisitorScientist 40-50k
complement selection graphs
7What is selection graph?
Department
- It corresponds to the subset of the instances
from target table - Nodes correspond to the tables from the database
- Edges correspond to the associations between
tables - Open edge have at least one
- Closed edge have non of
Grad.Student
Staff
Grad.Student
Department
Staff
Specializationmath
8Automatic transforming selection graphs into SQL
query
Staff
Select distinct T0.id From Staff Where
T0.positionProfessor
Position Professor
Select distinct T0.id From Staff T0,
Graduate_Student T1 Where T0.idT1.Advisor
Staff
Grad. Student
Generic query select distinct
T0.primary_key from table_list where
join_list and condition_list
Staff
Grad. Student
Select distinct T0.id From Staff T0 Where T0.id
not in ( Select T1. id
From Graduate_Student T1)
Grad. Student
Select distinct T0. id From Staff T0,
Graduate_Student T1 Where T0.idT1.Advisor T0.
id not in ( Select T1. id From
Graduate_Student T1 Where T1.GPA gt 3.9)
Staff
Grad. Student
GPA gt3.9
9MR decision tree
- Each node contains selection graph
- Each children selection graph is a supergraphof
the parent selection graph
10How to choose selection graphs in nodes?
- Problem There are too many supergraph selection
graphs to choose from in each node - Solution
- start with initial selection graph
- find greedy heuristic to choose
supergraphselection graphs refinements - use binary splits for simplicity
- for each refinementget complement refinement
- choose the best refinement basedon information
gain criterion - Problem Somepotentiallygood refinementsmay
give noimmediate benefit - Solution
- look ahead capability
11Refinements of selection graph
Grad.Student
Department
Staff
Specializationmath
Grad.Student
GPA gt3.9
- add condition to the node - explore attribute
information in the tables - add present edge and open node explore
relational properties between the tables
12Refinements of selection graph
refinement
Specializationmath
Position Professor
Specializationmath
complement refinement
Specializationmath
- add condition to the node
- add present edge and open node
Position ! Professor
13Refinements of selection graph
refinement
GPA gt2.0
Specializationmath
Specializationmath
complement refinement
Specializationmath
- add condition to the node
- add present edge and open node
14Refinements of selection graph
refinement
Grad.Student
Department
Staff
Specializationmath
Students gt200
Grad.Student
GPA gt3.9
Specializationmath
complement refinement
Specializationmath
- add condition to the node
- add present edge and open node
15Refinements of selection graph
refinement
Grad.Student
Department
Staff
Specializationmath
Grad.Student
GPA gt3.9
Specializationmath
complement refinement
Note information gain 0
Specializationmath
- add condition to the node
- add present edge and open node
16Refinements of selection graph
refinement
Grad.Student
Department
Staff
Specializationmath
Grad.Student
GPA gt3.9
Specializationmath
complement refinement
Specializationmath
- add condition to the node
- add present edge and open node
17Refinements of selection graph
refinement
Grad.Student
Department
Staff
Specializationmath
Grad.Student
GPA gt3.9
Specializationmath
complement refinement
Specializationmath
- add condition to the node
- add present edge and open node
18Refinements of selection graph
refinement
Grad.Student
Department
Staff
Specializationmath
Grad.Student
GPA gt3.9
Specializationmath
complement refinement
Specializationmath
- add condition to the node
- add present edge and open node
19Look ahead capability
refinement
Grad.Student
Department
Staff
Specializationmath
Grad.Student
GPA gt3.9
Specializationmath
complement refinement
Grad.Student
Department
Staff
Specializationmath
Grad.Student
GPA gt3.9
20Look ahead capability
refinement
Grad.Student
Department
Staff
Specializationmath
Grad.Student
GPA gt3.9
Students gt 200
Specializationmath
Specializationmath
21MR decision tree algorithm. Construction phase
Staff
- for each non-leaf node
- consider all possible refinements and their
complements of the nodes selection graph - choose the best ones based on information gain
criterion - create children nodes
Staff
Staff
Grad.Student
Grad.Student
22MR decision tree algorithm. Classification phase
Staff
- for each leaf
- apply selection graph of the leaf to the test
data - classify resulting instances with classification
of the leaf
Staff
Staff
Grad.Student
Grad.Student
Grad.Student
Staff
Staff
Grad. Student
Grad.Student
GPA gt3.9
GPA gt3.9
Staff
Grad. Student
Staff
Grad. Student
GPA gt3.9
Position Professor
..
GPA gt3.9
Department
Department
70-80k
80-100k
Specmath
Specphysics
23Experimental results. Mutagenesis
- Most widely DB used in ILP.
- Describes molecules of certain nitro aromatic
compounds. - Goal predict their mutagenic activity (label
attribute) ability to cause DNA to mutate.
High mutagenic activity can cause cancer. - Class distribution.
Compounds Active Inactive Total
Regression friendly 125 63 188
Regression unfriendly 13 29 42
Total 138 92 230
- 5 levels of background knowledge B0, B1, B2, B3,
B4. They provide richer descriptions of the
examples. The first three levels (B0, B1, B2) are
used only.
24Experimental results. Mutagenesis
- Results of 10-fold cross-validation for
regression friendly set.
Systems Accuracy () Accuracy () Accuracy () Time (secs.) Time (secs.) Time (secs.)
B0 B1 B2 B0 B1 B2
Progol 79 86 86 8595 4627 6530
Progol 76 81 83 117k 64k 42k
FOIL 61 61 83 4950 9138 0.5
TILDE 75 79 85 41 170 142
MRDTL 67 87 88 0.85 332 221
Systems Number of nodes Number of nodes Number of nodes
B0 B1 B2
MRDTL 1 53 51
25Experimental results. Mutagenesis
- Results of leave-one-out cross-validation for
regression unfriendly set.
Background Accuracy Time Nodes
B0 70 0.6 secs. 1
B1 81 86 secs. 24
B2 81 60 secs. 22
- Two recent approaches (Sebag and Rauveirol, 1997)
and (Kramer and De Raedt, 2001) using B3 have
achieved 93.6 and 94.7, respectively for
mutagenesis database.
26Experimental results. KDD Cup 2001
- Consists of a variety of details about the
various genes of one particular type of organism.
- Genes code for proteins, and these proteins tend
to localize in various parts of cells and
interact with one another in order to perform
crucial functions. - Task Prediction of gene/protein localization (15
possible values) - Target table Gene
- Target attribute Localization
- 862 training genes, 381 test genes.
- Challenge many attribute values are missing.
- Approach using a special value to encode a
missing value.Result accuracy of 50 - Have to find good techniques for filling in
missing values.
27Experimental results. KDD Cup 2001
- Approach Replacing missing values by the most
common value of the attribute for the
class.Results- accuracy of around 85 with a
decision tree of 367 nodes, with no limit in the
number of times an association can be
instantiated.- accuracy of 80, when limiting
the number of times an association can be
instantiated.- accuracy of around 75 is
obtained when following associations only in the
forward direction. - This shows that providing reasonable guesses for
missing values can significantly enhance the
performance of MRDTL on real world data sets. - In practice, since the class labels for test data
are unknown, it is not possible to apply this
method. - Approach Extension of the Naïve Bayes algorithm
for relational dataResult-no improvement
comparing to the first approach - Have to incorporate handling missing values into
decision tree algorithm
28Experimental results. Adult database
- Suitable for propositional learning. One table, 6
numerical attributes, 8 nominal attributes. - Information from 1994 census.
- Task determine whether a person makes over 50k a
year. - Class distribution for adult database
Training Training Test Test Total
gt50k lt50k gt50k lt50k
With missing values 7841 24720 3846 12435 48842
W/o missing values 7508 22654 3700 11360 45222
- Result after removal of missing values and using
original train/test split 82.2. - Filling missing values with Naïve Bayes approach
yields 83 - C4.5 result 84.46
29Summary
- the algorithm is a promising alternative to
existing algorithms, such as Progol, Foil, and
Tilde - the running time is comparable with the best
existing approaches - if equipped with principled approaches to handle
missing values it is an effective algorithm for
learning real-world relational data - the approach is an extension of propositional
learning, and can be successfully applied for
propositional learning - Questions
- - why cant we split the data based on the value
of the attribute in arbitrary table right away? - - is there less restrictive and more simple way
of representing the splits of data than selection
graphs? - - the running time for computing the first nodes
in decision tree is much less then for the rest
of the nodes. Is it unavoidable? Can we implement
the same idea more efficiently?
30Future work
- Incorporation of the more sophisticated
techniques for handling missing values - Incorporating of more sophisticated pruning
techniques or complexity regularizations - More extensive evaluation of MRDTL on real-world
data sets - Development of ontology-guided multi-relational
decision tree learning algotihms to generate
classifiers at multiple levels of abstraction
Zhang et al., 2002 - Development of variants of MRDTL for
classification tasks where the classes are not
disjoint, based on the recently developed
propositional decision tree counterparts of such
algorithms Caragea et al., 2002 - Development of variants of MRDTL that can learn
from heterogeneous, distributed, autonomous data
sources, based on recently developed techniques
for distributed learning and ontology based data
integration