Experiments with MRDTL - PowerPoint PPT Presentation

About This Presentation

Title:

Experiments with MRDTL

Description:

Experiments with MRDTL A Multi-relational Decision Tree Learning Algorithm Experiments with MRDTL A Multi-relational Decision Tree Learning Algorithm – PowerPoint PPT presentation

Number of Views:98

Avg rating:3.0/5.0

Slides: 31

Provided by: AnnaA156

Learn more at: http://msl.cs.uiuc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Experiments with MRDTL

1
Experiments with MRDTL A Multi-relational
Decision Tree Learning Algorithm
Experiments with MRDTL A Multi-relational
Decision Tree Learning Algorithm
Hector Leiva, Anna Atramentov and Vasant
Honavar Artificial Intelligence
Laboratory Department of Computer Science
and Graduate Program in Bioinformatics and
Computational Biology Iowa State University Ames,
IA 50011, USA www.cs.iastate.edu/honavar/aigroup/
html
Support provided in part by National Science
Foundation, Carver Foundation, and Pioneer
Hi-Bred, Inc.
2
Motivation

Importance of multi-relational learning
Growth of data stored in MRDB
Techniques for learning unstructured data often
extract the data into MRDB
Expanding of the techniques for multi-relational
learning
Blockeels framework (ILP)(1998)
Getoors framework (first order extensions of
PM)(2001)
Knobbes framework (MRDM)(1999)Problem no
experimental results available

Goals

Perform experiments and evaluate performance of
the Knobbes framework
Understand strengths and limits of the approach

3
Multi-Relational Learning Literature

Inductive Logic Programming
First order extensions of probabilistic models
Multi-Relational Data Mining
Propositionalization methods
PRMs extension for cumulative learning for
learning and reasoning as agents interact with
the world
Approaches for mining data in form of graph
Blockeel, 1998 De Raedt, 1998 Knobbe et
al., 1999 Friedman et al., 1999 Koller, 1999
Krogel and Wrobel, 2001 Getoor, 2001 Kersting
et al., 2000 Pfeffer, 2000 Dzeroski and Lavrac,
2001 Dehaspe and De Raedt, 1997 Dzeroski et
al., 2001 Jaeger, 1997 Karalic and Bratko,
1997 Holder and Cook, 2000 Gonzalez et al.,
2000

4
Problem Formulation

Given Data stored in relational data base
Goal Build decision tree for predicting target
attribute in the target table

Example of multi-relational database
Department Department Department
d1 Math 1000
d2 Physics 300
d3 Computer Science 400
schema
instances
Department
ID
Specialization
Students
Graduate Student Graduate Student Graduate Student Graduate Student Graduate Student Graduate Student
s1 John 2.0 4 p1 d3
s2 Lisa 3.5 10 p4 d3
s3 Michel 3.9 3 p4 d4
Grad.Student
ID
Name
GPA
Publications
Advisor
Department
Staff
ID
Name
Department
Position
Salary
Staff Staff Staff Staff Staff
p1 Dale d1 Professor 70 - 80k
p2 Martin d3 Postdoc 30-40k
p3 Victor d2 VisitorScientist 40-50k
p4 David d3 Professor 80-100k
5
Propositional decision tree algorithm.
Construction phase
Day Outlook Temp Hum-ty Wind PlayT
d1 Sunny Hot High Weak No
d2 Sunny Hot High Strong No
Day Outlook Temp-re Humidity Wind PlayTennis
d1 Sunny Hot High Weak No
d2 Sunny Hot High Strong No
d3 Overcast Hot High Weak Yes
d4 Overcast Cold Normal Weak No
Day Outlook Temp Hum-ty Wind PlayT
d3 Overcast Hot High Weak Yes
d4 Overcast Cold Normal Weak No
d1, d2, d3, d4
Tree_induction(D data) A
optimal_attribute(D) if stopping_criterion
(D) return leaf(D) else Dleft
split(D, A) Dright
splitcomplement(D, A) childleft
Tree_induction(Dleft) childright
Tree_induction(Dright) return node(A,
childleft, childright)
Outlook
6
MR setting. Splitting data with Selection Graphs
Department
Graduate Student
ID Name GPA Public. Advisor Department
s1 John 2.0 4 p1 d3
s2 Lisa 3.5 10 p4 d3
s3 Michel 3.9 3 p4 d4
ID Specialization Students
d1 Math 1000
d2 Physics 300
d3 Computer Science 400
Staff
Department
ID Name Department Position Salary
p4 David d3 Professor 80-100k
ID Name Department Position Salary
p1 Dale d1 Professor 70 - 80k
p2 Martin d3 Postdoc 30-40k
p3 Victor d2 VisitorScientist 40-50k
p4 David d3 Professor 80-100k
Grad.Student
ID Name Department Position Salary
p1 Dale d1 Professor 70-80k
Staff
ID Name Department Position Salary
p2 Martin d3 Postdoc 30-40k
p3 Victor d2 VisitorScientist 40-50k
complement selection graphs
7
What is selection graph?
Department

It corresponds to the subset of the instances
from target table
Nodes correspond to the tables from the database
Edges correspond to the associations between
tables
Open edge have at least one
Closed edge have non of

Grad.Student
Staff
Grad.Student
Department
Staff
Specializationmath
8
Automatic transforming selection graphs into SQL
query
Staff
Select distinct T0.id From Staff Where
T0.positionProfessor
Position Professor
Select distinct T0.id From Staff T0,
Graduate_Student T1 Where T0.idT1.Advisor
Staff
Grad. Student
Generic query select distinct
T0.primary_key from table_list where
join_list and condition_list
Staff
Grad. Student
Select distinct T0.id From Staff T0 Where T0.id
not in ( Select T1. id
From Graduate_Student T1)
Grad. Student
Select distinct T0. id From Staff T0,
Graduate_Student T1 Where T0.idT1.Advisor T0.
id not in ( Select T1. id From
Graduate_Student T1 Where T1.GPA gt 3.9)
Staff
Grad. Student
GPA gt3.9
9
MR decision tree

Each node contains selection graph
Each children selection graph is a supergraphof
the parent selection graph

10
How to choose selection graphs in nodes?

Problem There are too many supergraph selection
graphs to choose from in each node
Solution
start with initial selection graph
find greedy heuristic to choose
supergraphselection graphs refinements
use binary splits for simplicity
for each refinementget complement refinement
choose the best refinement basedon information
gain criterion
Problem Somepotentiallygood refinementsmay
give noimmediate benefit
Solution
look ahead capability

11
Refinements of selection graph
Grad.Student
Department
Staff
Specializationmath
Grad.Student
GPA gt3.9

add condition to the node - explore attribute
information in the tables
add present edge and open node explore
relational properties between the tables

12
Refinements of selection graph
refinement
Specializationmath
Position Professor
Specializationmath
complement refinement
Specializationmath

add condition to the node
add present edge and open node

Position ! Professor
13
Refinements of selection graph
refinement
GPA gt2.0
Specializationmath
Specializationmath
complement refinement
Specializationmath

add condition to the node
add present edge and open node

14
Refinements of selection graph
refinement
Grad.Student
Department
Staff
Specializationmath
Students gt200
Grad.Student
GPA gt3.9
Specializationmath
complement refinement
Specializationmath

add condition to the node
add present edge and open node

15
Refinements of selection graph
refinement
Grad.Student
Department
Staff
Specializationmath
Grad.Student
GPA gt3.9
Specializationmath
complement refinement
Note information gain 0
Specializationmath

add condition to the node
add present edge and open node

16
Refinements of selection graph
refinement
Grad.Student
Department
Staff
Specializationmath
Grad.Student
GPA gt3.9
Specializationmath
complement refinement
Specializationmath

add condition to the node
add present edge and open node

17
Refinements of selection graph
refinement
Grad.Student
Department
Staff
Specializationmath
Grad.Student
GPA gt3.9
Specializationmath
complement refinement
Specializationmath

add condition to the node
add present edge and open node

18
Refinements of selection graph
refinement
Grad.Student
Department
Staff
Specializationmath
Grad.Student
GPA gt3.9
Specializationmath
complement refinement
Specializationmath

add condition to the node
add present edge and open node

19
Look ahead capability
refinement
Grad.Student
Department
Staff
Specializationmath
Grad.Student
GPA gt3.9
Specializationmath
complement refinement
Grad.Student
Department
Staff
Specializationmath
Grad.Student
GPA gt3.9
20
Look ahead capability
refinement
Grad.Student
Department
Staff
Specializationmath
Grad.Student
GPA gt3.9
Students gt 200
Specializationmath
Specializationmath
21
MR decision tree algorithm. Construction phase
Staff

for each non-leaf node
consider all possible refinements and their
complements of the nodes selection graph
choose the best ones based on information gain
criterion
create children nodes

Staff
Staff
Grad.Student
Grad.Student
22
MR decision tree algorithm. Classification phase
Staff

for each leaf
apply selection graph of the leaf to the test
data
classify resulting instances with classification
of the leaf

Staff
Staff
Grad.Student
Grad.Student
Grad.Student

Staff
Staff
Grad. Student
Grad.Student
GPA gt3.9
GPA gt3.9

Staff
Grad. Student
Staff
Grad. Student
GPA gt3.9
Position Professor
..
GPA gt3.9
Department
Department
70-80k
80-100k
Specmath
Specphysics
23
Experimental results. Mutagenesis

Most widely DB used in ILP.
Describes molecules of certain nitro aromatic
compounds.
Goal predict their mutagenic activity (label
attribute) ability to cause DNA to mutate.
High mutagenic activity can cause cancer.
Class distribution.

Compounds Active Inactive Total
Regression friendly 125 63 188
Regression unfriendly 13 29 42
Total 138 92 230

5 levels of background knowledge B0, B1, B2, B3,
B4. They provide richer descriptions of the
examples. The first three levels (B0, B1, B2) are
used only.

24
Experimental results. Mutagenesis

Results of 10-fold cross-validation for
regression friendly set.

Systems Accuracy () Accuracy () Accuracy () Time (secs.) Time (secs.) Time (secs.)
B0 B1 B2 B0 B1 B2
Progol 79 86 86 8595 4627 6530
Progol 76 81 83 117k 64k 42k
FOIL 61 61 83 4950 9138 0.5
TILDE 75 79 85 41 170 142
MRDTL 67 87 88 0.85 332 221

Size of decision trees.

Systems Number of nodes Number of nodes Number of nodes
B0 B1 B2
MRDTL 1 53 51
25
Experimental results. Mutagenesis

Results of leave-one-out cross-validation for
regression unfriendly set.

Background Accuracy Time Nodes
B0 70 0.6 secs. 1
B1 81 86 secs. 24
B2 81 60 secs. 22

Two recent approaches (Sebag and Rauveirol, 1997)
and (Kramer and De Raedt, 2001) using B3 have
achieved 93.6 and 94.7, respectively for
mutagenesis database.

26
Experimental results. KDD Cup 2001

Consists of a variety of details about the
various genes of one particular type of organism.
Genes code for proteins, and these proteins tend
to localize in various parts of cells and
interact with one another in order to perform
crucial functions.
Task Prediction of gene/protein localization (15
possible values)
Target table Gene
Target attribute Localization
862 training genes, 381 test genes.

Challenge many attribute values are missing.
Approach using a special value to encode a
missing value.Result accuracy of 50
Have to find good techniques for filling in
missing values.

27
Experimental results. KDD Cup 2001

Approach Replacing missing values by the most
common value of the attribute for the
class.Results- accuracy of around 85 with a
decision tree of 367 nodes, with no limit in the
number of times an association can be
instantiated.- accuracy of 80, when limiting
the number of times an association can be
instantiated.- accuracy of around 75 is
obtained when following associations only in the
forward direction.
This shows that providing reasonable guesses for
missing values can significantly enhance the
performance of MRDTL on real world data sets.
In practice, since the class labels for test data
are unknown, it is not possible to apply this
method.
Approach Extension of the Naïve Bayes algorithm
for relational dataResult-no improvement
comparing to the first approach
Have to incorporate handling missing values into
decision tree algorithm

28
Experimental results. Adult database

Suitable for propositional learning. One table, 6
numerical attributes, 8 nominal attributes.
Information from 1994 census.
Task determine whether a person makes over 50k a
year.
Class distribution for adult database

Training Training Test Test Total
gt50k lt50k gt50k lt50k
With missing values 7841 24720 3846 12435 48842
W/o missing values 7508 22654 3700 11360 45222

Result after removal of missing values and using
original train/test split 82.2.
Filling missing values with Naïve Bayes approach
yields 83
C4.5 result 84.46

29
Summary

the algorithm is a promising alternative to
existing algorithms, such as Progol, Foil, and
Tilde
the running time is comparable with the best
existing approaches
if equipped with principled approaches to handle
missing values it is an effective algorithm for
learning real-world relational data
the approach is an extension of propositional
learning, and can be successfully applied for
propositional learning
Questions
- why cant we split the data based on the value
of the attribute in arbitrary table right away?
- is there less restrictive and more simple way
of representing the splits of data than selection
graphs?
- the running time for computing the first nodes
in decision tree is much less then for the rest
of the nodes. Is it unavoidable? Can we implement
the same idea more efficiently?

30
Future work

Incorporation of the more sophisticated
techniques for handling missing values
Incorporating of more sophisticated pruning
techniques or complexity regularizations
More extensive evaluation of MRDTL on real-world
data sets
Development of ontology-guided multi-relational
decision tree learning algotihms to generate
classifiers at multiple levels of abstraction
Zhang et al., 2002
Development of variants of MRDTL for
classification tasks where the classes are not
disjoint, based on the recently developed
propositional decision tree counterparts of such
algorithms Caragea et al., 2002
Development of variants of MRDTL that can learn
from heterogeneous, distributed, autonomous data
sources, based on recently developed techniques
for distributed learning and ontology based data
integration