Decision Tree Algorithms - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Decision Tree Algorithms

Description:

C&RT and C4.5 comparison using the golf' dataset (Answer tree Vs Spartacus) Don'tPlay ... CART. Item. 4/11/09. 14. A software which can implement multiple algorithms ... – PowerPoint PPT presentation

Number of Views:384

Avg rating:3.0/5.0

Slides: 23

Provided by: evandrom

Category:

more less

Transcript and Presenter's Notes

Title: Decision Tree Algorithms

1
Decision Tree Algorithms

Brief review

2
What will be discussed

Some definitions.
Algorithms.
There is no perfect algorithm.
The importance of having a software which can
implement multiple algorithms for the same
dataset. (The meta-learner, and the
meta-meta-learner)
The software being developed.

3
Some definitions

Variables
Continuous its measured values are real numbers
(ex. 73.827, 23).
Categorical takes values in a finite set not
having any natural ordering (ex. black, red,
green).
Ordered finite set, with some way of sorting the
elements of the set. (ex. age in years, interval
of integer numbers, 01/09/2004).
Dependent variable or set of classes The aspect
of the data to be studied.
Independent variable or set of attributes
Variables that are manipulated to explain the
dependent variable.
Regression-type problems. Ex House selling price
( value)
Classification-type problems. Ex Who will
graduate (yes, no)

4
CRT

CRT family CRT, tree (S), etc.
Motivation Classification-type problems and
Regression-type problems.
Exactly two branches from each nonterminal node.
Split attribute can be continuous or categorical.
Independent variables can be categorical, ordered
or continuous.
Many times splitting a node in more than 2
splits creates more parsimonious models.

5
C4.5

CLS family CLS, ID3, C4.5, etc.
Motivation Concept learning (Classification-type
problems)
Usually creates parsimonious trees.
Great for categorical dependent variables.
In the original version, the number of branches
is equal the number of attributes of the
independent variable to be split, if the
independent variable is not continuous. However,
there are other adaptations of that.
Independent variables are nominal only. In some
circumstances it is acceptable to divide the
continuous variables in discrete bands as
workaround for that issue. Ex LOS bands A0,
0.5) B0.5, 1) C1,2) etc

6
C4.5

Idea
Select a leaf node with an inhomogeneous sample
set.
Replace that leaf node by a test node that
divides the inhomogeneous sample set into
minimally inhomogeneous subsets, according to an
entropy calculation.

7
C4.5

Entropy Formulae
Entropy, a measure from information theory,
characterizes the (im)purity, or homogeneity, of
an arbitrary collection of examples.
Given
nb, the number of instances in branch b.
nbc, the number of instances in branch b of class
c. Of course, nbc is less than or equal to nb
nt, the total number of instances in all
branches.
If all the instances on the branch are positive,
then Pb 1 (homogeneous positive)
If all the instances on the branch are negative,
then Pb 0 (homogeneous negative)

8
C4.5

As you move from perfect balance and perfect
homogeneity, entropy varies smoothly between zero
and one.
The entropy is zero when the set is perfectly
homogeneous.
The entropy is one when the set is perfectly
inhomogeneous.

9
There is no perfect algorithm.

Examples
C4.5, THAID and QUEST are classification
algorithm only.
AID, MAID, and XAID are for quantitative
responses only.
The CRT does both. However, is a slow algorithm
and it will always yield binary trees, which can
sometimes not be summarised efficiently.
The QUEST does not do regression. It is very
fast, unfortunately uses a lot of memory for
large datasets.

10
CRT and C4.5 comparison using the golf dataset
(Answer tree Vs Spartacus)
11
CRT (golf dataset)
12

13
Basic comparison results
14
A software which can implement multiple algorithms

The software will be able to run the different
algorithms for the same dataset.
Trees generated from different algorithms will be
created and will be compared. The user will be
able to visually compare them, or to pick the one
that has the inferior misclassification rate.
Depending on the nature of the problem
(classification or regression) a specific
algorithm can be much more efficient.

15
A software which can implement multiple algorithms

The meta-learner
The user will choose the dataset and the
variables.
A trial of different runs, using combinations of
different methods will be the input of a neural
network (the meta-learner).

16
Set of rules
C1 CRT
Data quality
Meta-learner
Optimal data quality
CPU time Memory utilisation
Dataset
Neural network
simpler rules
Total CPU time
CPU time Memory utilisation
C2 QUEST
Data quality
Memory utilisationS memory(c) / CPU(c) c
-------------------------------Total time
Set of rules
17
The meta-meta-learners
Meta-Learner 1 CRT
User defined could be a function likeBest
meta-learner DataQuality A Simpler rules
B - Memory C - Time D
Meta-Learner2 Neural networkLinear discriminant
Dataset
Neural network(probably not necessary)
Meta-Learner 3 Relation rulesC4.5STR-Tree
18
The meta-meta-learners user input and output
Input
Output
Dataset name? NHSDependent variables? LOS,
OUTCOME, STROKE How much do you care
aboutData quality (0-99) Parsimonious models
(0-99)Time to process (0-99)Memory utilisation
(0-99)
The best meta-learner for youis a combination
of C4.5, ANN and Relation rules.These are the
best rules1- IF HEART ATTACK and AGE gt 90
then DEAD (error 3)2- Everybody that has STOKE
also has HIGH BLOOD PRESSURE 3- AGE 2.3
APACHE2 0.4 LOS (error 25)
19
A software which can implement multiple algorithms

Once the best meta-learner is found for a given
situation, dataset and dependent variable, the
user can define this meta-learner as the one to
be executed in similar situations.
Ex To find the out the patients LOS in the ICU
datasets the ML3(CRT) will be used. However to
find out the outcome of the patient (died or
survived) the ML103(C4.5, relation rules) will be
used.

20
Work in progress

Algorithm fully implemented in the system
ID3.
Algorithm partially implemented in the system
C4.5 (missing grouping of categorical
attributes, pruning, classification error and
missing attributes handling).
Algorithms to be implemented
CRT, CHAID and Spartacus (PhD thesis)
Future implementations of neural networks aspects
like, tree automatic adaptation based on recent
inputs. Various neural network architectures are
also applicable to solve regression-type
problems.
Any other suggestion

21
Work in progress

Capacity to handle large datasets. (memory
optimisation)
VirtualTable concept. No unnecessary data copy
for nodes.
Sub datasets on the fly. (speed optimisation)
Instead of creating a sub VirtualTables for each
set of data that will be used to test the split,
the software tests for splits in the parent node
on the fly.
It makes tests a little bit more complex, but
speed up the system
Memory access for items.
No I/O delay when the dataset is less than
350Mbytes for computers with 512Mbytes of RAM.
(Theoretical, has never been tested due lack of
time)
Nodes data visualisation.
Comma separated values (.csv) files, dBase tables
and MS excel spreadsheet support.
In a near future
Tree pruning.