Decision Trees - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Decision Trees

Description:

Scanned from Pattern Classification by Duda, Hart, and Stork. 15. 8/26/09 ... Richard O. Duda, Peter E. Hart und David G. Stork (2000): Pattern Classification. ... – PowerPoint PPT presentation

Number of Views:202

Avg rating:3.0/5.0

Slides: 28

Provided by: Cra58

Category:

more less

Transcript and Presenter's Notes

Title: Decision Trees

1
Decision Trees
Klassifikations- und Clustering-Methoden für die
ComputerlinguistikSabine Schulte im Walde, Irene
Cramer, Stefan SchachtUniversität des
Saarlandes, Winter 2004/2005
2
Outline

Example
What are decision trees?
Some characteristics
A (tentative) definition
How to build them?
Lots of questions
Discussion
Advantages disadvantages
When should we use them?

3
Illustration Classification example
Remember example at the black board
4
DiscussionIllustration Results

Lets gather some characteristics of our decision
tree
binäre Entscheidungsfragen (ja/nein-Frage)
nicht unbedingt balanciert
Baumtiefe beliebig, aber abhängig von
gewünschter Feinheit
Merkmale und Klassen stehen fest
annotierte Daten
Nominale und ordinale Merkmale
What questions did arise?
Größe nicht mit ja/nein antworten
Reihenfolge, davon auch abh. Baum unbalanciert

5
Illustration Results

Lets gather some characteristics of our decision
tree
annotated data at hand
look for clever features (knowledge about
features)
at each node tree splits data in subsets ? decide
whether further grow the tree or stop
set of rules ? rule at each node
binary ? thus answer yes or no at each step
nominal features but also real valued possible
tree rule set
What questions did arise?
over fitting possible if impurity at each node 0
?
when to prune?

6
Our First Definition

A decision tree is a graph
It consists of nodes, edges and leafs
nodes ? questions about features
edges ? possible value of a feature
leafs ? class labels
Path from root to leaf ? conjunction of questions
(rules)
A decision tree is learned by splitting the
source data into subsets based on features/rules
(how we will see later on)
This process is repeated recursively until
splitting is either non-feasible, or a singular
classification can be applied to each element of
the derived subset

7
Building Decision Trees

We meet a lot of questions while building/using
decision trees
Should we only allow binary questions? Why?
Which features (properties) should we use? Thus,
what questions should we ask?
Under what circumstances is a node a leaf?
How large should our tree become?
How should the category labels be assigned?
What should we do with corrupted data?

8
Only Binary Questions?
Taken from the web http//www.smartdraw.com/resou
rces/examples/business/images/decision_tree_diagra
m.gif
9
Only Binary Questions?
Taken from the web http//www.cs.cf.ac.uk/Dave/AI
2/dectree.gif
10
Only Binary Questions?

Branching factor how many edges do we have?
Binary ? branching factor 2
All decision trees can be converted into binary
ones
Binary trees are very expressive
Binary decision trees are simpler to train
With a binary tree 2n possible classifications
(n is number of features)

11
What Questions Should We Ask?

Try to follow Ockhams Razor ? prefer the
simplest model thus prefer those
features/questions that lead to a simple tree
(not very helpful?)

12
What Questions Should We Ask?

Measure impurity at each split
Impurity (i(N))
Metaphorically speaking, shows how many different
classes we have at each node
Best would be just one class ? leaf
Some impurity measures
Entropy Impurity
Gini Impurity
Misclassification Impurity

13
What Questions Should We Ask?

Entropy Impurity
Gini Impurity
Misclassification Impurity
where P(?j) is the fraction of patterns at node N
that are
in class ?j

14
Illustration
Scanned from Pattern Classification by Duda,
Hart, and Stork
15
What Questions Should We Ask?

Calculate best question/rule at a node
where NL and NR are the left and right descendent
nodes i(NL) and i(NR)
are their impurities and PL is the fraction of
patterns in node N that will go to
NL when this question is used
?i(N) should be as high as possible
Most common Entropy Impurity

16
What Questions Should We Ask?

Additional information about questions
monothetic (i.e. nominal) vs. polythetic (i.e.
real valued)
we now understand why binary trees are simpler
Keep in mind a local optimum isnt necessarily a
global one!

17
When to Declare Node Leaf?

On the one hand on the other
if i(N) near 0 ? over fit (possible)
tree to small ? (highly) erroneous classification
2 solutions
stop before i(N) 0 ? how to decide when?
pruning ? how?

18
When to Declare Node Leaf?

When to stop growing?
Cross validation
split training data in two subsets
train with bigger set
validate with smaller
?i(N) lt threshold
get unbalanced tree
what threshold is reasonable?
P(NL), P(NR) lt threshold
reasonable thresholds 5, 10 of data
advantage good partition where high data density
?i(N) ? 0 ? significant
Hypothesis testing

19
Large Tree vs. Small Tree?

Tree to large? Prune!
first grow the tree fully, then cut
cut those nodes/leafs where i(N) is very small
avoid horizon effect
Tree to large? Merge branches or rules!

20
When to assign Category Label ? Leaf?

If i(N) 0, then category label is class of all
objects
If i(N) gt 0, then category label is class of most
objects

21
DiscussionWe have learnt until now.

Merkmale von Entscheidungsbäumen
Entscheidungsfragen gestellt
Unterschied zwischen entropy impurity und gini
impurity?
Frage nach optimalem Baum nicht geklärt - NP
vollständig Problem

22
Examples
Scanned from Pattern Classification by Duda,
Hart, and Stork
23
Examples
Scanned from Pattern Classification by Duda,
Hart, and Stork
24
Examples
Scanned from Pattern Classification by Duda,
Hart, and Stork
25
What to do with Corrupted Data?

Missing attributes
during classification
look for surrogate questions
use virtual value
during training
calculate impurity of basis of attributes at hand
dirty solution dont consider data with missing
attributes

26
Some Terminology

CART (classification and regression trees)
general framework ? instances in many ways
see questions on slide before
ID3
for unordered nominal attributes (if real valued
variables ? intervals)
seldom binary
algorithm continues until nodes is pure or no
more variables left
no pruning
C4.5
refinement of ID3 (in various aspects, i.e.
real-valued variables, pruning etc.)

27
Advantages Disadvantages

Advantages of decision trees
non-metric data (nominal features) ?yes/no
questions
easily interpretable for humans
information in tree can be converted in rules
include expertise
Disadvantages of decision trees
deduced rules can be very complex
decision tree could be suboptimal (i.e. cross
check, over fitting)
need annotated data

28
Discussion When could we use decision trees?

Named Entity Recognition
Verbklassifikation
Polysemie
Spamfilter
immer dann, wenn nomiale Merkmale
POS

29
Literature

Richard O. Duda, Peter E. Hart und David G. Stork
(2000) Pattern Classification. John Wiley
Sons, New York.
Tom M. Mitchell (1997) Machine Learning.
McGraw-Hill, Boston.
www.wikipedia.org

Write a Comment

User Comments (0)