Decision Trees - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Decision Trees

Description:

Scanned from Pattern Classification by Duda, Hart, and Stork. 15. 8/26/09 ... Richard O. Duda, Peter E. Hart und David G. Stork (2000): Pattern Classification. ... – PowerPoint PPT presentation

Number of Views:202
Avg rating:3.0/5.0
Slides: 28
Provided by: Cra58
Category:
Tags: decision | stork | trees

less

Transcript and Presenter's Notes

Title: Decision Trees


1
Decision Trees
Klassifikations- und Clustering-Methoden für die
ComputerlinguistikSabine Schulte im Walde, Irene
Cramer, Stefan SchachtUniversität des
Saarlandes, Winter 2004/2005
2
Outline
  • Example
  • What are decision trees?
  • Some characteristics
  • A (tentative) definition
  • How to build them?
  • Lots of questions
  • Discussion
  • Advantages disadvantages
  • When should we use them?

3
Illustration Classification example
Remember example at the black board
4
DiscussionIllustration Results
  • Lets gather some characteristics of our decision
    tree
  • binäre Entscheidungsfragen (ja/nein-Frage)
  • nicht unbedingt balanciert
  • Baumtiefe beliebig, aber abhängig von
    gewünschter Feinheit
  • Merkmale und Klassen stehen fest
  • annotierte Daten
  • Nominale und ordinale Merkmale
  • What questions did arise?
  • Größe nicht mit ja/nein antworten
  • Reihenfolge, davon auch abh. Baum unbalanciert

5
Illustration Results
  • Lets gather some characteristics of our decision
    tree
  • annotated data at hand
  • look for clever features (knowledge about
    features)
  • at each node tree splits data in subsets ? decide
    whether further grow the tree or stop
  • set of rules ? rule at each node
  • binary ? thus answer yes or no at each step
  • nominal features but also real valued possible
  • tree rule set
  • What questions did arise?
  • over fitting possible if impurity at each node 0
    ?
  • when to prune?

6
Our First Definition
  • A decision tree is a graph
  • It consists of nodes, edges and leafs
  • nodes ? questions about features
  • edges ? possible value of a feature
  • leafs ? class labels
  • Path from root to leaf ? conjunction of questions
    (rules)
  • A decision tree is learned by splitting the
    source data into subsets based on features/rules
    (how we will see later on)
  • This process is repeated recursively until
    splitting is either non-feasible, or a singular
    classification can be applied to each element of
    the derived subset

7
Building Decision Trees
  • We meet a lot of questions while building/using
    decision trees
  • Should we only allow binary questions? Why?
  • Which features (properties) should we use? Thus,
    what questions should we ask?
  • Under what circumstances is a node a leaf?
  • How large should our tree become?
  • How should the category labels be assigned?
  • What should we do with corrupted data?

8
Only Binary Questions?
Taken from the web http//www.smartdraw.com/resou
rces/examples/business/images/decision_tree_diagra
m.gif
9
Only Binary Questions?
Taken from the web http//www.cs.cf.ac.uk/Dave/AI
2/dectree.gif
10
Only Binary Questions?
  • Branching factor how many edges do we have?
  • Binary ? branching factor 2
  • All decision trees can be converted into binary
    ones
  • Binary trees are very expressive
  • Binary decision trees are simpler to train
  • With a binary tree 2n possible classifications
    (n is number of features)

11
What Questions Should We Ask?
  • Try to follow Ockhams Razor ? prefer the
    simplest model thus prefer those
    features/questions that lead to a simple tree
    (not very helpful?)

12
What Questions Should We Ask?
  • Measure impurity at each split
  • Impurity (i(N))
  • Metaphorically speaking, shows how many different
    classes we have at each node
  • Best would be just one class ? leaf
  • Some impurity measures
  • Entropy Impurity
  • Gini Impurity
  • Misclassification Impurity

13
What Questions Should We Ask?
  • Entropy Impurity
  • Gini Impurity
  • Misclassification Impurity
  • where P(?j) is the fraction of patterns at node N
    that are
  • in class ?j

14
Illustration
Scanned from Pattern Classification by Duda,
Hart, and Stork
15
What Questions Should We Ask?
  • Calculate best question/rule at a node
  • where NL and NR are the left and right descendent
    nodes i(NL) and i(NR)
  • are their impurities and PL is the fraction of
    patterns in node N that will go to
  • NL when this question is used
  • ?i(N) should be as high as possible
  • Most common Entropy Impurity

16
What Questions Should We Ask?
  • Additional information about questions
  • monothetic (i.e. nominal) vs. polythetic (i.e.
    real valued)
  • we now understand why binary trees are simpler
  • Keep in mind a local optimum isnt necessarily a
    global one!

17
When to Declare Node Leaf?
  • On the one hand on the other
  • if i(N) near 0 ? over fit (possible)
  • tree to small ? (highly) erroneous classification
  • 2 solutions
  • stop before i(N) 0 ? how to decide when?
  • pruning ? how?

18
When to Declare Node Leaf?
  • When to stop growing?
  • Cross validation
  • split training data in two subsets
  • train with bigger set
  • validate with smaller
  • ?i(N) lt threshold
  • get unbalanced tree
  • what threshold is reasonable?
  • P(NL), P(NR) lt threshold
  • reasonable thresholds 5, 10 of data
  • advantage good partition where high data density
  • ?i(N) ? 0 ? significant
  • Hypothesis testing

19
Large Tree vs. Small Tree?
  • Tree to large? Prune!
  • first grow the tree fully, then cut
  • cut those nodes/leafs where i(N) is very small
  • avoid horizon effect
  • Tree to large? Merge branches or rules!

20
When to assign Category Label ? Leaf?
  • If i(N) 0, then category label is class of all
    objects
  • If i(N) gt 0, then category label is class of most
    objects

21
DiscussionWe have learnt until now.
  • Merkmale von Entscheidungsbäumen
  • Entscheidungsfragen gestellt
  • Unterschied zwischen entropy impurity und gini
    impurity?
  • Frage nach optimalem Baum nicht geklärt - NP
    vollständig Problem

22
Examples
Scanned from Pattern Classification by Duda,
Hart, and Stork
23
Examples
Scanned from Pattern Classification by Duda,
Hart, and Stork
24
Examples
Scanned from Pattern Classification by Duda,
Hart, and Stork
25
What to do with Corrupted Data?
  • Missing attributes
  • during classification
  • look for surrogate questions
  • use virtual value
  • during training
  • calculate impurity of basis of attributes at hand
  • dirty solution dont consider data with missing
    attributes

26
Some Terminology
  • CART (classification and regression trees)
  • general framework ? instances in many ways
  • see questions on slide before
  • ID3
  • for unordered nominal attributes (if real valued
    variables ? intervals)
  • seldom binary
  • algorithm continues until nodes is pure or no
    more variables left
  • no pruning
  • C4.5
  • refinement of ID3 (in various aspects, i.e.
    real-valued variables, pruning etc.)

27
Advantages Disadvantages
  • Advantages of decision trees
  • non-metric data (nominal features) ?yes/no
    questions
  • easily interpretable for humans
  • information in tree can be converted in rules
  • include expertise
  • Disadvantages of decision trees
  • deduced rules can be very complex
  • decision tree could be suboptimal (i.e. cross
    check, over fitting)
  • need annotated data

28
Discussion When could we use decision trees?
  • Named Entity Recognition
  • Verbklassifikation
  • Polysemie
  • Spamfilter
  • immer dann, wenn nomiale Merkmale
  • POS

29
Literature
  • Richard O. Duda, Peter E. Hart und David G. Stork
    (2000) Pattern Classification. John Wiley
    Sons, New York.
  • Tom M. Mitchell (1997) Machine Learning.
    McGraw-Hill, Boston.
  • www.wikipedia.org
Write a Comment
User Comments (0)
About PowerShow.com