DECISION TREES - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

DECISION TREES

Description:

Thm: For any n 1 and m 16n2log (5n), above is a =1/(4n)-weak learner, i.e. EZm ... Proof: Each training example label agrees with (|S| 1)/2 of the bits in S, a ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 17
Provided by: AdamK71
Category:
Tags: decision | trees | atk

less

Transcript and Presenter's Notes

Title: DECISION TREES


1
DECISION TREES NOISY/R-BOOSTING
  • LECTURE 6

2
BOOSTING EXAMPLE
  • X 0,1n
  • Fn f(x) MAJi2S xi S µ n, S odd
  • Weak learner L output h(x)xi with min emp-err
  • Thm For any n1 and m16n2log (5n), above is a
    ?1/(4n)-weak learner, i.e. EZmerr(h) ½?.
  • Proof Each training example label agrees with
    (S1)/2 of the bits in S, a ½ 1/(2n)
    fraction.
  • ) emp-err(h) ½1/(2n). From Lecture 4,
  • Emaxf2F err(f)emp-err(f)(log(5F)/m)½

1/(4n)
3
BOOSTING EXAMPLE
  • X 0,1n
  • Fn f(x) MAJi2S xi S µ n, S odd
  • Weak learner L output h(x)xi with min emp-err
  • Therefore, boost(L) AC-learns Fn

4
DECISION TREE EXAMPLE
x1 10
X R 0,1,2 R Y 0,1 ? over X
Y err(f)P(x,y)?f(x)?y
Y
N
x2
x3 ½
1
2
N
0
Y
0
x3 ¼
1
0
1
N
Y
f(x)(x1¼)Ç (x110Æx3½)
0
1
SIZE-10 DECISION TREE T X ! Y
5
REGRESSION TREE EX.
x1 10
X R 0,1,2 R Y 0,1 ? over X
Y err(f)?
Y
N
x2
x3 ½
1
2
N
0
Y
0.2
x3 ¼
0.2
0.8
0.7
N
Y
0.2
0.8
SIZE-10 REGRESSION TREE T X ! Y
6
SQUARED ERROR
  • Suppose P?y1¼, P?y0¾ (regardless of x)
  • E(x,y)?f(x)-y?
  • f(x)¼ ) Ef(x)-y ¼ ¾ ¾ ¼ 3/8
  • f(x)0 ) Ef(x)-y¼
  • E(x,y)?(f(x)-y)2 minc20,1 E(c-y)2 at
    cEy¼
  • Def ?(x)E(x,y)?yx E(x,y)?(f(x)-y)2
    E(f(x)-?(x))2 E(y-?(x))2
  • (homework)

?(F)E(x,y)?yxF
Evariance
7
R BATCH LEARNING
  • Set X, Y 0,1
  • Family F of f X ! Y
  • Distribution ? over X Y
  • Emp-err(f) (1/m)?i (f(xi)-yi)2
  • Define ?(x)Eyx
  • err(f) E(x,y)?(f(x)-?(x))2
  • err(f) E(f(x)-y)2-E(y-?(x))2
  • Assume ? 2 F
  • Special binary cases
  • P(x,y)? y 2 0,1 1
  • Noiseless 8f2F, f X ! 0,1
  • Random noise 8f2F, f X ! ?,1-?

8
GROWING TREESTOP-DOWN
9
DATA CALIBRATION
x1 10
Data calibration minimizes empirical error
Y
N
x2 4
0.5
N
Y
Proof
0.2
x3 ¼
N
Y
0.2
0.8
emp-err(T,Zm)
  • Value at each L is mean of training data in L.

10
DATA CALIBRATION
x1 10
Data calibration minimizes empirical error
Y
N
x2 4
0.5
N
Y
0.2
x3 ¼
Any split can only reduce empiricalerror
N
Y
0.2
0.8
  • Value at each L is mean of training data in L.

11
DATA CALIBRATION
x1 10
Data calibration minimizes empirical error
Y
N
x2 4
x3 ½
N
Y
N
Y
0.2
x3 ¼
?
?
Any split can only reduce empirical error
N
Y
Proof calibrationis better than keeping old
value
0.2
0.8
  • Value at each L is mean of training data in L.

12
TOP-DOWN ALGORITHM
internal nodes
  • Input Zm 2 (Rn 0,1)m, size s 1
  • Output (Binary) decision tree
  • Start with one-node tree.
  • For i1 to s
  • Find the split (over any leaf L) that results in
    calibrated tree of smallest emp-error.
  • Make the split.
  • For a decision tree, round each value to 0,1.

Runtime 1? poly(n,m)?
13
EMP-ERR IMPURITY
emp-err(T,Zm)
impurity(T)
g(Zm)
14
TOP-DOWN ALGORITHM
  • Input Zm 2 (Rn 0,1)m, size s 1
  • Output (Binary) decision tree
  • Start with one-node tree.
  • For i1 to s
  • Find the split (over any leaf L) that results in
    calibrated tree of largest impurity decrease.
  • Make the split.

Or other splitting criteria, e.g., information
gain (entropy)
15
WHAT SIZE TREE?
16
WHAT SIZE TREE?
  • Theory take size(T) m.
  • Practice divide Zm into (A,B) of size
    (0.9m,0.1m).
  • Stopping criteria
  • Build the tree for s1,2, on A.
  • Among all trees generated, choose the one that
    minimizes error on B.
  • Pruning
  • Build the complete tree T on A. (s1)
  • Use sub-tree with min error on B. (efficient
    bottom-up)

17
BOOSTING D.T. NOISE
Dietterich99
  • AdaBoost performs poorly with noise
  • AdaBoosts feature quickly identifies outliers
    by putting (exponentially) large weight on them

18
DECISION TR. BOOSTING
  • Replace (xi ?) with weak learner output
  • Natural divide and conquer boosting
  • Problem boost to learn f(x) MAJi2n xi
  • Need exponential size tree!
  • Solution merge nodes (graph instead of tree)

19
DECISION GRAPH
  • X 0,15
  • f(x) MAJ(x1,x2,x3,x4,x5)

x1 ½
x2 ½
x2 ½
x3 ½
x3 ½
x3 ½
0
x4 ½
x4 ½
1
0
x5 ½
1
0
1
Write a Comment
User Comments (0)
About PowerShow.com