DECISION TREES - PowerPoint PPT Presentation

1 / 16

About This Presentation

Title:

DECISION TREES

Description:

Thm: For any n 1 and m 16n2log (5n), above is a =1/(4n)-weak learner, i.e. EZm ... Proof: Each training example label agrees with (|S| 1)/2 of the bits in S, a ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 17

Provided by: AdamK71

Category:

more less

Transcript and Presenter's Notes

Title: DECISION TREES

1
DECISION TREES NOISY/R-BOOSTING

LECTURE 6

2
BOOSTING EXAMPLE

X 0,1n
Fn f(x) MAJi2S xi S µ n, S odd
Weak learner L output h(x)xi with min emp-err
Thm For any n1 and m16n2log (5n), above is a
?1/(4n)-weak learner, i.e. EZmerr(h) ½?.
Proof Each training example label agrees with
(S1)/2 of the bits in S, a ½ 1/(2n)
fraction.
) emp-err(h) ½1/(2n). From Lecture 4,
Emaxf2F err(f)emp-err(f)(log(5F)/m)½

1/(4n)
3
BOOSTING EXAMPLE

X 0,1n
Fn f(x) MAJi2S xi S µ n, S odd
Weak learner L output h(x)xi with min emp-err
Therefore, boost(L) AC-learns Fn

4
DECISION TREE EXAMPLE
x1 10
X R 0,1,2 R Y 0,1 ? over X
Y err(f)P(x,y)?f(x)?y
Y
N
x2
x3 ½
1
2
N
0
Y
0
x3 ¼
1
0
1
N
Y
f(x)(x1¼)Ç (x110Æx3½)
0
1
SIZE-10 DECISION TREE T X ! Y
5
REGRESSION TREE EX.
x1 10
X R 0,1,2 R Y 0,1 ? over X
Y err(f)?
Y
N
x2
x3 ½
1
2
N
0
Y
0.2
x3 ¼
0.2
0.8
0.7
N
Y
0.2
0.8
SIZE-10 REGRESSION TREE T X ! Y
6
SQUARED ERROR

Suppose P?y1¼, P?y0¾ (regardless of x)
E(x,y)?f(x)-y?
f(x)¼ ) Ef(x)-y ¼ ¾ ¾ ¼ 3/8
f(x)0 ) Ef(x)-y¼
E(x,y)?(f(x)-y)2 minc20,1 E(c-y)2 at
cEy¼
Def ?(x)E(x,y)?yx E(x,y)?(f(x)-y)2
E(f(x)-?(x))2 E(y-?(x))2
(homework)

?(F)E(x,y)?yxF
Evariance
7
R BATCH LEARNING

Set X, Y 0,1
Family F of f X ! Y
Distribution ? over X Y
Emp-err(f) (1/m)?i (f(xi)-yi)2
Define ?(x)Eyx
err(f) E(x,y)?(f(x)-?(x))2
err(f) E(f(x)-y)2-E(y-?(x))2
Assume ? 2 F

Special binary cases
P(x,y)? y 2 0,1 1
Noiseless 8f2F, f X ! 0,1
Random noise 8f2F, f X ! ?,1-?

8
GROWING TREESTOP-DOWN
9
DATA CALIBRATION
x1 10
Data calibration minimizes empirical error
Y
N
x2 4
0.5
N
Y
Proof
0.2
x3 ¼
N
Y
0.2
0.8
emp-err(T,Zm)

Value at each L is mean of training data in L.

10
DATA CALIBRATION
x1 10
Data calibration minimizes empirical error
Y
N
x2 4
0.5
N
Y
0.2
x3 ¼
Any split can only reduce empiricalerror
N
Y
0.2
0.8

Value at each L is mean of training data in L.

11
DATA CALIBRATION
x1 10
Data calibration minimizes empirical error
Y
N
x2 4
x3 ½
N
Y
N
Y
0.2
x3 ¼
?
?
Any split can only reduce empirical error
N
Y
Proof calibrationis better than keeping old
value
0.2
0.8

Value at each L is mean of training data in L.

12
TOP-DOWN ALGORITHM
internal nodes

Input Zm 2 (Rn 0,1)m, size s 1
Output (Binary) decision tree
Start with one-node tree.
For i1 to s
Find the split (over any leaf L) that results in
calibrated tree of smallest emp-error.
Make the split.
For a decision tree, round each value to 0,1.

Runtime 1? poly(n,m)?
13
EMP-ERR IMPURITY
emp-err(T,Zm)
impurity(T)
g(Zm)
14
TOP-DOWN ALGORITHM

Input Zm 2 (Rn 0,1)m, size s 1
Output (Binary) decision tree
Start with one-node tree.
For i1 to s
Find the split (over any leaf L) that results in
calibrated tree of largest impurity decrease.
Make the split.

Or other splitting criteria, e.g., information
gain (entropy)
15
WHAT SIZE TREE?
16
WHAT SIZE TREE?