Title: Near-Minimax Optimal Learning
1Near-Minimax Optimal Learning with Decision Trees
Rob Nowak and Clay Scott
University of Wisconsin-Madison and Rice
University
nowak_at_engr.wisc.edu
Supported by the NSF and the ONR
2Basic Problem
Classification build a decision rule based on
labeled training data
Given n training points, how well can we do ?
3Smooth Decision Boundaries
Suppose that the Bayes decision boundary behaves
locally like a Lipschitz function
Mammen Tsybakov 99
4Dyadic Thinking about Classification Trees
recursive dyadic partition
5Dyadic Thinking about Classification Trees
Pruned dyadic partition
Pruned dyadic tree
Hierarchical structure facilitates optimization
6The Classification Problem
Problem
7Classifiers
The Bayes Classifier
Minimum Empirical Risk Classifier
8Generalization Error Bounds
9Generalization Error Bounds
10Generalization Error Bounds
11Selecting a good h
12Convergence to Bayes Error
13Ex. Dyadic Classification Trees
Bayes decision boundary
labeled training data
pruned RDP
complete RDP
Dyadic classification tree
14Codes for DCTs
code-lengths
ex
code 0001001111 6 bits for leaf labels
15Error Bounds for DCTs
16Rate of Convergence
Suppose that the Bayes decision boundary behaves
locally like a Lipschitz function
Mammen Tsybakov 99
C. Scott RN 02
17Why too slow ?
because Bayes boundary is a (d-1)-dimensional
manifold good trees are
unbalanced
all T leaf trees are equally favored
18Local Error Bounds in Classification
Spatial Error Decomposition
Mansour McAllester 00
19Relative Chernoff Bound
20Relative Chernoff Bound
21Local Error Bounds in Classification
22Bounded Densities
23Global vs. Local
Key local complexity is offset by small volumes!
24Local Bounds for DCTs
25Unbalanced Tree
Global bound
J leafs depth J-1
Local bound
26Convergence to Bayes Error
27Concluding Remarks
data dependent bound
Neural Information Processing Systems 2002, 2003
nowak_at_engr.wisc.edu