Title: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)
1Decision Trees with Minimal Costs(ICML 2004,
Banff, Canada)
- Charles X. Ling, Univ of Western Ontario, Canada
- Qiang Yang, HK UST, Hong Kong
- Jianning Wang, Univ of Western Ontario, Canada
- Shichao Zhang, UTS, Australia
- Contact cling_at_csd.uwo.ca
2Outline
- Introduction
- Building Trees with Minimal Total Costs
- Testing Strategies
- Experiments and Results
- Conclusions
3Costs in Machine Learning
- Most inductive learning algorithms minimizing
classification errors - Different types of misclassification have
different costs, e.g. FP and FN - In this talk
- Test costs should also be considered
- Cost sensitive learning considers a variety of
costs see survey by Peter Turney (2000)
4Applications
- Medical Practice
- Doctors may ask a patient to go through a number
of tests (e.g., Blood tests, X-rays) - Which of these new tests will bring about higher
value? - Biological Experimental Design
- When testing a new drug, new tests are costly
- which experiments to perform?
5Previous Work
- Many previous works consider the two types of
cost separately an obvious oversight - (Turney 1995) ICET, uses genetic algorithm to
build trees to minimize the total cost - (Zubek and Dieterrich 2002) a Markov Decision
Process (MDP), searches in a state space for
optimal policies - (Greiner et al. 2002) PAC learning
6An Example of Our Problem
- Training with ?, cannot obtain values
IDC1 FeverC2 X-rayC3 Blood_1C4 Blood_2C5 D
12 101 ? H ? Yes
23 ? L M L No
Goal 1 build a tree that minimizes the total cost
Test with many ?, may obtain values at a cost
IDC1 FeverC2 X-rayC3 Blood_1C4 Blood_2C5 D
45 98 ? ? ? ?
58 ? ? ? ? ?
Goal 2 obtain test values at a cost to minimize
the total cost
7Outline
- Introduction
- Building Trees with Minimal Total Costs
- Testing Strategies
- Experiments and Results
- Conclusions
8Building Trees with Minimal Total Costs
- Assumption binary classes, costs FP and FN
- Goal minimize total cost
- Total cost misclassification cost test cost
- Previous Work
- Information Gain as a attribute selection
criterion - In this work, need a new attribute selection
criterion
9Attribute Selection Criterion C4.5
- Minimal total cost (C4.5 minimal entropy)
- If growing a tree has a smaller total costthen
choose an attribute with minimal total costelse
stop and form a leaf
10- Label leaf according to minimal total cost
- If (PFN ? NFP) then class positiveelse
class negative
11Difference on ? values
- First, how to handle ? values in training data
- Previous work
- built ? branch
- problematic
- This work
- deal with unknown values in the training set
- no branch for ? will be built,
- examples are gathered inside the internal nodes
12A Tree Building Example???
If P0?FN gt N0?FP, then Total cost N0FP if no
split further
PN
P0N0 with ?
Potential attribute A with a test cost C
2
1
P1N1
P2N2 -
Total cost total test cost total
misclassification cost Total test cost
(P1N1P2N2)C Total misclassification cost
N1FP P2FN N0FP
13Desirable Properties
- 1. Effect of difference between misclassification
costs and the test costs
14- 2. Prefer attribute with smaller test costs
15- 3. If test cost increases, attribute tends to be
pushed down and falls out of the tree
16Outline
- Introduction
- Building Trees with Minimal Total Costs
- Testing Strategies
- Experiments and Results
- Conclusions
17Missing values in test cases
A New patient arrives
Blood test X-ray result Urine test S-test
? good ? ?
18OST Intuition
- Explain the intuition of OST here
19Four Testing Strategies
- First Optimal Sequential Test (OST)(Simple
batch test do all tests) - Second No test will be performed, predict with
internal node - Third No test will be performed, predict with
weighted sum of subtrees - Fourth A new tree is built dynamically for each
test case using only the known attributes
20Six Testing Strategies (5 - 6)
- Fifth Batch test. When the test case is stopped
at the first unknown attribute, all the unknown
values in its sub-tree will be tested - Sixth always test the first unknown attribute
- Baseline Sequential test in C4.5
21Outline
- Introduction
- Building Trees with Minimal Total Costs
- Testing Strategies
- Experiments and Results
- Conclusions
22Experiment - settings
- Five dataset, binary-class
- 60/40 for training/testing, repeat 5 times
- Unknown values for training/test examples are
selected randomly by a specific probability - Also compare to C4.5 tree, using OST for testing
23Results with different of unknown
No test, distributed
- OST is best M4 and C4.5 next M3 is worst
- OST not increase with more ? others do overall
24Results with different test costs
No test, distributed
- With large test costs, OST M2 M3 M4
- C4.5 is much worse (tree building is
cost-insensitive)
25Results with unbalanced class costs
- With large test costs, OST M2 M4
- C4.5 is much worse (tree building is
cost-insensitive) - M3 is worse than M2 (M3 is used in C4.5)
26Comparing OST/C4.5 cross 6 datasets
- OST always outperforms C4.5
27Outline
- Introduction
- Building Trees with Minimal Total Costs
- Testing Strategies
- Experiments and Results
- Conclusions
28Conclusions
- New tree building algorithm for minimal costs
- Desirable properties
- Computationally efficient (similar to C4.5)
- Test strategies (OST and batch) are very
effective - Can solve many real-world diagnosis problems
29Future Work
- More intelligent Batch Test methods
- Consider cost of additional batch test
- Optimal sequential batch testbatch 1 (test1,
test 2)batch 2 (test 3, test 4, test 5), - Other learning algorithms with minimal total cost
- A wrapper that works for any black box