Title: Three kinds of learning
1Three kinds of learning
- Supervised learning
- Learning some mapping from inputs to outputs
- Unsupervised learning
- Given data, what kinds of patterns can you
find? - Reinforcement learning
- Learn from positive negative reinforcement
2Categorical data example
Example from Ross Quinlan, Decision Tree
Induction graphics from Tom Mitchell, Machine
Learning
3Decision Tree Classification
4Which feature to split on?
Try to classify as many as possible with each
split (This is a good split)
5Which feature to split on?
These are bad splits no classifications obtained
6Improving a good split
7Decision Tree Algorithm Framework
- Use splitting criterion to decide on best
attribute to split - Each child is new decision tree recurse with
parent feature removed - If all data points in child node are same class,
classify node as that class - If no attributes left, classify by majority rule
- If no data points left, no such example seen
classify as majority class from entire dataset
8How do we know which splits are good?
- Want nodes as pure as possible
- How do we quantify randomness of a node? Want
- All elements randomness 0
- All elements randomness 0
- Half , half - randomness 1
- Draw plot
- What should randomness function look like?
9Typical solution Entropy
- pp proportion of examples
- pn proportion of examples
- A collection with low entropy is good.
10ID3 Criterion
- Split on feature with most information gain.
- Gain entropy in original node weighted sum of
entropy in child nodes
11How good is this split?
12How good is this split?
13The big picture
- Start with root
- Find attribute to split on with most gain
- Recurse
14Assessment
- How do I know how well my decision tree works?
- Training set data that you use to build decision
tree - Test set data that you did not use for training
that you use to assess the quality of decision
tree
15Issues on training and test sets
- Do you know the correct classification for the
test set? - If you do, why not include it in the training set
to get a better classifier? - If you dont, how can you measure the performance
of your classifier?
16Cross Validation
- Tenfold cross-validation
- Ten iterations
- Pull a different tenth of the dataset out each
time to act as a test set - Train on the remaining training set
- Measure performance on the test set
- Leave one out cross-validation
- Similar, but leave only one point out each time,
then count correct vs. incorrect
17Noise and Overfitting
- Can we always obtain a decision tree that is
consistent with the data? - Do we always want a decision tree that is
consistent with the data? - Example Predict Carleton students who become
CEOs - Features state/country of origin, GPA letter,
major, age, high school GPA, junior high GPA, ... - What happens with only a few features?
- What happens with many features?
18Overfitting
- Fitting a classifier too closely to the data
- finding patterns that arent really there
- Prevented in decision trees by pruning
- When building trees, stop recursion on irrelevant
attributes - Do statistical tests at node to determine if
should continue or not
19Examples of decision treesusing Weka
20Preventing overfitting by cross validation
- Another technique to prevent overfitting (is this
valid)? - Keep on recursing on decision tree as long as you
continue to get improved accuracy on the test set
21Ensemble Methods
- Many weak learners, when combined together, can
perform more strongly than any one by itself - Bagging Boosting many different learners,
voting on which classification - Multiple algorithms, or different features, or
both
22Bagging / Boosting
- Bagging vote to determine answer
- Run one algorithm on random subsets of data to
obtain multiple classifiers - Boosting weighted vote to determine answer
- Each iteration, weight more heavily data that
learner got wrong - What does it mean to weight more heavily for
k-nn? For decision trees? - AdaBoost is recent (1997) and has become popular,
fast
23Computational Learning Theory
24Chapter 20 up next
- Moving on to Chapter 20 statistical learning
methods - Skipping to will revisit earlier topics
(perhaps) near end of course - 20.5 Neural Networks
- 20.6 Support vector machines