Title: Identifying Feature Relevance Using a Random Forest
1Identifying Feature Relevance Using a Random
Forest
2Overview
- What is a Random Forest?
- Why do Relevance Identification?
- Estimating Feature Importance with a Random
Forest - Node Complexity Compensation
- Employing Feature Relevance
- Extension to Feature Selection
3Random Forest
- Combination of base learners using Bagging
- Uses CART-based decision trees
4Random Forest (cont...)
- Optimises split using Information Gain
- Selects feature randomly to perform each split
- Implicit Feature Selection of CART is removed
5Feature Relevance Ranking
- Analyse Features individually
- Measures of Correlation to the target
- Feature is relevant if
Assumes no feature interaction Fails to identify
relevant features in parity problem
6Feature Relevance Subset Methods
- Use implicit feature selection of decision tree
induction - Wrapper methods
- Subset search methods
- Identifying Markov Blankets
- Feature is relevant if
7Relevance Identification using Average
Information Gain
- Can identify feature interaction
- Reliability dependant upon node composition
- Irrelevant features give non-zero relevance
8Node Complexity Compensation
- Some nodes are easier to split
- Requires each sample to be weighted by some
measure of node complexity - Data projected on to one-dimensional space
- For Binary Classification
9Unique Non-Unique Arrangements
- Some arrangements are reflections (non-unique)
Some arrangements are symmetrical about their
centre (unique)
10Node Complexity Compensation (cont)
Au - No. Unique Arrangements
11Information Gain Density Functions
- Node Complexity improves measure of average IG
- The effect is visible when examining the IG
density functions for each feature - These are constructed by building a forest and
recording the frequencies of IG values achieved
by each feature
12Information Gain Density Functions
- RF used to construct 500 trees on an artificial
dataset - IG density functions recorded for each feature
13Employing Feature Relevance
- Feature Selection
- Feature Weighting
- Random Forest uses a Feature Sampling
distribution to select each feature. - Distribution can be altered in two ways
- Parallel Update during forest construction
- Two-stage Fixed prior to forest construction
14Parallel
- Control update rate using confidence intervals.
- Assume Information Gain values have normal
distribution.
Statistic has a Students t distribution with n-1
degrees of freedom
Maintain most uniform distribution within
confidence bounds
15Convergence Rates
16Results
- 90 of data used for training, 10 for testing
- Forests of 100 trees were tested and averaged
over 100 trials
17Irrelevant Features
- Average IG is the mean of a non-negative sample.
- Expected IG of an irrelevant feature is non-zero.
- Performance is degraded when there is a high
proportion of irrelevant features.
18Expected Information Gain
nL - No. examples in left descendant iL -
No. positive examples in left descendant
19Expected Information Gain
No. positive examples
No. negative examples
20Bounds on Expected Information Gain
Lower Bound is given by
- Upper can be approximated as
21Irrelevant Features Bounds
- 100 trees built on artificial dataset
- Average IG recorded and bounds calculated
22Friedman
FS
CFS
23Simple
FS
CFS
24Results
- 90 of data used for training, 10 for testing
- Forests of 100 trees were tested and averaged
over 100 trials - 100 trees constructed for feature evaluation in
each trial
25Summary
- Node complexity compensation improves measure of
feature relevance by examining node composition - Feature sampling distribution can be updated
using confidence intervals to control the update
rate - Irrelevant features can be removed by calculating
their expected performance