Identifying Feature Relevance Using a Random Forest

About This Presentation

Title:

Identifying Feature Relevance Using a Random Forest

Description:

http://www.isis.ecs.soton.ac.uk. Identifying Feature Relevance ... Information Gain Density Functions. RF used to construct 500 trees on an artificial dataset ... – PowerPoint PPT presentation

Number of Views:86

Avg rating:3.0/5.0

Slides: 26

Provided by: jem68

Category:

more less

Transcript and Presenter's Notes

Title: Identifying Feature Relevance Using a Random Forest

1
Identifying Feature Relevance Using a Random
Forest

Jeremy Rogers Steve Gunn

2
Overview

What is a Random Forest?
Why do Relevance Identification?
Estimating Feature Importance with a Random
Forest
Node Complexity Compensation
Employing Feature Relevance
Extension to Feature Selection

3
Random Forest

Combination of base learners using Bagging
Uses CART-based decision trees

4
Random Forest (cont...)

Optimises split using Information Gain
Selects feature randomly to perform each split
Implicit Feature Selection of CART is removed

5
Feature Relevance Ranking

Analyse Features individually
Measures of Correlation to the target
Feature is relevant if

Assumes no feature interaction Fails to identify
relevant features in parity problem
6
Feature Relevance Subset Methods

Use implicit feature selection of decision tree
induction
Wrapper methods
Subset search methods
Identifying Markov Blankets
Feature is relevant if

7
Relevance Identification using Average
Information Gain

Can identify feature interaction
Reliability dependant upon node composition
Irrelevant features give non-zero relevance

8
Node Complexity Compensation

Some nodes are easier to split
Requires each sample to be weighted by some
measure of node complexity
Data projected on to one-dimensional space
For Binary Classification

9
Unique Non-Unique Arrangements

Some arrangements are reflections (non-unique)

Some arrangements are symmetrical about their
centre (unique)
10
Node Complexity Compensation (cont)
Au - No. Unique Arrangements
11
Information Gain Density Functions

Node Complexity improves measure of average IG
The effect is visible when examining the IG
density functions for each feature
These are constructed by building a forest and
recording the frequencies of IG values achieved
by each feature

12
Information Gain Density Functions

RF used to construct 500 trees on an artificial
dataset
IG density functions recorded for each feature

13
Employing Feature Relevance

Feature Selection
Feature Weighting
Random Forest uses a Feature Sampling
distribution to select each feature.
Distribution can be altered in two ways
Parallel Update during forest construction
Two-stage Fixed prior to forest construction

14
Parallel

Control update rate using confidence intervals.
Assume Information Gain values have normal
distribution.

Statistic has a Students t distribution with n-1
degrees of freedom
Maintain most uniform distribution within
confidence bounds
15
Convergence Rates
16
Results

90 of data used for training, 10 for testing
Forests of 100 trees were tested and averaged
over 100 trials

17
Irrelevant Features

Average IG is the mean of a non-negative sample.
Expected IG of an irrelevant feature is non-zero.
Performance is degraded when there is a high
proportion of irrelevant features.

18
Expected Information Gain
nL - No. examples in left descendant iL -
No. positive examples in left descendant
19
Expected Information Gain
No. positive examples
No. negative examples
20
Bounds on Expected Information Gain
Lower Bound is given by

Upper can be approximated as

21
Irrelevant Features Bounds

100 trees built on artificial dataset
Average IG recorded and bounds calculated

22
Friedman
FS
CFS
23
Simple
FS
CFS
24
Results

90 of data used for training, 10 for testing
Forests of 100 trees were tested and averaged
over 100 trials
100 trees constructed for feature evaluation in
each trial

25
Summary

Node complexity compensation improves measure of
feature relevance by examining node composition
Feature sampling distribution can be updated
using confidence intervals to control the update
rate
Irrelevant features can be removed by calculating
their expected performance

Write a Comment

User Comments (0)