Feature Selection for Regression Problems - PowerPoint PPT Presentation

About This Presentation

Title:

Feature Selection for Regression Problems

Description:

Forward selection wrapper methods are less able to improve performance of a ... they are less expensive in terms of the computational effort and use fewer ... – PowerPoint PPT presentation

Number of Views:141

Avg rating:3.0/5.0

Slides: 23

Provided by: sot2

Category:

more less

Transcript and Presenter's Notes

Title: Feature Selection for Regression Problems

1
Feature Selection for Regression Problems

M. Karagiannopoulos, D. Anyfantis, S. B.
Kotsiantis, P. E. Pintelas
Educational Software Development Laboratory
and
Computers and Applications Laboratory
Department of Mathematics, University of Patras,
Greece

2
Scope

To investigate the most suitable wrapper feature
selection technique (if any) for some well known
regression algorithms.

3
Contents

Introduction
Feature selection techniques
Wrapper algorithms
Experiments
Conclusions

4
Introduction

What is the feature subset selection problem?
Occurs prior to the learning (induction)
algorithm
Selection of the relevant features (variables)
that influence the prediction of the learning
algorithm

5
Why feature selection is important?

May improve performance of learning algorithm
Learning algorithm may not scale up to the size
of the full feature set either in sample or time
Allows us to better understand the domain
Cheaper to collect a reduced set of features

6
Characterising features

Generally, features are characterized as
Relevant These are features which have an
influence on the output and their role can not be
assumed by the rest
Irrelevant Irrelevant features are defined as
those features not having any influence on the
output, and whose values are generated at random
for each example.
Redundant A redundancy exists whenever a feature
can take the role of another (perhaps the
simplest way to model redundancy).

7
Typical Feature Selection First step
1
2
Original Feature Set
Generation
Subset
Evaluation
Goodness of the subset
Stopping Criterion
No
Validation
Yes
3
4
8
Typical Feature Selection Second step
Measures the goodness of the subset Compares with
the previous best subset if found better, then
replaces the previous best subset
1
2
Original Feature Set
Generation
Subset
Evaluation
Goodness of the subset
Stopping Criterion
No
Validation
Yes
3
4
9
Typical Feature Selection Third step

Based on Generation Procedure
Pre-defined number of features
Pre-defined number of iterations
Based on Evaluation Function
whether addition or deletion of a feature does
not produce a better subset
whether optimal subset based on some evaluation
function is achieved

1
2
Original Feature Set
Generation
Subset
Evaluation
Goodness of the subset
Stopping Criterion
No
Validation
Yes
3
4
10
Typical Feature Selection - Fourth step
1
2
Original Feature Set
Generation
Subset
Evaluation
Basically not part of the feature selection
process itself - compare results with already
established results or results from competing
feature selection methods
Goodness of the subset
Stopping Criterion
No
Validation
Yes
3
4
11
Categorization of feature selection techniques

Feature selection methods are grouped into two
broad groups
Filter methods that take the set of data
(features) attempting to trim some and then hand
this new set of features to the learning
algorithm
Wrapper methods that use as evaluation measure
the accuracy of the learning algorithm

12
Argument for wrapper methods

The estimated accuracy of the learning algorithm
is the best available heuristic for measuring the
values of features.
Different learning algorithms may perform better
with different feature sets, even if they are
using the same training set.

13
Wrapper selection algorithms (1)

The simplest method is forward selection (FS). It
starts with the empty set and greedily adds
features one at a time (without backtracking).
Backward stepwise selection (BS) starts with all
features in the feature set and greedily removes
them one at a time (without backtracking).

14
Wrapper selection algorithms (2)

The Best First search starts with an empty set of
features and generates all possible single
feature expansions. The subset with the highest
evaluation is chosen and is expanded in the same
manner by adding single features (with
backtracking). The Best First search (BFFS) can
be combined with forward or backward selection
(BFBS).
Genetic algorithm selection. A solution is
typically a fixed length binary string
representing a feature subsetthe value of each
position in the string represents the presence or
absence of a particular feature. The algorithm is
an iterative process where each successive
generation is produced by applying genetic
operators such as crossover and mutation to the
members of the current generation.

15
Experiments

For the purpose of the present study, we used 4
well known learning algorithms (RepTree, M5rules,
K, SMOreg), the presented feature selection
algorithms and 12 dataset from the UCI
repository.

16
Methodology of experiments

The whole training set was divided into ten
mutually exclusive and equal-sized subsets and
for each subset the learner was trained on the
union of all of the other subsets.
The best features are selected according to the
feature selection algorithm and the performance
of the subset is measured by how well it predicts
the values of the test instances.
This cross validation procedure was run 10 times
for each algorithm and the average value of the
10-cross validations was calculated.

17
Experiment with regression tree - RepTree
BS is slightly better feature selection method
(on the average) than the others for the RepTree.
18
Experiment with rule learner- M5rules
BS, BFBS and GS are the best feature selection
methods (on the average) for the M5rules learner.
19
Experiment with instance based learner - K
BS and BFBS is the best feature selection methods
(on the average) for K algorithm
20
Experiment with SMOreg
Similar results from all feature selection
methods
21
Conclusions

None of the described feature selection
algorithms is superior to others in all data sets
for a specific learning algorithm.
None of the described feature selection
algorithms is superior to others in all data
sets.
Backward selection strategies are very
inefficient for large-scale datasets, which may
have hundreds of original features.
Forward selection wrapper methods are less able
to improve performance of a given classifier, but
they are less expensive in terms of the
computational effort and use fewer features for
the induction.
Genetic selection typically requires a large
number of evaluations to reach a minimum.

22
Future Work

We will use a light filter feature selection
procedure as a preprocessing step in order to
reduce the computational cost of the wrapping
procedure without harming accuracy.

Write a Comment

User Comments (0)