Title: Feature Selection for Regression Problems
1Feature Selection for Regression Problems
- M. Karagiannopoulos, D. Anyfantis, S. B.
Kotsiantis, P. E. Pintelas - Educational Software Development Laboratory
- and
- Computers and Applications Laboratory
-
- Department of Mathematics, University of Patras,
Greece
2Scope
- To investigate the most suitable wrapper feature
selection technique (if any) for some well known
regression algorithms.
3Contents
- Introduction
- Feature selection techniques
- Wrapper algorithms
- Experiments
- Conclusions
4Introduction
- What is the feature subset selection problem?
- Occurs prior to the learning (induction)
algorithm - Selection of the relevant features (variables)
that influence the prediction of the learning
algorithm
5Why feature selection is important?
- May improve performance of learning algorithm
- Learning algorithm may not scale up to the size
of the full feature set either in sample or time - Allows us to better understand the domain
- Cheaper to collect a reduced set of features
6Characterising features
- Generally, features are characterized as
- Relevant These are features which have an
influence on the output and their role can not be
assumed by the rest - Irrelevant Irrelevant features are defined as
those features not having any influence on the
output, and whose values are generated at random
for each example. - Redundant A redundancy exists whenever a feature
can take the role of another (perhaps the
simplest way to model redundancy).
7Typical Feature Selection First step
1
2
Original Feature Set
Generation
Subset
Evaluation
Goodness of the subset
Stopping Criterion
No
Validation
Yes
3
4
8Typical Feature Selection Second step
Measures the goodness of the subset Compares with
the previous best subset if found better, then
replaces the previous best subset
1
2
Original Feature Set
Generation
Subset
Evaluation
Goodness of the subset
Stopping Criterion
No
Validation
Yes
3
4
9Typical Feature Selection Third step
- Based on Generation Procedure
- Pre-defined number of features
- Pre-defined number of iterations
- Based on Evaluation Function
- whether addition or deletion of a feature does
not produce a better subset - whether optimal subset based on some evaluation
function is achieved
1
2
Original Feature Set
Generation
Subset
Evaluation
Goodness of the subset
Stopping Criterion
No
Validation
Yes
3
4
10Typical Feature Selection - Fourth step
1
2
Original Feature Set
Generation
Subset
Evaluation
Basically not part of the feature selection
process itself - compare results with already
established results or results from competing
feature selection methods
Goodness of the subset
Stopping Criterion
No
Validation
Yes
3
4
11Categorization of feature selection techniques
- Feature selection methods are grouped into two
broad groups - Filter methods that take the set of data
(features) attempting to trim some and then hand
this new set of features to the learning
algorithm - Wrapper methods that use as evaluation measure
the accuracy of the learning algorithm
12Argument for wrapper methods
- The estimated accuracy of the learning algorithm
is the best available heuristic for measuring the
values of features. - Different learning algorithms may perform better
with different feature sets, even if they are
using the same training set.
13Wrapper selection algorithms (1)
- The simplest method is forward selection (FS). It
starts with the empty set and greedily adds
features one at a time (without backtracking). - Backward stepwise selection (BS) starts with all
features in the feature set and greedily removes
them one at a time (without backtracking).
14Wrapper selection algorithms (2)
- The Best First search starts with an empty set of
features and generates all possible single
feature expansions. The subset with the highest
evaluation is chosen and is expanded in the same
manner by adding single features (with
backtracking). The Best First search (BFFS) can
be combined with forward or backward selection
(BFBS). - Genetic algorithm selection. A solution is
typically a fixed length binary string
representing a feature subsetthe value of each
position in the string represents the presence or
absence of a particular feature. The algorithm is
an iterative process where each successive
generation is produced by applying genetic
operators such as crossover and mutation to the
members of the current generation.
15Experiments
- For the purpose of the present study, we used 4
well known learning algorithms (RepTree, M5rules,
K, SMOreg), the presented feature selection
algorithms and 12 dataset from the UCI
repository.
16Methodology of experiments
- The whole training set was divided into ten
mutually exclusive and equal-sized subsets and
for each subset the learner was trained on the
union of all of the other subsets. - The best features are selected according to the
feature selection algorithm and the performance
of the subset is measured by how well it predicts
the values of the test instances. - This cross validation procedure was run 10 times
for each algorithm and the average value of the
10-cross validations was calculated.
17Experiment with regression tree - RepTree
BS is slightly better feature selection method
(on the average) than the others for the RepTree.
18Experiment with rule learner- M5rules
BS, BFBS and GS are the best feature selection
methods (on the average) for the M5rules learner.
19Experiment with instance based learner - K
BS and BFBS is the best feature selection methods
(on the average) for K algorithm
20Experiment with SMOreg
Similar results from all feature selection
methods
21Conclusions
- None of the described feature selection
algorithms is superior to others in all data sets
for a specific learning algorithm. - None of the described feature selection
algorithms is superior to others in all data
sets. - Backward selection strategies are very
inefficient for large-scale datasets, which may
have hundreds of original features. - Forward selection wrapper methods are less able
to improve performance of a given classifier, but
they are less expensive in terms of the
computational effort and use fewer features for
the induction. - Genetic selection typically requires a large
number of evaluations to reach a minimum.
22Future Work
- We will use a light filter feature selection
procedure as a preprocessing step in order to
reduce the computational cost of the wrapping
procedure without harming accuracy.