Title: Tuesday, December 7, 1999
1Lecture 7
Discussion (KDD 2 of 3) Feature Selection for KDD
Tuesday, December 7, 1999 William H.
Hsu Department of Computing and Information
Sciences, KSU http//www.cis.ksu.edu/bhsu Readin
gs Paper 2 Liu and Motoda, Chapter 3
2Lecture Outline
- Readings Liu and Motoda
- Feature Selection for Knowledge Discovery and
Data Mining - Chapter 3 Feature selection aspects
- What is Feature Selection?
- Generation Scheme
- How do we generate subsets?
- Forward, backward, bidirectional, random,
opportunistic - Evaluation Measure
- How do we tell how good a candidate subset is?
- Accuracy, consistency, scores (information gain,
cross entropy, variance, etc.) - Search Strategy
- How do we systematically search for a good
subset? - Blind (uninformed) search
- Heuristic (informed) search
- Next Class Presentation
3What is Feature Selection?
- Problem Choosing Inputs x for Supervised
Learning - Applications
- Concept learning for monitoring
- Extraction of temporal features
- Sensor and data fusion
- Solutions
- Decomposition of spatiotemporal data
- Attribute-driven problem redefinition
- Constructive induction
- Model selection
- Approach
- Hierarchy of temporal submodels
- Probabilistic subnetworks
- ANNs
- Bayesian networks
- Quantitative (metric-based) model selection
x
4Lecture Outline
- Readings Liu and Motoda
- Feature Selection for Knowledge Discovery and
Data Mining - Chapter 3 Feature selection aspects
- What is Feature Selection?
- Generation Scheme
- How do we generate subsets?
- Forward, backward, bidirectional, random,
opportunistic - Evaluation Measure
- How do we tell how good a candidate subset is?
- Accuracy, consistency, scores (information gain,
cross entropy, variance, etc.) - Search Strategy
- How do we systematically search for a good
subset? - Blind (uninformed) search
- Heuristic (informed) search
- Next Class Presentation
5IssuesGeneration Scheme and Evaluation Measure
- Generation Scheme
- Directed subset construction
- Forward start with Ø and grow until U(S) is
high enough - Backward start with S and shrink while U(S) is
still high enough - Bidirectional meet in the middle (S, F
boundaries) - Random iterative improvement (cf. simulated
annealing) using F - Opportunistic prior knowledge guides generation
(compare heuristic search) - Evaluation Measure
- r (MAXR) ? (xi, y) Cov (xi, y) / sqrt (Var
(xi) Var (y)) - Accuracy
- Consistency
- Classical scores
- Information gain
- Cross entropy
- Variance
- Many others (Gini coefficient, dependence)
6Search
Subset Inclusion State Space Poset Relation Set
Inclusion A ? B B is a subset of A Up
operator DELETE Down operator ADD
7Feature Selection and Construction as
Unsupervised Learning
- Unsupervised Learning in Support of Supervised
Learning - Given D ? labeled vectors (x, y)
- Return D ? new training examples (x, y)
- Constructive induction transformation step in
KDD - Feature construction generic term
- Cluster definition
- Feature Construction Front End
- Synthesizing new attributes
- Logical x1 ? ? x2, arithmetic x1 x5 / x2
- Other synthetic attributes f(x1, x2, , xn),
etc. - Dimensionality-reducing projection, feature
extraction - Subset selection finding relevant attributes for
a given target y - Partitioning finding relevant attributes for
given targets y1, y2, , yp - Cluster Definition Back End
- Form, segment, and label clusters to get
intermediate targets y - Change of representation find good (x, y) for
learning target y
x / (x1, , xp)
8Wrappers for Performance Enhancement
- Wrappers
- Outer loops for improving inducers
- Use inducer performance to optimize
- Applications of Wrappers
- Combining knowledge sources
- Committee machines (static) bagging, stacking,
boosting - Other sensor and data fusion
- Tuning hyperparameters
- Number of ANN hidden units
- GA control parameters
- Priors in Bayesian learning
- Constructive induction
- Attribute (feature) subset selection
- Feature construction
- Implementing Wrappers
- Search Kohavi, 1995
- Genetic algorithm
9Supervised Learning Framework
10Case StudyAutomobile Insurance Risk Analysis
11Terminology
- Supervised Learning
- Inducer
- Supervised inductive learning framework (L, H)
- L learning algorithm, H hypothesis space
(language) - Relevance determination finding inputs that are
important to performance element (e.g.,
regression or classification) - Feature Selection
- Related terms feature, attribute, variable
- Definition problem of determining for given
inducer which - Related problems feature extraction,
construction (synthesis), partitioning - Methods for Feature Selection
- Feature ranking
- Subset selection minimum subset (Min-Set)
- Set generation (regression) sequential forward
(forward selection), sequential backward
(backward elimination), bidirectional, random - Search strategies uninformed, informed
- Filters vs. wrappers
12Summary Points
- Feature Selection and Knowledge Discovery in
Databases (KDD) - Virtuous cycle of data mining iterative
refinement - Feedback from supervised learning
- Role of Feature Selection in Data Mining
- Relevance determination
- Methodologies
- Filters vs. wrappers
- Generation scheme, evaluation measure, search
strategy - Resources Online
- MLC
- FSS wrapper
- Many inducers, including ID3, OC1
- http//www.sgi.com/Technology/mlc
- Jenesis
- Part of NCSA D2K http//lorax.ncsa.uiuc.edu
- KSU KDD Group http//www.kddresearch.org/Info
- C4.5 / C5.0 http//www.rulequest.com