GAs and Feature Weighting - PowerPoint PPT Presentation

About This Presentation
Title:

GAs and Feature Weighting

Description:

Adaptation in Natural and Artificial Systems, book published 1975 ... Airfoil design. Noise control. Fluid dynamics. Circuit partitioning. Image processing ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 26
Provided by: rebeccaf7
Category:

less

Transcript and Presenter's Notes

Title: GAs and Feature Weighting


1
GAs and Feature Weighting
  • Rebecca Fiebrink
  • MUMT 611
  • 31 March 2005

2
Outline
  • Genetic algorithms
  • History of GAs
  • How GAs work
  • Simple model
  • Variations
  • Feature selection and weighting
  • Feature selection
  • Feature weighting
  • Using GAs for feature weighting
  • Siedlecki and Sklansky
  • Punch
  • The ACE project

3
Part 1Genetic Algorithms
4
History of GAs
  • 1859 Charles Darwin, Origin of Species
  • 1950s 60s Computers simulate evolution
  • 1960s 70s John Holland invents genetic
    algorithms
  • Adaptation in Natural and Artificial Systems,
    book published 1975
  • Context of adaptive systems, not just
    optimization
  • 1980s Present Further exploration, widespread
    adoptation
  • Kenneth De Jong
  • Goldberg Genetic Algorithms in Search,
    Optimization, and Machine Learning (classic
    textbook published 1989)
  • Related areas evolutionary programming, genetic
    programming

(Carnegie Mellon GA online FAQ, Cantu-Paz 2000)
5
How GAs work
  • Problem is a search space
  • Variable parameters ? dimensions
  • Problem has minima and maxima within this space
  • Minima and maxima are hard to find
  • Potential solutions describe points in this space
  • It is easy to calculate goodness of any one
    solution and compare it to other solutions
  • Maintain a set of potential solutions
  • Guess a set of initial solutions
  • Combine these solutions with each other
  • Keep good solutions and discard bad solutions
  • Repeat combination and selection process for some
    time

6
A simple GA
Start
Evaluate Population
Terminate?
Select Parents
Produce Offspring
Mutate Offspring
Stop
7
GA terminology
  • Population The set of all current potential
    solutions
  • Deme A subset of the population that interbreeds
    (might be the whole population)
  • Chromosome A solution structure (often binary)
    contains one or more genes
  • Gene A feature or parameter
  • Allele A specific value for a gene
  • Phenotype A member of the population, a
    real-valued solution vector
  • Generation A complete cycle of evaluation,
    selection, crossover, and mutation in a deme
  • Locus A particular position (bit) on the
    chromosome string
  • Crossover The process of combining two
    chromosomes simplest method is single-point
    crossover where two chromosomes swap parts on
    either side of a random locus
  • Mutation A random change to a phenotype (i.e.,
    changing an allele)

8
Crossover Illustrated
Parent 1
Parent 2
1011010111101
1101010010100

101101
0010100
110101
0111101
Child 1
Child 2
9
GA Variations
  • Variables
  • Population size
  • Selection method
  • Crossover method
  • Mutation rate
  • How to encode a solution in a chromosome
  • No single choice is best for all problems
    (Wolpert Macready 1997, cited in Cantu-Paz
    2000).

10
More variations Parallel GAs
  • Categories
  • Single-population Master/Slave
  • Multiple population
  • Fine-grained
  • Hybrids

(Cantu-Paz 2000)
11
Applications of GAs
  • NP problems (e.g., Traveling Salesman)
  • Airfoil design
  • Noise control
  • Fluid dynamics
  • Circuit partitioning
  • Image processing
  • Liquid crystals
  • Water networks
  • Music
  • (Miettinen et al. 1999, Coley 1999)

12
Part 2Feature Selection and Weighting
13
Features
  • What are they?
  • A classifiers handle to the problem
  • Numerical or binary values representing
    independent or dependent attributes of each
    instance to be classified
  • What are they in music?
  • Spectral centroid
  • Number of instruments
  • Composer
  • Presence of Banjo
  • Beat histogram
  • -

14
Feature Selection Why?
  • Curse of dimensionality
  • The size of the training set must grow
    exponentially with the dimensionality of the
    feature space
  • The set of best features to use may vary
    according to classification scheme, goal, or data
    set
  • We need a way to select only a good subset of all
    possible features
  • May or may not be interested in best subset

15
Feature Selection How?
  • Common approaches
  • Dimensionality reduction (principle component
    analysis or factor analysis)
  • Experimentation Choose a subset, train a
    classifier with it, and evaluate its performance.
  • Example ltcentroid, length, avg_dynamicgt
  • A piece might have vector lt3.422, 523, 98gt
    (values)
  • Try selections lt1, 1, 0gt, lt0, 1, 1gt ? (1yes,
    0no)
  • Which works best?
  • Setbacks in experimentation
  • For n potential features, there are 2n possible
    subsets a huge search space!
  • Evaluating each classifier takes time
  • Choose vectors using sequential search, branch,
    and bound, or GAs

16
Feature Weighting
  • Among a group of selected (useful) features, some
    may be more useful than others
  • Weight the features for optimal classifier
    performance
  • Experimental approach Similar to feature
    selection
  • Example Say lt1, 0, 1gt is optimal selection for
  • ltcentroid, length, avg_dynamicgt feature set
  • Try weights like lt.5, 0, 1gt, lt1, 0, .234983gt, ?
  • Practical and theoretical constraints of feature
    selection are magnified

17
Part 3 Using GAs for Feature Weighting
18
GAs and Feature Weighting
  • A chromosome is a vector of weights
  • Each gene corresponds to a feature weight
  • E.g., lt1, 0, 1, 0, 1, 1, 1gt for selection
  • E.g., lt.3, 0, .89, 0, .39, .91, .03gt for
    weighting
  • The fitness of a chromosome is a measure of its
    performance training and validating an actual
    classifier

19
Siedlecki and Sklansky
  • 1989 A note on genetic algorithms for
    large-scale feature selection
  • Choose feature subset for Knn classifier
  • Propose using GAs when feature set gt 20
  • Binary chromosomes (0/1 for each feature)
  • Goal seek the smallest/least costly subset of
    features for which classifier performance above a
    certain level

20
Results
  • Compared GAs with exhaustive search, branch and
    bound, and (p,q)-search on sample problem
  • Branch and bound and (p, q)-search did not come
    close enough (within 1 error) to finding a set
    that performed optimally
  • Exhaustive search used 224 evaluations
  • GA found optimal or near-optimal solutions with
    1000-3000 evaluations, a huge savings over both
    exhaustive search and branch and bound
  • GA exhibits slightly more than linear increase of
    time complexity with added features
  • GA also outperforms other methods on a real
    problem with 30 features

21
Punch et al.
  • 1993 Further research on feature selection and
    classification using genetic algorithms
  • Approach Use GAs for feature weighting for Knn
    classifier classifier dimension warping
  • Each chromosome is a vector of real-valued
    weights
  • Mapped exponentially to range .01, 10) or 0
  • Also experimented with hidden features
    representing combination (multiplication) of 2
    other features

22
Results
  • Feature weighting can outperform simple
    selection, especially on large data set with
    noise
  • Best performance when feature weighting follows
    binary selection
  • 14 days to compute results
  • Parallel version nearly linear speedup when
    strings passed to other processors for fitness
    evaluation

23
Recent work
  • Minaei-Bidgoli, Kortemeyer, and Punch, 2004
    Optimizing classification ensembles via a genetic
    algorithm for a web-based educational system
  • Incorporate GA feature weighting in a
    multiple-classifier system (vote system)
  • Results show over 10 improvement in accuracy of
    the classifier ensemble when GA optimization is
    used

24
MIR Application
  • ACE project
  • Accept arbitrary type and number of features,
    arbitrary taxonomy, training and testing data,
    and a target classification time
  • Classify data using best means available e.g.,
    make use of multiple classifiers, feature
    weighting
  • Parallel GAs for feature weighting can improve
    solution time and quality

25
Conclusions
  • GAs are powerful and interesting tools, but they
    are complex and not always well-understood
  • Feature selection/weighting involves selecting
    from a huge space of possibilities, but it is a
    necessary task
  • GAs are useful for tackling the feature weighting
    problem
  • Faster machines, parallel computing, and better
    understanding of GA behavior can all benefit the
    feature weighting problem
  • This is relevant to MIR we want to do good and
    efficient classification!
Write a Comment
User Comments (0)
About PowerShow.com