Evolutionary Search - PowerPoint PPT Presentation

About This Presentation
Title:

Evolutionary Search

Description:

Chromosome Q D D Rank Q Rank Comb Rank R-S Fitness. 1 4. 3 1 ... What form of function is learned? Also called 'inductive bias' Graphically, decision boundary ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 51
Provided by: classesCs
Category:

less

Transcript and Presenter's Notes

Title: Evolutionary Search


1
Evolutionary Search
  • Artificial Intelligence
  • CMSC 25000
  • January 25, 2007

2
Agenda
  • Motivation
  • Evolving a solution
  • Genetic Algorithms
  • Modelling search as evolution
  • Mutation
  • Crossover
  • Survival of the fittest
  • Survival of the most diverse
  • Conclusions

3
Motivation Evolution
  • Evolution through natural selection
  • Individuals pass on traits to offspring
  • Individuals have different traits
  • Fittest individuals survive to produce more
    offspring
  • Over time, variation can accumulate
  • Leading to new species

4
Simulated Evolution
  • Evolving a solution
  • Begin with population of individuals
  • Individuals candidate solutions chromosomes
  • Produce offspring with variation
  • Mutation change features
  • Crossover exchange features between individuals
  • Apply natural selection
  • Select best individuals to go on to next
    generation
  • Continue until satisfied with solution

5
Genetic Algorithms Applications
  • Search parameter space for optimal assignment
  • Not guaranteed to find optimal, but can approach
  • Classic optimization problems
  • E.g. Travelling Salesman Problem
  • Program design (Genetic Programming)
  • Aircraft carrier landings

6
Genetic Algorithm Example
  • Cookie recipes (Winston, AI, 1993)
  • As evolving populations
  • Individual batch of cookies
  • Quality 0-9
  • Chromosomes 2 genes 1 chromosome each
  • Flour Quantity, Sugar Quantity 1-9
  • Mutation
  • Randomly select Flour/Sugar /- 1 1-9
  • Crossover
  • Split 2 chromosomes rejoin keeping both

7
Fitness
  • Natural selection Most fit survive
  • Fitness Probability of survival to next gen
  • Question How do we measure fitness?
  • Standard method Relate fitness to quality
  • 0-1 1-9

Chromosome Quality Fitness

1 4 3 1 1 2 1 1
4 3 2 1
0.4 0.3 0.2 0.1
8
GA Design Issues
  • Genetic design
  • Identify sets of features genes Constraints?
  • Population How many chromosomes?
  • Too few gt inbreeding Too manygttoo slow
  • Mutation How frequent?
  • Too fewgtslow change Too manygt wild
  • Crossover Allowed? How selected?
  • Duplicates?

9
GA Design Basic Cookie GA
  • Genetic design
  • Identify sets of features 2 genes
    floursugar1-9
  • Population How many chromosomes?
  • 1 initial, 4 max
  • Mutation How frequent?
  • 1 gene randomly selected, randomly mutated
  • Crossover Allowed? No
  • Duplicates? No
  • Survival Standard method

10
Basic Cookie GA Results
  • Results are for 1000 random trials
  • Initial state 1 1-1, quality 1 chromosome
  • On average, reaches max quality (9) in 16
    generations
  • Best max quality in 8 generations
  • Conclusion
  • Low dimensionality search
  • Successful even without crossover

11
Basic Cookie GACrossover Results
  • Results are for 1000 random trials
  • Initial state 1 1-1, quality 1 chromosome
  • On average, reaches max quality (9) in 14
    generations
  • Conclusion
  • Faster with crossover combine good in each gene
  • Key Global max achievable by maximizing each
    dimension independently - reduce dimensionality

12
Solving the Moat Problem
  • Problem
  • No single step mutation can reach optimal values
    using standard fitness (quality0 gt
    probability0)
  • Solution A
  • Crossover can combine fit parents in EACH gene
  • However, still slow 155 generations on average

13
Questions
  • How can we avoid the 0 quality problem?
  • How can we avoid local maxima?

14
Rethinking Fitness
  • Goal Explicit bias to best
  • Remove implicit biases based on quality scale
  • Solution Rank method
  • Ignore actual quality values except for ranking
  • Step 1 Rank candidates by quality
  • Step 2 Probability of selecting ith candidate,
    given that i-1 candidate not selected, is
    constant p.
  • Step 2b Last candidate is selected if no other
    has been
  • Step 3 Select candidates using the probabilities

15
Rank Method
Chromosome Quality Rank Std. Fitness
Rank Fitness
1 4 1 3 1 2 5 2 7 5
4 3 2 1 0
1 2 3 4 5
0.4 0.3 0.2 0.1 0.0
0.667 0.222 0.074 0.025 0.012
Results Average over 1000 random runs on Moat
problem - 75 Generations (vs 155 for standard
method) No 0 probability entries Based on rank
not absolute quality
16
Diversity
  • Diversity
  • Degree to which chromosomes exhibit different
    genes
  • Rank Standard methods look only at quality
  • Need diversity escape local min, variety for
    crossover
  • As good to be different as to be fit

17
Rank-Space Method
  • Combines diversity and quality in fitness
  • Diversity measure
  • Sum of inverse squared distances in genes
  • Diversity rank Avoids inadvertent bias
  • Rank-space
  • Sort on sum of diversity AND quality ranks
  • Best lower left high diversity quality

18
Rank-Space Method
W.r.t. highest ranked 5-1
Chromosome Q D D Rank Q Rank
Comb Rank R-S Fitness
4 3 2 1 0
1 5 3 4 2
1 2 3 4 5
0.667 0.025 0.222 0.012 0.074
0.04 0.25 0.059 0.062 0.05
1 4 3 1 1 2 1 1 7 5
1 4 2 5 3
Diversity rank breaks ties After select others,
sum distances to both Results Average (Moat) 15
generations
19
GAs and Local Maxima
  • Quality metrics only
  • Susceptible to local max problems
  • Quality Diversity
  • Can populate all local maxima
  • Including global max
  • Key Population must be large enough

20
GA Discussion
  • Similar to stochastic local beam search
  • Beam Population size
  • Stochastic selection mutation
  • Local Each generation from single previous
  • Key difference Crossover 2 sources!
  • Why crossover?
  • Schema Partial local subsolutions
  • E.g. 2 halves of TSP tour

21
Question
  • Traveling Salesman Problem
  • CSP-style Iterative refinement
  • Genetic Algorithm
  • N-Queens
  • CSP-style Iterative refinement
  • Genetic Algorithm

22
Iterative Improvement Example
  • TSP
  • Start with some valid tour
  • E.g. find greedy solution
  • Make incremental change to tour
  • E.g. hill-climbing - take change that produces
    greatest improvement
  • Problem Local minima
  • Solution Randomize to search other parts of
    space
  • Other methods Simulated annealing, Genetic algs

23
Machine LearningNearest Neighbor Information
Retrieval Search
  • Artificial Intelligence
  • CMSC 25000
  • January 25, 2007

24
Agenda
  • Machine learning Introduction
  • Nearest neighbor techniques
  • Applications
  • Credit rating
  • Text Classification
  • K-nn
  • Issues
  • Distance, dimensions, irrelevant attributes
  • Efficiency
  • k-d trees, parallelism

25
Machine Learning
  • Learning Acquiring a function, based on past
    inputs and values, from new inputs to values.
  • Learn concepts, classifications, values
  • Identify regularities in data

26
Machine Learning Examples
  • Pronunciation
  • Spelling of word gt sounds
  • Speech recognition
  • Acoustic signals gt sentences
  • Robot arm manipulation
  • Target gt torques
  • Credit rating
  • Financial data gt loan qualification

27
Machine Learning Characterization
  • Distinctions
  • Are output values known for any inputs?
  • Supervised vs unsupervised learning
  • Supervised training consists of inputs true
    output value
  • E.g. letterspronunciation
  • Unsupervised training consists only of inputs
  • E.g. letters only
  • Course studies supervised methods

28
Machine Learning Characteristics
  • Many machine learning techniques
  • Supervised vs Unsupervised
  • Supervised Input true labels
  • Unsupervised Input ONLY
  • Classification vs Regression
  • Classification Output is from finite label set
  • Regression Output is continuous valued
  • Decision Boundary
  • What function is learned? Inductive Bias
  • Linear, Rectangular, Vornoi diagram
  • Input features
  • Discrete? Continuous? Which ones? Scaling?

29
Machine Learning Characterization
  • Distinctions
  • Are output values discrete or continuous?
  • Discrete Classification
  • E.g. Qualified/Unqualified for a loan application
  • Continuous Regression
  • E.g. Torques for robot arm motion
  • Characteristic of task

30
Machine Learning Characterization
  • Distinctions
  • What form of function is learned?
  • Also called inductive bias
  • Graphically, decision boundary
  • E.g. Single, linear separator
  • Rectangular boundaries - ID trees
  • Vornoi spacesetc

- - -
31
Machine Learning Functions
  • Problem Can the representation effectively model
    the class to be learned?
  • Motivates selection of learning algorithm

For this function, Linear discriminant is
GREAT! Rectangular boundaries (e.g. ID
trees) TERRIBLE! Pick the right representation!
- - - - - - - - -

32
Machine Learning Features
  • Inputs
  • E.g.words, acoustic measurements, financial data
  • Vectors of features
  • E.g. word letters
  • cat L1c L2 a L3 t
  • Financial data F1 late payments/yr Integer
  • F2 Ratio of income to
    expense Real

33
Machine Learning Features
  • Question
  • Which features should be used?
  • How should they relate to each other?
  • Issue 1 How do we define relation in feature
    space if features have different scales?
  • Solution Scaling/normalization
  • Issue 2 Which ones are important?
  • If differ in irrelevant feature, should ignore

34
Complexity Generalization
  • Goal Predict values accurately on new inputs
  • Problem
  • Train on sample data
  • Can make arbitrarily complex model to fit
  • BUT, will probably perform badly on NEW data
  • Strategy
  • Limit complexity of model (e.g. degree of equn)
  • Split training and validation sets
  • Hold out data to check for overfitting

35
Nearest Neighbor
  • Memory- or case- based learning
  • Supervised method Training
  • Record labeled instances and feature-value
    vectors
  • For each new, unlabeled instance
  • Identify nearest labeled instance
  • Assign same label
  • Consistency heuristic Assume that a property is
    the same as that of the nearest reference case.

36
Nearest Neighbor Example
  • Problem Robot arm motion
  • Difficult to model analytically
  • Kinematic equations
  • Relate joint angles and manipulator positions
  • Dynamics equations
  • Relate motor torques to joint angles
  • Difficult to achieve good results modeling
    robotic arms or human arm
  • Many factors measurements

37
Nearest Neighbor Example
  • Solution
  • Move robot arm around
  • Record parameters and trajectory segment
  • Table torques, positions,velocities, squared
    velocities, velocity products, accelerations
  • To follow a new path
  • Break into segments
  • Find closest segments in table
  • Get those torques (interpolate as necessary)

38
Nearest Neighbor Example
  • Issue Big table
  • First time with new trajectory
  • Closest isnt close
  • Table is sparse - few entries
  • Solution Practice
  • As attempt trajectory, fill in more of table
  • After few attempts, very close

39
Nearest Neighbor Example
  • Credit Rating
  • Classifier Good / Poor
  • Features
  • L late payments/yr
  • R Income/Expenses

Name L R G/P
A 0 1.2 G
B 25 0.4 P
C 5 0.7 G
D 20 0.8 P
E 30 0.85 P
F 11 1.2 G
G 7 1.15 G
H 15 0.8 P
40
Nearest Neighbor Example
Name L R G/P
A 0 1.2 G
A
F
B 25 0.4 P
1
G
R
E
C 5 0.7 G
H
D
C
D 20 0.8 P
E 30 0.85 P
B
F 11 1.2 G
G 7 1.15 G
10
30
20
L
H 15 0.8 P
41
Nearest Neighbor Example
Name L R G/P
I 6 1.15
G
A
F
K
J 22 0.45
1
G
I
P
E
??
K 15 1.2
D
H
R
C
B
J
Distance Measure
Sqrt ((L1-L2)2 sqrt(10)(R1-R2)2)) -
Scaled distance
10
30
20
L
42
Nearest Neighbor Analysis
  • Problem
  • Ambiguous labeling, Training Noise
  • Solution
  • K-nearest neighbors
  • Not just single nearest instance
  • Compare to K nearest neighbors
  • Label according to majority of K
  • What should K be?
  • Often 3, can train as well

43
Text Classification
44
Matching Topics and Documents
  • Two main perspectives
  • Pre-defined, fixed, finite topics
  • Text Classification
  • Arbitrary topics, typically defined by statement
    of information need (aka query)
  • Information Retrieval

45
Vector Space Information Retrieval
  • Task
  • Document collection
  • Query specifies information need free text
  • Relevance judgments 0/1 for all docs
  • Word evidence Bag of words
  • No ordering information

46
Vector Space Model
Tv
Program
Computer
Two documents computer program, tv program
Query computer program matches 1 st doc
exact distance2 vs 0 educational
program matches both equally distance1
47
Vector Space Model
  • Represent documents and queries as
  • Vectors of term-based features
  • Features tied to occurrence of terms in
    collection
  • E.g.
  • Solution 1 Binary features t1 if present, 0
    otherwise
  • Similiarity number of terms in common
  • Dot product

48
Vector Space Model II
  • Problem Not all terms equally interesting
  • E.g. the vs dog vs Levow
  • Solution Replace binary term features with
    weights
  • Document collection term-by-document matrix
  • View as vector in multidimensional space
  • Nearby vectors are related
  • Normalize for vector length

49
Vector Similarity Computation
  • Similarity Dot product
  • Normalization
  • Normalize weights in advance
  • Normalize post-hoc

50
Term Weighting
  • Aboutness
  • To what degree is this term what document is
    about?
  • Within document measure
  • Term frequency (tf) occurrences of t in doc j
  • Specificity
  • How surprised are you to see this term?
  • Collection frequency
  • Inverse document frequency (idf)

51
Term Selection Formation
  • Selection
  • Some terms are truly useless
  • Too frequent, no content
  • E.g. the, a, and,
  • Stop words ignore such terms altogether
  • Creation
  • Too many surface forms for same concepts
  • E.g. inflections of words verb conjugations,
    plural
  • Stem terms treat all forms as same underlying

52
Efficient Implementations
  • Classification cost
  • Find nearest neighbor O(n)
  • Compute distance between unknown and all
    instances
  • Compare distances
  • Problematic for large data sets
  • Alternative
  • Use binary search to reduce to O(log n)

53
Efficient Implementation K-D Trees
  • Divide instances into sets based on features
  • Binary branching E.g. gt value
  • 2d leaves with d split path n
  • d O(log n)
  • To split cases into sets,
  • If there is one element in the set, stop
  • Otherwise pick a feature to split on
  • Find average position of two middle objects on
    that dimension
  • Split remaining objects based on average position
  • Recursively split subsets

54
K-D Trees Classification
Yes
No
No
Yes
Yes
No
No
Yes
No
Yes
No
No
Yes
Yes
Poor
Good
Good
Poor
Good
Good
Poor
Good
55
Efficient ImplementationParallel Hardware
  • Classification cost
  • distance computations
  • Const time if O(n) processors
  • Cost of finding closest
  • Compute pairwise minimum, successively
  • O(log n) time

56
Nearest Neighbor Issues
  • Prediction can be expensive if many features
  • Affected by classification, feature noise
  • One entry can change prediction
  • Definition of distance metric
  • How to combine different features
  • Different types, ranges of values
  • Sensitive to feature selection

57
Nearest Neighbor Analysis
  • Issue
  • What is a good distance metric?
  • How should features be combined?
  • Strategy
  • (Typically weighted) Euclidean distance
  • Feature scaling Normalization
  • Good starting point
  • (Feature - Feature_mean)/Feature_standard_deviatio
    n
  • Rescales all values - Centered on 0 with std_dev 1

58
Nearest Neighbor Analysis
  • Issue
  • What features should we use?
  • E.g. Credit rating Many possible features
  • Tax bracket, debt burden, retirement savings,
    etc..
  • Nearest neighbor uses ALL
  • Irrelevant feature(s) could mislead
  • Fundamental problem with nearest neighbor

59
Nearest Neighbor Advantages
  • Fast training
  • Just record feature vector - output value set
  • Can model wide variety of functions
  • Complex decision boundaries
  • Weak inductive bias
  • Very generally applicable

60
Summary
  • Machine learning
  • Acquire function from input features to value
  • Based on prior training instances
  • Supervised vs Unsupervised learning
  • Classification and Regression
  • Inductive bias
  • Representation of function to learn
  • Complexity, Generalization, Validation
Write a Comment
User Comments (0)
About PowerShow.com