Evolutionary Search

About This Presentation

Title:

Evolutionary Search

Description:

Chromosome Q D D Rank Q Rank Comb Rank R-S Fitness. 1 4. 3 1 ... What form of function is learned? Also called 'inductive bias' Graphically, decision boundary ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 51

Provided by: classesCs

Learn more at: https://www.classes.cs.uchicago.edu

Category:

more less

Transcript and Presenter's Notes

Title: Evolutionary Search

1
Evolutionary Search

Artificial Intelligence
CMSC 25000
January 25, 2007

2
Agenda

Motivation
Evolving a solution
Genetic Algorithms
Modelling search as evolution
Mutation
Crossover
Survival of the fittest
Survival of the most diverse
Conclusions

3
Motivation Evolution

Evolution through natural selection
Individuals pass on traits to offspring
Individuals have different traits
Fittest individuals survive to produce more
offspring
Over time, variation can accumulate
Leading to new species

4
Simulated Evolution

Evolving a solution
Begin with population of individuals
Individuals candidate solutions chromosomes
Produce offspring with variation
Mutation change features
Crossover exchange features between individuals
Apply natural selection
Select best individuals to go on to next
generation
Continue until satisfied with solution

5
Genetic Algorithms Applications

Search parameter space for optimal assignment
Not guaranteed to find optimal, but can approach
Classic optimization problems
E.g. Travelling Salesman Problem
Program design (Genetic Programming)
Aircraft carrier landings

6
Genetic Algorithm Example

Cookie recipes (Winston, AI, 1993)
As evolving populations
Individual batch of cookies
Quality 0-9
Chromosomes 2 genes 1 chromosome each
Flour Quantity, Sugar Quantity 1-9
Mutation
Randomly select Flour/Sugar /- 1 1-9
Crossover
Split 2 chromosomes rejoin keeping both

7
Fitness

Natural selection Most fit survive
Fitness Probability of survival to next gen
Question How do we measure fitness?
Standard method Relate fitness to quality
0-1 1-9

Chromosome Quality Fitness

1 4 3 1 1 2 1 1
4 3 2 1
0.4 0.3 0.2 0.1
8
GA Design Issues

Genetic design
Identify sets of features genes Constraints?
Population How many chromosomes?
Too few gt inbreeding Too manygttoo slow
Mutation How frequent?
Too fewgtslow change Too manygt wild
Crossover Allowed? How selected?
Duplicates?

9
GA Design Basic Cookie GA

Genetic design
Identify sets of features 2 genes
floursugar1-9
Population How many chromosomes?
1 initial, 4 max
Mutation How frequent?
1 gene randomly selected, randomly mutated
Crossover Allowed? No
Duplicates? No
Survival Standard method

10
Basic Cookie GA Results

Results are for 1000 random trials
Initial state 1 1-1, quality 1 chromosome
On average, reaches max quality (9) in 16
generations
Best max quality in 8 generations
Conclusion
Low dimensionality search
Successful even without crossover

11
Basic Cookie GACrossover Results

Results are for 1000 random trials
Initial state 1 1-1, quality 1 chromosome
On average, reaches max quality (9) in 14
generations
Conclusion
Faster with crossover combine good in each gene
Key Global max achievable by maximizing each
dimension independently - reduce dimensionality

12
Solving the Moat Problem

Problem
No single step mutation can reach optimal values
using standard fitness (quality0 gt
probability0)
Solution A
Crossover can combine fit parents in EACH gene
However, still slow 155 generations on average

13
Questions

How can we avoid the 0 quality problem?
How can we avoid local maxima?

14
Rethinking Fitness

Goal Explicit bias to best
Remove implicit biases based on quality scale
Solution Rank method
Ignore actual quality values except for ranking
Step 1 Rank candidates by quality
Step 2 Probability of selecting ith candidate,
given that i-1 candidate not selected, is
constant p.
Step 2b Last candidate is selected if no other
has been
Step 3 Select candidates using the probabilities

15
Rank Method
Chromosome Quality Rank Std. Fitness
Rank Fitness
1 4 1 3 1 2 5 2 7 5
4 3 2 1 0
1 2 3 4 5
0.4 0.3 0.2 0.1 0.0
0.667 0.222 0.074 0.025 0.012
Results Average over 1000 random runs on Moat
problem - 75 Generations (vs 155 for standard
method) No 0 probability entries Based on rank
not absolute quality
16
Diversity

Diversity
Degree to which chromosomes exhibit different
genes
Rank Standard methods look only at quality
Need diversity escape local min, variety for
crossover
As good to be different as to be fit

17
Rank-Space Method

Combines diversity and quality in fitness
Diversity measure
Sum of inverse squared distances in genes
Diversity rank Avoids inadvertent bias
Rank-space
Sort on sum of diversity AND quality ranks
Best lower left high diversity quality

18
Rank-Space Method
W.r.t. highest ranked 5-1
Chromosome Q D D Rank Q Rank
Comb Rank R-S Fitness
4 3 2 1 0
1 5 3 4 2
1 2 3 4 5
0.667 0.025 0.222 0.012 0.074
0.04 0.25 0.059 0.062 0.05
1 4 3 1 1 2 1 1 7 5
1 4 2 5 3
Diversity rank breaks ties After select others,
sum distances to both Results Average (Moat) 15
generations
19
GAs and Local Maxima

Quality metrics only
Susceptible to local max problems
Quality Diversity
Can populate all local maxima
Including global max
Key Population must be large enough

20
GA Discussion

Similar to stochastic local beam search
Beam Population size
Stochastic selection mutation
Local Each generation from single previous
Key difference Crossover 2 sources!
Why crossover?
Schema Partial local subsolutions
E.g. 2 halves of TSP tour

21
Question

Traveling Salesman Problem
CSP-style Iterative refinement
Genetic Algorithm
N-Queens
CSP-style Iterative refinement
Genetic Algorithm

22
Iterative Improvement Example

TSP
Start with some valid tour
E.g. find greedy solution
Make incremental change to tour
E.g. hill-climbing - take change that produces
greatest improvement
Problem Local minima
Solution Randomize to search other parts of
space
Other methods Simulated annealing, Genetic algs

23
Machine LearningNearest Neighbor Information
Retrieval Search

Artificial Intelligence
CMSC 25000
January 25, 2007

24
Agenda

Machine learning Introduction
Nearest neighbor techniques
Applications
Credit rating
Text Classification
K-nn
Issues
Distance, dimensions, irrelevant attributes
Efficiency
k-d trees, parallelism

25
Machine Learning

Learning Acquiring a function, based on past
inputs and values, from new inputs to values.
Learn concepts, classifications, values
Identify regularities in data

26
Machine Learning Examples

Pronunciation
Spelling of word gt sounds
Speech recognition
Acoustic signals gt sentences
Robot arm manipulation
Target gt torques
Credit rating
Financial data gt loan qualification

27
Machine Learning Characterization

Distinctions
Are output values known for any inputs?
Supervised vs unsupervised learning
Supervised training consists of inputs true
output value
E.g. letterspronunciation
Unsupervised training consists only of inputs
E.g. letters only
Course studies supervised methods

28
Machine Learning Characteristics

Many machine learning techniques
Supervised vs Unsupervised
Supervised Input true labels
Unsupervised Input ONLY
Classification vs Regression
Classification Output is from finite label set
Regression Output is continuous valued
Decision Boundary
What function is learned? Inductive Bias
Linear, Rectangular, Vornoi diagram
Input features
Discrete? Continuous? Which ones? Scaling?

29
Machine Learning Characterization

Distinctions
Are output values discrete or continuous?
Discrete Classification
E.g. Qualified/Unqualified for a loan application
Continuous Regression
E.g. Torques for robot arm motion
Characteristic of task

30
Machine Learning Characterization

Distinctions
What form of function is learned?
Also called inductive bias
Graphically, decision boundary
E.g. Single, linear separator
Rectangular boundaries - ID trees
Vornoi spacesetc

- - -
31
Machine Learning Functions

Problem Can the representation effectively model
the class to be learned?
Motivates selection of learning algorithm

For this function, Linear discriminant is
GREAT! Rectangular boundaries (e.g. ID
trees) TERRIBLE! Pick the right representation!
- - - - - - - - -

32
Machine Learning Features

Inputs
E.g.words, acoustic measurements, financial data
Vectors of features
E.g. word letters
cat L1c L2 a L3 t
Financial data F1 late payments/yr Integer
F2 Ratio of income to
expense Real

33
Machine Learning Features

Question
Which features should be used?
How should they relate to each other?
Issue 1 How do we define relation in feature
space if features have different scales?
Solution Scaling/normalization
Issue 2 Which ones are important?
If differ in irrelevant feature, should ignore

34
Complexity Generalization

Goal Predict values accurately on new inputs
Problem
Train on sample data
Can make arbitrarily complex model to fit
BUT, will probably perform badly on NEW data
Strategy
Limit complexity of model (e.g. degree of equn)
Split training and validation sets
Hold out data to check for overfitting

35
Nearest Neighbor

Memory- or case- based learning
Supervised method Training
Record labeled instances and feature-value
vectors
For each new, unlabeled instance
Identify nearest labeled instance
Assign same label
Consistency heuristic Assume that a property is
the same as that of the nearest reference case.

36
Nearest Neighbor Example

Problem Robot arm motion
Difficult to model analytically
Kinematic equations
Relate joint angles and manipulator positions
Dynamics equations
Relate motor torques to joint angles
Difficult to achieve good results modeling
robotic arms or human arm
Many factors measurements

37
Nearest Neighbor Example

Solution
Move robot arm around
Record parameters and trajectory segment
Table torques, positions,velocities, squared
velocities, velocity products, accelerations
To follow a new path
Break into segments
Find closest segments in table
Get those torques (interpolate as necessary)

38
Nearest Neighbor Example

Issue Big table
First time with new trajectory
Closest isnt close
Table is sparse - few entries
Solution Practice
As attempt trajectory, fill in more of table
After few attempts, very close

39
Nearest Neighbor Example

Credit Rating
Classifier Good / Poor
Features
L late payments/yr
R Income/Expenses

Name L R G/P
A 0 1.2 G
B 25 0.4 P
C 5 0.7 G
D 20 0.8 P
E 30 0.85 P
F 11 1.2 G
G 7 1.15 G
H 15 0.8 P
40
Nearest Neighbor Example
Name L R G/P
A 0 1.2 G
A
F
B 25 0.4 P
1
G
R
E
C 5 0.7 G
H
D
C
D 20 0.8 P
E 30 0.85 P
B
F 11 1.2 G
G 7 1.15 G
10
30
20
L
H 15 0.8 P
41
Nearest Neighbor Example
Name L R G/P
I 6 1.15
G
A
F
K
J 22 0.45
1
G
I
P
E
??
K 15 1.2
D
H
R
C
B
J
Distance Measure
Sqrt ((L1-L2)2 sqrt(10)(R1-R2)2)) -
Scaled distance
10
30
20
L
42
Nearest Neighbor Analysis

Problem
Ambiguous labeling, Training Noise
Solution
K-nearest neighbors
Not just single nearest instance
Compare to K nearest neighbors
Label according to majority of K
What should K be?
Often 3, can train as well

43
Text Classification
44
Matching Topics and Documents

Two main perspectives
Pre-defined, fixed, finite topics
Text Classification
Arbitrary topics, typically defined by statement
of information need (aka query)
Information Retrieval

45
Vector Space Information Retrieval

Task
Document collection
Query specifies information need free text
Relevance judgments 0/1 for all docs
Word evidence Bag of words
No ordering information

46
Vector Space Model
Tv
Program
Computer
Two documents computer program, tv program
Query computer program matches 1 st doc
exact distance2 vs 0 educational
program matches both equally distance1
47
Vector Space Model

Represent documents and queries as
Vectors of term-based features
Features tied to occurrence of terms in
collection
E.g.
Solution 1 Binary features t1 if present, 0
otherwise
Similiarity number of terms in common
Dot product

48
Vector Space Model II

Problem Not all terms equally interesting
E.g. the vs dog vs Levow
Solution Replace binary term features with
weights
Document collection term-by-document matrix
View as vector in multidimensional space
Nearby vectors are related
Normalize for vector length

49
Vector Similarity Computation

Similarity Dot product
Normalization
Normalize weights in advance
Normalize post-hoc

50
Term Weighting

Aboutness
To what degree is this term what document is
about?
Within document measure
Term frequency (tf) occurrences of t in doc j
Specificity
How surprised are you to see this term?
Collection frequency
Inverse document frequency (idf)

51
Term Selection Formation

Selection
Some terms are truly useless
Too frequent, no content
E.g. the, a, and,
Stop words ignore such terms altogether
Creation
Too many surface forms for same concepts
E.g. inflections of words verb conjugations,
plural
Stem terms treat all forms as same underlying

52
Efficient Implementations

Classification cost
Find nearest neighbor O(n)
Compute distance between unknown and all
instances
Compare distances
Problematic for large data sets
Alternative
Use binary search to reduce to O(log n)

53
Efficient Implementation K-D Trees

Divide instances into sets based on features
Binary branching E.g. gt value
2d leaves with d split path n
d O(log n)
To split cases into sets,
If there is one element in the set, stop
Otherwise pick a feature to split on
Find average position of two middle objects on
that dimension
Split remaining objects based on average position
Recursively split subsets

54
K-D Trees Classification
Yes
No
No
Yes
Yes
No
No
Yes
No
Yes
No
No
Yes
Yes
Poor
Good
Good
Poor
Good
Good
Poor
Good
55
Efficient ImplementationParallel Hardware