Title: Evolutionary Search
1Evolutionary Search
- Artificial Intelligence
- CMSC 25000
- January 25, 2007
2Agenda
- Motivation
- Evolving a solution
- Genetic Algorithms
- Modelling search as evolution
- Mutation
- Crossover
- Survival of the fittest
- Survival of the most diverse
- Conclusions
3Motivation Evolution
- Evolution through natural selection
- Individuals pass on traits to offspring
- Individuals have different traits
- Fittest individuals survive to produce more
offspring - Over time, variation can accumulate
- Leading to new species
4Simulated Evolution
- Evolving a solution
- Begin with population of individuals
- Individuals candidate solutions chromosomes
- Produce offspring with variation
- Mutation change features
- Crossover exchange features between individuals
- Apply natural selection
- Select best individuals to go on to next
generation - Continue until satisfied with solution
5Genetic Algorithms Applications
- Search parameter space for optimal assignment
- Not guaranteed to find optimal, but can approach
- Classic optimization problems
- E.g. Travelling Salesman Problem
- Program design (Genetic Programming)
- Aircraft carrier landings
6Genetic Algorithm Example
- Cookie recipes (Winston, AI, 1993)
- As evolving populations
- Individual batch of cookies
- Quality 0-9
- Chromosomes 2 genes 1 chromosome each
- Flour Quantity, Sugar Quantity 1-9
- Mutation
- Randomly select Flour/Sugar /- 1 1-9
- Crossover
- Split 2 chromosomes rejoin keeping both
7Fitness
- Natural selection Most fit survive
- Fitness Probability of survival to next gen
- Question How do we measure fitness?
- Standard method Relate fitness to quality
- 0-1 1-9
Chromosome Quality Fitness
1 4 3 1 1 2 1 1
4 3 2 1
0.4 0.3 0.2 0.1
8GA Design Issues
- Genetic design
- Identify sets of features genes Constraints?
- Population How many chromosomes?
- Too few gt inbreeding Too manygttoo slow
- Mutation How frequent?
- Too fewgtslow change Too manygt wild
- Crossover Allowed? How selected?
- Duplicates?
9GA Design Basic Cookie GA
- Genetic design
- Identify sets of features 2 genes
floursugar1-9 - Population How many chromosomes?
- 1 initial, 4 max
- Mutation How frequent?
- 1 gene randomly selected, randomly mutated
- Crossover Allowed? No
- Duplicates? No
- Survival Standard method
10Basic Cookie GA Results
- Results are for 1000 random trials
- Initial state 1 1-1, quality 1 chromosome
- On average, reaches max quality (9) in 16
generations - Best max quality in 8 generations
- Conclusion
- Low dimensionality search
- Successful even without crossover
11Basic Cookie GACrossover Results
- Results are for 1000 random trials
- Initial state 1 1-1, quality 1 chromosome
- On average, reaches max quality (9) in 14
generations - Conclusion
- Faster with crossover combine good in each gene
- Key Global max achievable by maximizing each
dimension independently - reduce dimensionality
12Solving the Moat Problem
- Problem
- No single step mutation can reach optimal values
using standard fitness (quality0 gt
probability0) - Solution A
- Crossover can combine fit parents in EACH gene
- However, still slow 155 generations on average
13Questions
- How can we avoid the 0 quality problem?
- How can we avoid local maxima?
14Rethinking Fitness
- Goal Explicit bias to best
- Remove implicit biases based on quality scale
- Solution Rank method
- Ignore actual quality values except for ranking
- Step 1 Rank candidates by quality
- Step 2 Probability of selecting ith candidate,
given that i-1 candidate not selected, is
constant p. - Step 2b Last candidate is selected if no other
has been - Step 3 Select candidates using the probabilities
15Rank Method
Chromosome Quality Rank Std. Fitness
Rank Fitness
1 4 1 3 1 2 5 2 7 5
4 3 2 1 0
1 2 3 4 5
0.4 0.3 0.2 0.1 0.0
0.667 0.222 0.074 0.025 0.012
Results Average over 1000 random runs on Moat
problem - 75 Generations (vs 155 for standard
method) No 0 probability entries Based on rank
not absolute quality
16Diversity
- Diversity
- Degree to which chromosomes exhibit different
genes - Rank Standard methods look only at quality
- Need diversity escape local min, variety for
crossover - As good to be different as to be fit
17Rank-Space Method
- Combines diversity and quality in fitness
- Diversity measure
- Sum of inverse squared distances in genes
- Diversity rank Avoids inadvertent bias
- Rank-space
- Sort on sum of diversity AND quality ranks
- Best lower left high diversity quality
18Rank-Space Method
W.r.t. highest ranked 5-1
Chromosome Q D D Rank Q Rank
Comb Rank R-S Fitness
4 3 2 1 0
1 5 3 4 2
1 2 3 4 5
0.667 0.025 0.222 0.012 0.074
0.04 0.25 0.059 0.062 0.05
1 4 3 1 1 2 1 1 7 5
1 4 2 5 3
Diversity rank breaks ties After select others,
sum distances to both Results Average (Moat) 15
generations
19GAs and Local Maxima
- Quality metrics only
- Susceptible to local max problems
- Quality Diversity
- Can populate all local maxima
- Including global max
- Key Population must be large enough
20GA Discussion
- Similar to stochastic local beam search
- Beam Population size
- Stochastic selection mutation
- Local Each generation from single previous
- Key difference Crossover 2 sources!
- Why crossover?
- Schema Partial local subsolutions
- E.g. 2 halves of TSP tour
21Question
- Traveling Salesman Problem
- CSP-style Iterative refinement
- Genetic Algorithm
- N-Queens
- CSP-style Iterative refinement
- Genetic Algorithm
22Iterative Improvement Example
- TSP
- Start with some valid tour
- E.g. find greedy solution
- Make incremental change to tour
- E.g. hill-climbing - take change that produces
greatest improvement - Problem Local minima
- Solution Randomize to search other parts of
space - Other methods Simulated annealing, Genetic algs
23Machine LearningNearest Neighbor Information
Retrieval Search
- Artificial Intelligence
- CMSC 25000
- January 25, 2007
24Agenda
- Machine learning Introduction
- Nearest neighbor techniques
- Applications
- Credit rating
- Text Classification
- K-nn
- Issues
- Distance, dimensions, irrelevant attributes
- Efficiency
- k-d trees, parallelism
25Machine Learning
- Learning Acquiring a function, based on past
inputs and values, from new inputs to values. - Learn concepts, classifications, values
- Identify regularities in data
26Machine Learning Examples
- Pronunciation
- Spelling of word gt sounds
- Speech recognition
- Acoustic signals gt sentences
- Robot arm manipulation
- Target gt torques
- Credit rating
- Financial data gt loan qualification
27Machine Learning Characterization
- Distinctions
- Are output values known for any inputs?
- Supervised vs unsupervised learning
- Supervised training consists of inputs true
output value - E.g. letterspronunciation
- Unsupervised training consists only of inputs
- E.g. letters only
- Course studies supervised methods
28Machine Learning Characteristics
- Many machine learning techniques
- Supervised vs Unsupervised
- Supervised Input true labels
- Unsupervised Input ONLY
- Classification vs Regression
- Classification Output is from finite label set
- Regression Output is continuous valued
- Decision Boundary
- What function is learned? Inductive Bias
- Linear, Rectangular, Vornoi diagram
- Input features
- Discrete? Continuous? Which ones? Scaling?
29Machine Learning Characterization
- Distinctions
- Are output values discrete or continuous?
- Discrete Classification
- E.g. Qualified/Unqualified for a loan application
- Continuous Regression
- E.g. Torques for robot arm motion
- Characteristic of task
30Machine Learning Characterization
- Distinctions
- What form of function is learned?
- Also called inductive bias
- Graphically, decision boundary
- E.g. Single, linear separator
- Rectangular boundaries - ID trees
- Vornoi spacesetc
- - -
31Machine Learning Functions
- Problem Can the representation effectively model
the class to be learned? - Motivates selection of learning algorithm
For this function, Linear discriminant is
GREAT! Rectangular boundaries (e.g. ID
trees) TERRIBLE! Pick the right representation!
- - - - - - - - -
32Machine Learning Features
- Inputs
- E.g.words, acoustic measurements, financial data
- Vectors of features
- E.g. word letters
- cat L1c L2 a L3 t
- Financial data F1 late payments/yr Integer
- F2 Ratio of income to
expense Real
33Machine Learning Features
- Question
- Which features should be used?
- How should they relate to each other?
- Issue 1 How do we define relation in feature
space if features have different scales? - Solution Scaling/normalization
- Issue 2 Which ones are important?
- If differ in irrelevant feature, should ignore
34Complexity Generalization
- Goal Predict values accurately on new inputs
- Problem
- Train on sample data
- Can make arbitrarily complex model to fit
- BUT, will probably perform badly on NEW data
- Strategy
- Limit complexity of model (e.g. degree of equn)
- Split training and validation sets
- Hold out data to check for overfitting
35Nearest Neighbor
- Memory- or case- based learning
- Supervised method Training
- Record labeled instances and feature-value
vectors - For each new, unlabeled instance
- Identify nearest labeled instance
- Assign same label
- Consistency heuristic Assume that a property is
the same as that of the nearest reference case.
36Nearest Neighbor Example
- Problem Robot arm motion
- Difficult to model analytically
- Kinematic equations
- Relate joint angles and manipulator positions
- Dynamics equations
- Relate motor torques to joint angles
- Difficult to achieve good results modeling
robotic arms or human arm - Many factors measurements
37Nearest Neighbor Example
- Solution
- Move robot arm around
- Record parameters and trajectory segment
- Table torques, positions,velocities, squared
velocities, velocity products, accelerations - To follow a new path
- Break into segments
- Find closest segments in table
- Get those torques (interpolate as necessary)
38Nearest Neighbor Example
- Issue Big table
- First time with new trajectory
- Closest isnt close
- Table is sparse - few entries
- Solution Practice
- As attempt trajectory, fill in more of table
- After few attempts, very close
39Nearest Neighbor Example
- Credit Rating
- Classifier Good / Poor
- Features
- L late payments/yr
- R Income/Expenses
Name L R G/P
A 0 1.2 G
B 25 0.4 P
C 5 0.7 G
D 20 0.8 P
E 30 0.85 P
F 11 1.2 G
G 7 1.15 G
H 15 0.8 P
40Nearest Neighbor Example
Name L R G/P
A 0 1.2 G
A
F
B 25 0.4 P
1
G
R
E
C 5 0.7 G
H
D
C
D 20 0.8 P
E 30 0.85 P
B
F 11 1.2 G
G 7 1.15 G
10
30
20
L
H 15 0.8 P
41Nearest Neighbor Example
Name L R G/P
I 6 1.15
G
A
F
K
J 22 0.45
1
G
I
P
E
??
K 15 1.2
D
H
R
C
B
J
Distance Measure
Sqrt ((L1-L2)2 sqrt(10)(R1-R2)2)) -
Scaled distance
10
30
20
L
42Nearest Neighbor Analysis
- Problem
- Ambiguous labeling, Training Noise
- Solution
- K-nearest neighbors
- Not just single nearest instance
- Compare to K nearest neighbors
- Label according to majority of K
- What should K be?
- Often 3, can train as well
43Text Classification
44Matching Topics and Documents
- Two main perspectives
- Pre-defined, fixed, finite topics
- Text Classification
- Arbitrary topics, typically defined by statement
of information need (aka query) - Information Retrieval
45Vector Space Information Retrieval
- Task
- Document collection
- Query specifies information need free text
- Relevance judgments 0/1 for all docs
- Word evidence Bag of words
- No ordering information
46Vector Space Model
Tv
Program
Computer
Two documents computer program, tv program
Query computer program matches 1 st doc
exact distance2 vs 0 educational
program matches both equally distance1
47Vector Space Model
- Represent documents and queries as
- Vectors of term-based features
- Features tied to occurrence of terms in
collection - E.g.
- Solution 1 Binary features t1 if present, 0
otherwise - Similiarity number of terms in common
- Dot product
48Vector Space Model II
- Problem Not all terms equally interesting
- E.g. the vs dog vs Levow
- Solution Replace binary term features with
weights - Document collection term-by-document matrix
- View as vector in multidimensional space
- Nearby vectors are related
- Normalize for vector length
49Vector Similarity Computation
- Similarity Dot product
- Normalization
- Normalize weights in advance
- Normalize post-hoc
50Term Weighting
- Aboutness
- To what degree is this term what document is
about? - Within document measure
- Term frequency (tf) occurrences of t in doc j
- Specificity
- How surprised are you to see this term?
- Collection frequency
- Inverse document frequency (idf)
51Term Selection Formation
- Selection
- Some terms are truly useless
- Too frequent, no content
- E.g. the, a, and,
- Stop words ignore such terms altogether
- Creation
- Too many surface forms for same concepts
- E.g. inflections of words verb conjugations,
plural - Stem terms treat all forms as same underlying
52Efficient Implementations
- Classification cost
- Find nearest neighbor O(n)
- Compute distance between unknown and all
instances - Compare distances
- Problematic for large data sets
- Alternative
- Use binary search to reduce to O(log n)
53Efficient Implementation K-D Trees
- Divide instances into sets based on features
- Binary branching E.g. gt value
- 2d leaves with d split path n
- d O(log n)
- To split cases into sets,
- If there is one element in the set, stop
- Otherwise pick a feature to split on
- Find average position of two middle objects on
that dimension - Split remaining objects based on average position
- Recursively split subsets
54K-D Trees Classification
Yes
No
No
Yes
Yes
No
No
Yes
No
Yes
No
No
Yes
Yes
Poor
Good
Good
Poor
Good
Good
Poor
Good
55Efficient ImplementationParallel Hardware
- Classification cost
- distance computations
- Const time if O(n) processors
- Cost of finding closest
- Compute pairwise minimum, successively
- O(log n) time
56Nearest Neighbor Issues
- Prediction can be expensive if many features
- Affected by classification, feature noise
- One entry can change prediction
- Definition of distance metric
- How to combine different features
- Different types, ranges of values
- Sensitive to feature selection
57Nearest Neighbor Analysis
- Issue
- What is a good distance metric?
- How should features be combined?
- Strategy
- (Typically weighted) Euclidean distance
- Feature scaling Normalization
- Good starting point
- (Feature - Feature_mean)/Feature_standard_deviatio
n - Rescales all values - Centered on 0 with std_dev 1
58Nearest Neighbor Analysis
- Issue
- What features should we use?
- E.g. Credit rating Many possible features
- Tax bracket, debt burden, retirement savings,
etc.. - Nearest neighbor uses ALL
- Irrelevant feature(s) could mislead
- Fundamental problem with nearest neighbor
59Nearest Neighbor Advantages
- Fast training
- Just record feature vector - output value set
- Can model wide variety of functions
- Complex decision boundaries
- Weak inductive bias
- Very generally applicable
60Summary
- Machine learning
- Acquire function from input features to value
- Based on prior training instances
- Supervised vs Unsupervised learning
- Classification and Regression
- Inductive bias
- Representation of function to learn
- Complexity, Generalization, Validation