Evolutionary Search - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

Evolutionary Search

Description:

Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004 – PowerPoint PPT presentation

Number of Views:144
Avg rating:3.0/5.0
Slides: 72
Provided by: peopleCsU9
Category:

less

Transcript and Presenter's Notes

Title: Evolutionary Search


1
Evolutionary Search
  • Artificial Intelligence
  • CSPP 56553
  • January 28, 2004

2
Agenda
  • Motivation
  • Evolving a solution
  • Genetic Algorithms
  • Modeling search as evolution
  • Mutation
  • Crossover
  • Survival of the fittest
  • Survival of the most diverse
  • Conclusions

3
Genetic Algorithms Applications
  • Search parameter space for optimal assignment
  • Not guaranteed to find optimal, but can approach
  • Classic optimization problems
  • E.g. Traveling Salesman Problem
  • Program design (Genetic Programming)
  • Aircraft carrier landings

4
Genetic Algorithms Procedure
  • Create an initial population (1 chromosome)
  • Mutate 1 genes in 1 chromosomes
  • Produce one offspring for each chromosome
  • Mate 1 pairs of chromosomes with crossover
  • Add mutated offspring chromosomes to pop
  • Create new population
  • Best randomly selected (biased by fitness)

5
Fitness
  • Natural selection Most fit survive
  • Fitness Probability of survival to next gen
  • Question How do we measure fitness?
  • Standard method Relate fitness to quality
  • 0-1 1-9

Chromosome Quality Fitness

1 4 3 1 1 2 1 1
4 3 2 1
0.4 0.3 0.2 0.1
6
Crossover
  • Genetic design
  • Identify sets of features 2 genes
    floursugar1-9
  • Population How many chromosomes?
  • 1 initial, 4 max
  • Mutation How frequent?
  • 1 gene randomly selected, randomly mutated
  • Crossover Allowed? Yes, select random mates
    cross at middle
  • Duplicates? No
  • Survival Standard method

7
Basic Cookie GACrossover Results
  • Results are for 1000 random trials
  • Initial state 1 1-1, quality 1 chromosome
  • On average, reaches max quality (9) in 14
    generations
  • Conclusion
  • Faster with crossover combine good in each gene
  • Key Global max achievable by maximizing each
    dimension independently - reduce dimensionality

8
Solving the Moat Problem
  • Problem
  • No single step mutation can reach optimal values
    using standard fitness (quality0 gt
    probability0)
  • Solution A
  • Crossover can combine fit parents in EACH gene
  • However, still slow 155 generations on average

9
Questions
  • How can we avoid the 0 quality problem?
  • How can we avoid local maxima?

10
Rethinking Fitness
  • Goal Explicit bias to best
  • Remove implicit biases based on quality scale
  • Solution Rank method
  • Ignore actual quality values except for ranking
  • Step 1 Rank candidates by quality
  • Step 2 Probability of selecting ith candidate,
    given that i-1 candidate not selected, is
    constant p.
  • Step 2b Last candidate is selected if no other
    has been
  • Step 3 Select candidates using the probabilities

11
Rank Method
Chromosome Quality Rank Std. Fitness
Rank Fitness
1 4 1 3 1 2 5 2 7 5
4 3 2 1 0
1 2 3 4 5
0.4 0.3 0.2 0.1 0.0
0.667 0.222 0.074 0.025 0.012
Results Average over 1000 random runs on Moat
problem - 75 Generations (vs 155 for standard
method) No 0 probability entries Based on rank
not absolute quality
12
Diversity
  • Diversity
  • Degree to which chromosomes exhibit different
    genes
  • Rank Standard methods look only at quality
  • Need diversity escape local min, variety for
    crossover
  • As good to be different as to be fit

13
Rank-Space Method
  • Combines diversity and quality in fitness
  • Diversity measure
  • Sum of inverse squared distances in genes
  • Diversity rank Avoids inadvertent bias
  • Rank-space
  • Sort on sum of diversity AND quality ranks
  • Best lower left high diversity quality

14
Rank-Space Method
W.r.t. highest ranked 5-1
Chromosome Q D D Rank Q Rank
Comb Rank R-S Fitness
4 3 2 1 0
1 5 3 4 2
1 2 3 4 5
0.667 0.025 0.222 0.012 0.074
0.04 0.25 0.059 0.062 0.05
1 4 3 1 1 2 1 1 7 5
1 4 2 5 3
Diversity rank breaks ties After select others,
sum distances to both Results Average (Moat) 15
generations
15
Genetic Algorithms
  • Evolution mechanisms as search technique
  • Produce offspring with variation
  • Mutation, Crossover
  • Select fittest to continue to next generation
  • Fitness Probability of survival
  • Standard Quality values only
  • Rank Quality rank only
  • Rank-space Rank of sum of quality diversity
    ranks
  • Large population can be robust to local max

16
Machine LearningNearest Neighbor Information
Retrieval Search
  • Artificial Intelligence
  • CSPP 56553
  • January 28, 2004

17
Agenda
  • Machine learning Introduction
  • Nearest neighbor techniques
  • Applications Robotic motion, Credit rating
  • Information retrieval search
  • Efficient implementations
  • k-d trees, parallelism
  • Extensions K-nearest neighbor
  • Limitations
  • Distance, dimensions, irrelevant attributes

18
Machine Learning
  • Learning Acquiring a function, based on past
    inputs and values, from new inputs to values.
  • Learn concepts, classifications, values
  • Identify regularities in data

19
Machine Learning Examples
  • Pronunciation
  • Spelling of word gt sounds
  • Speech recognition
  • Acoustic signals gt sentences
  • Robot arm manipulation
  • Target gt torques
  • Credit rating
  • Financial data gt loan qualification

20
Machine Learning Characterization
  • Distinctions
  • Are output values known for any inputs?
  • Supervised vs unsupervised learning
  • Supervised training consists of inputs true
    output value
  • E.g. letterspronunciation
  • Unsupervised training consists only of inputs
  • E.g. letters only
  • Course studies supervised methods

21
Machine Learning Characterization
  • Distinctions
  • Are output values discrete or continuous?
  • Discrete Classification
  • E.g. Qualified/Unqualified for a loan application
  • Continuous Regression
  • E.g. Torques for robot arm motion
  • Characteristic of task

22
Machine Learning Characterization
  • Distinctions
  • What form of function is learned?
  • Also called inductive bias
  • Graphically, decision boundary
  • E.g. Single, linear separator
  • Rectangular boundaries - ID trees
  • Vornoi spacesetc

- - -
23
Machine Learning Functions
  • Problem Can the representation effectively model
    the class to be learned?
  • Motivates selection of learning algorithm

For this function, Linear discriminant is
GREAT! Rectangular boundaries (e.g. ID
trees) TERRIBLE! Pick the right representation!
- - - - - - - - -

24
Machine Learning Features
  • Inputs
  • E.g.words, acoustic measurements, financial data
  • Vectors of features
  • E.g. word letters
  • cat L1c L2 a L3 t
  • Financial data F1 late payments/yr Integer
  • F2 Ratio of income to
    expense Real

25
Machine Learning Features
  • Question
  • Which features should be used?
  • How should they relate to each other?
  • Issue 1 How do we define relation in feature
    space if features have different scales?
  • Solution Scaling/normalization
  • Issue 2 Which ones are important?
  • If differ in irrelevant feature, should ignore

26
Complexity Generalization
  • Goal Predict values accurately on new inputs
  • Problem
  • Train on sample data
  • Can make arbitrarily complex model to fit
  • BUT, will probably perform badly on NEW data
  • Strategy
  • Limit complexity of model (e.g. degree of equn)
  • Split training and validation sets
  • Hold out data to check for overfitting

27
Nearest Neighbor
  • Memory- or case- based learning
  • Supervised method Training
  • Record labeled instances and feature-value
    vectors
  • For each new, unlabeled instance
  • Identify nearest labeled instance
  • Assign same label
  • Consistency heuristic Assume that a property is
    the same as that of the nearest reference case.

28
Nearest Neighbor Example
  • Problem Robot arm motion
  • Difficult to model analytically
  • Kinematic equations
  • Relate joint angles and manipulator positions
  • Dynamics equations
  • Relate motor torques to joint angles
  • Difficult to achieve good results modeling
    robotic arms or human arm
  • Many factors measurements

29
Nearest Neighbor Example
  • Solution
  • Move robot arm around
  • Record parameters and trajectory segment
  • Table torques, positions,velocities, squared
    velocities, velocity products, accelerations
  • To follow a new path
  • Break into segments
  • Find closest segments in table
  • Get those torques (interpolate as necessary)

30
Nearest Neighbor Example
  • Issue Big table
  • First time with new trajectory
  • Closest isnt close
  • Table is sparse - few entries
  • Solution Practice
  • As attempt trajectory, fill in more of table
  • After few attempts, very close

31
Roadmap
  • Problem
  • Matching Topics and Documents
  • Methods
  • Classic Vector Space Model
  • Challenge I Beyond literal matching
  • Expansion Strategies
  • Challenge II Authoritative source
  • Page Rank
  • Hubs Authorities

32
Matching Topics and Documents
  • Two main perspectives
  • Pre-defined, fixed, finite topics
  • Text Classification
  • Arbitrary topics, typically defined by statement
    of information need (aka query)
  • Information Retrieval

33
Three Steps to IR
  • Three phases
  • Indexing Build collection of document
    representations
  • Query construction
  • Convert query text to vector
  • Retrieval
  • Compute similarity between query and doc
    representation
  • Return closest match

34
Matching Topics and Documents
  • Documents are about some topic(s)
  • Question Evidence of aboutness?
  • Words !!
  • Possibly also meta-data in documents
  • Tags, etc
  • Model encodes how words capture topic
  • E.g. Bag of words model, Boolean matching
  • What information is captured?
  • How is similarity computed?

35
Models for Retrieval and Classification
  • Plethora of models are used
  • Here
  • Vector Space Model

36
Vector Space Information Retrieval
  • Task
  • Document collection
  • Query specifies information need free text
  • Relevance judgments 0/1 for all docs
  • Word evidence Bag of words
  • No ordering information

37
Vector Space Model
Tv
Program
Computer
Two documents computer program, tv program
Query computer program matches 1 st doc
exact distance2 vs 0 educational
program matches both equally distance1
38
Vector Space Model
  • Represent documents and queries as
  • Vectors of term-based features
  • Features tied to occurrence of terms in
    collection
  • E.g.
  • Solution 1 Binary features t1 if present, 0
    otherwise
  • Similiarity number of terms in common
  • Dot product

39
Question
  • Whats wrong with this?

40
Vector Space Model II
  • Problem Not all terms equally interesting
  • E.g. the vs dog vs Levow
  • Solution Replace binary term features with
    weights
  • Document collection term-by-document matrix
  • View as vector in multidimensional space
  • Nearby vectors are related
  • Normalize for vector length

41
Vector Similarity Computation
  • Similarity Dot product
  • Normalization
  • Normalize weights in advance
  • Normalize post-hoc

42
Term Weighting
  • Aboutness
  • To what degree is this term what document is
    about?
  • Within document measure
  • Term frequency (tf) occurrences of t in doc j
  • Specificity
  • How surprised are you to see this term?
  • Collection frequency
  • Inverse document frequency (idf)

43
Term Selection Formation
  • Selection
  • Some terms are truly useless
  • Too frequent, no content
  • E.g. the, a, and,
  • Stop words ignore such terms altogether
  • Creation
  • Too many surface forms for same concepts
  • E.g. inflections of words verb conjugations,
    plural
  • Stem terms treat all forms as same underlying

44
Key Issue
  • All approaches operate on term matching
  • If a synonym, rather than original term, is used,
    approach fails
  • Develop more robust techniques
  • Match concept rather than term
  • Expansion approaches
  • Add in related terms to enhance matching
  • Mapping techniques
  • Associate terms to concepts
  • Aspect models, stemming

45
Expansion Techniques
  • Can apply to query or document
  • Thesaurus expansion
  • Use linguistic resource thesaurus, WordNet to
    add synonyms/related terms
  • Feedback expansion
  • Add terms that should have appeared
  • User interaction
  • Direct or relevance feedback
  • Automatic pseudo relevance feedback

46
Query Refinement
  • Typical queries very short, ambiguous
  • Cat animal/Unix command
  • Add more terms to disambiguate, improve
  • Relevance feedback
  • Retrieve with original queries
  • Present results
  • Ask user to tag relevant/non-relevant
  • push toward relevant vectors, away from nr
  • ß?1 (0.75,0.25) r rel docs, s non-rel docs
  • Roccio expansion formula

47
Compression Techniques
  • Reduce surface term variation to concepts
  • Stemming
  • Map inflectional variants to root
  • E.g. see, sees, seen, saw -gt see
  • Crucial for highly inflected languages Czech,
    Arabic
  • Aspect models
  • Matrix representations typically very sparse
  • Reduce dimensionality to small key aspects
  • Mapping contextually similar terms together
  • Latent semantic analysis

48
Authoritative Sources
  • Based on vector space alone, what would you
    expect to get searching for search engine?
  • Would you expect to get Google?

49
Issue
  • Text isnt always best indicator of content
  • Example
  • search engine
  • Text search -gt review of search engines
  • Term doesnt appear on search engine pages
  • Term probably appears on many pages that point to
    many search engines

50
Hubs Authorities
  • Not all sites are created equal
  • Finding better sites
  • Question What defines a good site?
  • Authoritative
  • Not just content, but connections!
  • One that many other sites think is good
  • Site that is pointed to by many other sites
  • Authority

51
Conferring Authority
  • Authorities rarely link to each other
  • Competition
  • Hubs
  • Relevant sites point to prominent sites on topic
  • Often not prominent themselves
  • Professional or amateur
  • Good Hubs Good Authorities

52
Computing HITS
  • Finding Hubs and Authorities
  • Two steps
  • Sampling
  • Find potential authorities
  • Weight-propagation
  • Iteratively estimate best hubs and authorities

53
Sampling
  • Identify potential hubs and authorities
  • Connected subsections of web
  • Select root set with standard text query
  • Construct base set
  • All nodes pointed to by root set
  • All nodes that point to root set
  • Drop within-domain links
  • 1000-5000 pages

54
Weight-propagation
  • Weights
  • Authority weight
  • Hub weight
  • All weights are relative
  • Updating
  • Converges
  • Pages with high x good authorities y good hubs

55
Googles PageRank
  • Identifies authorities
  • Important pages are those pointed to by many
    other pages
  • Better pointers, higher rank
  • Ranks search results
  • tpage pointing to A C(t) number of outbound
    links
  • ddamping measure
  • Actual ranking on logarithmic scale
  • Iterate

56
Contrasts
  • Internal links
  • Large sites carry more weight
  • If well-designed
  • HA ignores site-internals
  • Outbound links explicitly penalized
  • Lots of tweaks.

57
Web Search
  • Search by content
  • Vector space model
  • Word-based representation
  • Aboutness and Surprise
  • Enhancing matches
  • Simple learning model
  • Search by structure
  • Authorities identified by link structure of web
  • Hubs confer authority

58
Nearest Neighbor Example II
  • Credit Rating
  • Classifier Good / Poor
  • Features
  • L late payments/yr
  • R Income/Expenses

Name L R G/P
A 0 1.2 G
B 25 0.4 P
C 5 0.7 G
D 20 0.8 P
E 30 0.85 P
F 11 1.2 G
G 7 1.15 G
H 15 0.8 P
59
Nearest Neighbor Example II
Name L R G/P
A 0 1.2 G
A
F
B 25 0.4 P
1
G
R
E
C 5 0.7 G
H
D
C
D 20 0.8 P
E 30 0.85 P
B
F 11 1.2 G
G 7 1.15 G
10
30
20
L
H 15 0.8 P
60
Nearest Neighbor Example II
Name L R G/P
I 6 1.15
G
A
F
K
J 22 0.45
1
G
I
P
E
??
K 15 1.2
D
H
R
C
B
J
Distance Measure
Sqrt ((L1-L2)2 sqrt(10)(R1-R2)2)) -
Scaled distance
10
30
20
L
61
Efficient Implementations
  • Classification cost
  • Find nearest neighbor O(n)
  • Compute distance between unknown and all
    instances
  • Compare distances
  • Problematic for large data sets
  • Alternative
  • Use binary search to reduce to O(log n)

62
Efficient Implementation K-D Trees
  • Divide instances into sets based on features
  • Binary branching E.g. gt value
  • 2d leaves with d split path n
  • d O(log n)
  • To split cases into sets,
  • If there is one element in the set, stop
  • Otherwise pick a feature to split on
  • Find average position of two middle objects on
    that dimension
  • Split remaining objects based on average position
  • Recursively split subsets

63
K-D Trees Classification
Yes
No
No
Yes
Yes
No
No
Yes
No
Yes
No
No
Yes
Yes
Poor
Good
Good
Poor
Good
Good
Poor
Good
64
Efficient ImplementationParallel Hardware
  • Classification cost
  • distance computations
  • Const time if O(n) processors
  • Cost of finding closest
  • Compute pairwise minimum, successively
  • O(log n) time

65
Nearest Neighbor Issues
  • Prediction can be expensive if many features
  • Affected by classification, feature noise
  • One entry can change prediction
  • Definition of distance metric
  • How to combine different features
  • Different types, ranges of values
  • Sensitive to feature selection

66
Nearest Neighbor Analysis
  • Problem
  • Ambiguous labeling, Training Noise
  • Solution
  • K-nearest neighbors
  • Not just single nearest instance
  • Compare to K nearest neighbors
  • Label according to majority of K
  • What should K be?
  • Often 3, can train as well

67
Nearest Neighbor Analysis
  • Issue
  • What is a good distance metric?
  • How should features be combined?
  • Strategy
  • (Typically weighted) Euclidean distance
  • Feature scaling Normalization
  • Good starting point
  • (Feature - Feature_mean)/Feature_standard_deviatio
    n
  • Rescales all values - Centered on 0 with std_dev 1

68
Nearest Neighbor Analysis
  • Issue
  • What features should we use?
  • E.g. Credit rating Many possible features
  • Tax bracket, debt burden, retirement savings,
    etc..
  • Nearest neighbor uses ALL
  • Irrelevant feature(s) could mislead
  • Fundamental problem with nearest neighbor

69
Nearest Neighbor Advantages
  • Fast training
  • Just record feature vector - output value set
  • Can model wide variety of functions
  • Complex decision boundaries
  • Weak inductive bias
  • Very generally applicable

70
Summary
  • Machine learning
  • Acquire function from input features to value
  • Based on prior training instances
  • Supervised vs Unsupervised learning
  • Classification and Regression
  • Inductive bias
  • Representation of function to learn
  • Complexity, Generalization, Validation

71
Summary Nearest Neighbor
  • Nearest neighbor
  • Training record input vectors output value
  • Prediction closest training instance to new data
  • Efficient implementations
  • Pros fast training, very general, little bias
  • Cons distance metric (scaling), sensitivity to
    noise extraneous features
Write a Comment
User Comments (0)
About PowerShow.com