Title: Artificial Intelligence Informed search and exploration
1Artificial IntelligenceInformed search and
exploration
- Fall 2008
- professor Luigi Ceccaroni
2Heuristic search
- Heuristic function, h(n) it estimates the
lowest cost from the n state to a goal state. - At each moment, most promising states are
selected. - Finding a solution (if it exist) is not
guaranteed. - Types of heuristic search
- BB (Branch Bound), Best-first search
- A, A
- IDA
- Local search
- hill climbing
- simulated annealing
- genetic algorithms
2
3Branch Bound
- For each state, the cost is kept of getting from
the initial state to that state (g). - The global, minimum cost is kept, too, and guides
the search. - A branch is abandoned if its cost is greater than
the current minimum.
3
4Best-first search
- Priority is given by the heuristic function
(estimation of the cost from a given state to a
solution). - At each iteration the node with minimum heuristic
function is chosen. - The optimal solution is not guaranteed.
5Greedy best-first search
6Importance of the estimator
Operations - put a free block on the table
- put a free block on another free block
Estimator H1 - add 1 for each block which
is on the right block - subtract 1
otherwise
Initial state H1 4 H2 -28
Estimator H2 - for each block, if the
underlying structure is correct, add 1 for each
block of that structure - otherwise,
subtract 1 for each block of the structure
Final state H1 8 H2 28 ( 7654321)
7Initial state H1 4 H2 -28
H1 ? H2 ?
H1 ? H2 ?
H1 ? H2 ?
8Initial state H1 4 H2 -28
H1 6 H2 -21
H1 4 H2 -15
H1 4 H2 -16
9Heuristic functions
Initial state
Possible heuristic functions h(n) w(n)
misplaced h(n) p(n) sum of distances to
the final position h(n) p(n) 3 s(n) where
s(n) is obtained cycling over non-central
positions and adding 2 if a tile is not followed
by the right one and adding 1 if there is a tile
in the center
2 8 3 1 6 4 7
5
1 2 3 8 4 7
6 5
Final state
10A and A algorithms
A
A
11A algorithm
12A algorithm
- The priority is given by the estimation function
f(n)g(n)h(n). - At each iteration, the best estimated path is
chosen (the first element in the queue). - A is an instance of best-first search
algorithms. - A is complete when the branching factor is
finished and each operator has a constant,
positive cost.
13A treatment of repeated states
- If the repeated state is in the structure of open
nodes - If its new cost (g) is lower, the new cost is
used possibly changing its position in the
structure of open nodes. - If its new cost (g) is equal or greater, the node
is forgotten. - If the repeated state is in the structure of
closed nodes - If its new cost (g) is lower, the node is
reopened and inserted in the structure of open
nodes with the new cost. Nothing is done with its
successors they will be reopened if necessary. - If its new cost (g) is equal or greater, the node
is forgotten.
13
14Example Romania with step costs in km
15A search example
16A search example
17A search example
18A search example
19A search example
20A search example
21A admissibility
- The A algorithm, depending on the heuristic
function, finds or not an optimal solution. - If the heuristic function is admissible, the
optimization is granted. - A heuristic function is admissible if it
satisfies the following property - ?n 0 h(n) h(n)
- h(n) has to be an optimistic estimator it never
has to overestimate h(n). - Using an admissible heuristic function guarantees
that a node on the optimal path never seems too
bad and that it is considered at some point in
time.
22Example no admissibility
- h3
- /\
- / \
- h2 h4
-
-
- h1 h1
-
-
- h1 goal
-
-
- h1
-
-
- goal
23Optimality of A
- A expands nodes in order of increasing f value.
- Gradually adds "f-contours" of nodes.
- Contour i has all nodes with ffi, where fi lt
fi1.
24A1, A2 admissible ? n?final 0 h2(n) lt
h1(n) h(n) ? A1 more informed than A2
25(No Transcript)
26More informed algorithms
- Compromise between
- Calculation time in h
- h1(n) could require more calculation time than
h2(n) ! - Re-expansions
- A1 may re-expand more nodes than A2 !
- Not if A1 is consistent
- Not if trees (instead of graphs) are considered
- Loss of admissibility
- Working with non admissible heuristic functions
can be interesting to gain speed.
271
3
2
1
3
2
4
8
4
8
5
6
7
5
6
7
28(No Transcript)
291
3
A3 no admissible
2
4
8
5
6
7
30Memory bounded search
- The A algorithm solves problems in which it is
necessary to find the best solution. - Its cost in space and time, in average and if the
heuristic function is adequate, is better than
that of blind-search algorithms. - There exist problems in which the size of the
search space does not allow a solution with A. - There exist algorithms which allow to solve
problems limiting the memory used - Iterative deepening A (IDA)
- Recursive best-first
- Simplified memory-bounded A (SMA)
31Iterative deepening A (IDA)
- Iterative deepening A is similar to the
iterative deepening (ID) technique. - In ID the limit is given by a maximum depth.
- In IDA the limit is given by a maximum value of
the f-cost. - Important The search is a standard depth-first
the f-cost is used only to limit the expansion. - Starting limit f (initial)
(deepening expansion limit)
02
11
12
21
21
31
31
40
41
goal
50
goal
32Iterative deepening A (IDA)
- Iterative deepening A is similar to the
iterative deepening (ID) technique. - In ID the limit is given by a maximum depth.
- In IDA the limit is given by a maximum value of
the f-cost. - Important The search is a standard depth-first
the f-cost is used only to limit the expansion. - Starting limit f (initial)
(deepening expansion limit)
(1,3,8)
02
11
(2,4,9) (5,10) (11)
(6,12) (7,13) (14) (15)
12
21
21
31
31
40
41
goal
50
goal
33IDA algorithm
34IDA algorithm
Algorithm IDA depthf(Initial_state) While
not is_final?(Current) do Open_states.insert(I
nitial_state) Current Open_states.first()
While not is_final?(Current) and not
Open_states.empty?() do Open_states.delete_
first() Closed_states.insert(Current)
Successors generate_successors(Current,
depth) Successors process_repeated(Success
ors, Closed_states, Open_states)
Open_states.insert(Successors) Current
Open_states.first() eWhile depthdepth1
Open_states.initialize() eWhile eAlgorithm
- The function generate_successors only generate
those with an f-cost less or equal to the cutoff
limit of the iteration. - The OPEN structure is a stack (depth-first
search). - If repeated nodes are processed there is no
space saving. - Only the current path (tree branch) is saved in
memory.
35Other memory-bounded algorithms
- IDAs re-expansions can represent a high
temporal cost. - There are algorithms which, in general, expand
less nodes. - Their functioning is based on eliminating less
promising nodes and saving information which
allows to re-expand them (in necessary). - Examples
- Recursive best-first
- Memory bound A (MA)
35
36Recursive best-first
- It is a recursive implementation of best-first,
with lineal spatial cost. - It forgets a branch when its cost is more than
the best alternative. - The cost of the forgotten branch is stored in the
parent node as its new cost. - The forgotten branch is re-expanded if its cost
becomes the best one again.
36
37Recursive best-first example
38Recursive best-first example
39Recursive best-first
- In general, it expands less nodes than IDA.
- Not being able to control repeated states, its
cost in time can be high if there are loops. - Sometimes, memory restrictions can be relaxed.
39
40Memory bound A (MA)
- It imposes a memory limit number of nodes which
can be stored. - A is used for exploration and nodes are stored
while there is memory space. - When there is no more space, the worst nodes are
eliminated, keeping the best cost of forgotten
descendants. - The forgotten branches are re-expanded if their
cost becomes the best one again. - MA is complete if the solution path fits in
memory.
40
41Local search algorithms and optimization problems
- In local search (LS), from a (generally random)
initial configuration, via iterative improvements
(by operators application), a state is reached
from which no better state can be attained. - LS algorithms (or meta-heuristics or local
optimization methods) are prone to find local
optima, which are not the best possible solution.
The global optimum is generally impossible to be
reached in limited time. - In LS, there is a function to evaluate the
quality of the states, but this is not
necessarily related to a cost.
42Local search algorithms
43Local search algorithms
- These algorithms do not systematically explore
all the state space. - The heuristic (or evaluation) function is used to
reduce the search space (not considering states
which are not worth being explored). - Algorithms do not usually keep track of the path
traveled. The memory cost is minimal. - This total lack of memory can be a problem (i.e.,
cycles).
44Hill-climbing search
- Standard hill-climbing search algorithm
- It is a simple loop which search for and select
any operation that improves the current state. - Steepest-ascent hill climbing or gradient search
- The best move (not just any one) that improves
the current state is selected.
45Steepest-ascent hill-climbing algorithm
46Steepest-ascent hill-climbing algorithm
Algorithm Hill Climbing Current Initial_state
end false While end do Children
generate_successors(Current) Children
order_and_eliminate_worse_ones(Children,
Current) if empty?(Children) then Current
Select_best(Children) else endtrue
eWhile eAlgorithm
47Hill climbing
- Children are considered only if their evaluation
function is better than the one of the parent
(reduction of the search space). - A stack could be used to save children which are
better than the parent, to be able to backtrack
but in general the cost of this is prohibitive. - The characteristics of the heuristic function
determine the success and the rapidity of the
search. - It is possible that the algorithm does not find
the best solution - Local optima no successor has a better
evaluation - Plateaux all successors has the same evaluation
48Simulated annealing
- It is a stochastic hill-climbing algorithm
(stochastic local search, SLS) - A successor is selected among all possible
successors according to a probability
distribution. - The successor can be worse than the current
state. - Random steps are taken in the state space.
- It is inspired by the physical process of
controlled cooling (crystallization, metal
annealing) - A metal is heated up to a high temperature and
then is progressively cooled in a controlled way. - If the cooling is adequate, the minimum-energy
structure (a global minimum) is obtained.
49Simulated annealing
- Aim to avoid local optima, which represent a
problem in hill climbing.
50Simulated annealing
- Solution to take, occasionally, steps in a
different direction from the one in which the
increase (or decrease) of energy is maximum.
51Simulated annealing methodology
- Terminology from the physical problem is often
used. - Temperature (T) is a control parameter.
- Energy (f or E) is a heuristic function about the
quality of a state. - A function F decides about the selection of a
successor state and depends on T and the
difference between the quality of the nodes (?f
) the lower the temperature, the lower the
probability of selecting worse successors. - The cooling strategy determines
- maximum number of iterations in the search
process - temperature-decrease steps
- number of iterations for each step
52Simulated annealing - Basic algorithm
53Simulated annealing - Basic algorithm
- An initial temperature is defined.
- While the temperature is not zero do
- / Random moves in the state space /
- For a predefined number of iterations do
- LnewGenerate_successor(Lcurrent)
- if F(f(Lactual)- f(Lnew),T) gt 0 then
LactualLnew - eFor
- The temperature is decreased.
- eWhile
54Simulated annealing
55Simulated annealing
- Main idea Steps taken in random directions do
not decrease (but actually increase) the ability
of finding a global optimum. - Disadvantage The very structure of the algorithm
increases the execution time. - Advantage The random steps possibly allow to
avoid small hills. - Temperature It determines (through a probability
function) the amplitude of the steps, long at the
beginning, and then shorter and shorter. - Annealing When the amplitude of the random step
is sufficiently small not to allow to descend the
hill under consideration, the result of the
algorithm is said to be annealed.
56Simulated annealing
- It is possible to demonstrate that, if the
temperature of the algorithm is reduced very
slowly, a global maximum will be found with a
probability close to 1 - Value of the energy function (E) in the global
maximum m - Value of E in the best local maximum l lt m
- There will be some temperature t high enough to
allow to descend from the local maximum, but not
from the global maximum. - If the temperature is reduced very slowly, the
algorithm will work enough time with a
temperature close to t, until it finds and ascend
the global maximum and the solution will stay
there because there wont be available steps long
enough to descend from it.
57Simulated annealing
- Conclusion During the resolution of a search
problem, a node should occasionally be explored,
which appears substantially worse than the best
node in the L list of OPEN nodes.
58Hill climbing and simulated annealing
applications
- Minimum spanning tree (MST, SST) A
minimum-weight tree in a weighted graph which
contains all of the graph's vertices. - Traveling salesman (TSP) Find a path through a
weighted graph which starts and ends at the same
vertex, includes every other vertex exactly once,
and minimizes the total cost of edges.
59Simulated annealing applications TSP
- Search space N!
- Operators define transformations between
solutions inversion, displacement, interchange - Energy function sum of distances between cities,
ordered according to the solution. - An initial temperature is defined. (It may need
some experimentation.) - The number of iteration per temperature value is
defined. - The way to decrease the temperature is defined.
60Hill climbing and simulated annealing
applications
- Euclidean Steiner tree A tree of minimum
Euclidean distance connecting a set of points,
called terminals, in the plane. This tree may
include points other than the terminals. - Steiner tree A minimum-weight tree connecting a
designated set of vertices, called terminals, in
an undirected, weighted graph or points in a
space. The tree may include non-terminals, which
are called Steiner vertices or Steiner points.
61Hill climbing and simulated annealing
applications
- Bin packing problem Determine how to put the
most objects in the least number of fixed space
bins. More formally, find a partition and
assignment of a set of objects such that a
constraint is satisfied or an objective function
is minimized (or maximized). There are many
variants, such as, 3D, 2D, linear, pack by
volume, pack by weight, minimize volume, maximize
value, fixed shape objects. - Knapsack problem Given items of different values
and volumes, find the most valuable set of items
that fit in a knapsack of fixed volume.
62Simulated annealing conclusions
- It is suitable for problems in which the global
optimum is surrounded by many local optima. - It is suitable for problems in which it is
difficult to find a good heuristic function. - Determining the values of the parameters can be a
problem and requires experimentation.
63Local beam search
- Keeping just one node in memory may be an extreme
reaction to the problem of memory limits. - The local beam search keeps track of k nodes
- It starts with k states randomly generated
- At each step all successors of the k states are
generated. - It is checked if some state is a goal.
- If not, the k best successors from the complete
list are selected and the process is repeated.
63
64Local beam search
- There is a difference between k random executions
in parallel and in sequence! - If a state generates various good successors, the
algorithm quickly abandons unsuccessful searches
and moves its resources where major progress is
made. - It can suffer from a lack of diversity among the
k states (concentrated in a small region of the
state space) and become an expensive version of
hill climbing.
64
65Stochastic beam search
- It is similar to local beam search.
- Instead of choosing the k best successors, k
successors are chosen randomly - The probability of choosing a successor grows
with its fitness function. - Analogy with the process of natural selection
successors (descendants) of a state (organism)
populate the following generation as a function
of their value (fitness, health, adaptability).
65
66Genetic algorithms
- A genetic algorithm (GA) is a variant of
stochastic beam search, in which two parent
states are combined. - Inspired by the process of natural selection
- Living beings adapt to the environment thanks to
the characteristics inherited from their parents. - The possibility of survival and reproduction are
proportional to the goodness of these
characteristics. - The combination of good individuals can produce
better adapted individuals.
66
67Genetic algorithms
- To solve a problem via GAs requires
- The representation of the states (individuals)
- Each individual is represented as a string over a
finite alphabet (usually, a string of 0s and 1s) - A function, which measure the fitness of the
states - Operators, which combine states to obtain new
states - Cross-over and mutation operators
- The size of the initial population
- GAs start with a set of k states randomly
generated - A strategy to combine individuals
67
68Genetic algorithms representation of individuals
- The coding of individuals defines the size of the
search space and the kind of operators. - If the state has to specify the position of n
queens, each one in a column of n squares,
nlog2n bits are required (8, in the example). - The state could be represented also as n digits
1,n.
68
69Genetic algorithms fitness function
- A fitness function is calculated for each state.
- It should return higher values for better states.
- In the n-queens problem, the number of
non-attacking pairs of queens could be used. - In the case of 4 queens, the fitness function has
a value of 6 for a solution.
70Genetic algorithms operators
- The combination of individuals is carried out via
cross-over operators. - The basic operator is crossing over one (random)
point and combining using that point as reference
71Genetic algorithms operators
- Other possible exchange-operators
- Crossing over 2 points
- Random exchange of bits
- Ad hoc operators
- Mutation operators
- Analogy with gene combination
- Sometimes the information of part of the genes
randomly changes - A typical case (in the case of binary strings)
consists of changing the sign of each bit with
some (very low) probability.
72Genetic algorithms combination
- Each iteration of the search is a new generation
of individuals - The size of the population is in general constant
(N). - To get to the following population, the
individuals that reproduce have to be chosen
(intermediate population).
73Genetic algorithms combination
- Selection of individuals
- Individuals are selected according to their
fitness function value - N random tournaments of pairs of individuals,
choosing the winner of each one -
- There will always be individuals represented more
than once and other individuals not represented
in the intermediate population
74Genetic algorithms algorithm
- Steps of the basic GA algorithm
- N individuals from current population are
selected to form the intermediate population
(according to some predefined criteria). - Individuals are paired and for each pair
- The crossover operator is applied with
probability P_c and two new individuals are
obtained. - New individuals are mutated with a probability
P_m. - The resulting individuals form the new population.
75Genetic algorithms algorithm
- The process is iterated until the population
converges or a specific number of iteration has
passed. - The crossover probability influences the
diversity of the new population. - The mutation probability is always very low.
76Genetic algorithms 8-queens problem
- An 8-queens state must specify the positions of 8
queens, each in a column of 8 squares, and so
requires 8 x log 8 24 bits. - Alternatively, the state can be represented as 8
digits, each in the range from 1 to 8. - Each state is rated by the evaluation function or
the fitness function. - A fitness function should return higher values
for better states, so, for the 8-queens problem
the number of non-attacking pairs of queens is
used (28 for a solution).
2
76
77Genetic algorithms 8-queens problem
- The initial population in (a)
- is ranked by the fitness function in (b),
- resulting in pairs for mating in (c).
- They produce offspring in (d),
- which are subject to mutation in (e).
77
78Genetic algorithms
- Like stochastic beam search, GAs combine
- an uphill tendency
- random exploration
- exchange of information among parallel search
threads - The primary advantage of GAs comes from the
crossover operation. - Yet it can be shown mathematically that, if the
genetic code is permuted initially in a random
order, crossover conveys no advantage.
78
79Genetic algorithms application
- In practice, GAs have had a widespread impact on
optimization problems, such as - circuit layout
- scheduling
- At present, it is not clear whether the appeal of
GAs arises - from their performance or
- from their aesthetically pleasing origins in the
theory of evolution.
79