Title: Heuristic Search
1Heuristic Search
- Foundations of Artificial Intelligence
2Announcement
- Mirror Site Now Available
- http//facweb.cs.depaul.edu/mobasher/classes/CS480
/ - The old site on maya.cs.depaul.edu will soon be
de-commissioned.
3Topics
- Heuristic Search
- What is a heuristic
- Best-First Search and Hill-Climbing
- A Search
4Heuristic Search
- Problem with uniform cost search
- We are only considering the cost so far, not the
expected cost of getting to the goal node - But, we dont know before hand the cost of
getting to the goal from a previous state - Solution
- Need to estimate for each state the cost of
getting from there to a goal state - Use heuristic information to guess which nodes
to expand next - a heuristic is in the form of an evaluation
function based on domain-specific information
related to the problem. - gives us a way to evaluate a node locally based
on an estimate of the cost to get from the node
to a goal node (the idea is to find the least
cost path to a goal node).
5Evaluation Functions
- h(n) is the heuristic functiong(n) cost of the
best path found so far between the initial node
and n - f(n) h(n) ? greedy best-first search
- f(n) g(n) h(n) ? A search
6Best-First Search
- Basic Idea always expand the node that minimizes
(or maximizes) the evaluation function f(n) - Greedy strategy f(n) h(n), where h(n)
estimates the cost of getting from the node n to
the goal - if we keep nodes in memory (on the queue) for
backtracking, then this is called (Greedy)
Best-First search if no queue and we stop as
soon as f(n) is worse for the children than the
parent, then this is called Hill-Climbing. - What happens if always try at each step to move
closer to the goal node?
The BFS algorithm in this case will find the
longer solution path, since it will begin by
moving forward and then be committed to this
choice. What about hill-climbing?
7Hill-Climbing
It is simply a loop that continually moves in the
direction of the best value. No search tree is
maintained. One important refinement is that
when there is more than one best successor to
choose from, the algorithm can select among them
at random.
This simple policy has three well-known
drawbacks 1. Local Maxima a local maximum
as opposed to global maximum. 2. Plateaus An
area of the search space where evaluation
function is flat, thus requiring random
walk. 3. Ridge Where there are steep
slopes and the search direction is not
towards the top but towards the side.
8Best-First Search
- The evaluation function f maps each search node n
to positive real number f(n) - Traditionally, the smaller f(n), the more
promising n - Best-first search sorts the search queue at each
step in increasing order of f - random order is assumed among nodes with equal
values of f
9Best-First Search
- The evaluation function f maps each search node n
to positive real number f(n) - Traditionally, the smaller f(n), the more
promising n - Best-first search sorts the search queue at each
step in increasing order of f - random order is assumed among nodes with equal
values of f
Best only refers to the value of f, not to the
quality of the actual path. Best-first search
does not generate optimal paths in general
10Best-First Search Example (Romania)
- Suppose we dont know the actual distances
beforehand, but can figure out the straight line
distances from a map
11Best-First Search Example (Romania)
- Suppose we dont know the actual distances
beforehand, but can figure out the straight line
distances from a map - Heuristic evaluation function
- h(n) straight-line distance between n
- and Bucharest
- h(n) is a heuristic because it is an estimate of
the actual cost of getting from n to the goal - Note that h(goal) 0 always
12Greedy Best-First Search
h(n) 366
Arad
lt Arad lt
13Greedy Best-First Search
h(n) 366
Arad
Zerind
Sibiu
h(n) 253
h(n) 329
Timisoara
h(n) 374
lt Sibiu, Timisoara, Zerind lt
14Greedy Best-First Search
h(n) 366
Arad
Zerind
Sibiu
h(n) 253
h(n) 329
Timisoara
h(n) 374
193
178
366
380
Fagaras
Oradea
Arad
Rimnicu
lt Fagaras, Rimnicu, Timisoara, Zerind, Oradea
lt
15Greedy Best-First Search
h(n) 366
Arad
Zerind
Sibiu
h(n) 253
h(n) 329
Timisoara
h(n) 374
193
178
380
Fagaras
Oradea
Rimnicu
253
h(n) 0
Sibiu
Bucharest
lt Bucharest, Rimnicu, Timisoara, Zerind, Oradea
lt
16Greedy Best-First Search
h(n) 366
Arad
Zerind
Sibiu
h(n) 253
h(n) 329
Timisoara
h(n) 374
193
178
380
Fagaras
Oradea
Rimnicu
Actual cost of the solution Arad gt Sibiu gt
Fagaras gt Bucharest is 140 99 211
450 But, consider the path Arad gt Sibiu gt
Rimnicu gt Pitesti gt Bucharest with the cost 418
So we got a suboptimal solution
h(n) 0
253
Sibiu
Bucharest
17Heuristics for 8-Puzzle Problem
- In total, there are a possible of 9! or 362,880
possible states. - However, with a good heuristic function, it is
possible to reduce this state to less than 50. - Some possible heuristics for 8-Puzzle
- h1(n) no. of misplaced tiles
- may have many plateaus (indistinguishable states)
- doesnt captures the number of moves to get to
the right place - h2(n) sum of Manhattan distances (i.e., no. of
squares from desired location of each tile) - doesnt capture the importance of sequencing
tiles (putting them in the right order)
18Heuristics for 8-Puzzle Problem
1
2
3
5
4
6
1
8
8
4
6
3
7
2
7
5
s start state
g goal state
- h1(s) 7
- h2(s) 4 2 2 3 3 0 2 2 18
19Part of the search tree generated by Best-First
search using h2 sum of Manhattan distances.
20Heuristics Search in 8-Puzzle
Part of the search tree generated by Best-First
search using h2 sum of Manhattan
distances. What will happen with hill-climbing?
Goal Node
Initial Node
21Properties of Best-First (Greedy) Search
- Complete?
- No - can get stuck in loops
- e.g., Iasi gt Neamt gt Iasi gt Neamt gt
- It is complete in finite space with
repeated-state checking - Time Complexity
- In worst case O(bm)
- but good heuristic can give dramatic improvement
- Space Complexity
- In worst case O(bm)
- keeps all nodes in memory
- Optimal No
22A Search(most popular algorithm in AI)
- Basic Idea avoid expanding paths that are
already expensive - Evaluation function f(n) g(n) h(n)
- g(n) cost so far to reach n
- h(n) estimated cost to goal from n
- f(n) estimated total cost of path through n to
reach the goal - Admissible heuristics
- i.e., h(n) h(n), for all n, where h(n) is
the true cost from n - Ex straight-line distance never overestimates
the actual road distance - Ex h1 and h2 is 8-puzzle never overestimate the
actual no. of moves - A search is optimal (finds lowest cost solution)
if h(n) is admissible - however, the number of nodes expanded depends on
how good the heuristic is - best case h(n) h(n) for all n ? A will find
the best solution with no search - if h(n) gt h(n) for some n, then A
- might still work, but might not find any solution
at all
23A Search
f(n) 366
Arad
140
75
118
h(n) 253 f(n) 393
h(n) 374 f(n) 449
h(n) 329 f(n) 447
Zerind
Sibiu
Timisoara
80
140
99
151
Fagaras
Oradea
Arad
f(n) 413
Rimnicu
f(n) 417
f(n) 646
f(n) 661
146
97
80
Pitesti
Craiova
Sibiu
f(n) 526
f(n) 415
f(n) 553
24A Search
...
h(n) 253 f(n) 393
Sibiu
140
151
99
80
Fagaras
Oradea
Arad
f(n) 413
Rimnicu
f(n) 417
f(n) 661
f(n) 646
146
97
80
Pitesti
Sibiu
f(n) 415
Craiova
f(n) 553
f(n) 526
138
101
97
Craiova
Bucharest
Rimnicu
f(n) 418
f(n) 607
f(n) 615
25A Search
...
h(n) 253 f(n) 393
Sibiu
140
99
80
151
Fagaras
Oradea
f(n) 417
Arad
f(n) 413
Rimnicu
f(n) 526
f(n) 646
99
211
146
97
80
Pitesti
Sibiu
f(n) 415
Bucharest
Craiova
Sibiu
f(n) 553
f(n) 450
f(n) 526
f(n) 591
138
101
97
Craiova
Bucharest
Rimnicu
f(n) 418
f(n) 607
f(n) 615
26A Search
...
h(n) 253 f(n) 393
Sibiu
140
99
80
151
Fagaras
Oradea
f(n) 417
Arad
f(n) 413
Rimnicu
f(n) 526
f(n) 646
99
211
146
97
80
Pitesti
Sibiu
f(n) 415
Bucharest
Craiova
Sibiu
f(n) 553
f(n) 450
f(n) 526
f(n) 591
138
101
97
Craiova
Bucharest
Rimnicu
f(n) 607
f(n) 615
f(n) 418
27A search for an instance of 8-puzzle with h1
(sum of misplaced tiles). g(n) assumes each move
has a cost of 1. Here we assume repeated state
checking.
f(n) g(n) h(n)
28A search for an instance of 8-puzzle with h1
(sum of misplaced tiles). g(n) assumes each move
has a cost of 1. Here we assume repeated state
checking.
Order of expansion
f(n) g(n) h(n)
29A search for an instance of 8-puzzle with h1
(sum of misplaced tiles). g(n) assumes each move
has a cost of 1. Here we assume repeated state
checking.
f(n) g(n) h(n)
30A search for an instance of 8-puzzle with h1
(sum of misplaced tiles). g(n) assumes each move
has a cost of 1. Here we assume repeated state
checking.
f(n) g(n) h(n)
31A search for an instance of 8-puzzle with h1
(sum of misplaced tiles). g(n) assumes each move
has a cost of 1. Here we assume repeated state
checking.
f(n) g(n) h(n)
32A search for an instance of 8-puzzle with h1
(sum of misplaced tiles). g(n) assumes each move
has a cost of 1. Here we assume repeated state
checking.
f(n) g(n) h(n)
33A search for an instance of 8-puzzle with h1
(sum of misplaced tiles). g(n) assumes each move
has a cost of 1. Here we assume repeated state
checking.
f(n) g(n) h(n)
34A search for an instance of 8-puzzle with h1
(sum of misplaced tiles). g(n) assumes each move
has a cost of 1. Here we assume repeated state
checking.
f(n) g(n) h(n)
Note at level 2 there are two nodes listed with
f(n) 5. Depending on which node is we put in
front of the queue, the algorithm will either
expand 6 or 7 nodes. Here we have assumed the
worse case, and thus the tree shows that 6 nodes
were expanded
7
35A and Repeated States
- The heuristic h is clearly admissible
36A and Repeated States
?
If we discard this new node, then the
search algorithm expands the goal node next
and returns a non-optimal solution
37A and Repeated States
290
Instead, if we do not discard nodes revisiting
states, the search terminates with an optimal
solution
38A and Repeated States
- It is not harmful to discard a node revisiting a
state if the new path to this state has higher
cost than the previous one - A remains optimal, but the size of the search
tree can still be exponential in the worst case - Fortunately, for a large family of admissible
heuristics consistent heuristics there is a
much easier way of dealing with revisited states
39A and Consistency (Monotonicity)
- A heuristic h(n) is consistent if for every node
n and every successor of n, succ(n) - h(n) h(succ(n)) cost( n ? succ(n) )
- i.e., decrease in heuristic value due to an
action is never more than the cost of the action - All consistent heuristics are admissible
- For an admissible heuristic
- the values of f(n) along any path are
non-decreasing (monotonincity) - A expands nodes according to non-decreasing
order of f(n)
40A and Consistency (Monotonicity)
goal
STATE(N)
- h1(N) number of misplaced tiles
- h2(N) sum of the (Manhattan) distances
of every tile to its goal position - are both consistent
41A and Repeated States
Theorem If h is consistent, then whenever A
expands a node, it has already found an optimal
path to this nodes state
- Dealing with Repeated States
- If a newly generated state was previously
expanded, then discard the new state - If multiple (unexpanded) instances of a state end
up on the queue, we only keep the instance that
has the smallest f value.
42A and Informedness
- Finding good heuristics for a problem
- relax restrictions on operators in general, the
cost of an exact solution to a relaxed problem is
a good heuristic for the original problem - e.g., sum of Manhattan distances in 8-puzzle
gives the exact solution to the relaxed version
of the problem where a tile can move in any
direction, even onto occupied squares - use statistical information from training
examples to predict the correct heuristic value
for the nodes (this may result in an inadmissible
heuristic function)
43A and Informedness (Cont.)
- Multiple admissible heuristics?
- given admissible heuristics h1 and h2, if h1(n) gt
h2(n), for all n, then h2 dominates h1 - the dominating admissible heuristic usually
expands fewer nodes (it is more informed) - if there are several admissible heuristics none
of which dominates any others, we can take the
composite heuristic h(n) max ( h1(n), h2(n),
, hk(n) ) - A Efficiency and Informedness
- A expands every node n for which f(n) lt f
- i.e., every node with h(n) lt f - g(n) will be
expanded - so, if h1(n) ³ h2(n), for all n, every node
expanded by h1 must also be expanded by h2
Moral of the story heuristic functions with
higher values work better so long as they are
admissible (in other words, we want our heuristic
to get as close as possible to h).
44Efficiency of A
Comparison of search costs and effective
branching factors for the Iterative Deepening
search and A algorithms with h1 and h2 for
8-puzzle. d is the average depth of the search
tree. Data are averaged over 100 instances of the
problem, for various solution lengths.
45IDA Algorithm
- A potential problem with A is memory
- since it reduces to breadth-first search (when h
0), it will potentially use memory that is
exponential to the depth of the optimal goal node - iterative deepening can again help, but now we
prune the nodes for which the nearest goal node
can be shown to lie below the cutoff depth - note that individual iterations perform a
depth-first search heuristic function is used to
prune nodes, but not to determine the order of
node expansion
46IDA Algorithm
- Sketch of IDA Algorithm
- 1. Set c 1 this is the current cutoff value.
- 2. Set L to be the list of initial nodes.
- Let n be the first node on L.
- 4. If L is empty, increment c and return to step
2. - 5. If n is a goal node, stop and return the path
from initial node to n. - 6. Otherwise, remove n from L. Add to front of L
every child node n of n - for which f(n) c. Return to step 3.
47When to Use Search Techniques?
- The search space is small, and
- No other technique is available, or
- Developing a more efficient technique is not
worth the effort - The search space is large, and
- No other available technique is available, and
- There exist good heuristics
48Exercise
- Consider the problem of solving a cross-word
puzzle - initial state is an empty board with some
possible blocked cells - a goal state is board configuration filled in
with legal English words - How can this problem be viewed as a search
problem? What are the operators? How can we
measure path costs? Etc. - Assuming we have dictionary of 100,000 words,
what would be a good (uninformed) search strategy
to use? Why? - What might be some good heuristics to use for
this problem? - How effective might hill-climbing strategies work
in solving this problem? How can we handle the
local minima problem? Propose a solution and
discuss its effectiveness.