Origin of Heuristic Functions

About This Presentation

Title:

Origin of Heuristic Functions

Description:

Heuristic from Relaxed Models Example: Road navigation STRIPS - Eight Puzzle Example Now in order to construct a simplified or relaxed problem we only have to remove ... – PowerPoint PPT presentation

Number of Views:344

Avg rating:3.0/5.0

Slides: 84

Provided by: iseBguAc

Category:

more less

Transcript and Presenter's Notes

Title: Origin of Heuristic Functions

1
Chapter 6 Origin of Heuristic Functions
2
Heuristic from Relaxed Models

A heuristic function returns the exact cost of
reaching a goal in a simplified or relaxed
version of the original problem.
This means that we remove some of the constraints
of the problem.
Removing constraints adding edges

3
Example Road navigation

A good heuristic the straight line.
We remove the constraint that we have to move
along the roads
We are allowed to move in a straight line between
two points.
We get a relaxation of the original problem.
In fact, we added edges of the complete graph

4
Example 2 - TSP problem

We can describe the problem as a graph with 3
constraints
1) Our tour covers all the cities.
2) Every node has a degree two
an edge entering the node and
an edge leaving the node.
3) The graph is connected.
If we remove constraint 2
We get a spanning graph and the optimal
solution to this problem is a MST (Minimum
Spanning Tree).
If we remove constraint 3
Now the graph isnt connected and the optimal
solution to this problem is the solution to the
assignment problem.

5
Example 3- Tile Puzzle problem

One of the constraints in this problem is that a
tile can only slide into the position occupied by
a blank.
If we remove this constraint we allow any tile to
be moved horizontally or vertically position.
This is the Manhattan distance to its goal
location.

6
The STRIPS Problem formulation

We would like to derive such heuristics
automatically.
In order to do that we need a formal description
language that is richer than the problem space
graph.
One such language is called STRIPS.
In this language we have predicates and
operators.
Lets see a STRIPS representation of the Eight
Puzzle Problem

7
STRIPS - Eight Puzzle Example

On(x,y) tile x is in location y.
Clear(z) location z is clear.
Adj(y,z) location y is adjacent to location
z.
Move(x,y,z) move tile x from location y to
location z.
In the language we have
A precondition list - for example to execute
move(x,y,z) we
must have On(x,y)
Clear(z)
Adj(y,z)
An add list - predicates that werent true before
the operator and now after the
operator was executed are true.
A delete list - a subset of the preconditions,
that now after the
operator was executed arent true anymore.

8
STRIPS - Eight Puzzle Example

Now in order to construct a simplified or relaxed
problem we only have to remove some of the
preconditions.
For example - by removing Clear(z) we allow tiles
to move to adjacent locations.
In general, the hard part is to identify which
relaxed problems have the property that their
exact solution can be efficiently computed.

9
Admissibility and Consistency

The heuristics that are derived by this method
are both admissible and consistent.

Note The cost of the simplified graph should be
as close as possible to the original graph.

Admissibility means that the simplified graph has
an equal or lower cost than the lowest - cost
path in the original graph.

Consistency means that a heuristic h is
consistent for every neighbor n of n,
when h(n) is the actual optimal cost of reaching
a goal in the graph of the relaxed problem.

h(n) ? c(n,n)h(n)
10
Method 2 pattern data base

A different method for abstracting and relaxing
to problem to get a simplified problem.
Invented in 1996 by Culberson Schaeffer

11
optimal path search algorithms

For small graphs provided explicitly, algorithm
such as Dijkstras shortest path, Bellman-Ford or
Floyd-Warshal. Complexity O(n2).
For very large graphs , which are implicitly
defined, the A algorithm which is a best-first
search algorithm.

12
Best-first search schema

sorts all generated nodes in an OPEN-LIST and
chooses the node with the best cost value for
expansion.
generate(x) insert x into OPEN_LIST.
expand(x) delete x from OPEN_LIST and generate
its children.
BFS depends on its cost (heuristic) function.
Different functions cause BFS to expand different
nodes..

20
25
30
35
30
35
35
40
Open-List
13
Best-first search Cost functions

g(x) Real distance from the initial state to x
h(x) The estimated remained distance from x to
the goal state.
ExamplesAir distance
Manhattan Dinstance
Different cost combinations of g and h
f(x)level(x) Breadth-First Search.
f(x)g(x) Dijkstras algorithms.
f(x)h(x) Pure Heuristic Search (PHS).
f(x)g(x)h(x) The A algorithm (1968).

14
A (and IDA)

A is a best-first search algorithm that uses
f(n)g(n)h(n) as its cost function.
f(x) in A is an estimation of the shortest path
to the goal via x.
A is admissible, complete and optimally
effective. Pearl 84
Result any other optimal search algorithm will
expand at least all the nodes expanded by A

Breadth First Search
A
15
Domains

15 puzzle
1013 states
First solved by Korf 85 with IDA and Manhattan
distance
Takes 53 seconds
24 puzzle
1024 states
First solved by Korf 96
Takes two days

16
Domains

Rubiks cube
1019 states
First solved by Korf 97
Takes 2 days to solve

17
(n,k) Pancake Puzzle

An array of N tokens (Pancakes)
Operators Any first k consecutive
tokens can be reversed.
The 17 version has 1013 states
The 20 version has 1018 states

18
(n,k) Top Spin Puzzle

n tokens arranged in a ring
States any possible permutation of the tokens
Operators Any k consecutive tokens can be
reversed
The (17,4) version has 1013 states
The (20,4) version has 1018 states

19
4-peg Towers of Hanoi (TOH4)

Harder than the known 3-peg Towers of Hanoi
There is a conjecture about the length of optimal
path but it was not proven.
Size 4k

20
How to improve search?

Enhanced algorithms
Perimeter-search Delinberg and Nilson 95
RBFS Korf 93
Frontier-search Korf and Zang 2003
Breadth-first heuristic search Zhou and Hansen
04
? They all try to better explore the search tree.
Better heuristics more parts of the search tree
will be pruned.

21
Better heuristics

In the 3rd Millennium we have very large
memories.
We can build large tables.
For enhanced algorithms large open-lists or
transposition tables. They store nodes
explicitly.
A more intelligent way is to store general
knowledge. We can do this with heuristics

22
Subproblems-Abstractions

Many problems can be abstracted into subproblems
that must be also solved.
A solution to the subproblem is a lower bound on
the entire problem.

Example Rubiks cube Korf 97
Problem ? 3x3x3 Rubiks cube
Subproblem ? 2x2x2 Corner cubies.

23
Pattern Databases heuristics

A pattern database (PDB) is a lookup table that
stores solutions to all configurations of the
sub-problem (patterns)
This PDB is used as a heuristic during the search

88 Million states
1019 States
Search space
Mapping/Projection
Pattern space
24
Non-additive pattern databases

Fringe pattern database Culberson Schaeffer
1996.
Has only 259 Million states.
Improvement of a factor of 100 over Manhattan
Distance

25
Example - 15 puzzle
1
1 4 5 10
4 5
8 11 9 12
2
13 15
8 9 10 11
3 6 7 14
12 13 14 15

How many moves do we need to move tiles 2,3,6,7
from locations 8,12,13,14 to their goal locations
The solution to this is located in
PDB812131418

26
Example - 15 puzzle
2
3 6 7

How many moves do we need to move tiles 2,3,6,7
from locations 8,12,13,14 to their goal locations
The solution to this is located in
PDB812131418

27
Disjoint Additive PDBs (DADB)

If you have many PDBS, take their maximum

Values of disjoint databases can be added and are
still admissible Korf Felner AIJ-02,
Felner, Korf
Hanan JAIR-04
Additivity can be applied if the cost of a
subproblem is composed from costs of objects from
corresponding pattern only

28
DADBTile puzzles
5-5-5
6-6-3
7-8
6-6-6-6
Korf, AAAI 2005
Memory Time Nodes Value Heuristic Puzzle
3-tera-bytes 28 days 1013 Breadth-FS 15
0 53.424 401,189,630 36.942 Manhattan 15
3,145 0.541 3,090,405 41.562 5-5-5 15
33,554 0.163 617,555 42.924 6-6-3 15
576,575 0.034 36,710 45.632 7-8 15
242,000 2 days 360,892,479,671 6-6-6-6 24
29
Heuristics for the TOH

Infinite peg heuristic (INP) Each disk moves to
its own temporary peg.
Additive pattern databases
Felner, Korf Hanan, JAIR-04

30
Additive PDBS for TOH4

Partition the disks into disjoint sets
Store the cost of the complete pattern space of
each set in a pattern database.
Add values from these PDBs for the heuristic
value.
The n-disk problem contains 4n states
The largest database that we stored was of 14
disks which needed 414256MB.

6
10
31
TOH4 results
16 disks
seconds Nodes Avg h h(s) solution Heuristic
memory full Infinite peg
48 134,653,232 75.78 102 161 Static 13-3
14 36,479,151 89.10 114 161 Static 14-2
21 12,872,732 95.52 114 161 Dynamic 14-2
17 disks
2,501 238,561,590 97.05 116 183 Dynamic 14-3

The difference between static and dynamic is
covered in Felner, Korf Hanan JAIR-04

32
Best Usage of Memory

Given 1 giga byte of memory, how do we best use
it with pattern databases?
Holte, Newton, Felner, Meshulam and Furcy,
ICAPS-2004 showed that it is better to use many
small databases and take their maximum instead of
one large database.
We will present a different (orthogonal) method
Felner, Mushlam Holte AAAI-04.

33
Compressing pattern database Felner et al
AAAI-04

Traditionally, each configuration of the pattern
had a unique entry in the PDB.
Our main claim ?
Nearby entries in PDBs are highly correlated
!!
We propose to compress nearby entries by storing
their minimum in one entry.
We show that ?
most of the knowledge is preserved
Consequences Memory is saved, larger patterns
can be used ? speedup in search is obtained.

34
Cliques in the pattern space

The values in a PDB for a clique are d or d1
In permutation puzzles cliques exist when only
one object moves to another location.

d
G
d1
d

Usually they have nearby entries in the PDB
A44444

A clique in TOH4
35
Compressing cliques

Assume a clique of size K with values d or d1
Store only one entry (instead of K) for the
clique with the minimum d. Lose at most 1.
A44444 A44441
Instead of 4p we need only 4(p-1) entries.
This can be generalized to a set of nodes with
diameter D. (for cliques D1)
A44444 A44411
In general compressing by k disks reduces memory
requirements from 4p to 4(p-k)

36
TOH4 results 16 disks (142)
Mem MB Time Nodes D Avg H H(s) PDB
256 14.34 36,479,151 0 87.03 116 14/0 2
64 14.69 37,964,227 1 86.48 115 14/1 2
16 15.41 40,055,436 3 85.67 113 14/2 2
4 16.94 44,996,743 5 84.44 111 14/3 2
1 17.36 45,808,328 9 82.73 107 14/4 2
0.256 23.78 61,132,726 13 80.84 103 14/5 2

Memory was reduced by a factor of 1000!!! at a
cost of only a factor of 2 in the search effort.

37
TOH4 larger versions
Memory was reduced by a factor of 1000!!! At a
cost of only a factor of 2 in the search
effort. Lossless compressing is noe efficient in
this domain.
Mem Time Nodes Avg H Type PDB size
256 gt421 gt393,887,923 81.5 static 14/0 3 17
256 2,501 238,561,590 87.0 dynamic 14/0 3 17
256 83 155,737,832 103.7 static 15/1 2 17
256 7 17,293,603 123.8 static 16/2 1 17
256 463 380,117,836 123.8 static 16/2 2 18

For the 17 disks problem a speed up of 3 orders
of magnitude is obtained!!!
The 18 disks problem can be solved in 5 minutes!!

38
Tile Puzzles
Goal State
Clique

Storing PDBs for the tile puzzle
(Simple mapping) A multi dimensional array ?
A1616161616 size1.04Mb
(Packed mapping) One dimensional array ?
A1615141312 size 0.52Mb.
Time versus memory tradeoff !!

39
15 puzzle results

A clique in the tile puzzle is of size 2.
We compressed the last index by two ?
A161616168

Avg H Mem Time Nodes compress Type PDB
44.75 576,575 0.081 136,288 No packed 1 7-8
45.63 576,575 0.034 36,710 No packed 1 7-8
43.64 57,657 0.232 464,977 No packed 1 7-7-1
43.64 536,870 0.058 464,977 No simple 1 7-7-1
43.02 268,435 0.069 565,881 Yes simple 1 7-7-1
43.98 536,870 0.021 147,336 Yes simple 2 7-7-1
44.92 536,870 0.016 66,692 Yes simple 2 7-7-1

40

Dual lookups in pattern databases Felner et al,
IJCAI-04

41
Symmetries in PDBs

Symmetric lookups were already performed by the
first PDB paper of Culberson Schaeffer 96
examples
Tile puzzles reflect the tiles
about the main diagonal.
Rubiks cube rotate the cube
We can take the maximum among the different
lookups
These are all geometrical symmetries
We suggest a new type of symmetry!!

7
8
8
7
42
Regular and dual representation

Regular representation of a problem
Variables objects (tiles, cubies etc,)
Values locations
Dual representation
Variables locations
Values objects

43
Regular vs. Dual lookups in PDBs

Regular question
Where are tiles 2,3,6,7 and how many moves
are needed to gather them to their goal
locations?
Dual question
Who are the tiles in locations 2,3,6,7 and
how many moves
are needed to distribute them to their goal
locations?

44
Regular and dual lookups

Regular lookup PDB8,12,13,14
Dual lookup PDB9,5,12,15

45
Regular and dual in TopSpin

Regular lookup for C PDB1,2,3,7,6
Dual lookup for C PDB1,2,3,8,9

46
Dual lookups

Dual lookups are possible when there is a
symmetry between locations and objects
Each object is in only one location and each
location occupies only one object.
Good examples TopSpin, Rubiks cube
Bad example Towers of Hanoi
Problematic example Tile Puzzles

47
Inconsistency of Dual lookups
Consistency of heuristics h(a)-h(b) lt
c(a,b)

Example Top-Spin
c(b,c)1

Both lookups for B
PDB1,2,3,4,50
Regular lookup for C PDB1,2,3,7,61
Dual lookup for C PDB1,2,3,8,92

Regular Dual
b 0 0
c 1 2
48
Traditional Pathmax

children inherit f-value from their parents if
it makes them larger

g1 h4 f5
Inconsistency
g2 h2 f4
g2 h3 f5
Pathmax
49
Bidirectional pathmax (BPMX)
h-values
h-values
2
4
BPMX
5
1
5
3

Bidirectional pathmax h-values are propagated in
both directions decreasing by 1 in each edge.
If the IDA threshold is 2 then with BPMX the
right child will not even be generated!!

50
Results (17,4) TopSpin puzzle
regular dual BPMX nodes time
1 0 ---- 40,019,429 67.76
0 1 no 7,618,805 15.72
0 1 yes 1,397,614 2.93
4 4 yes 82,606 0.94
17 17 yes 27,575 1.34

Nodes improvement (17r17d) 1451
Time improvement (4r4d) 72
We also solved the (20,4) TopSpin version.

51
Results Rubiks cube

Data on 1000 states with 14 random moves
PDB of 7-edges cubies

regular dual BPMX nodes time
1 0 ---- 90,930,662 28.18
0 1 no 19,653,386 7.38
0 1 yes 8,315,116 3.24
4 4 yes 615,563 0.51
24 24 yes 362,927 0.90

Nodes improvement (24r24d) 250
Time improvement (4r4d) 55

52
Results Rubiks cube

With duals we improved Korfs results on random
instances by a factor of 1.5 using exactly the
same PDBs.

53
Results tile puzzles
Heuristic BPMX Value nodes time
Manhattan ---- 36.94 401,189,630 53.424
R ---- 44.75 136,289 0.081
RR ---- 45.63 36,710 0.034
RRDD yes 46.12 18,601 0.022

With duals, the time for the 24 puzzle drops
from 2 days to 1 day.

54
Discussion

Results for the TopSpin and Rubiks cube are
better than those of the tile puzzles
Dual PDB lookups and BPMX cutoffs are more
effective if each operators changes larger part
of the states.
This is because the identity of the objects being
queried in consecutive states are dramatically
changed

55
Summary

Dual PDB lookups
BPMX cutoffs for inconsistent heuristics
State of the art solvers.

56
Future work

More compression
Duality in search spaces
Which and how many symmetries to use
Other sources of inconsistencies
Better ways for propagating inconsistencies

57
Duality ? Motivation

What is the relation between state S and states
S1 and S2?

S
3
1
4
2
5
5
2
4
1
3
S1

Geometrical symmetries!! Reversing and/or
rotating

S2
1
4
2
5
3

And what is the relation between S and Sd??

Sd
2
4
1
3
5
58
Symmetries in PDBs

Symmetric lookups
Tile puzzles reflect the tiles
about the main diagonal.
Rubiks cube rotate the cube
We can take the maximum among the different
lookups
These are all geometrical symmetries
We suggest a new type of symmetry!!

7
8
8
7
59
Duality definition 1

Let S be a state.
Let ? be a permutation such that ?(S)G
Define ?(G)
consequences
? ( )G
The length of the optimal path from S to G and
from S to G is identical

S
-1
d
S
d
d
An admissible heuristic for S is also admissible
for S
60
Regular and dual representation

Regular representation of a problem
Variables objects (tiles, cubies etc,)
Values locations
Dual representation
Variables locations
Values objects

61
Duality definition 2

Definition 2 For a state S we flip the roles of
variables and objects
Assume a vector lt3,1,4,2gt
Regular representation
Dual representation

3
1
4
2
2
4
1
3
62
Duality

Claim Definition 1 and definition 2 are
equivalent
Proof Assume that in S, object j is in location
i and that ?(i)j.
Applying ? for the first time (on S) will move
object j to location j.
Applying ? for the second time (on G) will move
object i to location j

63
Using duality

Dual lookup We can take the heuristic of the
dual state and use it for the regular state.
In particular we can perform a PDB lookup for the
dual state
Dual Search
This is a novel search algorithm which can be
constructed for any known search algorithm

64
Dual Search

When the search arrives at a state S, we also
look at its dual state S.
We might consider to JUMP and continue the search
from S towards the goal.
This is a novel version of bidirectional search

d
d
65
Example
S
S
?
G
G
?
(a) No Jumps
(b) One Jump
Bidirectional Search
Traditional Search
Construction of the solution path is possible by
applying usual backtracking with some simple
modifications.
66
When to jump

At every node, a decision should be made whether
to continue the search from S or to jump to S
Jumping Policies
JIL Jump if larger
JOR Jump only at the root
J15,J24 Special jumping policies for the 15 and
24 tile puzzles

d
67
Experimental results

Rubiks cube 7-edges PDB. 1000 problem
instances.

Heuristic Search Policy Nodes Time
r IDA - 90,930,662 28.18
d IDA - 8,315,116 3.24
max(r,d) IDA - 2,997,539 1.34
max(r,d) DIDA JIL 2,697,087 1.16
max(r,d) DIDA JOR 2,464,685 1.02
68
Experimental results

16 Pancake problem 9-tiles PDB. 100 problem
instances.

Heuristic Search Policy Nodes Time
r IDA - 342,308,368,717 284,054
d IDA - 14,387,002,121 12,485
max(r,d) IDA - 2,478,269,076 3,086
max(r,d) DIDA JIL 260,506,693 362
69
Experimental results

15 puzzle 7-8 tiles PDB. 1000 problem instances
from Korf Felner 2002

Heuristic Search Policy Value Nodes Time
r IDA - 44.75 136,289 0.081
Max(r,r) IDA - 45.63 36,710 0.034
max(r,r,d,d) IDA - 46.12 18,601 0.022
max(r,r,d,d) DIDA J15 46.12 13,687 0.018
70
Experimental results

24 puzzle 6-6-6-6 tiles PDB. 50 Problem
instances from Korf Felner 2002

Heuristic Search Policy Nodes
max(r,r) IDA - 43,454,810,045
max(r,r,d,d) IDA - 13,549,943,868
max(r,d) DIDA J24 8,248,769,713
max(r,r,d,d) DIDA J24 3,948,614,947
71
Conclusions

Duality in search spaces
Two way to use duality
1) the dual heuristic
2) the dual search
Improvement in performance

72
Discussion

Why are these domains important??
The ideas presented in this paper are wonderful,
but are they useful in real applications?
(An anonymous referee from IJCAI-05.)

73
Ongoing and future work compressing PDBs

An item for the PDB of tiles (a,b,c,d) is in the
form ltLa, Lb, Lc, Ldgtd
Store the PDBs in a Trie
A PDB of 5 tiles will have a level in the trie
for each tile. The values will be in the leaves
of the trie.
This data-structure will enable flexibility and
will save memory as subtrees of the trie can be
pruned

74
Trie pruninig
Simple (lossless) pruning Fold leaves with
exactly the same values.
No data will be lost.
2
2
2
2
2
75
Trie pruninig

Intelligent (lossy)pruning
Fold leaves/subtrees with are correlated to each
other (many option for this!!)
Some data will be lost.
Admissibility is still kept.

2
2
2
2
4
76
Trie Initial Results
A 5-5-5 partitioning stored in a trie with simple
folding
Mem Nodes/sec Time Nodes H(s) MD PDB
3,145,728 5,150,676 0.6 3,090,405 41.56 36.94 Simple
1,572,480 988,613 3.126 3,090,405 41.56 36.94 Packed
765,778 1,191,826 2.593 3,090,405 41.56 36.94 Trie
77
Neural Networks (NN)

We can feed a PDB into a neural network engine.
Especially, Addition above MD
For each tile we focus on its dx and dy from its
goal position. (i.e. MD)
Linear conflict
dx1 dx2 0
dy1 gt dy21
A NN can learn
these rules

2
1
dy1 2 dy20
78
Neural network

We train the NN by feeding the entire (or part of
the) pattern space.
For example for a pattern of 5 tiles we have 10
features, 2 for each tile.
During the search, given the locations of the
tiles we look them up in the NN.

79
Neural network example
dx4
Layout for the pattern of the tiles 4, 5 and 6
dy4
dx5
4
dy5
dx6
dy6
80
Neural Network problems

We face the problem of overestimating and will
have to bias the results towards underestimating.
We keep the overestimating values in a separate
hash table
Results are encouraging!!

Mem Time Nodes H(s) PDB
1,572,480 0.49 243,290 31.00 Regular
33,611d472w 69.75 454,262 29.67 Neural Network
81
Ongoing and Future WorkDuality