Title: Origin of Heuristic Functions
1Chapter 6 Origin of Heuristic Functions
2Heuristic from Relaxed Models
- A heuristic function returns the exact cost of
reaching a goal in a simplified or relaxed
version of the original problem. - This means that we remove some of the constraints
of the problem. - Removing constraints adding edges
3Example Road navigation
- A good heuristic the straight line.
- We remove the constraint that we have to move
along the roads - We are allowed to move in a straight line between
two points. - We get a relaxation of the original problem.
- In fact, we added edges of the complete graph
4Example 2 - TSP problem
- We can describe the problem as a graph with 3
constraints - 1) Our tour covers all the cities.
- 2) Every node has a degree two
- an edge entering the node and
- an edge leaving the node.
- 3) The graph is connected.
- If we remove constraint 2
- We get a spanning graph and the optimal
solution to this problem is a MST (Minimum
Spanning Tree). - If we remove constraint 3
- Now the graph isnt connected and the optimal
solution to this problem is the solution to the
assignment problem.
5Example 3- Tile Puzzle problem
- One of the constraints in this problem is that a
tile can only slide into the position occupied by
a blank. - If we remove this constraint we allow any tile to
be moved horizontally or vertically position. - This is the Manhattan distance to its goal
location.
6The STRIPS Problem formulation
- We would like to derive such heuristics
automatically. - In order to do that we need a formal description
language that is richer than the problem space
graph. - One such language is called STRIPS.
- In this language we have predicates and
operators. - Lets see a STRIPS representation of the Eight
Puzzle Problem
7STRIPS - Eight Puzzle Example
- On(x,y) tile x is in location y.
- Clear(z) location z is clear.
- Adj(y,z) location y is adjacent to location
z. - Move(x,y,z) move tile x from location y to
location z. - In the language we have
- A precondition list - for example to execute
move(x,y,z) we - must have On(x,y)
- Clear(z)
- Adj(y,z)
- An add list - predicates that werent true before
the operator and now after the
operator was executed are true. - A delete list - a subset of the preconditions,
that now after the - operator was executed arent true anymore.
8STRIPS - Eight Puzzle Example
- Now in order to construct a simplified or relaxed
problem we only have to remove some of the
preconditions. - For example - by removing Clear(z) we allow tiles
to move to adjacent locations. - In general, the hard part is to identify which
relaxed problems have the property that their
exact solution can be efficiently computed.
9Admissibility and Consistency
- The heuristics that are derived by this method
are both admissible and consistent.
- Note The cost of the simplified graph should be
as close as possible to the original graph.
- Admissibility means that the simplified graph has
an equal or lower cost than the lowest - cost
path in the original graph.
- Consistency means that a heuristic h is
consistent for every neighbor n of n, - when h(n) is the actual optimal cost of reaching
a goal in the graph of the relaxed problem.
h(n) ? c(n,n)h(n)
10Method 2 pattern data base
- A different method for abstracting and relaxing
to problem to get a simplified problem. - Invented in 1996 by Culberson Schaeffer
11optimal path search algorithms
- For small graphs provided explicitly, algorithm
such as Dijkstras shortest path, Bellman-Ford or
Floyd-Warshal. Complexity O(n2). - For very large graphs , which are implicitly
defined, the A algorithm which is a best-first
search algorithm.
12Best-first search schema
- sorts all generated nodes in an OPEN-LIST and
chooses the node with the best cost value for
expansion. - generate(x) insert x into OPEN_LIST.
- expand(x) delete x from OPEN_LIST and generate
its children. - BFS depends on its cost (heuristic) function.
Different functions cause BFS to expand different
nodes..
20
25
30
35
30
35
35
40
Open-List
13Best-first search Cost functions
- g(x) Real distance from the initial state to x
- h(x) The estimated remained distance from x to
the goal state. - ExamplesAir distance
- Manhattan Dinstance
- Different cost combinations of g and h
- f(x)level(x) Breadth-First Search.
- f(x)g(x) Dijkstras algorithms.
- f(x)h(x) Pure Heuristic Search (PHS).
- f(x)g(x)h(x) The A algorithm (1968).
14A (and IDA)
- A is a best-first search algorithm that uses
f(n)g(n)h(n) as its cost function. - f(x) in A is an estimation of the shortest path
to the goal via x. - A is admissible, complete and optimally
effective. Pearl 84 - Result any other optimal search algorithm will
expand at least all the nodes expanded by A
Breadth First Search
A
15Domains
- 15 puzzle
- 1013 states
- First solved by Korf 85 with IDA and Manhattan
distance - Takes 53 seconds
- 24 puzzle
- 1024 states
- First solved by Korf 96
- Takes two days
16Domains
- Rubiks cube
- 1019 states
- First solved by Korf 97
- Takes 2 days to solve
17(n,k) Pancake Puzzle
- An array of N tokens (Pancakes)
- Operators Any first k consecutive
- tokens can be reversed.
- The 17 version has 1013 states
- The 20 version has 1018 states
18(n,k) Top Spin Puzzle
- n tokens arranged in a ring
- States any possible permutation of the tokens
- Operators Any k consecutive tokens can be
reversed - The (17,4) version has 1013 states
- The (20,4) version has 1018 states
194-peg Towers of Hanoi (TOH4)
- Harder than the known 3-peg Towers of Hanoi
- There is a conjecture about the length of optimal
path but it was not proven. - Size 4k
20How to improve search?
- Enhanced algorithms
- Perimeter-search Delinberg and Nilson 95
- RBFS Korf 93
- Frontier-search Korf and Zang 2003
- Breadth-first heuristic search Zhou and Hansen
04 - ? They all try to better explore the search tree.
- Better heuristics more parts of the search tree
will be pruned.
21Better heuristics
- In the 3rd Millennium we have very large
memories. - We can build large tables.
- For enhanced algorithms large open-lists or
transposition tables. They store nodes
explicitly. - A more intelligent way is to store general
knowledge. We can do this with heuristics
22Subproblems-Abstractions
- Many problems can be abstracted into subproblems
that must be also solved. - A solution to the subproblem is a lower bound on
the entire problem.
- Example Rubiks cube Korf 97
- Problem ? 3x3x3 Rubiks cube
- Subproblem ? 2x2x2 Corner cubies.
23Pattern Databases heuristics
- A pattern database (PDB) is a lookup table that
stores solutions to all configurations of the
sub-problem (patterns) - This PDB is used as a heuristic during the search
88 Million states
1019 States
Search space
Mapping/Projection
Pattern space
24Non-additive pattern databases
- Fringe pattern database Culberson Schaeffer
1996. - Has only 259 Million states.
- Improvement of a factor of 100 over Manhattan
Distance
25Example - 15 puzzle
1
1 4 5 10
4 5
8 11 9 12
2
13 15
8 9 10 11
3 6 7 14
12 13 14 15
- How many moves do we need to move tiles 2,3,6,7
from locations 8,12,13,14 to their goal locations - The solution to this is located in
- PDB812131418
26Example - 15 puzzle
2
3 6 7
- How many moves do we need to move tiles 2,3,6,7
from locations 8,12,13,14 to their goal locations - The solution to this is located in
- PDB812131418
27Disjoint Additive PDBs (DADB)
- If you have many PDBS, take their maximum
- Values of disjoint databases can be added and are
still admissible Korf Felner AIJ-02, - Felner, Korf
Hanan JAIR-04 - Additivity can be applied if the cost of a
subproblem is composed from costs of objects from
corresponding pattern only
28DADBTile puzzles
5-5-5
6-6-3
7-8
6-6-6-6
Korf, AAAI 2005
Memory Time Nodes Value Heuristic Puzzle
3-tera-bytes 28 days 1013 Breadth-FS 15
0 53.424 401,189,630 36.942 Manhattan 15
3,145 0.541 3,090,405 41.562 5-5-5 15
33,554 0.163 617,555 42.924 6-6-3 15
576,575 0.034 36,710 45.632 7-8 15
242,000 2 days 360,892,479,671 6-6-6-6 24
29Heuristics for the TOH
- Infinite peg heuristic (INP) Each disk moves to
its own temporary peg. - Additive pattern databases
- Felner, Korf Hanan, JAIR-04
30Additive PDBS for TOH4
- Partition the disks into disjoint sets
- Store the cost of the complete pattern space of
each set in a pattern database. - Add values from these PDBs for the heuristic
value. - The n-disk problem contains 4n states
- The largest database that we stored was of 14
disks which needed 414256MB.
6
10
31TOH4 results
16 disks
seconds Nodes Avg h h(s) solution Heuristic
memory full Infinite peg
48 134,653,232 75.78 102 161 Static 13-3
14 36,479,151 89.10 114 161 Static 14-2
21 12,872,732 95.52 114 161 Dynamic 14-2
17 disks
2,501 238,561,590 97.05 116 183 Dynamic 14-3
- The difference between static and dynamic is
covered in Felner, Korf Hanan JAIR-04
32Best Usage of Memory
- Given 1 giga byte of memory, how do we best use
it with pattern databases? - Holte, Newton, Felner, Meshulam and Furcy,
ICAPS-2004 showed that it is better to use many
small databases and take their maximum instead of
one large database. - We will present a different (orthogonal) method
Felner, Mushlam Holte AAAI-04.
33Compressing pattern database Felner et al
AAAI-04
- Traditionally, each configuration of the pattern
had a unique entry in the PDB. - Our main claim ?
- Nearby entries in PDBs are highly correlated
!! - We propose to compress nearby entries by storing
their minimum in one entry. - We show that ?
- most of the knowledge is preserved
- Consequences Memory is saved, larger patterns
can be used ? speedup in search is obtained.
34Cliques in the pattern space
- The values in a PDB for a clique are d or d1
- In permutation puzzles cliques exist when only
one object moves to another location.
d
G
d1
d
- Usually they have nearby entries in the PDB
- A44444
A clique in TOH4
35Compressing cliques
- Assume a clique of size K with values d or d1
- Store only one entry (instead of K) for the
clique with the minimum d. Lose at most 1. - A44444 A44441
- Instead of 4p we need only 4(p-1) entries.
- This can be generalized to a set of nodes with
diameter D. (for cliques D1) - A44444 A44411
- In general compressing by k disks reduces memory
requirements from 4p to 4(p-k)
36TOH4 results 16 disks (142)
Mem MB Time Nodes D Avg H H(s) PDB
256 14.34 36,479,151 0 87.03 116 14/0 2
64 14.69 37,964,227 1 86.48 115 14/1 2
16 15.41 40,055,436 3 85.67 113 14/2 2
4 16.94 44,996,743 5 84.44 111 14/3 2
1 17.36 45,808,328 9 82.73 107 14/4 2
0.256 23.78 61,132,726 13 80.84 103 14/5 2
- Memory was reduced by a factor of 1000!!! at a
cost of only a factor of 2 in the search effort.
37TOH4 larger versions
Memory was reduced by a factor of 1000!!! At a
cost of only a factor of 2 in the search
effort. Lossless compressing is noe efficient in
this domain.
Mem Time Nodes Avg H Type PDB size
256 gt421 gt393,887,923 81.5 static 14/0 3 17
256 2,501 238,561,590 87.0 dynamic 14/0 3 17
256 83 155,737,832 103.7 static 15/1 2 17
256 7 17,293,603 123.8 static 16/2 1 17
256 463 380,117,836 123.8 static 16/2 2 18
- For the 17 disks problem a speed up of 3 orders
of magnitude is obtained!!! - The 18 disks problem can be solved in 5 minutes!!
38Tile Puzzles
Goal State
Clique
- Storing PDBs for the tile puzzle
- (Simple mapping) A multi dimensional array ?
- A1616161616 size1.04Mb
- (Packed mapping) One dimensional array ?
A1615141312 size 0.52Mb. - Time versus memory tradeoff !!
3915 puzzle results
- A clique in the tile puzzle is of size 2.
- We compressed the last index by two ?
- A161616168
Avg H Mem Time Nodes compress Type PDB
44.75 576,575 0.081 136,288 No packed 1 7-8
45.63 576,575 0.034 36,710 No packed 1 7-8
43.64 57,657 0.232 464,977 No packed 1 7-7-1
43.64 536,870 0.058 464,977 No simple 1 7-7-1
43.02 268,435 0.069 565,881 Yes simple 1 7-7-1
43.98 536,870 0.021 147,336 Yes simple 2 7-7-1
44.92 536,870 0.016 66,692 Yes simple 2 7-7-1
40- Dual lookups in pattern databases Felner et al,
IJCAI-04
41Symmetries in PDBs
- Symmetric lookups were already performed by the
first PDB paper of Culberson Schaeffer 96 - examples
- Tile puzzles reflect the tiles
- about the main diagonal.
- Rubiks cube rotate the cube
- We can take the maximum among the different
lookups - These are all geometrical symmetries
- We suggest a new type of symmetry!!
7
8
8
7
42Regular and dual representation
- Regular representation of a problem
- Variables objects (tiles, cubies etc,)
- Values locations
- Dual representation
- Variables locations
- Values objects
43Regular vs. Dual lookups in PDBs
- Regular question
- Where are tiles 2,3,6,7 and how many moves
are needed to gather them to their goal
locations? - Dual question
- Who are the tiles in locations 2,3,6,7 and
how many moves - are needed to distribute them to their goal
locations?
44Regular and dual lookups
- Regular lookup PDB8,12,13,14
- Dual lookup PDB9,5,12,15
45Regular and dual in TopSpin
- Regular lookup for C PDB1,2,3,7,6
- Dual lookup for C PDB1,2,3,8,9
46Dual lookups
- Dual lookups are possible when there is a
symmetry between locations and objects - Each object is in only one location and each
location occupies only one object. - Good examples TopSpin, Rubiks cube
- Bad example Towers of Hanoi
- Problematic example Tile Puzzles
47Inconsistency of Dual lookups
Consistency of heuristics h(a)-h(b) lt
c(a,b)
- Both lookups for B
- PDB1,2,3,4,50
- Regular lookup for C PDB1,2,3,7,61
- Dual lookup for C PDB1,2,3,8,92
Regular Dual
b 0 0
c 1 2
48Traditional Pathmax
- children inherit f-value from their parents if
it makes them larger
g1 h4 f5
Inconsistency
g2 h2 f4
g2 h3 f5
Pathmax
49Bidirectional pathmax (BPMX)
h-values
h-values
2
4
BPMX
5
1
5
3
- Bidirectional pathmax h-values are propagated in
both directions decreasing by 1 in each edge. - If the IDA threshold is 2 then with BPMX the
right child will not even be generated!!
50Results (17,4) TopSpin puzzle
regular dual BPMX nodes time
1 0 ---- 40,019,429 67.76
0 1 no 7,618,805 15.72
0 1 yes 1,397,614 2.93
4 4 yes 82,606 0.94
17 17 yes 27,575 1.34
- Nodes improvement (17r17d) 1451
- Time improvement (4r4d) 72
- We also solved the (20,4) TopSpin version.
51Results Rubiks cube
- Data on 1000 states with 14 random moves
- PDB of 7-edges cubies
regular dual BPMX nodes time
1 0 ---- 90,930,662 28.18
0 1 no 19,653,386 7.38
0 1 yes 8,315,116 3.24
4 4 yes 615,563 0.51
24 24 yes 362,927 0.90
- Nodes improvement (24r24d) 250
- Time improvement (4r4d) 55
52Results Rubiks cube
- With duals we improved Korfs results on random
instances by a factor of 1.5 using exactly the
same PDBs.
53Results tile puzzles
Heuristic BPMX Value nodes time
Manhattan ---- 36.94 401,189,630 53.424
R ---- 44.75 136,289 0.081
RR ---- 45.63 36,710 0.034
RRDD yes 46.12 18,601 0.022
- With duals, the time for the 24 puzzle drops
from 2 days to 1 day.
54Discussion
- Results for the TopSpin and Rubiks cube are
better than those of the tile puzzles - Dual PDB lookups and BPMX cutoffs are more
effective if each operators changes larger part
of the states. - This is because the identity of the objects being
queried in consecutive states are dramatically
changed
55Summary
- Dual PDB lookups
- BPMX cutoffs for inconsistent heuristics
- State of the art solvers.
56Future work
- More compression
- Duality in search spaces
- Which and how many symmetries to use
- Other sources of inconsistencies
- Better ways for propagating inconsistencies
57Duality ? Motivation
- What is the relation between state S and states
S1 and S2?
S
3
1
4
2
5
5
2
4
1
3
S1
- Geometrical symmetries!! Reversing and/or
rotating
S2
1
4
2
5
3
- And what is the relation between S and Sd??
Sd
2
4
1
3
5
58Symmetries in PDBs
- Symmetric lookups
- Tile puzzles reflect the tiles
- about the main diagonal.
- Rubiks cube rotate the cube
- We can take the maximum among the different
lookups - These are all geometrical symmetries
- We suggest a new type of symmetry!!
7
8
8
7
59Duality definition 1
- Let S be a state.
- Let ? be a permutation such that ?(S)G
- Define ?(G)
- consequences
- ? ( )G
- The length of the optimal path from S to G and
from S to G is identical
S
-1
d
S
d
d
An admissible heuristic for S is also admissible
for S
60Regular and dual representation
- Regular representation of a problem
- Variables objects (tiles, cubies etc,)
- Values locations
- Dual representation
- Variables locations
- Values objects
61Duality definition 2
- Definition 2 For a state S we flip the roles of
variables and objects - Assume a vector lt3,1,4,2gt
- Regular representation
- Dual representation
3
1
4
2
2
4
1
3
62Duality
- Claim Definition 1 and definition 2 are
equivalent - Proof Assume that in S, object j is in location
i and that ?(i)j. - Applying ? for the first time (on S) will move
object j to location j. - Applying ? for the second time (on G) will move
object i to location j
63Using duality
- Dual lookup We can take the heuristic of the
dual state and use it for the regular state. - In particular we can perform a PDB lookup for the
dual state - Dual Search
- This is a novel search algorithm which can be
constructed for any known search algorithm
64Dual Search
- When the search arrives at a state S, we also
look at its dual state S. - We might consider to JUMP and continue the search
from S towards the goal. - This is a novel version of bidirectional search
d
d
65Example
S
S
?
G
G
?
(a) No Jumps
(b) One Jump
Bidirectional Search
Traditional Search
Construction of the solution path is possible by
applying usual backtracking with some simple
modifications.
66When to jump
- At every node, a decision should be made whether
to continue the search from S or to jump to S - Jumping Policies
- JIL Jump if larger
- JOR Jump only at the root
- J15,J24 Special jumping policies for the 15 and
24 tile puzzles
d
67Experimental results
- Rubiks cube 7-edges PDB. 1000 problem
instances.
Heuristic Search Policy Nodes Time
r IDA - 90,930,662 28.18
d IDA - 8,315,116 3.24
max(r,d) IDA - 2,997,539 1.34
max(r,d) DIDA JIL 2,697,087 1.16
max(r,d) DIDA JOR 2,464,685 1.02
68Experimental results
- 16 Pancake problem 9-tiles PDB. 100 problem
instances.
Heuristic Search Policy Nodes Time
r IDA - 342,308,368,717 284,054
d IDA - 14,387,002,121 12,485
max(r,d) IDA - 2,478,269,076 3,086
max(r,d) DIDA JIL 260,506,693 362
69Experimental results
- 15 puzzle 7-8 tiles PDB. 1000 problem instances
from Korf Felner 2002
Heuristic Search Policy Value Nodes Time
r IDA - 44.75 136,289 0.081
Max(r,r) IDA - 45.63 36,710 0.034
max(r,r,d,d) IDA - 46.12 18,601 0.022
max(r,r,d,d) DIDA J15 46.12 13,687 0.018
70Experimental results
- 24 puzzle 6-6-6-6 tiles PDB. 50 Problem
instances from Korf Felner 2002
Heuristic Search Policy Nodes
max(r,r) IDA - 43,454,810,045
max(r,r,d,d) IDA - 13,549,943,868
max(r,d) DIDA J24 8,248,769,713
max(r,r,d,d) DIDA J24 3,948,614,947
71Conclusions
- Duality in search spaces
- Two way to use duality
- 1) the dual heuristic
- 2) the dual search
- Improvement in performance
72Discussion
- Why are these domains important??
- The ideas presented in this paper are wonderful,
but are they useful in real applications? - (An anonymous referee from IJCAI-05.)
73Ongoing and future work compressing PDBs
- An item for the PDB of tiles (a,b,c,d) is in the
form ltLa, Lb, Lc, Ldgtd - Store the PDBs in a Trie
- A PDB of 5 tiles will have a level in the trie
for each tile. The values will be in the leaves
of the trie. - This data-structure will enable flexibility and
will save memory as subtrees of the trie can be
pruned
74Trie pruninig
Simple (lossless) pruning Fold leaves with
exactly the same values.
No data will be lost.
2
2
2
2
2
75Trie pruninig
- Intelligent (lossy)pruning
- Fold leaves/subtrees with are correlated to each
other (many option for this!!) - Some data will be lost.
- Admissibility is still kept.
2
2
2
2
4
76Trie Initial Results
A 5-5-5 partitioning stored in a trie with simple
folding
Mem Nodes/sec Time Nodes H(s) MD PDB
3,145,728 5,150,676 0.6 3,090,405 41.56 36.94 Simple
1,572,480 988,613 3.126 3,090,405 41.56 36.94 Packed
765,778 1,191,826 2.593 3,090,405 41.56 36.94 Trie
77Neural Networks (NN)
- We can feed a PDB into a neural network engine.
Especially, Addition above MD - For each tile we focus on its dx and dy from its
goal position. (i.e. MD) - Linear conflict
- dx1 dx2 0
- dy1 gt dy21
- A NN can learn
- these rules
2
1
dy1 2 dy20
78Neural network
- We train the NN by feeding the entire (or part of
the) pattern space. - For example for a pattern of 5 tiles we have 10
features, 2 for each tile. - During the search, given the locations of the
tiles we look them up in the NN.
79Neural network example
dx4
Layout for the pattern of the tiles 4, 5 and 6
dy4
dx5
4
dy5
dx6
dy6
80Neural Network problems
- We face the problem of overestimating and will
have to bias the results towards underestimating. - We keep the overestimating values in a separate
hash table - Results are encouraging!!
Mem Time Nodes H(s) PDB
1,572,480 0.49 243,290 31.00 Regular
33,611d472w 69.75 454,262 29.67 Neural Network
81Ongoing and Future WorkDuality
- Definition 1 of a dual state
- For a state S we flip the roles of variables and
objects - A vector lt3,1,4,2gt
- Regular state S 3, 1 , 4 , 2
- Dual state Sd 2, 4 , 1 , 4
82Future of Duality
- S ? O ? G
- G ? O ? Sd
- Sd ? O-1 ? G
83Workshop
- You are all welcome to the workshop on
- Heuristic Search, Memory-based Heuristics and
Their application - To be held in AAAI-06
- See www.ise.bgu.ac.il/faculty/felner