Lookahead pathology in real-time pathfinding

About This Presentation

Title:

Lookahead pathology in real-time pathfinding

Description:

Hypothesis 1 is correct, but it is not the main reason for on-policy pathology ... Is Hypothesis 2 wrong? ... Hypothesis 3 ... – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 41

Provided by: mitjal

Category:

more less

Transcript and Presenter's Notes

Title: Lookahead pathology in real-time pathfinding

1
Lookahead pathology in real-time pathfinding

Mitja Luštrek
Jožef Stefan Institute, Department of Intelligent
Systems
Vadim Bulitko
University of Alberta, Department of Computer
Science

Introduction
Problem
Explanation

3
Agent-centered search (LRTS)
Lookahead area
Current state
Goal state
Lookahead depth d
4
Agent-centered search (LRTS)
f g h
True shortest distance g
Estimated shortest distance h
Frontier state
5
Agent-centered search (LRTS)
Frontier state with the lowest f (fopt)
6
Agent-centered search (LRTS)
7
Agent-centered search (LRTS)
h fopt
8
Agent-centered search (LRTS)
9
Lookahead pathology

Generally believed that larger lookahead depths
produce better solutions
Solution-length pathology larger lookahead
depths produce worse solutions

Lookahead depth Solution length
1 11
2 10
3 8
4 10
5 7
6 8
7 7
Degree of pathology 2
10
Lookahead pathology

Pathology on states that do not form a path
Error pathology larger lookahead depths produce
more suboptimal decisions

Multiple states Multiple states
Depth Error
1 0.31
2 0.25
3 0.21
4 0.24
5 0.18
6 0.23
7 0.12
One state One state
Depth Decision
1 suboptimal
2 suboptimal
3 optimal
4 optimal
5 optimal
6 suboptimal
7 suboptimal
Degree of pathology 2
There is pathology
11
Related minimax pathology

Minimax backs up heuristic values from the leaves
of the game tree to the root
Attempts to explain why backed-up heuristic
values are better than static values
Theoretical analyses show that they are worse
pathology Nau 79, Beal 80
Explanations
similarity of nearby positions in real games
realistic modeling of error
...
Focus on why the pathology does not appear in
practice

12
Related pathology in single-agent search

Discovered on synthetic search trees Bulitko et
al. 03
Observed in eight puzzle Bulitko 03
appears with different evaluation functions
shown that the benefit from knowing the optimal
lookahead depth is large
Explained on synthetic search trees Luštrek 05
caused by certain properties of trees
caused by inconsistent and inadmissible
heuristics
Unexplored in pathfinding

Introduction
Problem
Explanation

14
Our setting

HOG Hierarchical Open Graph Sturtevant et al.
Maps from commercial computer games (Baldurs
Gate, Warcraft III)
Initial heuristic octile distance (true distance
assuming an empty map)
1,000 problems (map, start state, goal state)

15
On-policy experiments

The agent follows a path from the start state to
the goal state, updating the heuristic along the
way
Solution length and error over the whole path
computed for each lookahead depth -gt pathology

d 1
d 2
d 3
16
Off-policy experiments

The agent spawns in a number of states
It takes one move towards the goal state
Heuristic not updated
Error is computed from these first moves -gt
pathology

d 3
d 1, 2
d 1
d 1
d 2
d 2, 3
d 3
17
Basic on-policy experiment
Degree of pathology 0 1 2 3 4 5
Length (problems ) 38.1 12.8 18.2 16.1 9.5 5.3
Error (problems ) 38.5 15.1 20.3 17.0 7.6 1.5

A lot of pathology over 60!
First explanation a lot of states are
intrinsically pathological (off-policy mode)
Not true only 3.9 are
If the topology of the maps is not at fault,
perhaps the algorithm is to blame?

18
Off-policy experiment on 188 states

Comparison not fair
On-policy pathology from error over a number of
states
Off-policy pathologicalness of single states
Fair off-policy error over the same number of
states as on-policy 188 (chosen randomly)
Can use only error no solution length off-policy

Degree of pathology 0 1 2 3 4
Problems 57.8 31.4 9.4 1.4 0.0

Not much less pathology than on-policy 42.2 vs.
61.5

19
Tolerance

The first off-policy experiment showed little
pathology, the second one quite a lot
Perhaps off-policy pathology is caused by minor
differences in error noise
Introduce tolerence t
increase in error counts towards the pathology
only if error (d1) gt t error (d2)
set t so that the pathology in the off-policy
experiment on 188 states is lt 5 t 1.09

20
Experiments with t 1.09
Degree of pathology 0 1 2 3 4 5
On-policy (prob. ) 42.3 19.7 21.2 12.9 3.6 0.3
Off-policy (prob. ) 95.7 3.7 0.6 0.0 0.0 0.0

On-policy changes little vs. t 1 57.7 vs.
61.9
Apparently on-policy pathology is more severe
than off-policy
Investigate why!
The above experiments are the basic on-policy
experiment and the basic off-policy experiment

Introduction
Problem
Explanation

22
Hypothesis 1

LRTS tends to visit pathological states with an
above-average frequency
Test compute pathology from states visited
on-policy instead of 188 random states

Degree of pathology 0 1 2 3 4
Problems 93.6 5.3 0.9 0.2 0.0

More pathology than in random states 6.3 vs.
4.3
Much less pathology than basic on-policy 6.3
vs. 57.7
Hypothesis 1 is correct, but it is not the main
reason for on-policy pathology

23
Is learning the culprit?

There is learning (updating the heuristic)
on-policy, but not off-policy
Learning necessary on-policy, otherwise the agent
gets caught in infinite loops
Test traverse paths in the normal on-policy
manner, measure error without learning

Degree of pathology 0 1 2 3 4 5
Problems 79.8 14.2 4.5 1.2 0.3 0.0

Less pathology than basic on-policy 20.2 vs.
57.7
Still more pathology than basic off-policy 20.2
vs. 4.3
Learning is a reason, although not the only one

24
Hypothesis 2

Larger fraction of updated states at smaller
depths

Current lookahead area
Updated state
25
Hypothesis 2

Smaller lookahead depths benefit more from
learning
This makes their decisions better than the mere
depth suggests
Thus they are closer to larger depths
If they are closer to larger depths, cases where
a larger depth happens to be worse than a smaller
depth are more common
Test equalize depths by learning as much as
possible in the whole lookahead area uniform
learning

26
Uniform learning
27
Uniform learning
Search
28
Uniform learning
Update
29
Uniform learning
Search
30
Uniform learning
Update
31
Uniform learning
32
Uniform learning
33
Uniform learning
34
Uniform learning
35
Pathology with uniform learning
Degree of pathology 0 1 2 3 4 5
Problems 40.9 20.2 22.1 12.3 4.2 0.3

Even more pathology than basic on-policy 59.1
vs. 57.7
Is Hypothesis 2 wrong?
Let us look at the volume of heuristic updates
encountered per state generated during search
This seems to be the best measure of the benefit
of learning

36
Volume of updates encountered

Hypothesis 2 is correct after all

37
Hypothesis 3

On-policy one search every d moves, so fewer
searchs at larger depths
Off-policy one search every move

38
Hypothesis 3

The difference between depths in the amount of
search is smaller on-policy than off-policy
This makes the depths closer on-policy
If they are closer, cases where a larger depth
happens to be worse than a smaller depth are more
common
Test search every move on-policy

39
Pathology when searching every move
Degree of pathology 0 1 2 3 4 5
Problems 86.9 9.0 3.3 0.6 0.2 0.0

Less pathology than basic on-policy 13.1 vs.
57.7
Still more pathology than basic off-policy 13.1
vs. 4.3
Hypothesis 3 is correct, the remaining pathology
due to Hypotheses 1 and 2
Further test number of states generated per move

40
States generated / move

Hypothesis 3 confirmed again

41
Summary of explanation

On-policy pathology caused by different lookahead
depths being closer to each other in terms of the
quality of decisions than the mere depths would
suggest
due to the volume of heuristic updates
ecnountered per state generated
due to the number of states generated per move
LRTS tends to visit pathological states with an
above-average frequency

Lookahead pathology in real-time pathfinding - PowerPoint PPT Presentation

Lookahead pathology in real-time pathfinding

Hypothesis 1 is correct, but it is not the main reason for on-policy pathology ... Is Hypothesis 2 wrong? ... Hypothesis 3 ... – PowerPoint PPT presentation