LikelyAdmissible - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

LikelyAdmissible

Description:

Marco Ernandes & Marco Gori Department of Information Engineering University of ... We can potentiate the Pairwise Distance computing it for all possible tile ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 43
Provided by: marcoer
Category:

less

Transcript and Presenter's Notes

Title: LikelyAdmissible


1
Likely-Admissible Sub-symbolic Heuristics
26-08-2004 Valencia
  • Marco Ernandes
  • Cognitive Science PhD Student
  • Email ernandes_at_dii.unisi.it
  • Web www.dii.unisi.it/ernandes
  • Marco Gori
  • Professor of Computer Science
  • Email marco_at_dii.unisi.it
  • Web www.dii.unisi.it/marco

2
Heuristic Search
  • Search algorithms
  • A, IDA, BS,
  • Heuristic information
  • h(n) ? tipically the distance from node n to goal
  • Heuristic usage policy
  • How to combine h(n) g(n) to obtain f(n)

3
Optimal Search for NP Problems
  • 2 approaches
  • Rigid admissibility
  • requires optimistic heuristics
  • ALWAYS retrieves optimal solutions CC
  • Relaxed Admissibility
  • e-admissible search (es WA)
  • retrieves solutions with bounded costs C?(1e)C
  • the problem is no more NP-complete

4
Two families of heuristics
  • Online heuristics
  • The h(n) value is computed during search when a
    node is visited.
  • An AI classic Manhattan Distance
  • Memory-based heuristics
  • Offline phase resolution of all possibile
    subproblems and storage of all the results.
  • Online phase decomposition of a node in
    subproblems and database querying.
  • Successfully used for rigid admissibility.

5
Online heuristic research
  • How to improve Manhattan estimations?
  • Working on its main bias locality.
  • Manhattan considers each piece of the problem as
    completely independent from the rest.
  • Hence it has no way to determine how tiles
    influence each other.
  • hM h - GAP
  • Manhattan does not consider the
  • influence of the blank tile.

hM 3 h 11
Tile conflicts
6
Online heuristic research
  • How to improve Manhattan estimations?
  • 1 Manhattan Correction (Hansson et al., 1992)
  • The idea is to increment the estimation with ad
    hoc techniques, maintaining admissibility.
  • 2 ABSOLVER approach (Prieditis, 1989)
  • Automatically inventing admissible heurisitcs
    through constraint elimination.
  • 3 Higher-Order Heuristics (Korf, 1996)
  • Generalizing Manhattan considering subproblems
    of a configuration and not the single elements.

7
Manhattan Corrections
  • Linear Conflicts
  • Corner Tiles
  • Last Moves
  • Non-Linear Conflicts
  • Corner Deduction
  • First Moves

Hansson et al., 1992
Conflict Deduction Ernandes, 2003
Ernandes, 2003
8
Examples
Linear Conflicts computes conflicts on the same
row/coloumn
Corner tiles computes conflicts thanks to
corners properties
Last Moves computes the last two moves to
complete the puzzle.
Non-Linear Conflicts computes conflicts on a
different row/coloumn (two types)
Corner deduction as corner tiles but with
correct tiles on the diagonal.
9
Conflict Deduction
  • It is more convenient to implement the various
    techniques separately.
  • Cannot add together all corrections
    inadmissibility!
  • If one tile is involved in one conflict it counts
    only once.
  • To maximize the estimation we use, for each
    tile, the technique that gives the highest
    contribute.

10
Higher-Order Heuristics
  • Ad hoc techniques generate strongly
    problem-dependent heuristics.
  • They are not sufficient to attack bigger problems
    as the 24-puzzle.
  • Manhattan has to be generalized otherwise
    considering the distance-to-goal of more elements
    (tiles) collected together.
  • First example ? Pairwise Distances instead of
    computing the distance of 1 indipendent tile, we
    use couples of tiles.

11
Higher-Order Heuristics problems
  • We can potentiate the Pairwise Distance computing
    it for all possible tile couples and then seeking
    the combination that maximizes the estimation
    Maximum Weighted Matching Problem
  • PD remains poorly informed. We need triples of
    tiles but the Matching Problem becomes
    NP-Complete (Korf, 1996).
  • Hence the only possible Higher-Order Heuristic to
    be efficiently used online is Pairwise Distance,
    which is to poor ? less informed than Conflict
    Deduction!

12
From Higher-Order Heuristics to Memory-based
heuristics
  • Higher-Order Heuristics could ignore the
    maximization problem and consider pre-designed
    tile groups (and increment their dimensions).
  • Solving subproblems of 3 or more tiles (patterns)
    is too expensive during search we need to do
    this offline.

13
Disjoint Pattern Databases(KorfTaylor 2002)
  • Additive version of Pattern Databases
    (CulbersonSchaeffer,96) where pattern are
    considered independently.
  • Manhattan is the simplest Disjoint Pattern DB 1
    tile 1 pattern. DPDBs, unlike PDBs, always
    dominate Manhattan.
  • On the 15-puzzle they perform 75 times faster
    than non-additive PDBs and their DB generation is
    much easier because distances can be computed
    backwards by disarranging the patterns.
  • Different DPDBs can be combined taking argmax
    global speedup over Manhattan 2000, space
    reduction 13000.

DPDB 1
DPDB 2
14
DPDBs and the 24-puzzle.
  • This technique solved the 24-puzzle between 1,1
    and 21 times faster than classic Higher-Order
    Heuristics avg. 2 days.
  • But in many cases using more nodes!
  • This technique evidently does not scale with
    dimension problems.
  • Maintaining the same time complexity for the
    35-puzzle would mean increase from 1013 to 1028
    the number of DB entrances.

15
Criticizing the classic approach
  • We believe that it is more sensible to
    investigate the combination online heuristics
    relaxed admissibility.
  • A) Because rigid admissibility does not give any
    chance to face problems of greater greater
    dimensions.
  • Online admissible heuristics ? NP-Hard in time
  • Mem-based admissible heuristics ? NP-Hard in
    space
  • B) Because admissibility is a sufficient
    condition for optimality, not necessary!

16
Admissible overestimations
  • Some overestimations obviously dont affect
    optimality
  • Constant overestimations
  • Overestimations outside the optimal path
  • Optimal path overestimations coupled with
    overestimations in brother sub-branches.
  • In some domains other overestimations are
    admissible
  • Uniform-cost problems h lt hc (Move
    games)
  • Orthogonal single-piece-move problems h lt h2c
    (Atomic Manhattan-space problems ? like the
    sliding-tile puzzle)
  • Simple experiment with the 8-puzzle and A
  • Use heuristic hhMs with s variable
  • If sgt0 and slt2 search is optimal, but more
    inefficient while s?2.
  • If s2 search can be supoptimal, and regain space
    efficiency.

17
Likely-Admissible Search
  • We relax the optimality requirement in a
    probabilistic sense (not qualitatively like
    e-admissible search).
  • Why is it a better approach than e-admissibility?
  • It allows to retrieve TRULY OPTIMAL solutions.
  • It still allows to change the nature of search
    complexity.
  • It allows to study the complexity stressing p
    asymptotically to 1.
  • Because search can rely on any heuristic, unlike
    e-admissible search that works only on
    already-proven-admissible ones.
  • Because we can better combine search with
    statistical machine learning techniques. Using
    universal approximators we can automatically
    generate heuristics.

18
Likely-Admissible Search A statistical
framework
  • Any given non-admissible heuristic can be used.
    The only requisite is to have a previous
    statistical analysis of overestimation
    frequencies.
  • We refer with P(h) to the probability that
    heuristic h underestimates h for any given state
    x ? X.
  • We refer with ph to the probability of optimally
    solving a problem using h and A.
  • A main goal of the framework is to obtain ph from
    P(h) WE WANT TO ESTIMATE OPTIMALITY FROM
    ADMISSIBILITY

19
Likely-Admissible Search Trivial case single
heuristic.
  • The overestimations over optimal path p affect
    optimality hence, given solution depth d
  • (eq. 1)
  • Considering the admissible overestimation
    theorem, in the sliding-tiles puzzle domain
  • (eq. 2)

20
Likely-Admissible Search Effect of Admissible
Overestimations Th.
  • Underestimating h2 is MUCH EASIER than h!
  • Best heuristic generated for the 8-puzzle
    overestimated h in 28,4 of cases, but h2 in
    1,9 !!

21
Likely-Admissible Search Multiple Heuristics
  • To enrich the heuristic information we can
    generate many heuristics and use them
    simultaneously.
  • With j different heuristics we can take each time
    the smaller evaluation, in order to stress
    admissibility
  • Thus
  • (eq. 3)
  • (eq. 3b)

22
Likely-Admissible Search Multiple Heuristics
  • A common problem we desire an optimality p
    how many heuristics do we have to use to obtain
    p?
  • We will consider for simplicity that all j
    heuristics have the same given P(h2?). Hence
  • (eq. 4)

j grows logarithmically with this term, that
grows both with d and pH because dgt1 e pH lt 1
23
Likely-Admissible Search Some Examples
  • 8-puzzle how many heuristics?
  • d ? 22
  • Desired optimality 99,9 ? pH 0,999
  • Given heuristics have P(h2?) 0,95
  • log 0,05 (1 22 0,999 ) ? log 0,05 0,0000455 ?
    ?3,33?
  • 4
  • 15-puzzle how many heuristics?
  • d ? 53
  • Same desired optimality
  • Give heuristics have P(h2?) 0,93
  • log 0,07 (1 22 0,999 ) ? log 0,07 0,0000189 ?
    ?4,1?
  • 5

24
Likely-Admissible Search Main Problems
  • Equations 3 and 3b assume that
  • INDEPENDENT PROBABILITY DISTRIBUTION
    Overestimation probability of competitive
    heuristics hj(x) have an independent distribution
    over X.
  • Equations 2 assumes that
  • CONSTANT PROBABILITY
  • Underestimation probability P(h2) is constant
    for all x independently by h(n).
  • All these assumptions are very strong
  • We observed experimentally that ANN heuristics
    map X with similar overestimation probabilities.
  • We observed that avg. error grows with h, thus
    P(h2) too.

25
Likely-Admissible Search Prediction capability
  • Eq.3 is not usable since it requires total
    independency.
  • Optimality growth seems more or less linear (not
    exponential) with the number of heuristics. It
    sensibly improves with learning over different
    datasets .
  • Trivial equation 2 gives a probabilistic lower
    bound of effective search optimality
  • Extremely precise if the estimation is over 80.
  • Imprecise (but always pessimistic) for low
    predictions.
  • Optimistic predictions are very rare and depend
    on the CONSTANT PROBABILITY assumption.
  • Predictions are much more accurate than
    e-admissible search predictions.

26
Likely-Admissible Search Optimality
prediction 8-puzzle
27
Likely-Admissible Search Optimality
prediction 15-puzzle
28
Sub-symbolic heuristics
We used standard MLP networks.
h(n)
29
Sub-symbolic heuristics Are sub-symbolic
heuristics online?
  • We believe so. Even that there is an offline
    learning phase. For 2 reasons
  • 1. Nodes visited during search are generally
    UNSEEN.
  • Exactly like often humans do with learned
    heuristics we dont recover a heuristic value
    from a database, we compute it employing the
    inner rules that the heuristic provides.
  • 2. The learned heuristic should be
    dimension-independent learning over small
    problems could be used for bigger problems (i.e.
    8-puzzle ? 15-puzzle). This is not possible with
    mem-based heuristics.

30
Sub-symbolic heuristics Outputs Targets
  • Two options
  • A) 1 linear neuron output
  • B) n 0/1 neuron outputs
  • A is much better.
  • Two possible targets
  • A) direct target function ? o(x)h(x)
  • B) gap target ? o(x)h(x)-hM(x)
  • (which takes advantage of Manhattan too)
  • Experiments B improves against A only in bigger
    problems such as the 15-puzzle.

31
Sub-symbolic heuristics Entrances coding
(N x k t) if square k occupied by value t ? N2
  • A 000000100 001000000 000010000
  • B 001 100 100 001 010 010 100 010
  • C -2 0 0 1 -1 1 1 1 0 1 0 0

Row/column targets in block k of value t are high
if k occupied by value t ? 2N3/2
For each square compute hortvert distances ? 2N
32
Sub-symbolic heuristics Learning Algorithm
  • Backpropagation with a new error function,
    instead of classic function Ed od td over
    example d.
  • We introduce a coefficient of asymmetry in order
    to stress admissibility
  • Ed (1-w) (od td) if (od td) lt 0
  • Ed (1w) (od td) if (od td) gt 0 with
    0 lt w lt 1
  • The modified backprop minimizes
  • E(W) ½ ?d?D rd (od td )2 with rd (1w)
    or rd (1-w)
  • We used a dynamic decreasing w, in order to
    stress underestimations when learning is simple
    and to ease it successively. Momentum a0,8
    helped smoothness.

33
Sub-symbolic heuristics Asymmetric Regression
Symmetric error
Asymmetric error
  • This is a general idea for backpropagation
    learning.
  • It can suit any regression problem where
    overestimations harm more than underestimations
    (or contrary).
  • Heuristic machine learning is an ideal
    application field.

34
Sub-symbolic heuristics Dataset Generation
  • Examples are previously optimally solved
    configurations.
  • Few examples are sufficient for good learning. A
    few hundreds to have faster search than
    Manhattan.
  • Experimental ideal 8-puzzle set ? 10000
    examples, 15-puzzle ? 25000 (1/500x106 of the
    problem space!).
  • IMPORTANT these examples have to be
    representative of cases present in search trees,
    not of random cases! see 15-puzzle search tree
    distribution
  • Hence, avg. h should stay around d/2. Over 60
    of 15-puzzle examples have d lt 30, ? 80 have d lt
    45. Dataset generation is much easier than
    expected and its fully parallelizable.
  • Generating two 25000 15-puzzle dataset, took 100
    hours, half than learning.

35
Sub-symbolic heuristics Modifying estimations
a posteriori
  • Using trunc() ? mandatory for IDA
  • Adapting value to Manhattans parity
  • Increases by 30 IDA efficiency.
  • Does not improve admissibility, due to the
    admissible overestimations theorem.
  • Shifting to Manhattan in search endings.
  • Maintaining dominance over Manhattan.
  • Arbitrary estimation reduction.

36
Experimental Results 8-puzzle using A single
heuristics
21,97
22,91
22,81
22,79
22,73
22,57
22,56
22,43
22,31
Manhattan
Conflict Deduction
1 ANN techniques a posteriori
1 ANN asym-learning
1
3
4
5
1 ANN
Test set 2000 random configurations
37
Experimental Results 8-puzzle using A and
multiple heuristics
Test set 2000 random configurations
38
Experimental Results 15-puzzle using IDA and
multiple heuristics
Test set 700 random configurations
(avg d52,7, nodes with Manh 3.7 x 108)
39
Experimental Results Some comparisons
38
Try the demo at http//www.dii.unisi.it/
ernandes/samloyd/
  • Compared to e-admissible search
  • WIDA with w1,25 and hconflict deduction
    predicted d66, factual d54,49, nodes visited
    42374
  • IDA with 1 ANN factual d54,45, nodes 24711
  • Compared to Manhattan
  • IDA with 1 ANN (optimality ? 30) 1/1000
    execution time, 1/15000 nodes visited
  • IDA with 2 ANN (opt. ? 50) 1/500 time, 1/13000
    nodes.
  • IDA with 4 ANN-1 (opt. ? 90) 1/70 time, 1/2800
    nodes.
  • Compared to DPDBs
  • IDA with 1 ANN between -17 and 13 nodes
    visited, between 1,4 and 3,5 times slower

40
Conclusions
39
  • We defined a new framework of relaxed-admissible
    search likely-admissible search
  • This statistical framework is more appealing than
    e-admissibility
  • it relaxes the quantity of the solutions, not the
    quality
  • it works with any non-admissible heuristic
  • it can exploit statistical learning techniques
  • Likely-admissible sub-symbolic heuristics
  • performance on 15-puzzle can challenge DPDB
    heuristics
  • represent a way to speed-up solving, avoid
    memory abuse and still retrieve optimal solutions.

41
Further Work
40
  • 1 Generalization of the input coding. Two goals
  • A) reduce the dimension of input representation.
  • B) allow learning over different
    problem-dimensions
  • An idea using graphs and Recurrent ANN to
    generate heuristics.
  • 2 Auto-feed Learning
  • The system should be able to generate its own
    dataset automatically during learning, increasing
    complexity gradually.
  • 3. Network specialization
  • Train and apply heuristics only over a certain
    domain of complexity (i.e. guided by Manhattan
    Distance), during search.

42
Likely-Admissible Sub-symbolic Heuristics
26-08-2004 Valencia
  • THANK YOU FOR YOUR ATTENTION
Write a Comment
User Comments (0)
About PowerShow.com