Title: Ehsan Nazerfard
1Learning Search Control Rules for Plan-Space
Planner
Yong QuSuresh KatukamSubbarao Kambhampati
- Presented by
- Ehsan Nazerfard
2Outline
- Related works
- Overview of UCPOP
- EBL in UCPOP
- Schematic of UCPOPEBL
- Failures and Explanations
- Regression and Propagation
- Example
- Generalization
- Performance
3Related Works
- SNLPEBL
- Suresh Katukam and Subbarao Kambhampati
- UCPOPEBL
- Yong Qu and Subbarao Kambhampati
- GraphPlanEBL
- ? and Subbarao Kambhampati
4Overview of UCPOP
- Flowchart of refinement process in UCPOP
5Overview of UCPOP
- The Briefcase Domain
- Classic Problem
- Goal
- Getting an empty briefcase to the office while
leaving everything else at home - Initial state
- Paycheck P is in briefcase and briefcase is at
home
6Trace of UCPOP Solving the Problem
7Explanation Based Learning
- explaining why something is a good idea
(and generalizing from that) - construct an explanation
- generalize the explanation (introduce variables
into the explanation) - construct a new rule from the generalized
- Utility Problem
8EBL in UCPOP
- Learn control rules from search failures
- Analytical failure
- Cross depth limit failure
- To avoid failures in similar situations in future
- Online learning
- Offline learning
- Control rules
- In Selection form
- In Rejection form
9Schematic of UCPOPEBL
- The Explanation is a minimal set of
constraints in P that are together
inconsistent - Regression and Propagation are used to abstract
failure info to higher level plans in tree - DFS is usually adopted
10Failures and Explanations
- Some are detected by consistency checks
- (s1lts2 , s2lts1) ? O)
- (p_at_s1 , p_at_s1) ? A)
-
- Implicit failures may only be detected by use of
domain specific axioms - An object can not be at two places at the same
time - Domain axiom in briefcase domain
- no block can have another block on it and be
clear - Domain axiom in blocks world domain
11Regression and Propagation
- Initial failure explanation E is regressed up
through decision d leading to failing
plan - Compute condition E' that need to be true in plan
before d such that failure will result after d - Its possible to create rule that rejects d
whenever E' is true but ...
12Regression and Propagation
- For regression its useful to think UCPOP
decisions as STRIPS operators
- Regressing constraints over decisions
13Regression and Propagation
- When all the branches under P are failing, we can
construct an explanation for P itself - Sometimes its not needed to wait for all
branches to fail - This is called dependency directed backtracking
(ddb)
14Trace of UCPOPEBL Solving the Problem
15UCPOPEBL solving the problem
- Explanation of failure for p7 that is regressed
over step-addition, take-out (p)
- E6 E5 E4 E3 E2 are constructed in such manner
- Explanation of failure for p3
-
16UCPOPEBL solving the problem
- By regressing E3 over establishment under p2
- this leads to useful control rule
- establishment of closed(B)_at_G should be avoided
when the paycheck is in the
office, briefcase is not at office and we want
the briefcase to be at office and paycheck to
be left at home
17Generalization
- replace any problem-specific constants with
variables without affecting its correctness - Only those binding that are forced by initial
explanation of the failure. - Step names
- Object names
- Final search control rule
18Rule Storage
- Generalized rules are available to the planner
via rule corpus - Learning phase
- Subsequent planning episodes
- Some checks is done by the planner on rule
corpus
19Performance
- Performance of UCPOPEBL in Blocks world and
Briefcase Domain - 100 random problems were generated for each
domain
20Other issues
- Learn useful rules from Domain axioms
- Expressive action representation can obviate the
need to specific domain knowledge to
some extent. - for example
21Conclusion
- extend control rule learning framework to UCPOP
- (previous work on SNLPEBL)
- Explanation, Regression, Propagation and
- rule constructing
- Solving classic problem from Briefcase Domain
- Reduce the need for domain specific failure
theories using expressive action
representation
22Any Question would be welcome
Thanks for your attention