Title: Learning to Support Constraint Programmers
1Learning to Support Constraint Programmers
- Susan L. Epstein1
- Gene Freuder2 and Rick Wallace2
- 1Department of Computer Science
- Hunter College and The Graduate Center of
- The City University of New York
- 2Cork Constraint Computation Centre
2Facts about ACE
- Learns to solve constraint satisfaction problems
- Learns search heuristics
- Can transfer what it learns on simple problems to
solve more difficult ones - Can export knowledge to ordinary constraint
solvers - Both a learner and a test bed
- Heuristic but complete will find a solution,
eventually, if one exists - Guarantees high-quality, not optimal, solutions
- Begins with substantial domain knowledge
3Outline
- The task constraint satisfaction
- Performance results
- Reasoning mechanism
- Learning
- Representations
4The Problem Space
- Constraint satisfaction problem ltX, D, Cgt
- Solution assign a value to every variable
consistent with constraints - Many real-world problems can be represented and
solved this way (design and configuration,
planning and scheduling, diagnosis and testing)
Variables A, B, C, D
Domains A ? 1,2,3 B ? 1,2,4,5,6 C ? 1,2 D ?
1,3
Constraints A B A gt D C ? D
5A Challenging Domain
- Constraint solving is NP-hard
- Problem class parameters ltn, k, d, tgt
- n number of variables
- k maximum domain size
- d edge density ( of possible constraints)
- t tightness ( of value pairs excluded)
- Complexity peak values for d and t that make
problems hardest - Heavy-tailed distribution difficulty Gomes et
al., 2002 - Problem may have multiple or no solutions
- Unexplored choices may be good
6Finding a Path to a Solution
- Sequence of decision pairs (select variable,
assign value) - Optimal length 2n for n variables
- For n variables with domain size d, there are
(d1)n possible states
7Solution Method
Domains A ? 1,2,3 B ? 1,2,4,5,6 C ? 1,2 D ?
1,3
Constraints A B A gt D C ? D
- Search from initial state to goal
8Consistency Maintenance
- Some values may initially be inconsistent
- Value assignment can restrict domains
Constraints A B A gt D C ? D
9Retraction
- When an inconsistency arises, a retraction method
removes a value and returns to an earlier state
Domains A ? 1,2,3 B ? 1,2,4,5,6 C ? 1,2 D ?
1,3
10Variable Ordering
- A good variable ordering can speed search
11Value Ordering
- A good value ordering can speed search too
Domains A ? 1,2,3 B ? 1,2,4,5,6 C ? 1,2 D ?
1,3
Solution A2, B2, C2, D1
12Constraint Solvers Know
- Several consistency methods
- Several retraction methods
- Many variable ordering heuristics
- Many value ordering heuristics
but the interactions among them are not well
understood, nor is one combination best for all
problem classes.
13Goals of the ACE Project
- Characterize problem classes
- Learn to solve classes of problems well
- Evaluate mixtures of known heuristics
- Develop new heuristics
- Explore the role of planning in solution
14Outline
- The task constraint satisfaction
- Performance results ACE
- Reasoning mechanism
- Learning
- Representation
15Experimental Design
- Specify problem class, consistency and retraction
methods - Average performance across 10 runs
- Learn on L problems (halt at 10,000 steps)
- To-completion testing on T new problems
- During testing, use only heuristics judged
accurate during learning - Evaluate performance on
- Steps to solution
- Constraint checks
- Retractions
- Elapsed time
16ACE Learns to Solve Hard Problems
- lt30, 8, .24, .66gt near the complexity peak
- Learn on 80 problems
- 10 runs, binned in sets of 10 learning problems
- Discards 26 of 38 heuristics
- Outperforms MinDomain, an off-the-shelf
heuristic
Steps to solution
1 2 3 4 5 6 7 8
Bin
Means in blue, medians in red
17ACE Rediscovers Brélaz Heuristic
- Graph coloring assign different colors to
adjacent nodes. - Graph coloring is a kind of constraint
satisfaction problem. - Brélaz Minimize dynamic domain, break ties with
maximum forward degree. - ACE learned this consistently on different
classes of graph coloring problems.
Color each vertex red, blue, or green so pair of
adjacent vertices are different colors.
Epstein Freuder, 2001
18ACE Discovers a New Heuristic
- Maximize the product of degree and forward
degree at the top of the search tree - Exported to several traditional approaches
- Min Domain
- Min Domain/Degree
- Min Domain degree preorder
- Learned on small problems but tested in 10 runs
on n 150, domain size 5, density .05, tightness
.24 - Reduced search tree size by 25 96
Epstein, Freuder, Wallace, Morozov, Samuels
2002
19Outline
- The task constraint satisfaction
- Performance results
- Reasoning mechanism
- Learning
- Representation
20Constraint-Solving Heuristic
- Uses domain knowledge
- What problem classes does it work well on?
- Is it valid throughout a single solution?
- Can its dual also be valid?
- How can heuristics be combined?
and where do new heuristics come from?
21FORR (For the Right Reasons)
- General architecture for learning and problem
solving - Multiple learning methods, multiple
representations, multiple decision rationales - Specialized by domain knowledge
- Learns useful knowledge to support reasoning
- Specify whether a rationale is correct or
heuristic - Learns to combine rationales to improve problem
solving
Epstein 1992
22An Advisor Implements a Rationale
- Class-independent action-selection rationale
- Supports or opposes actions by comments
- Expresses opinion direction by strengths
- Limitedly-rational procedure
23Advisor Categories
- Tier 1 rationales that correctly select a single
action - Tier 2 rationales produce a set of actions
directed to a subgoal - Tier 3 heuristic rationales that select a single
action
24 Choosing an Action
25ACEs Domain Knowledge
- Consistency maintenance methods forward
checking, arc consistency - Backtracking methods chronological
- 21 variable ordering heuristics
- 19 value ordering heuristics
- 3 languages whose expressions have
interpretations as heuristics - Graph theory knowledge, e.g., connected, acyclic
- Constraint solving knowledge, e.g., only one arc
consistency pass is required on a tree
26An Overview of ACE
- The task constraint satisfaction
- Performance results ACE
- Reasoning mechanism
- Learning
- Representation
27What ACE Learns
- Weighted linear combination for comment strengths
- For voting in tier 3 only
- Includes only valuable heuristics
- Indicates relative accuracy of valuable
heuristics - New, learned heuristics
- How to restructure tier 3
- When random choice is the right thing to do
- Acquire knowledge that supports heuristics (e.g.,
typical solution path length)
28Digression-based Weight Learning
- Learn from trace of each solved problem
- Reward decisions on perfect solution path
- Shorter paths reward variable ordering
- Longer paths reward value ordering
- Blame digression-producing decisions in
proportion to error - Valuable Advisors weight gt baselines
29Learning New Advisors
- Advisor grammar on pairs of concerns
- Maximize or minimize
- Product or quotient
- Stage
- Monitor all expressions
- Use good ones collectively
- Use best ones individually
30Outline
- The task constraint satisfaction
- Performance results ACE
- Reasoning mechanism
- Learning
- Representation
31Representation of Experience
- State describes variables and value assignments,
impossible future values, prior state, connected
components, constraint checks incurred, dynamic
edges, trees - History of successful decisions
- plus other significant decisions
become training examples
Is Can be Cannot be A 1 2 B
2 C 1,2 D 1,3 Checks incurred 4 1
acyclic component A,C,D Dynamic edges AD, CD
32Representation of Learned Knowledge
- Weights for Advisors
- Solution size distribution
- Latest error greatest number of variables bound
at retraction
33ACEs Status Report
- 41 Advisors in tiers 1 and 3
- 3 languages in which to express additional
Advisors - 5 experimental planners
- Problem classes random, coloring, geometric,
logic, n-queens, small world, and quasigroup
(with and without holes)
- Learns to solve hard problems
- Learns new heuristics
- Transfers to harder problems
- Divides and conquers problems
- Learns when not to reason
34Current ACE Research
- Further weight-learning refinements
- Learn appropriate restart parameters
- More problem classes, consistency methods,
retraction methods, planners, and Advisor
languages - Learn appropriate consistency checking methods
- Learn appropriate backtracking methods
- Learn to bias initial weights
- Metaheuristics to reformulate the architecture
- Modeling strategies
and, coming soon, ACE on the Web
35Acknowledgements
- Continued thanks for their ideas and efforts go
to - Diarmuid Grimes
- Mark Hennessey
- Tiziana Ligorio
- Anton Morozov
- Smiljana Petrovic
- Bruce Samuels
- Students of the FORR study group
- The Cork Constraint Computation Centre
- and, for their support, to
- The National Science Foundation
- Science Foundation Ireland
36Is ACE Reinforcement Learning?
- Similarities
- Unsupervised learning through trial and error
- Delayed rewards
- Learns a policy
- Primary differences
- Reinforcement learning learns a policy
represented as the estimated values of states it
has experienced repeatedly but ACE is unlikely
to revisit a state instead it learns how to act
in any state - Q-learning learns state-action preferences but
ACE learns a policy that combines action
preferences
37How is ACE like STAGGER?
- STAGGER ACE
- Learns Boolean classifier Search control
preference function for a sequence of
decisions in a class of problems - Represents Weighted booleans Weighted linear
function - Supervised Yes No
- New elements Failure-driven Success-driven
- Initial bias Yes Under construction
- Real attributes Yes No
Schlimmer 1987
38How is ACE like SAGE.2?
- Both learn search control from unsupervised
experience, reinforce decisions on a successful
path, gradually introduce new factors, specify a
threshold, and transfer to harder problems, but - SAGE.2 ACE
- Learns on Same task Different problems in a
class - Represents Symbolic rules Weighted linear
function - Reinforces Repeating rules Correct comments
- Failure response Revise Reduce weight
- Proportional to error No Yes
- Compares states Yes No
- Random benchmarks No Yes
- Subgoals No Yes
- Learns during search Yes No
Langley 1985