Title: Combinatorial Problems I: Finding Solutions
1Combinatorial Problems I Finding Solutions
- Ashish Sabharwal
- Cornell University
- March 3, 2008
- 2nd Asian-Pacific School on Statistical Physics
and Interdisciplinary Applications
KITPC/ITP-CAS, Beijing, China
2Computer Science
Engineering
Mathematics
Cross-fertilization of ideas for the study and
design of Intelligent Systems
Operations Research
Economics
Phase transition
Physics
Cognitive Science
Research part of Cornells Intelligent
Information Systems Institute (IISI) Director
Carla Gomes
3Combinatorial Problems
- Examples
- Routing Given a partially connected networkon
N nodes, find the shortest path between X and Y - Traveling Salesperson Problem (TSP) Given
apartially connected network on N nodes, find a
paththat visits every node of the network
exactly oncemuch harder!! - Scheduling Given N tasks with earliest start
times, completion deadlines, and set of M
machines on which they can execute, schedule them
so that they all finish by their deadlines
4Problem Instance, Algorithm
- Specific instantiation of the problem
- E.g. three instances for the routing problem with
N8 nodes - Objective a single, generic algorithm for the
problem that can solve any instance of that
problem
A sequence of steps, a recipe
5Measuring the Effectiveness of Algorithms
- Capture scaling with input size N, rather than
runtime on specific instances - The most common notion in Computer Science is
worst-case complexity What is the longest time
(or number of steps) the algorithm might take on
any input of size N?Perhaps only N steps, 100
N5 ?N linear time, O(N)Maybe N2 steps, or N2
4 N 6 quadratic ,O(N2)Maybe N3 1000 log
N cubic, O(N3) Maybe 2N, or 2N
N1000 exponential, O(2N)
6Polynomial vs. Exponential Complexity
Polynomial time tractable, canhope to solve
very large problemswith enough computing
power E.g. known routing / shortestpath
algorithms O(N3) Exponential time quickly run
intoscalability issues as N increases E.g. best
known algorithms for TSP
7Are some problems inherently harder than
others?A large amount of work on answering this
question computational complexity theory
8Computational Complexity Hierarchy
EXP-complete games like Go,
Hard
EXP
PSPACE-complete QBF, adversarial planning,
chess (bounded),
PSPACE
P-complete/hard SAT, sampling,
probabilistic inference,
PP
PH
NP-complete SAT, scheduling, graph
coloring, puzzles,
NP
P-complete circuit-value,
P
In P sorting, shortest path,
Easy
Note widely believed hierarchy know P?EXP for
sure
9NP-Completeness
- P class of problems for which a solution can
be found in poly time e.g. can find a
shortest path in poly time - NP class of problems for which a solution can be
verified in poly time e.g. cant find a
TSP solution in poly time (as far as we know)
but, given a candidate solution (a witness)
can verify the correctness of the witness
in poly time N non-deterministic, with
the power of guessing P polynomial
time - NP-complete the hardest problems within NP
10NP-Completeness
- One of the biggest discoveries in Computer
Science - All NP-complete problems are equally hard!
worst-case complexity - An algorithm for any one NP-complete problem can
be used to solve any other NP-complete problem
with only a polynomial overhead! - There are catalogues of 10,000s of such
problemse.g. Boolean satisfiability or SAT,
TSP, scheduling, (bounded) planning, chip
verification, 0-1 integer programming, graph
coloring, logical inference, - Similarly for PSPACE-complete, P-complete,
etc.
11Can one design a single algorithm that can
efficiently solve thousands of different problems
of interest?
12The Quest for Machine Reasoning
A cornerstone of Artificial Intelligence Objectiv
e Develop foundations and technology to enable
effective, practical, large-scale automated
reasoning.
Current reasoning technology
Machine Reasoning (1960-90s)
Computational complexity of reasoning appears to
severely limit real-world applications
Revisiting the challenge Significant progress
with new ideas / tools for dealing with
complexity (scale-up), uncertainty, and
multi-agent reasoning
13General Automated Reasoning
GeneralInferenceEngine
ModelGenerator(Encoder)
Probleminstance
Solution
Domain-specific
Generic
e.g. logistics, chess,planning, scheduling, ...
applicable to all domainswithin range of
modeling language
Research objective Better reasoning and
modeling technology
Impact Faster solutions in several domains
14Reasoning Complexity
- EXPONENTIAL COMPLEXITY INHERENT
- AN worst case
- N No. of Variables/Objects A Object
states - TIME/SPACE
- ?Granularity ? ? Object states
- Current implementations trade
- time with soundness
Search for rules to apply
For N variables 2N cases drive complexity!
Check Contradictions
15Exponential Complexity Growth The Challenge of
Complex Domains
Note rough estimates, for propositional reasoning
1M 5M
War Gaming
10301,020
0.5M 1M
VLSI Verification
10150,500
Case complexity
100K 450K
Military Logistics
106020
20K 100K
Chess (20 steps deep)
103010
No. of atoms on the earth
10K 50K
Deep space mission control
Seconds until heat death of sun
1047
100 200
1030
Car repair diagnosis
Protein folding Calculation (petaflop-year)
Variables
100
10K
20K
100K
1M
Rules (Constraints)
Credit Kumar, DARPA Cited in Computer World
magazine
16Progress in Last 15 Years
- Focus Combinatorial Search Spaces
- Specifically, the Boolean satisfiability problem,
SAT - Significant progress since the 1990s.
- How much?
- Problem size We went from 100 variables, 200
constraints (early 90s) to 1,000,000 vars. and
5,000,000 constraints in 15 years.Search space
from 1015 to 10300,000.Aside one can
encode quite a bit in 1M variables. - Tools 50 competitive SAT solvers available
- Overview of the state of the art Plenary talk
at IJCAI-05 (Selman) Discrete App. Math. article
(Kautz-Selman 06)
17How Large are the Problems?
A bounded model checking problem
18SAT Encoding
(automatically generated from problem
specification)
i.e., ((not x1) or x7) ((not x1) or x6)
etc.
x1, x2, x3, etc. are our Boolean variables (to be
set to True or False)
Should x1 be set to False??
1910 Pages Later
i.e., (x177 or x169 or x161 or x153 x33 or x25
or x17 or x9 or x1 or (not x185)) clauses /
constraints are getting more interesting
Note x1
204,000 Pages Later
21Finally, 15,000 Pages Later
Search space of truth assignments
Current SAT solvers solve this instance in under
30 seconds!
22SAT Solver Progress
Solvers have continually improved over time
Source Marques-Silva 2002
23How do SAT Solvers Keep Improving?
- From academically interesting to practically
relevant. - We now have regular SAT solver competitions.
- (Germany 89, Dimacs 93, China 96, SAT-02,
SAT-03, , SAT-07) - E.g. at SAT-2006 (Seattle, Aug 06)
- 35 solvers submitted, most of them open source
- 500 industrial benchmarks
- 50,000 benchmark instances available on the www
- This constant improvement in SAT solvers is the
key to making, e.g.,SAT-based planning very
successful.
24Current Automated Reasoning Tools
- Most-successful fully automated methods based
on Boolean Satisfiability (SAT) / Propositional
Reasoning - Problems modeled as rules / constraints over
Boolean variables - SAT solver used as the inference engine
- Applications single-agent search
- AI planning
- SATPLAN-06, fastest optimal planner ICAPS-06
competition (Kautz Selman 06) - Verification hardware and software
- Major groups at Intel, IBM, Microsoft, and
universitiessuch as CMU, Cornell, and
Princeton.SAT has become the dominant
technology. - Many other domains Test pattern generation,
Scheduling,Optimal Control, Protocol Design,
Routers, Multi-agent systems,E-Commerce
(E-auctions and electronic trading agents), etc.
25Recall General Automated Reasoning
GeneralInferenceEngine
ModelGenerator(Encoder)
Probleminstance
Solution
Domain-specific
Generic
e.g. logistics, chess,planning, scheduling, ...
applicable to all domainswithin range of
modeling language
Research objective Better reasoning and
modeling technology
Impact Faster solutions in several domains
26Automated Reasoning with SAT
- A simple but useful modeling language Boolean
formulas - Corresponding inference engine Satisfiability
or SAT algorithm (e.g. complete search, local
search, message passing) - Numerous applications hardware and software
verification, planning, scheduling, e-commerce,
circuit design, open problems in algebra,
27Boolean Logic
- Defined over Boolean (binary) variables a, b, c,
- Each of these can be True (1, T) or False (0, F)
- Variables connected together with logic
operators and, or, not (denoted ?) - E.g. ((c ? ?d) ? f) is True iff
either c is True and d is False, or f is True - Fact All other Boolean logic operators can be
expressed with and, or, not - E.g. (a ? b) same as (?a or b)
- Boolean formula, e.g. F (a or b) and ?(a
and (b or c)) - (Truth) Assignment any setting of the variables
to True or False - Satisfying assignment assignment where the
formula evaluates to True - E.g. F has 3 satisfying assignments
(0,1,0), (0,1,1), (1,0,0)
28Boolean Logic Example
- F (a or b) and ?(a and (b or c))
- Note True often written as 1, False as 0
- There are 23 8 possible truth assignments to a,
b, c - (a0,b1,c0) representing (aFalse, bTrue,
cFalse) - (a0,b0,c1)
-
- Exactly 3 truth assignments satisfy F
- (a0,b1,c0)
- (a0,b1,c1)
- (a1,b0,c0)
29Boolean Logic Expressivity
- All discrete single-agent search problems can be
cast as a Boolean formula - Variables a, b, c, often represent states of
the system, events, actions, etc. - (more on this later, using Planning as an
example) - Very general encoding language. E.g. can handle
- Numbers (k-bit binary representation)
- Floating-point numbers
- Arithmetic operators like , x, exp(), log()
-
- SAT encodings (generated automatically from high
level languages) routinely used in domains like
planning, scheduling, verification, e-commerce,
network design,
Recall Example
event
Variables X1 email_ received X2 in_
meeting X3 urgent X4 respond_to_email X5
near_deadline X6 postpone X7
air_ticket_info_request X8 travel_ request X9
info_request
state
action
- Rules
- X1 (not X2) X3 ? X4
- X2 ? not X4
- X5 ? X3 or X6
- 4. X7 ? X8
- 5. X8 ? X9
- 6. X8 ? X5
- 7. X6 ? not X9
constraint
30Boolean Logic Standard Representations
- Each problem constraint typically specified as (a
set of) clauses - E.g. (a or b), (c or d or ?f), (?a or c or
d), - Formula in conjunctive normal form, or CNF a
conjunction of clauses - E.g. F (a or b) and ?(a and (b or c))
changes to - FCNF (a or b) and (?a or ?b) and (b
or ?c) - Alternative useful for QBF specify each
constraint as a term (only and, not) - E.g. (a and ?d), (b and ?a and f), (?b and
d and e), - Formula in disjunctive normal form, or DNF a
disjunction of terms - E.g. FDNF (?a and b) or (a and ?b and ?c)
clauses (only or, not)
31Boolean Satisfiability Testing
- The Boolean Satisfiability Problem, or SAT
- Given a Boolean formula F,
- find a satisfying assignment for F
- or prove that no such assignment exists.
- A wide range of applications
- Relatively easy to test for small formulas (e.g.
with a Truth Table) - However, very quickly becomes hard to solve
- Search space grows exponentially with formula
size (more on this next) - SAT technology has been very successful in taming
this exponential blow up!
32SAT Search Space
All vars free
- SAT Problem Find a path to a True leaf node.
- For N Boolean variables, the raw search space is
of size 2N - Grows very quickly with N
- Brute-force exhaustive search unrealistic without
efficient heuristics, etc.
33SAT Solution
All vars free
- A solution to a SAT problem can be seen as a path
in the search tree that leads to the formula
evaluating to True at the leaf. - Goal Find such a path efficiently out of the
exponentially many paths. - Note this is a 4 variable example. Imagine a
tree for 1,000,000 variables!
34k-CNF, 3-CNF
- k-CNF all clauses have k literals
- 1-CNF SAT trivial
- 2-CNF SAT solvable in O(N2) time N num.
of variables - 3-CNF SAT NP-complete
- 4-CNF SAT NP-complete
Note Any Boolean formula can be converted into
CNF. -- with or without extra variables (without
? size increase)
35Worst-Case Complexity
- SAT is an NP-complete problem
- Worst-case believed to be exponential(roughly 2N
for N variables) - 10,000 problems in CS are NP-complete (e.g.
planning, scheduling, protein folding, reasoning) - P vs. NP --- 1M Clay Prize
- However, real-world instances are usually not
pathological and can often be solved very quickly
with the latest technology! - Typical-case complexity provides a moredetailed
understanding and a more positive picture.
36Exponential Complexity Growth
Planning (single-agent) find the right
sequence of actions
HARD 10 actions, 10! 3 x 106 possible plans
Contingency planning (multi-agent) actions
may or may not produce the desired effect!
REALLY HARD 10 x 92 x 84 x 78 x x 2256
10224 possible contingency
plans!
37Typical-Case Complexity
A key hardness parameter for k-SAT the ratio
of clauses to variables
Problems that are not critically constrained tend
to be much easier in practicethan the relatively
few critically constrained ones
Mitchell, Selman, and Levesque 92 Kirkpatrick
and Selman Science 94
38Typical-Case Complexity
SAT solvers continually getting close to tackling
problems in the hardest region!
SP (survey propagation) now handles 1,000,000
variablesvery near the phase transition region
39Tractable Sub-Structure Can Dominate and
Drastically Reduce Solution Cost!
2p-SAT model mix 2-SAT (tractable) and 3-SAT
(intractable) clauses
40 3-SAT exponential scaling
Median runtime
? 40 3-SAT linear scaling!
Number of variables
(Monasson, Selman et al. Nature 99 Achlioptas
00)
40How are other NP-complete problems translated
into SAT instances?SAT encoding
41SAT Encoding Example Planning Domain
- Planning Problem ? Propositional CNF formulaby
axiom schemas - Logistics planning think of a number of trucks
and planes that need to transport a bunch of
packages from their origin to their destination - Discrete time, modeled by integers
- state predicates indexed by time at which they
hold - E.g. at_location(x,,loc,i), free(x,i1),
route(cityA,cityB,i) - action predicates indexed by time at which
action begins - E.g. fly(cityA,cityB,i), pickup(x,loc,i),
drive_truck(loc1,loc2,i) - each action takes 1 time step
- many actions may occur at the same step
42Encoding Rules
- Actions imply preconditions and effects
- fly(x,y,i) ? at(x,i) and route(x,y,i)
and at(y,i1) - Conflicting actions cannot occur at same time (A
deletes a precondition of B) - fly(x,y,i) and y?z ? not fly(x,z,i)
- If something changes, an action must have caused
it(Explanatory Frame Axioms) - at(x,i) and not at(x,i1) ? ?y .
route(x,y) and fly(x,y,i) - Initial and final states hold
- at(NY,0) and ... and at(LA,9) and ...
43Using SAT Solvers for Planning
Modeling and Solving a Planning Problem
instantiated propositional clauses
instantiate
Problem description inhigh level language
axiom schemas
(manual)
length
mapping
SAT engine(s)
interpret
satisfying model
plan
(fully automatic)
44Planning Benchmark Complexity
- Logistics domain a complex, highly-parallel
transportation domain - E.g. logistics.d problem
- 2,165 possible actions per time slot
- optimal solution contains 74 distinct actions
over 14 time slots - (out of 5 x 1046 possible sequential plans of
length 14) - Satplan Selman et al. approach is currently
fastest optimal planning approach. Winner
ICAPS-05 ICAPS-06 international planning
competitions.
45Solution Approaches to SAT
46Solving SAT Systematic Search
- One possibility enumerate all truth assignments
one-by-one, test whether any satisfies F - Note testing is easy!
- But too many truth assignments (e.g. for N1000
variables, have 21000 ? 10300 truth assignments) - 00000000
- 00000001
- 00000010
- 00000011
-
- 11111111
2N
47Solving SAT Systematic Search
- Smarter approach the DPLL procedure 1960s
- (Davis, Putnam, Logemann, Loveland)
- Assign values to variables one at a time
(partial assignments) - Simplify F
- If contradiction (i.e. some clause becomes
False), backtrack, flip last unflipped
variables value, and continue search - Extended with many new techniques -- 100s of
research papers, yearly conference on SATe.g.,
extremely efficient data-structures
(representation), randomization, restarts,
learning reasons of failure - Provides proof of unsatisfiability if F is unsat.
complete method - Forms the basis of dozens of very effective SAT
solvers!e.g. minisat, zchaff, relsat, rsat,
(open source, available on the www)
48Solving SAT Local Search
- Search space all 2N truth assignments for F
- Goal starting from an initial truth assignment
A0, compute assignments A1, A2, , As such that
As is a satisfying assignment for F - Ai1 is computed by a local transformation to
Aie.g. A1 000110111 green bit flips to
red bit A2 001110111 A3
001110101 A4 101110101
As 111010000 solution found! - No proof of unsatisfiability if F is unsat.
incomplete method - Several SAT solvers based on this approach, e.g.
Walksat
49Solving SAT Decimation
- Search space all 2N truth assignments for F
- Goal attempt to construct a solution in
one-shot by very carefully setting one variable
at a time - Survey Inspired Decimation
- Estimate certain marginal probabilities of each
variable being True, False, or undecided in
each solution cluster using Survey Propagation - Fix the variable that is the most biased to its
preferred value - Simplify F and repeat
- A method rarely used by computer scientists
- But has received tremendous success from the
physics community on random k-SAT can easily
solve random instances with 1M variables! - No searching for solution
- No proof of unsatisfiability incomplete method
50The Next Two Lectures
- Problems beyond SAT / searching for a single
solution - P-complete count the number of solutions of a
SAT instance - P-hard sample a solution uniformly at random
for a SAT instance - PSPACE-complete quantified Boolean formula (QBF)
51Thank you for attending!
Slides http//www.cs.cornell.edu/sabhar/tutoria
ls/kitpc08-combinatorial-problems-I.ppt Ashish
Sabharwal http//www.cs.cornell.edu/sabhar Bart
Selman http//www.cs.cornell.edu/selman