Title: CS 4700: Foundations of Artificial Intelligence
1CS 4700Foundations of Artificial Intelligence
- Carla P. Gomes
- gomes_at_cs.cornell.edu
- Module
- Randomization in Complete Tree Search Algorithms
- Wrap-up of Search!
2Randomization in Local Search
- Randomized strategies are very successful in
the area of local search. - Random Hill Climbing
- Simulated annealing
- Genetic algorithms
- Tabu Search
- Gsat and variants.
- Key Limitation?
Inherent incomplete nature of local search
methods.
3 Randomization in Tree Search
Can we also add a stochastic element to a
systematic (tree search) procedure without
losing completeness?
- Introduce randomness in a tree search method
e.g., by randomly breaking ties in variable
and/or value selection. -
- Why would we do that?
4Backtrack Search
( a OR NOT b OR NOT c ) AND ( b OR
NOT c) AND ( a OR c)
5Backtrack Search Two Different Executions
( a OR NOT b OR NOT c ) AND ( b OR
NOT c) AND ( a OR c)
6The fringe of the search space
The fringe of search space
7Latin Square CompletionRandomized Backtrack
Search
Easy instance 15 pre-assigned cells
Gomes et al. 97
8Erratic Mean Behavior
3500!
sample mean
Median 1!
number of runs (on the same instance)
91
1075lt30
5gt100000
Proportion of cases Solved F(x)
11Run Time Distributions
- The runtime distributions of some of the
instances reveal interesting properties - I Erratic behavior of mean.
- II Distributions have heavy tails.
12Heavy-Tailed Distributions
- infinite variance infinite mean
- Introduced by Pareto in the 1920s
- --- probabilistic curiosity.
- Mandelbrot established the use of heavy-tailed
distributions to model real-world fractal
phenomena. - Examples stock-market, earth-quakes, weather,...
13Decay of Distributions
- Standard --- Exponential Decay
- e.g. Normal
-
- Heavy-Tailed --- Power Law Decay
- e.g. Pareto-Levy
-
-
14Normal, Cauchy, and Levy
15Tail Probabilities (Standard Normal, Cauchy,
Levy)
16Fat tailed distributions
17Fat and Heavy-tailed distributions
Exponential decay for standard distributions,
e.g. Normal, Logonormal, exponential
Normal?
Heavy-Tailed Power Law Decay e.g. Pareto-Levy
18Pareto Distribution
where ? gt 0 is a shape parameter
- Density Function f(x) P X x
- f(x) ? / x(? 1) for x ? 1
- Distribution Function F(x) P X ? x
- F(x) 1 - 1 / x? for x ? 1
- Survival Function (Tail probability S(x) 1
F(x) PXgtx - S(x) 1 / x? for x ? 1
19Pareto Distribution
E(Xn) ? / (? - n) if n lt ? E(Xn) ? if n
? ?.
Mean ? E(X) ? / (? - 1) if ? gt 1. E(X) ? if
? ? 1. Variance ? var(X) ? / (? - 1)2(?
- 2) if ? gt 2 var(X) ? if ? ? 2.
20How to Check for Heavy Tails?
- Power-law decay of tail
- ?Log-Log plot of tail of distribution (Survival
function or 1-F(x) e.g for the Pareto S(x) 1 /
x? for x ? 1 ) - ?should be approximately linear.
- Slope gives value of
- infinite mean and infinite
variance - infinite variance
21Pareto ?1Lognormal 1,1
Lognormal(1,1)
Pareto(1)
f(x)
X
Infinite mean and infinite variance.
22How to Visually Check for Heavy-Tailed Behavior
Log-log plot of tail of distribution exhibits
linear behavior.
23Survival FunctionPareto and Lognormal
24Example of Heavy Tailed Model
- Random Walk
- Start at position 0
- Toss a fair coin
- with each head take a step up (1)
- with each tail take a step down (-1)
X --- number of steps the random walk takes to
return to position 0.
25(No Transcript)
26Heavy-tails vs. Non-Heavy-Tails
Normal (2,1000000)
1-F(x) Unsolved fraction
O,1gt200000
Normal (2,1)
X - number of steps the walk takes to return to
zero (log scale)
27Heavy-Tailed Behavior in Latin Square
Completion Problem
(1-F(x))(log) Unsolved fraction
Number backtracks (log)
28How Toby Walsh Fried his PC(Graph Coloring)
29-
-
- To Be or Not To Be
- Heavy-Tailed
30Random Binary CSP Models
Model E ltN, D, pgt
N number of variables D size of the domains
p proportion of forbidden pairs (out of D2N (
N-1)/ 2)
N from 15 to 50
(Achlioptas et al 2000)
31Typical Case Analysis Model E
Phase Transition Phenomenon Discriminating
easy vs. hard instances
of solvable instances
Computational Cost (Mean)
Constrainedness
Hogg et al 96
32Runtime distributions
33(No Transcript)
34Explaining and Exploiting Fat and Heavy-Tailed
35Formal Models of Heavy and Fat Tails in
Combinatorial Search
How to explain short runs? Heavy/Fat Tails wide
range of solution times very short and very long
runtimes
36Logistics Planning instances with O(log(n))
backdoors
37Exploiting Backdoors
38Algorithms
- Three kinds of strategies for dealing with
backdoors - A complete backtrack-search deterministic
algorithm - A complete randomized backtrack-search algorithm
- Provably better performance over the
deterministic one - A heuristicly guided complete randomized
backtrack-search algorithm - Assumes existence of a good heuristic for
choosing variables to branch on - We believe this is close to what happens in
practice -
Williams, Gomes, Selman 03/04
39Deterministic Generalized Iterative Deepening
40Generalized Iterative Deepening
()
All possible trees of depth 1
41 Generalized Iterative Deepening
Level 2
x1 0
x1 1
All possible trees of depth 2
42 Generalized Iterative Deepening
Level 2
xn-1 0
Xn-1 1
All possible trees of depth 2
Level 3, level 4, and so on
43Randomized Generalized Iterative Deepening
Assumption There exists a backdoor whose size
is bounded by a function of n (call it
B(n)) Idea Repeatedly choose random subsets
of variables that are slightly larger than B(n),
searching these subsets for the backdoor
44Deterministic Versus Randomized
Suppose variables have 2 possible values (e.g.
SAT)
For B(n) n/k, algorithm runtime is cn
c
Deterministic strategy
Randomized strategy
k
45Complete Randomized Depth First Search with
Heuristic
- Assume we have the following.
- DFS, a generic depth first search randomized
- backtrack search solver with
- (polytime) sub-solver A
- Heuristic H that (randomly) chooses variables to
branch on, in polynomial time - H has probability 1/h of choosing a
- backdoor variable (h is a fixed constant)
- Call this ensemble (DFS, H, A)
46Polytime Restart Strategy for(DFS, H, A)
- Essentially
- If there is a small backdoor, then (DFS, H, A)
has a restart strategy that runs in polytime.
47Runtime Table for Algorithms
DFS,H,A
B(n) upper bound on the size of a backdoor,
given n variables
When the backdoor is a constant fraction of n,
there is an exponential improvement between the
randomized and deterministic algorithm
Williams, Gomes, Selman 03/04
48- How to avoid the long runs in practice?
-
Use restarts or parallel / interleaved runs to
exploit the extreme variance performance.
Restarts provably eliminate heavy-tailed
behavior.
49Restarts
70 unsolved
no restarts
1-F(x) Unsolved fraction
restart every 4 backtracks
0.001 unsolved
250 (62 restarts)
Number backtracks (log)
50Example of Rapid Restart Speedup(planning)
Number backtracks (log)
Cutoff (log)
51Super-linear Speedups
Interleaved (1 machine) 10 x 1 10 seconds
5 x speedup
52Sketch of proof of elimination of heavy tails
- Lets truncate the search procedure after m
backtracks. - Probability of solving problem with truncated
version - Run the truncated procedure and restart it
repeatedly.
53 Y - does not have Heavy Tails
54Paramedic Crew Assignment
Paramedic crew assignment is the problem of
assigning paramedic crews from different
stations to cover a given region, given several
resource constraints.
55Deterministic Search
56Restarts
57Restart Strategies
- Restart with increasing cutoff - e.g., used by
the Satisfiability and Constraint Programming
community cutoff increases linearly - Randomized backtracking (Lynce et al 2001) ?
randomizes the target decision points when
backtracking (several variants) - Random jumping (Zhang 2002) ? the solver randomly
jumps to unexplored portions of the search space
jumping decisions are based on analyzing the
ratio between the space searched vs. the
remaining search space solved several open
problems in combinatorics - Geometric restarts (Walsh 99) cutoff is
increased geometrically - Learning restart strategies (Kautz et al 2001
and Ruan et. al 2002) results on optimal
policies for restarts under particular scenarios.
Huge area for further research. - Universal restart strategies (Luby et al 93)
seminal paper on optimal restart strategies for
Las Vegas algorithms (theoretical paper)
58Notes on Randomizing Backtrack Search
- Can we replay a randomized run? ? yes since we
use pseudo random numbers if we save the seed,
we can then repeat the run with the same seed - Deterministic randomization (Wolfram 2002)
the behavior of some very complex deterministic
systems is so unpredictable that it actually
appears to be random (e.g., adding learned
clauses or cutting constraints between restarts
? used in the satisfiability community)
- What if we cannot randomized the code?
- Randomize the input
- Randomly rename the variables
- (Motwani and Raghavan 95)
-
- (Walsh (99) applied this technique to study
- the runtime distributions of graph-coloring using
a deterministic algorithm based on DSATUR
implemented by Trick)
59Portfolios of Algorithms
60Portfolio of Algorithms
- A portfolio of algorithms is a collection of
algorithms running interleaved or on different
processors. - Goal to improve the performance of the
different algorithms in terms of - expected runtime
- risk (variance)
- Efficient Set or Pareto set set of portfolios
that are best in terms of expected value and risk.
61Branch Bound for MIP Depth-first vs.
Best-bound
Depth-First Average - 18000St. Dev. 30000
62Heavy-tailed behavior of Depth-first
63Portfolio for 6 processors
0 DF / 6 BB
Expected run time of portfolios
6 DF / 0BB
Standard deviation of run time of portfolios
64Portfolio for 20 processors
0 DF / 20 BB
The optimal strategy is to run Depth First on
the 20 processors!
Expected run time of portfolios
Optimal collective behavior emerges from
suboptimal individual behavior.
20 DF / 0 BB
Standard deviation of run time of portfolios
65Compute Clusters and Distributed Agents
- With the increasing popularity of compute
clusters and distributed problem solving / agent
paradigms, portfolios of algorithms --- and
flexible computation in general --- are rapidly
expanding research areas.
66 Summary
- Stochastic search methods (complete and
incomplete) have been shown very effective. - Restart strategies and portfolio approaches can
lead to substantial improvements in the expected
runtime and variance, especially in the presence
of heavy-tailed phenomena. - Randomization is therefore a tool to improve
algorithmic performance and robustness.
Take home message you should always randomize
your complete search method.
67Exploiting Structure using RandomizationSummary
- Very exciting new research area with successful
stories - ? E.g., state of the art complete Sat and CP
solvers use - randomization and restarts.
-
- Very effective when combined with learning
-
- More later
-
68Local Search - Summary
- Surprisingly efficient search method.
- Wide range of applications.
- any type of optimization / search task
- Handles search spaces that are too large
- (e.g., 101000) for systematic search
- Often best available algorithm when lack of
global information. - Formal properties remain largely elusive.
- Research area will most likely continue to thrive.
69Summary Search
- Uninformed search DFS / BFS / Uniform cost
search - time / space complexity size search space up to
approx. 1011 nodes.
- Informed Search use heuristic function guide to
goal - Greedy best-first search
- A search / provably optimal
- Search space up to approximately 1025
70Summary Search (contd.)
Special case Constraint Satisfaction /
CSPs generic framework that uses a restricted,
structured format for representing states and
goal variables constraints, backtrack search
(DFS) propagation (forward-checking /
arc-consistency, global constraints, variable /
value ordering / randomized backtrack-search).
71Summary Search (Contd)
Local search Greedy / Hillclimbing Simulated
annealing Genetic Algorithms / Genetic
Programming search space 10100 to
101000 Aversarial Search / Game
Playing minimax Up to 1010 nodes, 67 ply in
chess. alpha-beta pruning Up to 1020 nodes, 14
ply in chess. provably optimal
72Search and AI
- Why such a central role?
- Basically, because lots of tasks in AI are
intractable. Search is the only way to
handle them. - Many applications of search, in e.g., Learning
/ Reasoning / Planning / NLU / Vision - Good thing much recent progress (1030 quite
feasible sometimes up to 101000). - Qualitative difference from only a few years ago!