CS 4700: Foundations of Artificial Intelligence - PowerPoint PPT Presentation

About This Presentation

Title:

CS 4700: Foundations of Artificial Intelligence

Description:

Kurtosis = second central moment (i.e., variance) fourth central moment. Normal distribution ... when kurtosis 3 (e.g., exponential, lognormal) Carla P. Gomes ... – PowerPoint PPT presentation

Number of Views:55

Avg rating:3.0/5.0

Slides: 72

Provided by: csCor

Learn more at: https://www.cs.cornell.edu

Category:

more less

Transcript and Presenter's Notes

Title: CS 4700: Foundations of Artificial Intelligence

1
CS 4700Foundations of Artificial Intelligence

Carla P. Gomes
gomes_at_cs.cornell.edu
Module
Randomization in Complete Tree Search Algorithms
Wrap-up of Search!

2
Randomization in Local Search

Randomized strategies are very successful in
the area of local search.
Random Hill Climbing
Simulated annealing
Genetic algorithms
Tabu Search
Gsat and variants.
Key Limitation?

Inherent incomplete nature of local search
methods.
3

Randomization in Tree Search
Can we also add a stochastic element to a
systematic (tree search) procedure without
losing completeness?

Introduce randomness in a tree search method
e.g., by randomly breaking ties in variable
and/or value selection.
Why would we do that?

4
Backtrack Search
( a OR NOT b OR NOT c ) AND ( b OR
NOT c) AND ( a OR c)
5
Backtrack Search Two Different Executions
( a OR NOT b OR NOT c ) AND ( b OR
NOT c) AND ( a OR c)
6
The fringe of the search space
The fringe of search space
7
Latin Square CompletionRandomized Backtrack
Search
Easy instance 15 pre-assigned cells
Gomes et al. 97
8
Erratic Mean Behavior
3500!
sample mean
Median 1!
number of runs (on the same instance)
9
1
10
75lt30
5gt100000
Proportion of cases Solved F(x)
11
Run Time Distributions

The runtime distributions of some of the
instances reveal interesting properties
I Erratic behavior of mean.
II Distributions have heavy tails.

12
Heavy-Tailed Distributions

infinite variance infinite mean
Introduced by Pareto in the 1920s
--- probabilistic curiosity.
Mandelbrot established the use of heavy-tailed
distributions to model real-world fractal
phenomena.
Examples stock-market, earth-quakes, weather,...

13
Decay of Distributions

Standard --- Exponential Decay
e.g. Normal
Heavy-Tailed --- Power Law Decay
e.g. Pareto-Levy

14
Normal, Cauchy, and Levy
15
Tail Probabilities (Standard Normal, Cauchy,
Levy)

16
Fat tailed distributions

Kurtosis

17
Fat and Heavy-tailed distributions
Exponential decay for standard distributions,
e.g. Normal, Logonormal, exponential
Normal?
Heavy-Tailed Power Law Decay e.g. Pareto-Levy
18
Pareto Distribution
where ? gt 0 is a shape parameter

Density Function f(x) P X x
f(x) ? / x(? 1) for x ? 1
Distribution Function F(x) P X ? x
F(x) 1 - 1 / x? for x ? 1
Survival Function (Tail probability S(x) 1
F(x) PXgtx
S(x) 1 / x? for x ? 1

19
Pareto Distribution

Moments

E(Xn) ? / (? - n) if n lt ? E(Xn) ? if n
? ?.
Mean ? E(X) ? / (? - 1) if ? gt 1. E(X) ? if
? ? 1. Variance ? var(X) ? / (? - 1)2(?
- 2) if ? gt 2 var(X) ? if ? ? 2.
20
How to Check for Heavy Tails?

Power-law decay of tail
?Log-Log plot of tail of distribution (Survival
function or 1-F(x) e.g for the Pareto S(x) 1 /
x? for x ? 1 )
?should be approximately linear.
Slope gives value of
infinite mean and infinite
variance
infinite variance

21
Pareto ?1Lognormal 1,1
Lognormal(1,1)
Pareto(1)
f(x)
X
Infinite mean and infinite variance.
22
How to Visually Check for Heavy-Tailed Behavior
Log-log plot of tail of distribution exhibits
linear behavior.
23
Survival FunctionPareto and Lognormal

24
Example of Heavy Tailed Model

Random Walk
Start at position 0
Toss a fair coin
with each head take a step up (1)
with each tail take a step down (-1)

X --- number of steps the random walk takes to
return to position 0.
25
(No Transcript)
26
Heavy-tails vs. Non-Heavy-Tails
Normal (2,1000000)
1-F(x) Unsolved fraction
O,1gt200000
Normal (2,1)
X - number of steps the walk takes to return to
zero (log scale)
27
Heavy-Tailed Behavior in Latin Square
Completion Problem
(1-F(x))(log) Unsolved fraction
Number backtracks (log)
28
How Toby Walsh Fried his PC(Graph Coloring)
29

To Be or Not To Be
Heavy-Tailed

30
Random Binary CSP Models

Model E ltN, D, pgt
N number of variables D size of the domains
p proportion of forbidden pairs (out of D2N (
N-1)/ 2)
N from 15 to 50
(Achlioptas et al 2000)
31
Typical Case Analysis Model E
Phase Transition Phenomenon Discriminating
easy vs. hard instances
of solvable instances
Computational Cost (Mean)
Constrainedness
Hogg et al 96
32
Runtime distributions
33
(No Transcript)
34
Explaining and Exploiting Fat and Heavy-Tailed
35
Formal Models of Heavy and Fat Tails in
Combinatorial Search
How to explain short runs? Heavy/Fat Tails wide
range of solution times very short and very long
runtimes
36
Logistics Planning instances with O(log(n))
backdoors
37
Exploiting Backdoors
38
Algorithms

Three kinds of strategies for dealing with
backdoors
A complete backtrack-search deterministic
algorithm
A complete randomized backtrack-search algorithm
Provably better performance over the
deterministic one
A heuristicly guided complete randomized
backtrack-search algorithm
Assumes existence of a good heuristic for
choosing variables to branch on
We believe this is close to what happens in
practice

Williams, Gomes, Selman 03/04
39
Deterministic Generalized Iterative Deepening
40
Generalized Iterative Deepening
()
All possible trees of depth 1
41

Generalized Iterative Deepening
Level 2
x1 0
x1 1
All possible trees of depth 2
42

Generalized Iterative Deepening
Level 2
xn-1 0
Xn-1 1
All possible trees of depth 2
Level 3, level 4, and so on
43
Randomized Generalized Iterative Deepening
Assumption There exists a backdoor whose size
is bounded by a function of n (call it
B(n)) Idea Repeatedly choose random subsets
of variables that are slightly larger than B(n),
searching these subsets for the backdoor
44
Deterministic Versus Randomized
Suppose variables have 2 possible values (e.g.
SAT)
For B(n) n/k, algorithm runtime is cn
c
Deterministic strategy
Randomized strategy
k
45
Complete Randomized Depth First Search with
Heuristic

Assume we have the following.
DFS, a generic depth first search randomized
backtrack search solver with
(polytime) sub-solver A
Heuristic H that (randomly) chooses variables to
branch on, in polynomial time
H has probability 1/h of choosing a
backdoor variable (h is a fixed constant)
Call this ensemble (DFS, H, A)

46
Polytime Restart Strategy for(DFS, H, A)

Essentially
If there is a small backdoor, then (DFS, H, A)
has a restart strategy that runs in polytime.

47
Runtime Table for Algorithms
DFS,H,A
B(n) upper bound on the size of a backdoor,
given n variables
When the backdoor is a constant fraction of n,
there is an exponential improvement between the
randomized and deterministic algorithm
Williams, Gomes, Selman 03/04
48

How to avoid the long runs in practice?

Use restarts or parallel / interleaved runs to
exploit the extreme variance performance.
Restarts provably eliminate heavy-tailed
behavior.
49
Restarts
70 unsolved
no restarts
1-F(x) Unsolved fraction
restart every 4 backtracks
0.001 unsolved
250 (62 restarts)
Number backtracks (log)
50
Example of Rapid Restart Speedup(planning)
Number backtracks (log)
Cutoff (log)
51
Super-linear Speedups
Interleaved (1 machine) 10 x 1 10 seconds
5 x speedup
52
Sketch of proof of elimination of heavy tails

Lets truncate the search procedure after m
backtracks.
Probability of solving problem with truncated
version
Run the truncated procedure and restart it
repeatedly.

53

Y - does not have Heavy Tails
54
Paramedic Crew Assignment
Paramedic crew assignment is the problem of
assigning paramedic crews from different
stations to cover a given region, given several
resource constraints.
55
Deterministic Search
56
Restarts
57
Restart Strategies

Restart with increasing cutoff - e.g., used by
the Satisfiability and Constraint Programming
community cutoff increases linearly
Randomized backtracking (Lynce et al 2001) ?
randomizes the target decision points when
backtracking (several variants)
Random jumping (Zhang 2002) ? the solver randomly
jumps to unexplored portions of the search space
jumping decisions are based on analyzing the
ratio between the space searched vs. the
remaining search space solved several open
problems in combinatorics
Geometric restarts (Walsh 99) cutoff is
increased geometrically
Learning restart strategies (Kautz et al 2001
and Ruan et. al 2002) results on optimal
policies for restarts under particular scenarios.
Huge area for further research.
Universal restart strategies (Luby et al 93)
seminal paper on optimal restart strategies for
Las Vegas algorithms (theoretical paper)

58
Notes on Randomizing Backtrack Search

Can we replay a randomized run? ? yes since we
use pseudo random numbers if we save the seed,
we can then repeat the run with the same seed
Deterministic randomization (Wolfram 2002)
the behavior of some very complex deterministic
systems is so unpredictable that it actually
appears to be random (e.g., adding learned
clauses or cutting constraints between restarts
? used in the satisfiability community)

What if we cannot randomized the code?
Randomize the input
Randomly rename the variables
(Motwani and Raghavan 95)
(Walsh (99) applied this technique to study
the runtime distributions of graph-coloring using
a deterministic algorithm based on DSATUR
implemented by Trick)

59
Portfolios of Algorithms
60
Portfolio of Algorithms

A portfolio of algorithms is a collection of
algorithms running interleaved or on different
processors.
Goal to improve the performance of the
different algorithms in terms of
expected runtime
risk (variance)
Efficient Set or Pareto set set of portfolios
that are best in terms of expected value and risk.

61
Branch Bound for MIP Depth-first vs.
Best-bound
Depth-First Average - 18000St. Dev. 30000
62
Heavy-tailed behavior of Depth-first
63
Portfolio for 6 processors
0 DF / 6 BB
Expected run time of portfolios
6 DF / 0BB
Standard deviation of run time of portfolios
64
Portfolio for 20 processors
0 DF / 20 BB
The optimal strategy is to run Depth First on
the 20 processors!
Expected run time of portfolios
Optimal collective behavior emerges from
suboptimal individual behavior.
20 DF / 0 BB
Standard deviation of run time of portfolios
65
Compute Clusters and Distributed Agents

With the increasing popularity of compute
clusters and distributed problem solving / agent
paradigms, portfolios of algorithms --- and
flexible computation in general --- are rapidly
expanding research areas.

66

Summary

Stochastic search methods (complete and
incomplete) have been shown very effective.
Restart strategies and portfolio approaches can
lead to substantial improvements in the expected
runtime and variance, especially in the presence
of heavy-tailed phenomena.
Randomization is therefore a tool to improve
algorithmic performance and robustness.

Take home message you should always randomize
your complete search method.
67
Exploiting Structure using RandomizationSummary

Very exciting new research area with successful
stories
? E.g., state of the art complete Sat and CP
solvers use
randomization and restarts.
Very effective when combined with learning
More later

68
Local Search - Summary

Surprisingly efficient search method.
Wide range of applications.
any type of optimization / search task
Handles search spaces that are too large
(e.g., 101000) for systematic search
Often best available algorithm when lack of
global information.
Formal properties remain largely elusive.
Research area will most likely continue to thrive.

69
Summary Search

Uninformed search DFS / BFS / Uniform cost
search
time / space complexity size search space up to
approx. 1011 nodes.

Informed Search use heuristic function guide to
goal
Greedy best-first search
A search / provably optimal
Search space up to approximately 1025

70
Summary Search (contd.)
Special case Constraint Satisfaction /
CSPs generic framework that uses a restricted,
structured format for representing states and
goal variables constraints, backtrack search
(DFS) propagation (forward-checking /
arc-consistency, global constraints, variable /
value ordering / randomized backtrack-search).
71
Summary Search (Contd)
Local search Greedy / Hillclimbing Simulated
annealing Genetic Algorithms / Genetic
Programming search space 10100 to
101000 Aversarial Search / Game
Playing minimax Up to 1010 nodes, 67 ply in
chess. alpha-beta pruning Up to 1020 nodes, 14
ply in chess. provably optimal
72
Search and AI

Why such a central role?
Basically, because lots of tasks in AI are
intractable. Search is the only way to
handle them.
Many applications of search, in e.g., Learning
/ Reasoning / Planning / NLU / Vision
Good thing much recent progress (1030 quite
feasible sometimes up to 101000).
Qualitative difference from only a few years ago!

Write a Comment

User Comments (0)