CS 4700: Foundations of Artificial Intelligence - PowerPoint PPT Presentation

About This Presentation
Title:

CS 4700: Foundations of Artificial Intelligence

Description:

Kurtosis = second central moment (i.e., variance) fourth central moment. Normal distribution ... when kurtosis 3 (e.g., exponential, lognormal) Carla P. Gomes ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 72
Provided by: csCor
Category:

less

Transcript and Presenter's Notes

Title: CS 4700: Foundations of Artificial Intelligence


1
CS 4700Foundations of Artificial Intelligence
  • Carla P. Gomes
  • gomes_at_cs.cornell.edu
  • Module
  • Randomization in Complete Tree Search Algorithms
  • Wrap-up of Search!

2
Randomization in Local Search
  • Randomized strategies are very successful in
    the area of local search.
  • Random Hill Climbing
  • Simulated annealing
  • Genetic algorithms
  • Tabu Search
  • Gsat and variants.
  • Key Limitation?

Inherent incomplete nature of local search
methods.
3

Randomization in Tree Search
Can we also add a stochastic element to a
systematic (tree search) procedure without
losing completeness?
  • Introduce randomness in a tree search method
    e.g., by randomly breaking ties in variable
    and/or value selection.
  • Why would we do that?

4
Backtrack Search
( a OR NOT b OR NOT c ) AND ( b OR
NOT c) AND ( a OR c)
5
Backtrack Search Two Different Executions
( a OR NOT b OR NOT c ) AND ( b OR
NOT c) AND ( a OR c)
6
The fringe of the search space
The fringe of search space
7
Latin Square CompletionRandomized Backtrack
Search
Easy instance 15 pre-assigned cells
Gomes et al. 97
8
Erratic Mean Behavior
3500!
sample mean
Median 1!
number of runs (on the same instance)
9
1
10
75lt30
5gt100000
Proportion of cases Solved F(x)
11
Run Time Distributions
  • The runtime distributions of some of the
    instances reveal interesting properties
  • I Erratic behavior of mean.
  • II Distributions have heavy tails.

12
Heavy-Tailed Distributions
  • infinite variance infinite mean
  • Introduced by Pareto in the 1920s
  • --- probabilistic curiosity.
  • Mandelbrot established the use of heavy-tailed
    distributions to model real-world fractal
    phenomena.
  • Examples stock-market, earth-quakes, weather,...

13
Decay of Distributions
  • Standard --- Exponential Decay
  • e.g. Normal
  • Heavy-Tailed --- Power Law Decay
  • e.g. Pareto-Levy

14
Normal, Cauchy, and Levy
15
Tail Probabilities (Standard Normal, Cauchy,
Levy)

16
Fat tailed distributions
  • Kurtosis

17
Fat and Heavy-tailed distributions
Exponential decay for standard distributions,
e.g. Normal, Logonormal, exponential
Normal?
Heavy-Tailed Power Law Decay e.g. Pareto-Levy
18
Pareto Distribution
where ? gt 0 is a shape parameter
  • Density Function f(x) P X x
  • f(x) ? / x(? 1) for x ? 1
  • Distribution Function F(x) P X ? x
  • F(x) 1 - 1 / x? for x ? 1
  • Survival Function (Tail probability S(x) 1
    F(x) PXgtx
  • S(x) 1 / x? for x ? 1

19
Pareto Distribution
  • Moments

E(Xn) ? / (? - n) if n lt ? E(Xn) ?    if n
?   ?.
Mean ? E(X) ? / (? - 1) if ? gt 1. E(X) ? if
? ? 1. Variance ? var(X) ? / (? - 1)2(?
- 2) if ? gt 2 var(X) ? if ? ? 2.
20
How to Check for Heavy Tails?
  • Power-law decay of tail
  • ?Log-Log plot of tail of distribution (Survival
    function or 1-F(x) e.g for the Pareto S(x) 1 /
    x? for x ? 1 )
  • ?should be approximately linear.
  • Slope gives value of
  • infinite mean and infinite
    variance
  • infinite variance

21
Pareto ?1Lognormal 1,1
Lognormal(1,1)
Pareto(1)
f(x)
X
Infinite mean and infinite variance.
22
How to Visually Check for Heavy-Tailed Behavior
Log-log plot of tail of distribution exhibits
linear behavior.
23
Survival FunctionPareto and Lognormal

24
Example of Heavy Tailed Model
  • Random Walk
  • Start at position 0
  • Toss a fair coin
  • with each head take a step up (1)
  • with each tail take a step down (-1)

X --- number of steps the random walk takes to
return to position 0.
25
(No Transcript)
26
Heavy-tails vs. Non-Heavy-Tails
Normal (2,1000000)
1-F(x) Unsolved fraction
O,1gt200000
Normal (2,1)
X - number of steps the walk takes to return to
zero (log scale)
27
Heavy-Tailed Behavior in Latin Square
Completion Problem
(1-F(x))(log) Unsolved fraction
Number backtracks (log)
28
How Toby Walsh Fried his PC(Graph Coloring)
29
  • To Be or Not To Be
  • Heavy-Tailed

30
Random Binary CSP Models

Model E ltN, D, pgt
N number of variables D size of the domains
p proportion of forbidden pairs (out of D2N (
N-1)/ 2)
N from 15 to 50
(Achlioptas et al 2000)
31
Typical Case Analysis Model E
Phase Transition Phenomenon Discriminating
easy vs. hard instances
of solvable instances
Computational Cost (Mean)
Constrainedness
Hogg et al 96
32
Runtime distributions
33
(No Transcript)
34
Explaining and Exploiting Fat and Heavy-Tailed
35
Formal Models of Heavy and Fat Tails in
Combinatorial Search
How to explain short runs? Heavy/Fat Tails wide
range of solution times very short and very long
runtimes
36
Logistics Planning instances with O(log(n))
backdoors
37
Exploiting Backdoors
38
Algorithms
  • Three kinds of strategies for dealing with
    backdoors
  • A complete backtrack-search deterministic
    algorithm
  • A complete randomized backtrack-search algorithm
  • Provably better performance over the
    deterministic one
  • A heuristicly guided complete randomized
    backtrack-search algorithm
  • Assumes existence of a good heuristic for
    choosing variables to branch on
  • We believe this is close to what happens in
    practice

Williams, Gomes, Selman 03/04
39
Deterministic Generalized Iterative Deepening
40
Generalized Iterative Deepening
()
All possible trees of depth 1
41

Generalized Iterative Deepening
Level 2
x1 0
x1 1
All possible trees of depth 2
42

Generalized Iterative Deepening
Level 2
xn-1 0
Xn-1 1
All possible trees of depth 2
Level 3, level 4, and so on
43
Randomized Generalized Iterative Deepening
Assumption There exists a backdoor whose size
is bounded by a function of n (call it
B(n)) Idea Repeatedly choose random subsets
of variables that are slightly larger than B(n),
searching these subsets for the backdoor
44
Deterministic Versus Randomized
Suppose variables have 2 possible values (e.g.
SAT)
For B(n) n/k, algorithm runtime is cn
c
Deterministic strategy
Randomized strategy
k
45
Complete Randomized Depth First Search with
Heuristic
  • Assume we have the following.
  • DFS, a generic depth first search randomized
  • backtrack search solver with
  • (polytime) sub-solver A
  • Heuristic H that (randomly) chooses variables to
    branch on, in polynomial time
  • H has probability 1/h of choosing a
  • backdoor variable (h is a fixed constant)
  • Call this ensemble (DFS, H, A)

46
Polytime Restart Strategy for(DFS, H, A)
  • Essentially
  • If there is a small backdoor, then (DFS, H, A)
    has a restart strategy that runs in polytime.

47
Runtime Table for Algorithms
DFS,H,A
B(n) upper bound on the size of a backdoor,
given n variables
When the backdoor is a constant fraction of n,
there is an exponential improvement between the
randomized and deterministic algorithm
Williams, Gomes, Selman 03/04
48
  • How to avoid the long runs in practice?

Use restarts or parallel / interleaved runs to
exploit the extreme variance performance.
Restarts provably eliminate heavy-tailed
behavior.
49
Restarts
70 unsolved
no restarts
1-F(x) Unsolved fraction
restart every 4 backtracks
0.001 unsolved
250 (62 restarts)
Number backtracks (log)
50
Example of Rapid Restart Speedup(planning)
Number backtracks (log)
Cutoff (log)
51
Super-linear Speedups
Interleaved (1 machine) 10 x 1 10 seconds
5 x speedup
52
Sketch of proof of elimination of heavy tails
  • Lets truncate the search procedure after m
    backtracks.
  • Probability of solving problem with truncated
    version
  • Run the truncated procedure and restart it
    repeatedly.

53

Y - does not have Heavy Tails
54
Paramedic Crew Assignment
Paramedic crew assignment is the problem of
assigning paramedic crews from different
stations to cover a given region, given several
resource constraints.
55
Deterministic Search
56
Restarts
57
Restart Strategies
  • Restart with increasing cutoff - e.g., used by
    the Satisfiability and Constraint Programming
    community cutoff increases linearly
  • Randomized backtracking (Lynce et al 2001) ?
    randomizes the target decision points when
    backtracking (several variants)
  • Random jumping (Zhang 2002) ? the solver randomly
    jumps to unexplored portions of the search space
    jumping decisions are based on analyzing the
    ratio between the space searched vs. the
    remaining search space solved several open
    problems in combinatorics
  • Geometric restarts (Walsh 99) cutoff is
    increased geometrically
  • Learning restart strategies (Kautz et al 2001
    and Ruan et. al 2002) results on optimal
    policies for restarts under particular scenarios.
    Huge area for further research.
  • Universal restart strategies (Luby et al 93)
    seminal paper on optimal restart strategies for
    Las Vegas algorithms (theoretical paper)

58
Notes on Randomizing Backtrack Search
  • Can we replay a randomized run? ? yes since we
    use pseudo random numbers if we save the seed,
    we can then repeat the run with the same seed
  • Deterministic randomization (Wolfram 2002)
    the behavior of some very complex deterministic
    systems is so unpredictable that it actually
    appears to be random (e.g., adding learned
    clauses or cutting constraints between restarts
    ? used in the satisfiability community)
  • What if we cannot randomized the code?
  • Randomize the input
  • Randomly rename the variables
  • (Motwani and Raghavan 95)
  • (Walsh (99) applied this technique to study
  • the runtime distributions of graph-coloring using
    a deterministic algorithm based on DSATUR
    implemented by Trick)

59
Portfolios of Algorithms
60
Portfolio of Algorithms
  • A portfolio of algorithms is a collection of
    algorithms running interleaved or on different
    processors.
  • Goal to improve the performance of the
    different algorithms in terms of
  • expected runtime
  • risk (variance)
  • Efficient Set or Pareto set set of portfolios
    that are best in terms of expected value and risk.

61
Branch Bound for MIP Depth-first vs.
Best-bound
Depth-First Average - 18000St. Dev. 30000
62
Heavy-tailed behavior of Depth-first
63
Portfolio for 6 processors
0 DF / 6 BB
Expected run time of portfolios
6 DF / 0BB
Standard deviation of run time of portfolios
64
Portfolio for 20 processors
0 DF / 20 BB
The optimal strategy is to run Depth First on
the 20 processors!
Expected run time of portfolios
Optimal collective behavior emerges from
suboptimal individual behavior.
20 DF / 0 BB
Standard deviation of run time of portfolios
65
Compute Clusters and Distributed Agents
  • With the increasing popularity of compute
    clusters and distributed problem solving / agent
    paradigms, portfolios of algorithms --- and
    flexible computation in general --- are rapidly
    expanding research areas.

66

Summary
  • Stochastic search methods (complete and
    incomplete) have been shown very effective.
  • Restart strategies and portfolio approaches can
    lead to substantial improvements in the expected
    runtime and variance, especially in the presence
    of heavy-tailed phenomena.
  • Randomization is therefore a tool to improve
    algorithmic performance and robustness.

Take home message you should always randomize
your complete search method.
67
Exploiting Structure using RandomizationSummary
  • Very exciting new research area with successful
    stories
  • ? E.g., state of the art complete Sat and CP
    solvers use
  • randomization and restarts.
  • Very effective when combined with learning
  • More later

68
Local Search - Summary
  • Surprisingly efficient search method.
  • Wide range of applications.
  • any type of optimization / search task
  • Handles search spaces that are too large
  • (e.g., 101000) for systematic search
  • Often best available algorithm when lack of
    global information.
  • Formal properties remain largely elusive.
  • Research area will most likely continue to thrive.

69
Summary Search
  • Uninformed search DFS / BFS / Uniform cost
    search
  • time / space complexity size search space up to
    approx. 1011 nodes.
  • Informed Search use heuristic function guide to
    goal
  • Greedy best-first search
  • A search / provably optimal
  • Search space up to approximately 1025

70
Summary Search (contd.)
Special case Constraint Satisfaction /
CSPs generic framework that uses a restricted,
structured format for representing states and
goal variables constraints, backtrack search
(DFS) propagation (forward-checking /
arc-consistency, global constraints, variable /
value ordering / randomized backtrack-search).
71
Summary Search (Contd)
Local search Greedy / Hillclimbing Simulated
annealing Genetic Algorithms / Genetic
Programming search space 10100 to
101000 Aversarial Search / Game
Playing minimax Up to 1010 nodes, 67 ply in
chess. alpha-beta pruning Up to 1020 nodes, 14
ply in chess. provably optimal
72
Search and AI
  • Why such a central role?
  • Basically, because lots of tasks in AI are
    intractable. Search is the only way to
    handle them.
  • Many applications of search, in e.g., Learning
    / Reasoning / Planning / NLU / Vision
  • Good thing much recent progress (1030 quite
    feasible sometimes up to 101000).
  • Qualitative difference from only a few years ago!
Write a Comment
User Comments (0)
About PowerShow.com