Title: Advanced Algorithms
1Advanced Algorithms
- Piyush Kumar
- (Lecture 12 Online Algorithms)
Welcome to COT5405
2On Bounds
- Worst Case.
- Average Case Running time over some distribution
of input. (Quicksort) - Amortized Analysis Worst case bound on sequence
of operations. - (Bit Increments, Union-Find)
- Competitive Analysis Compare the cost of an
on-line algorithm with an optimal prescient
algorithm on any sequence of requests. - Today.
3Problem 1
- The online dating game.
- You get to date fixed number of partners.
- You either choose to pick them up or try your
luck again. - You can not go back in time.
- What strategy would you use to pick?
4Problem 2.
- You like to Ski.
- When weather AND mood permits, you go skiing
- If you own the equipment, you take it with you,
Otherwise Rent. - You can buy the equipment whenever you decide,
but not while skiing.
5Costs
- 1 Unit to rent, M units to buy
- If you go ski I times, what is OPT?
OPT min (I,M)
What algorithm should you use to decide whether
you Should buy the equipment?
6Algorithms
- Algorithm 1
- Buy equipment ofter first day.
- Competitive algorithm
- CostALG(s) lt ?CostOPT(s)b
An Algorithm is called ?-competitive if there
exists some constant b such that for every
sequence of inputs s
Cost OPT (s)Min(I,M) 1? Cost ALG (s)M ? gt M
7Algorithms
- Algorithm 2 Rent for (M-1) days and buy on Mth
day. - L lt M CostALG(s) CostOPT(s)
- L gt M CostALG(s) 2M 1
- CostOPT(s) M
- Competitive ratio 2 1/M
8Ski Rental
- Alg 3 Rent for k days and buy on (k1)th day.
- CostALG(s) kM
- CostOPT(s) min(M,k)
- Competitive ratio 2?
9Problem 3 (1D)Monkey Looking for food
Hidden
What is the best competitive algorithm you can
come up With? What is its competitive ratio?
10Problem 3.(3D)
Hidden
11On Line Algorithms
- Work without full knowledge of the future
- Deal with a sequence of events
- Future events are unknown to the algorithm
- The algorithm has to deal with one event at each
time. The next event happens only after the
algorithm is done dealing with the previous event
12On-Line versus off-line
- We compare the behavior of the on-line algorithm
to an optimal off-line algorithm OPT which is
familiar with the sequence - The off-line algorithm knows the exact properties
of all the events in the sequence
13Absolute competitive ratio (for minimization
problems)
- We measure the performance of an on-line
algorithm by the competitive ratio - This is the ratio between what the on-line
algorithms pays to what the optimal off-line
algorithm pays
14- Formally let be the cost of the
on-line algorithm on sequence . Let
be the optimal off-line cost on then the
competitive ratio is -
- Calculus supremum is similar to maximum but may
be achieved in the limit
15Problem 4 Caching
- K-competitive caching.
- Two level memory model
- If a page is not in the cache , a page fault
occurs. - A Paging algorithm specifies which page to evict
on a fault. - Paging algorithms are online algorithms for cache
replacement.
16Online Paging Algorithms
- Assumption cache can hold k-pages.
- CPU accesses memory thru cache.
- Each request specifies a page in the memory
system. - We want to minimize the page faults.
17A Lower bound
- Theorem Let A be a deterministic online paging
algorithm. If A is ?-competitive, then ? ? k. - Pf Let S p_1,p_2, , p_k1 be a set of k1
arbitrary memory pages. Assume w.l.g. that A and
OPT initially have p_1, , p_k in their cache. - In the worst case A has a page fault on any
request ?t.
18Online Algorithm and Competitive Analysis
- Theorem. LRU is k-competitive.
- Proof Let ? be a subsequence of ? on which LRU
faults exactly k times. Let p denote page
requested just before ?. - Case 1 LRU faults in sequence ? on p.
- ? requests at least k1 different pages ?MIN
faults at least once - Case 2 LRU faults on some page, say q, at least
twice in ?. - ? requests at least k1 different pages ?MIN
faults at least once
LRU Least recently used Evicts page whose most
recent access was earliest
19- Theorem. LRU is k-competitive.
- Proof Let ? be a subsequence of ? on which LRU
faults exactly k times. Let p denote page
requested just before ?. - Case 3 LRU does not fault on p, nor on any page
more than once. - k different pages are accessed and faulted on,
none of which is p - p is in MIN's cache at start of ? ? MIN faults
at least once
MIN faults ? 1 times
?0
?1
?2
?1
?p
. . .
?
. . .
LRU faults k times
LRU faults? k times
20Universal Hashing
21Dictionary Data Type
- Dictionary. Given a universe U of possible
elements, maintain a subset S ? U so that
inserting, deleting, and searching in S is
efficient. - Dictionary interface.
- Create() Initialize a dictionary with S ?.
- Insert(u) Add element u ? U to S.
- Delete(u) Delete u from S, if u is currently in
S. - Lookup(u) Determine whether u is in S.
- Challenge. Universe U can be extremely large so
defining an array of size U is infeasible. - Applications. File systems, databases, Google,
compilers, checksums P2P networks, associative
arrays, cryptography, web caching, etc.
22Hashing
- Hash function. h U ? 0, 1, , n-1 .
- Hashing. Create an array H of size n. When
processing element u, access array element
Hh(u). - Collision. When h(u) h(v) but u ? v.
- A collision is expected after ?(?n) random
insertions. This phenomenon is known as the
"birthday paradox." - Separate chaining Hi stores linked list of
elements u with h(u) i.
jocularly
seriously
H1
null
H2
suburban
untravelled
considerating
H3
browsing
Hn
23Ad Hoc Hash Function
- Ad hoc hash function.
- Deterministic hashing. If U ? n2, then for any
fixed hash function h, there is a subset S ? U of
n elements that all hash to same slot. Thus, ?(n)
time per search in worst-case. - Q. But isn't ad hoc hash function good enough in
practice?
int h(String s, int n) int hash 0 for
(int i 0 i lt s.length() i) hash (31
hash) si return hash n
hash function ala Java string library
24Algorithmic Complexity Attacks
- When can't we live with ad hoc hash function?
- Obvious situations aircraft control, nuclear
reactors. - Surprising situations denial-of-service
attacks. - Real world exploits. Crosby-Wallach 2003
- Bro server send carefully chosen packets to DOS
the server, using less bandwidth than a dial-up
modem - Perl 5.8.0 insert carefully chosen strings into
associative array. - Linux 2.4.20 kernel save files with carefully
chosen names.
malicious adversary learns your ad hoc hash
function (e.g., by reading Java API) and causes a
big pile-up in a single slot that grinds
performance to a halt
25Hashing Performance
- Idealistic hash function. Maps m elements
uniformly at random to n hash slots. - Running time depends on length of chains.
- Average length of chain ? m / n.
- Choose n ? m ? on average O(1) per insert,
lookup, or delete. - Challenge. Achieve idealized randomized
guarantees, but with a hash function where you
can easily find items where you put them. - Approach. Use randomization in the choice of h.
adversary knows the randomized algorithm you're
using, but doesn't know random choices that the
algorithm makes
26Universal Hashing
- Universal class of hash functions.
Carter-Wegman 1980s - For any pair of elements u, v ? U,
- Can select random h efficiently.
- Can compute h(u) efficiently.
- Ex. U a, b, c, d, e, f , n 2.
chosen uniformly at random
H h1, h2 Pr h ? H h(a) h(b) 1/2 Pr h
? H h(a) h(c) 1Pr h ? H h(a) h(d)
0. . .
a
b
c
d
e
f
not universal
0
1
0
1
0
1
h1(x)
0
0
0
1
1
1
h2(x)
a
b
c
d
e
f
H h1, h2 , h3 , h4 Pr h ? H h(a) h(b)
1/2Pr h ? H h(a) h(c) 1/2 Pr h ? H h(a)
h(d) 1/2 Pr h ? H h(a) h(e) 1/2 Pr
h ? H h(a) h(f) 0 . . .
0
1
0
1
0
1
h1(x)
universal
0
0
0
1
1
1
h2(x)
0
0
1
0
1
1
h3(x)
1
0
0
1
1
0
h4(x)
27Universal Hashing
- Universal hashing property. Let H be a universal
class of hash functions let h ? H be chosen
uniformly at random from H and letu ? U. For
any subset S ? U of size at most n, the expected
number of items in S that collide with u is at
most 1. - Pf. For any element s ? S, define indicator
random variable Xs 1 if h(s) h(u) and 0
otherwise. Let X be a random variable counting
the total number of collisions with u.
linearity of expectation
Xs is a 0-1 random variable
universal(assumes u ? S)
28Designing a Universal Family of Hash Functions
- Theorem. Chebyshev 1850 There exists a prime
between n and 2n. - Modulus. Choose a prime number p ? n.
- Integer encoding. Identify each element u ? U
with a base-p integer of r digits x (x1, x2,
, xr). - Hash function. Let A set of all r-digit,
base-p integers. For eacha (a1, a2, , ar)
where 0 ? ai lt p, define - Hash function family. H ha a ? A .
no need for randomness here
29Designing a Universal Class of Hash Functions
- Theorem. H ha a ? A is a universal class
of hash functions. - Pf. Let x (x1, x2, , xr) and y (y1, y2, ,
yr) be two distinct elements of U. We need to
show that Prha(x) ha(y) ? 1/n. - Since x ? y, there exists an integer j such that
xj ? yj. - We have ha(x) ha(y) iff
- Can assume a was chosen uniformly at random by
first selecting all coordinates ai where i ? j,
then selecting aj at random. Thus, we can assume
ai is fixed for all coordinates i ? j. - Since p is prime, aj z m mod p has at most one
solution among p possibilities. - Thus Prha(x) ha(y) 1/p ? 1/n. ?
see lemma on next slide
30Number Theory Facts
- Fact. Let p be prime, and let z ? 0 mod p. Then
?z m mod p has at most one solution 0 ? ? lt p. -
- Pf.
- Suppose ? and ? are two different solutions.
- Then (? - ?)z 0 mod p hence (? - ?)z is
divisible by p. - Since z ? 0 mod p, we know that z is not
divisible by pit follows that (? - ?) is
divisible by p. - This implies ? ?. ?
- Bonus fact. Can replace "at most one" with
"exactly one" in above fact. - Pf idea. Euclid's algorithm.