Title: Estimating the longest increasing sequence in polylogarithmic time
1Estimating the longest increasing sequence in
polylogarithmic time
- C. Seshadhri (Sandia National Labs)
- Joint work with Michael Saks
- (Rutgers University)
Sandia National Laboratories is a multi-program
laboratory managed and operated by Sandia
Corporation, a wholly owned subsidiary of
Lockheed Martin Corporation, for the U.S.
Department of Energy's National Nuclear Security
Administration under contract DE-AC04-94AL85000
2The problem
4
24
10
9
15
17
20
18
4
19
3
4
10
15
17
18
19
- Given array fn ? N, find (length of)
- Longest Increasing Subsequence (LIS)
- Rather self-explanatory
- By now, textbook dynamic programming problem
- CLRS 01 Chapter 15.4 (Longest Common
Subsequence), Starred Problem 15.4-6 - Schensted 61, Fredman 75 O(n log n) algorithm
3Too much to read
LIS is in range 0.4n, 0.6n
Algorithm
5
7
4
8
9
2
- Array f is extremely large, so cant read all of
it - What can we say about LIS length, if we see very
little? - LIS LIS length
- Read only poly(log n) positions
- Obviously randomized
4Uniform sample says nothing
2
1
4
3
6
5
8
7
10
9
4
9
- Choose uniform random sample of poly(log n) size
- LIS n/2, but random sample always increasing
- So not really that easy to learn about LIS
5Our result
LIS in this range
Algorithm
1
n
LIS
- We want range to be small
6Our result
LIS in this range
dn
Algorithm
1
n
LIS
- We want range to be small
- This work For any (constant) d gt 0
- Algorithm gives additive dn approximation to
LIS - Running time is (1/d)1/d(log n)c
7Our result
Ad alert!
dn
1
n
n/2
LIS
- We want range to be small
- This work For any (constant) d gt 0
- Algorithm gives additive dn approximation to
LIS - Running time is (1/d)1/d(log n)c
- Ailon Chazelle Liu S 03 Parnas Ron Rubinfeld
03 - Previous best d ½
8Our result
Ad alert!
dn
1
n
n/2
LIS
- We want range to be small
- This work For any (constant) d gt 0
- Algorithm gives additive dn approximation to
LIS - Running time is (1/d)1/d(log n)c
- We get (1 d)-approx to distance to monotonicity
- Previously best was factor 2
9Prelims the array in space
20
15
10
4
20
10
9
15
4
10
15
4
1
2
3
10Prelims the array in space
Violation
Increasing sequence
- Input is points in plane, given as array
- (LIS is longest chain in partial order)
11A hard example
k
k
10 points in each
k
k
LIS 4k
LIS 2k
3k
k
k
3k
- The decision for a point depends on small scale
properties of far away portions
12A hard example
k
k
k
k
3k
k
k
3k
- Random samples in neighborhoods of points are
identical! - Can we really estimate LIS in polylog time?
- Is it time for some heavy work?
- I mean, time for lbs (lower bounds).
13Outline (or lack thereof)
- Will I show proofs?
- No
- Will I show the algorithm?
- Maybe
- I will try to demonstrate the main insight
- By a series of thought experiments
14The dynamic program
Closest LIS point to left
Splitter
n/2
- Closest LIS point to left gives splitter
- Find LIS is each blue region. Piece together!
- So we break up original problem into subproblems
15The dynamic program
S
n/2
- But we dont know right splitter.
- So try all possible! Only n different choices
- Choose the one that gives the largest sum of
LISs - MaxS (LIS-below-S LIS-above-S)
16The dynamic program
n/2
- If you LIS in all small boxes, you can build LIS
for bigger boxes - Not the most efficient DP
- So our sublinear algo will mimic this process
17The IP
Is this point on LIS?
LIS is in blue region
Splitter
n/2
Where is the splitter?
It is there.
18The IP
This point NOT on LIS
LIS is in blue region
n/2
Where is the splitter?
It is there.
19The IP
n/2
3n/4
I wish we knew the splitter in that region
It is there.
20The IP
n/2
3n/4
5n/8
I think I know what will happen next
Youre lucky Im here
21The IP
n/2
3n/4
5n/8
I think I know what will happen next
Youre lucky Im here
22The IP
n/2
3n/4
5n/8
I think I know what will happen next
Youre lucky Im here
23The interactive protocol
- If point stays in blue region till very end, then
it is good (on LIS). Otherwise, bad. - This takes (log n) steps, with the help of the
wizard
24The interactive protocol
- If point stays in blue region till very end, then
it is good (on LIS). Otherwise, bad. - This takes (log n) steps, with the help of the
wizard - If we could simulate the wizard
25The interactive protocol
- If point stays in blue region till very end, then
it is good (on LIS). Otherwise, bad. - This takes (log n) steps, with the help of the
wizard - If we could simulate the wizard
What?? If you could simulate the wizard, you know
the LIS!
26Find a splitter
If very few LIS points outside blue, this is not
a bad splitter
n/2
- Finding splitter may be hard, so try for
approximate versions? - But how do we determine the number of LIS points?
27Find a splitter
Total no. of points outside bluelt µn
Conservative splitter
n/2
- If µ lt 1/(100 log n), being against health care
conservative is good enough
28Easy to check
n/2
- Count fraction of sample outside blue
- poly(log n) samples checks this accurately
29Getting a conservative splitter
n/2
- We can sample (log n) different candidates and
check which of them disbelieves evolution is
conservative - What if no conservative splitter exists?
30A liberal paradise
Choose any line
No. of points outside at least µn
n/2
- So we know that LIS lt (1-µ) n
- Leads to the next idea. Boosting approximations!
- Given d-approx to LIS, can we get improve to d?
31Boosting approximations
Run dn-approx on points in box
No. of points outside at least µn
n2
Run dn-approx on points in box
Real splitter
n1
n/2
- Take sum of outputs as total LIS estimate
- LIS LIS1 LIS2, Est Est1 Est2
- Est1 LIS1 lt dn1 Est2 LIS2 lt dn2
- So Est LIS lt d(n1 n2)
- n1n2 lt (1-µ)n, so Est LIS lt d(1-µ)n !
32Putting it together
Conservative splitter?
n/2
- Check if each is conservative splitter
- If it is, were found right subproblems
- Otherwise
33Putting it together
Run dn-approx on points in box
S
Run dn-approx on points in box
n/2
- One of these is close enough to real splitter
- Est(S) Left-Est(S) Right-Est(S)
34Putting it together
Run dn-approx on points in box
S
Run dn-approx on points in box
n/2
- One of these is close enough to real splitter
- Est(S) Left-Est(S) Right-Est(S)
- Final Estimate maxS Est(S)
- Looks like a great idea!
- We go from dn to d(1- µ)n. Recur to keep
improving approximation
35It fails, miserably
Alg
d0 d1(1-µ)
Alg
Alg
d1
1/µ
Alg
Alg
Alg
Alg
d2
½
Alg
Alg
Alg
Alg
- As we go up each level, approx gets better by
(1-µ). - So to get d0 ¼, how many levels needed?
- ¼ ½ (1-µ)t So t 1/µ
- We have running time at least 21/µ.
- So, µ needs to be gt 1/log log n.
36Find a splitter
Total no. of points outside bluelt µn
Conservative splitter
n/2
- If µ lt 1/(100 log n), being against health care
conservative is good enough
37The basic dichotomy
Continue IP
P
We find splitter
Cannot find splitter
The Interactive Protocol phase
The Dynamic Programming phase
- For IP, we need µ lt 1/log n
- µn is error in each level of IP
- For DP, we need µ gt 1/log log n
- (1-µ) is decrease in approximation
38The basic dichotomy
Strengthen
Continue IP
Weaken
P
We find splitter
Cannot find splitter
The Interactive Protocol phase
The Dynamic Programming phase
- For IP, we need µ lt 1/log n
- µn is error in each level of IP
- For DP, we need µ gt 1/log log n
- (1-µ) is decrease in approximation
39Reducing to smaller DP!
n/(log n)
Run d-approx to get LIS estimate inside box
n/(log n)
- Run d-approx on all poly(log n) such boxes
40Reducing to smaller DP!
n/(log n)
- Run d-approx on all poly(log n) such boxes
- Use Dynamic Program to find chain with largest
sum of estimates - Longest path in DAG
- Can solve in poly(log n) time
41Dichotomy theorem
OR
One can go from d-approx to (d-d2)-approx by a
(log n) sized DP
Either it is easy to find the right subproblems
42The algorithm, in one slide
Continue IP
P
We find splitter
Cannot find splitter
Make poly(log n) calls to d-approx. Solve DP of
poly(log n) size.
- Overall running time becomes (log n)1/d
- miracle that the math works out
43The even better version
- Dont exactly solve this dynamic program!
- Use our sublinear algo to approximately solve in
(loglog n) time. Then do it recursively - Its painful
- Its all Greek a ß ? d e ? ? µ ?
- We had ?, but got rid of it
44What next?
- Sublinear dynamic programming!
- We get (1/d)1/d (log n)c time. Can we get
- (log n)/d time?
- Would be extremely cool. Completely optimal
- Applications for other dynamic programs?
- How does one find the right subproblems in
sublinear time? - Generalize the dichotomy
- Longest common subsequence/edit distance?
45Ask and you shall know