On the Range Maximum-Sum Segment Query Problem - PowerPoint PPT Presentation

About This Presentation
Title:

On the Range Maximum-Sum Segment Query Problem

Description:

Kuan-Yu Chen and Kun-Mao Chao. Department of Computer Science and Information Engineering, ... Chen and Chao. 4. An example Locating GC-rich regions (1) ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 36
Provided by: csieN
Category:

less

Transcript and Presenter's Notes

Title: On the Range Maximum-Sum Segment Query Problem


1
On the Range Maximum-Sum Segment Query Problem
  • Kuan-Yu Chen and Kun-Mao Chao
  • Department of Computer Science and Information
    Engineering,
  • National Taiwan University, Taiwan

2
Outlines
  • Motivations
  • Problems arising from some bioinformatics
    applications
  • Defining the RMSQ problem
  • Our main idea
  • Reducing the RMSQ problem to the Range Minima
    Query problem (RMQ)
  • Conclusions and applications
  • Solving some relevant problems in O(n) time

3
Applications to Biomolecular Sequence Analysis
  • Locating conserved regions or GC-rich regions in
    a DNA sequence
  • Assign a real number (also called scores) to each
    residue
  • Looking for the maximum-sum or maximum-average
    segments
  • Add length constraints or average constraints

4
An example Locating GC-rich regions (1)
  • One reasonable scoring expression to measure the
    richness of a region is x-pl , where x is the
    CG count of the region, l is the length of the
    region, and p is a positive ratio constant.
  • The goal is to design an algorithm to report the
    region that maximizes the expression x-pl

5
An example Locating GC-rich regions (2)
  • Let x be the CG count of the region, and y be
    the AT count of the region
  • Hence, we have
  • x-pl x-p(xy) (1-p)x - py
  • Therefore, to calculate the value of x-pl, one
    can assign
  • w(G) w(C)1-p
  • w(A)w(T)-p

6
The Maximum-Sum Segment
  • Also called the maximum-sum interval or the
    maximum-scoring region
  • Given a sequence of numbers, the maximum-sum
    segment is simply the contiguous subsequence
    having the greatest total sum.
  • lt5, -5.1, 1, 3, -4, 2, 3, -4, 7gt

With greatest total sum 8
Zero prefix-/suffix-sums are possible.
7
A Relevant Problem - RMQ
  • Range Minima (Maxima) Query Problem (also called
    Discrete Range Searching)
  • Given a sequence of numbers, by preprocessing the
    sequence we wish to retrieve the minimum
    (maximum) value within a given querying interval
    efficiently
  • lt5, -5.1, 1, 3, -4, 2, 3, -4, 7gt

Minimum
Maximum
8
Range Maximum-Sum Segment Query Problem
  • Definition
  • The input is a sequence lta1,a2, angt of real
    numbers which is to be preprocessed.
  • A query is comprised of two intervals S and E.
  • Our goal is to return the maximum-sum segment
    whose starting index lies in S and end index lies
    in E.

9
A Nonoverlapping Example
  • Input Sequence
  • 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4,
    -5, 3

Total sum 6
Starting region
End region
10
An Overlapping Example
  • Input Sequence
  • 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4,
    -5, 3

Total sum 8
Starting region
End region
11
Our Results
  • We propose an algorithm that runs in O(n)
    preprocessing time and O(1) query time under the
    unit-cost RAM model.
  • In fact, we show that RMSQ and RMQ are
    computationally linearly equivalent.
  • We show that the RMSQ techniques yield
    alternative O(n) time algorithms for the
    following problems
  • The maximum-sum segment with length constraints
  • All maximal-sum segments

12
Strategy
  • Reduce the RMSQ to the RMQ problem
  • Theorem. If there is a ltf(n), g(n)gt-time solution
    for the RMQ problem, then there is a ltf(n)O(n),
    g(n)O(1)gt-time solution for the RMSQ problem.

O(n)
RMSQ
RMQ
O(1)
13
Cumulative Sum/ Prefix Sum
prefix-sum(i) a1a2ai
14
Computing sum(i,j) in O(1) time
  • prefix-sum(i) a1a2ai
  • all n prefix sums are computable in O(n) time.
  • sum(i, j) prefix-sum(j) prefix-sum(i-1)

i
j
prefix-sum(j)
prefix-sum(i-1)
15
Case 1 Nonoverlapping
Maximize
Maximize
Minimize
  • sum(i, j) prefix-sum(j) prefix-sum(i-1)
  • Prefix-sum sequence
  • 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4,
    -5, 3

Range Minima Query
Find the highest point here
Find the lowest point here
16
Case 2 Overlapping
  • Some problems may occur
  • Prefix-sum sequence
  • 9, -10, 4, -2, 5, -5, 4, -3, 6, -11, 8, -3, 4,
    -5, 3

Negative Sum !!
Find the highest point here
Find the lowest point here
17
Case 2 Overlapping
  • Divide into 3 possible cases
  • Prefix-sum sequence
  • 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4,
    -5, 3

Range Minima Query Preprocessing time
f(n) Query time g(n)
Range Minima Query Preprocessing time
f(n) Query time g(n)
Find the highest point here
Find the highest point here
What should we do?
Find the lowest point here
Find the lowest point here
18
Dealing with the Special CaseSingle Range Query
  • Input Sequence
  • 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4,
    -5, 3
  • Challenge Can this special case be reduced to
    the RMQ problem?

Total sum 6
19
Reduction Procedure
  • Step 1. Find a partner for each index.
  • Step 2. Record the sum of each pair in an array
  • Step 3. Retrieve the maximum-sum pair by applying
    the RMQ techniques

20
Our First Attempt (1)
  • Step 1 For each index i, we define the lowest
    point preceding i as its partner
  • Prefix-sum sequence

i
Lowest point
Find a partner within this region
21
Our First Attempt (2)
  • Step 2 Record sum(partner(i), i) in an array

i
Lowest point
sum(partner(i), i)
22
Our First Attempt (3)
  • Step 3 Apply the RMQ techniques to the array

i
The maximum-sum pair can be retrieved
Applying RMQ to this sequence
Querying this interval
Lowest point
sum(partner(i), i)
23
Bump into Difficulties
  • What if its partners go beyond the querying
    interval?

i
We might have to update every pair!
Needs to be updated
partner(i)
sum(partner(i), i)
24
A Better Partner
How?
  • Prefix-sum sequence

Find the nearest point at least as large as
prefix-sum(i)
i
Left_bound(i)
Find the lowest point
New partner(i)
25
Why Is It Better? (1)
  • It remains the best choice.
  • It saves lots of update steps.
  • It turns out that zero or one point needs to be
    updated.

26
Why Is It Better? (2)-- Remains the Best
Find the nearest point at least as large as
prefix-sum(i)
i
Left_bound(i)
Find the lowest point
partner(i)
Impossible region
27
Why Is It Better? (3)-- Minimal-Maximal Property
  • Height(partner(i))lt Height(j) lt Height(i), for
    all partner(i)lt jlt i

Next higher point
Maximal point
i
Minimal point
partner(i)
No one higher than i
No one lower than partner(i)
28
Why Is It Better? (4)-- Save Some Updates
  • Prefix-sum sequence

Next higher point
Can not be the right end of the maximum-sum
segment
Querying interval
i
partner(i)
No one higher than i
29
Why Is It Better? (5)-- Nesting Property
  • For two indices i lt j, it cannot be the case that
    partner(i)ltpartner(j) ?iltj

Maximal point
j
i
Minimal point
Minimal point
Maximal point
partner(j)
partner(i)
30
Why Is It Better? (6)-- An example
  • No overlapping is allowed
  • 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4,
    -5, 3
  • Nesting Property
  • 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4,
    -5, 3

31
When a Query Comes-- Case 1 No Exceeding
  • The maximum pair (partner(i), i) lies in the
    querying interval

Retrieve the maximum pair
Querying interval
i
partner(i)
We are done. Output (partner(i), i).
32
When a Query Comes-- Case 2 Exceeding
  • The maximum pair (partner(i), i) goes beyond the
    querying interval

Retrieve the maximum pair
Retrieve the maximum pair
Querying interval
j
i
Maximal
Minimal
partner(i)
Update partner(i)
partner(j)
(Partner(i), i) is the maximum pair.
Nesting property
Can not be the right end of the maximum-sum
segment.
Compare (new_partner(i), i) and (partner(j), j)
33
Time Complexity
  • RMSQ can be reduced to the RMQ problem in O(n)
    time
  • Since under the unit-cost RAM model, there is a
    ltO(n), O(1)gt-time solution for the RMQ problem,
    there is a ltO(n), O(1)gt-time solution for the
    RMSQ problem.
  • On the other hand, RMQ can be reduced to the RMSQ
    problem in O(n) time, too. (Range Maxima Query
    For each two adjacent elements, we augment a
    negative number whose absolute value is larger
    than them.)

O(n)
RMQ
RMSQ
O(1)
34
Use RMSQ Techniques to Solve Two Relevant
Problems
  • 1. Finding the Maximum-Sum Segment with length
    constraints in O(n) time.
  • - Y.-L. Lin, T. Jiang, K.-M. Chao, 2002
  • - T.-H Fan et al., 2003
  • 2. Finding all maximal scoring subsequences in
    O(n) time.
  • - W. L. Ruzzo M. Tompa, 1999

35
Problem 1The Maximum-Sum Segment with Length
Constraints
  • Lin, Jiang, and Chao JCSS 2002 and Fan et al.
    CIAA 2003 gave O(n)-time algorithms for this
    problem.
  • Length at least L, and at most U

L
U
36
Problem 1 Finding the Maximum-Sum Segment with
Length Constraints
  • Length at least L, at most U
  • For each index i, find the maximum-sum segment
    whose starting point lies in i-U1, i-L1 and
    end point is i

i
RMSQ query
L
U
Runs in O(n) time since each query costs O(1) time
37
Problem 2 All Maximal-Sum Segments
  • Ruzzo and Tompa ISMB 1999 gave a O(n)-time
    algorithm for this problem.
  • Recursive definition.

R(S)
L(S)
S
38
Problem 2 Finding All Maximal Scoring
Subsequences
  • Recursive calls.
  • Input sequence

R(S)
L(S)
S
RMSQ query
Runs in O(n) time since each query costs O(1) time
Write a Comment
User Comments (0)
About PowerShow.com