A TableDriven, FullSensitivity Similarity Search Algorithm - PowerPoint PPT Presentation

About This Presentation
Title:

A TableDriven, FullSensitivity Similarity Search Algorithm

Description:

Start Trimming ... Start Trimming. Worst case. Let abe the expected percentage of vertices that are seed. Extension Trimming. A table that eliminates vertices ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 44
Provided by: yufe1
Category:

less

Transcript and Presenter's Notes

Title: A TableDriven, FullSensitivity Similarity Search Algorithm


1
A Table-Driven, Full-Sensitivity Similarity
Search Algorithm
  • Gene Myers and Richard Durbin
  • Presented by Wang, Jia-Nan and Huang, Yu-Feng

2
Outline
  • Introduction
  • Background
  • Preliminary
  • Method
  • Experiment

3
Introduction
  • Given a Query and database . Do local alignment
  • Smith-Waterman Guaranteed to find all local
    alignment . Expensive
  • BLAST
  • FASTA

4
Improvement
  • Hardware more investment on computer ,CPU
  • Software
  • Phil Greens SWAT appeal to sparsity and some
    machine-level coding tricks
  • 60 of dynamic programming matrix has value 0
  • Avoiding computing most of these unproductive
    entries

5
  • Focus on improving protein similarity searches
  • This approach examines and compute only 4 of the
    underlying dynamic programming matrix

6
Recall
  • Sequence alignment
  • Local sequence alignment
  • Global sequence alignment
  • Goal matching path with highest score
  • Table-based computation and dynamic programming

7
Dynamic Programming
  • Three basic components
  • Recurrence relation
  • Tabular computation
  • Traceback

8
Smith-Waterman Method
  • Dynamic programming algorithm
  • Find the most similar subsequences of two
    sequences
  • Problem
  • Lots of computation ? will be googol
  • Programmer ? will be crazy and excite
  • Why? ? how to accelerate

9
Background
  • Scoring System
  • Simple scoring scheme
  • Affine gap penalty scoring scheme
  • PAM120 (PAMn)
  • BLOSUM62 (BLOSUMn)

10
Simple Scoring Scheme
  • Match (e.g. 8)
  • Mismatch (e.g. -5)
  • Gap constant penalty (e.g. -20)

11
Affine Gap Penalty Scoring Scheme
  • Match (e.g. 8)
  • Mismatch (e.g. -5)
  • Gap symbol (e.g. -5)
  • Gap open penalty (e.g. -10)

12
PAM
  • PAM Percent Accepted Mutation
  • Dayhoff et al. (1978)
  • PAM unit
  • Evolutionary time corresponding to average of 1
    mutation per 100 residues ? 1 accepted
  • PAMn
  • Relates to mutation probabilities in evolutionary
    interval of n PAM units

Some information from http//www.apl.jhu.edu/prz
ytyck/CAMS_2004_1b.pdf
13
PAM120
Source http//eta.embl-heidelberg.de8000/misc/ma
t/pam120.html
14
BLOSUM62
  • BLOSUM BLOcks SUbstitution Matrix
  • Steven and Jorga G. Henikoff (1992)
  • Paper Amino acid substitution matrices from
    protein blocks PubMed
  • BLOSUMn
  • Relates to mutation probabilities observed
    between pairs of related proteins that diverged
    so above n identity

Some information from http//www.apl.jhu.edu/prz
ytyck/CAMS_2004_1b.pdf
15
BLOSUM62
16
Preliminaries
  • S sequences are composed
  • S S Substitution matrix S giving the score
  • Uniform gap penalty g gt 0
  • Query q1q2. . .qp of P letters
  • Target t1t2. . .tn of N letters
  • Threshold T gt 0

17
Score Table ? Edit Graph
Picture source http//searchlauncher.bcm.tmc.edu/
help/Pictures/S-Wexample.gif
18
(No Transcript)
19
Problem
  • Find a high score local alignment between Query
    and Target whose path score ?T
  • Edit-graph figure1
  • Limit our attention to prefix-positive paths
  • If there is a path of score T or greater in the
    edit graph then there is a prefix positive path
    of score T or greater

20
Definition
  • A set P of index-value pairs (i,v) i is 0,P

21
The start and extension tables
  • Consider a vertex x in row j of the edit graph of
    Query vs. Target

22
(No Transcript)
23
Start Trimming
  • Limiting the dynamic programming to the
    startable vertices requires a table Start(w)
    where w Sks

24
Start Trimming
  • Worst case
  • Let abe the expected percentage of vertices that
    are seed

25
Extension Trimming
  • A table that eliminates vertices that are not
    extendable
  • (i,j) is extendable vertex iff C(i,j)gtExtend(i,Tar
    getj1jke)

26
Extension Trimming
27
(No Transcript)
28
A Table-Driven Scheme for DP
  • Goal to restrict the SW computation to
    productive vertices
  • Jump table captures the effect of Advance and
    Delete over kJ gt 0 rows
  • space ? unmanageably large
  • But only record those for which

29
  • Jump table
  • Start table
  • Space-saving version for Jump and Start tables

30
  • Check for paths scoring T or more

31
(No Transcript)
32
Recall Affine Gap Penalty
  • Score
  • Match
  • Mismatch
  • Gap symbol - gsp
  • Gap open penalty - gop
  • Affine cost of gap of length k
  • g kh, g gop, h gsp

33
Diagram of Affine Gap Penalty
Source kmchaos lecture note
34
  • Recurrence system - Gotoh

35
The Case of Affine Gap Costs
  • Simple scoring scheme ? affine gap penalty scheme
  • Affine edit graph and vertex structure
  • Question how to modify the equations defined
    above?

36
(No Transcript)
37
Recurrence System for Affine Gap Costs
  • Two observations
  • To compute the jth row form the (j-1)st requires
    knowing only the vectors of and values in
    row j-1, and not on the values in that row
  • If then the value
    at vertex need not be
    recorded as any maximal path through its
    will have score less than the maximal path
    passing through the corresponding

38
Recurrence System
39
Results
40
Experiment
  • Method
  • Edit graph based approach vs. SWAT
  • Scoring matrix
  • PAM120
  • Affine gap cost
  • 84n
  • Database (target)
  • 3 million residue subset of the PIR database
  • Query
  • A periodic clock protein of length 173 (pcp)
  • A lactate dehydrogenase of length 319 (dehydro)
  • A cGMP kinase of length 670 (kinase)
  • A growth factor of length 1210 (g factor)

41
PAM120 Gap Cost 84n
42
BLOSUM62 Gap Cost 82n
43
Ending
Thanks for Your Attention
Write a Comment
User Comments (0)
About PowerShow.com