Parallel DNA Alignment - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Parallel DNA Alignment

Description:

If xi = yj then the aligment is xi add alignment of Xi -1 and Yj -1. ... The most widely used tool for alignment ... categories of pairwise DNA alignment? ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 13
Provided by: chatCa
Category:

less

Transcript and Presenter's Notes

Title: Parallel DNA Alignment


1
Parallel DNA Alignment
  • Jiang Lu

2
DNA Pairwise Alignment
  • DNA stores genetic information
  • by the permutation of the four
  • nucleotide A, C, G, T.
  • Global alignment
  • For two similar sequence
  • End to end
  • Local alignment
  • Not similar
  • The best matched part

Global CTCTACGCCAGAG C- -TAC -CCA
-AG Local CTCTACGCC- AGAG - - CTAC-
CCAAG
3
Algorithm Development Timeline

DP Optimal but slow Seed Fast but no guarantee
4
Dynamic Programming (global alignment)
X1
X2
X3

Xi-1
Xi
  • If xi yj then the aligment is xi add
    alignment of Xi -1 and Yj -1.
  • If xi ? yj then the alignment is the alignment
    of Xi -1 and Yj
  • If xi ? yj then the alignment is the alignment
    of Xi and Yj -1.
  • Demo http//baba.sourceforge.net/

Y1
Y2
Y3

Yj-1
Yj
5
Fine-grained parallel based on SIMD
  • Parallel calculate multiple lines/columns/blocks
  • Front-wave parallel compute diagonally

6
BLASTThe most widely used tool for alignment
  • Good performance 100 times faster than dynamic
    programming
  • Demo http//blast.ncbi.nlm.nih.gov

7
Ordered Index Seed Algorithm (I) sequence
indexing
  • A new seed based algorithm published in 2008 IEEE
  • Its claimed 528 times faster than BLAST
  • First step encode and indexing

8
Ordered Index Seed Algorithm (II) Un-gapped
extension
  • For each seed, match the neighbor pair in two
    directions
  • For seed AACTGTAA (using 8-length seed)
  • Skip duplicate extension, e.g. seed AATTGCTC
  • Score each extension and keep the one higher than
    the threshold
  • Match 5 points
  • Unmatch - 4 points

9
Ordered Index Seed Algorithm (III) Gapped
extension
  • In mutation process, the
  • gene sequence maybe
  • interrupted
  • Use dynamic programming
  • do gapped extension in
  • middle of the seed

10
Coarse-grained parallel application IBM's gene
database on Blue Gene/L
  • Process 2 million alignment against 2.5 million
    sequences per day. - The fastest when published
    at 2005

11
Performance

12
Question Sheet
  • What are the two DNA alignment types?
  • What are the two main algorithm categories of
    pairwise DNA alignment?
  • What is the most widely used DNA alignment
    algorithm?
Write a Comment
User Comments (0)
About PowerShow.com