A MillionFold Speed Improvement in Genomic Repeats Detection - PowerPoint PPT Presentation

About This Presentation
Title:

A MillionFold Speed Improvement in Genomic Repeats Detection

Description:

Ignores non-matching values before and after the subsequence (by disallowing negative values) ... Matching junk1 TATGCAG junk2 with junk3 TCTGAG junk4 ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 15
Provided by: csVu
Category:

less

Transcript and Presenter's Notes

Title: A MillionFold Speed Improvement in Genomic Repeats Detection


1
A Million-Fold Speed Improvementin Genomic
Repeats Detection
  • John W. Romein
  • Jaap Heringa
  • Henri E. Bal
  • Vrije Universiteit
  • Faculty of Sciences, Department of Computer
    Science
  • Bio-Informatics Group Computer Systems Group
  • Amsterdam, the Netherlands

Vrije Universiteit, Amsterdam
2
repeats in bio sequences
  • important to detect
  • essential for evolution
  • protein structure function
  • diseases
  • hard to detect
  • any length
  • mutations
  • insertions/deletions ? different fragment sizes
  • tandem and distant

3
repro
  • delineates repeats
  • sensitive
  • two phases
  • find top alignments (slow)?
  • find repeats
  • replaced phase 1
  • old algorithm
  • O(n4) ? n lt 2,000
  • new algorithm
  • O(n3) ? n lt 60,000
  • 3-level parallel SIMD, SMP, cluster

4
sidestep sequence alignment
  • superpose two sequences (TATGCAG, TCTGAG)?
  • match symbols vertically (good 2, bad -1)?
  • allow gaps (-2-1length)?
  • maximize score
  • compute matrix using dynamic programming

5
sidestep local alignment
  • Find sub-sequences that match well
  • Ignores non-matching values before and after the
    subsequence (by disallowing negative values)
  • Construct actual alignment O(n3) time
  • Computing only the scores O(n2) time
  • (see paper)

6
summary
  • (TATGCAG, TCTGAG) gt 6
  • takes O(n2) time
  • (TATGCAG, TCTGAG) gt
  • takes O(n3) time
  • Matching ltjunk1gt TATGCAG ltjunk2gt with
    ltjunk3gt TCTGAG ltjunk4gtgives same result as
    matching only the substrings TATGCAG and TCTGAG

7
finding topalignments
  • red lines top alignments
  • split sequence every possible way
  • align subsequence-pair
  • best is first top alignment
  • trick find next best (top) alignment using
    O(n2) algorithm n times construct topalignment
    using O(n3) algorithm
  • repeat while avoiding found top alignments
  • user typically wants 5-30 top alignments
  • ordered list, do most promising alignments first
  • realign 3-10

8
performance old vs. new
  • sequence longest known protein (titin)?
  • speed improvement increases with sequence length

9
parallel alignment
  • parallelism within alignment
  • loop-carried dependency
  • concurrent alignments
  • speculative parallelism
  • good performance
  • three-level parallelism
  • SSE/SSE2 multimedia extensions (SIMD)
  • shared memory MIMD
  • distributed memory MIMD

10
SIMD parallelism
  • multimedia extensions
  • 4 (SSE) or 8 (SSE2) parallel operations on
    consecutive 2-byte words
  • compiler intrinsics
  • compute 4 (or 8) neighboring matrices
    concurrently
  • interleaved memory layout
  • use fine-grained hardware for coarse-grained
    computation
  • applicable to any program that does many
    alignments

11
SSE/SSE2 performance
  • speedups w.r.t. new algorithm
  • superlinear speedups
  • MAX operator
  • 8 extra mmx/xmm registers
  • scheduling
  • cache-aware alignment 4 6.5 times faster

12
MIMD parallelism
  • SIMD (SSE) parallelism is speculative
  • If a matrix (alignment) is promising, its
    neighbors probably also are promising
  • MIMD parallelism
  • use dynamic task scheduling, selecting most
    promising tasks from a job queue
  • Shared memory (SMP) easy
  • Distributed memory MPI, master/worker

13
total parallel performance
  • SMP 2 CPUs ? 2 2 times faster
  • cluster 642 CPUs ? 548 889-fold speedup
  • Up to 125x faster than SSE version on 1 CPU

14
conclusions
  • new algorithm gtgt 100 times faster
  • much more for longer sequences
  • parallel SSE(2), SMP, cluster
  • SSE(2) parallelism yields superlinear speedups
  • 128 CPUs 548 889-fold speedup
  • 1,000,000-fold speed improvement
Write a Comment
User Comments (0)
About PowerShow.com