Accelerating Bioinformatics Algorithms with Reconfigurable Computing - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Accelerating Bioinformatics Algorithms with Reconfigurable Computing

Description:

Accelerating Bioinformatics Algorithms with Reconfigurable Computing Presentation to MAPLD Conference September 2004 Overview The Problem BioInformatics Algorithm ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 29
Provided by: klabsOrgm
Learn more at: http://klabs.org
Category:

less

Transcript and Presenter's Notes

Title: Accelerating Bioinformatics Algorithms with Reconfigurable Computing


1
Accelerating Bioinformatics Algorithms with
Reconfigurable Computing
  • Presentation to MAPLD Conference
  • September 2004

2
Overview
  • The Problem
  • BioInformatics Algorithm Smith Waterman
  • Current Implementations
  • The Solution
  • Viva as a Reconfigurable Computing SW HW Design
    Tool
  • Hypercomputer Architecture for High-End RC
    applications
  • The Implementation
  • Smith Waterman Viva Code
  • Smith Waterman Pipeline Design
  • Smith Waterman Pipeline applied to Hypercomputer
    Architecture
  • Smith Waterman Pipeline Primitives inside the
    FPGA
  • The Results
  • Visualization of Rat vs. Human Genetic Code
  • Informal Benchmarks
  • Other Potential Applications
  • Seismic Data Processing Weather Modeling Image
    Rendering

Page 2
3
The ProblemEnormous Biosciences Problems
  • Exploding Datasets in Biosciences
  • DNA Sequencing
  • Gene Expression
  • Protein Identification

Page 3
4
The Need High-Speed High Sensitivity Algorithms
  • High-Speed High-Sensitivity DNA and Protein
    Searching Algorithms
  • Critical in virtually every branch of molecular
    biology.
  • Smith-Waterman
  • Theoretically optimal for sequence matching.
  • BUT Compute Intensive!
  • BLAST and FASTA
  • Approximations.
  • Faster than Smith Waterman, but less sensitive.

Page 4
5
The Need High-Speed High Sensitivity
Algorithms
  • Comparative Genomics Comparing the genomes of
    related species
  • Identifying genes, defining gene structure,
    elucidating evolutionary change, identifying
    regulatory elements and revealing combinatorial
    control of gene regulation
  • Sequencing Effort
  • Human sequence is completed other organisms now
    being sequenced
  • Sequencing effort will require high sensitivity
    DNA searches and alignments
  • SmithWaterman preferred method of choicemore
    accurate, specific
  • NCBI BLAST, WU BLAWST not effective in
    low-coverage DNA situations
  • RNA interference (RNAi) seeking novel therapies
    developing new drugs.
  • The process Choosing the correct genetic
    sequence to effectively block a targeted
    messenger RNA (mRNA) without silencing additional
    genes
  • Due to word length limitations, BLAST algorithms
    can miss sequences that have one or more
    mismatches compared to the query siRNA sequence
  • Genome Annotation
  • BLAST does not allow for long introns or
    frameshifts
  • Smith-Waterman is both frameshift- and
    intron-tolerant.

Page5
6
The Need High-Speed Smith Waterman
  • Large Matrix comparison
  • Large datasets
  • High level of detail for each SW calculation
  • NOT heuristic approximations

Page 6
7
The Need High Performance Biosciences Platform
  • Cluster Computingmost widely used platform. BUT
    there are diminishing returns
  • Expensive to build, difficult to maintain
  • Require significant power, air conditioning, and
    physical space
  • Architecture inherently limits scalability and
    performance
  • Reconfigurable Computing(RC)the promising
    alternative
  • Advantages of a Custom Chip
  • Implement algorithms directly in hardware
  • Performance advantages of an ASIC, but without
    chip development cost
  • Advantages of a General Purpose Platform
  • Development time comparable to software
    development
  • FPGAs can be reconfigured to perform other
    computational tasks.

Page 7
8
The SolutionFPGA-Programming Environment Viva
  • VIVA GRAPHICAL LANGUAGE
  • Capture natively parallel code
  • Accommodate data of any type, size, or precision
  • Tune algorithms for speed of execution or
    conservation of hardware resources
  • VIVA EDITOR
  • Call Viva algorithms from legacy code such as C,
    C, or Fortran
  • Interactively debug code
  • Import/Export EDIF files
  • VIVA COMPILER/SYNTHESIZER
  • Program multi-million gate designs
  • Compile hardware designs quickly for efficient
    development
  • VIVA LIBRARIES
  • Reuse flexible Viva objects which accept any data
    type or size
  • Target any hardware platform with a System
    Description
  • Prototype Viva on any X-86-based Windows machine

Page 8
9
The Solution FPGA-based Hypercomputers
Page 9
10
Structure of an FPGA Processing Element
Page 10
11
Structure of a Processing Element Quad
Page 11
12
Structure of a Hypercomputer Accelerator Board
Page 12
13
The Prototype ImplementationSmith Waterman in
Viva Code
 
Page 13
14
Smith Waterman Program Flow
  • As the query sequence is loaded, the Init_Cells
    object creates our initial column and stores it
    in SW_Cell_Mem.
  • After this initialization period, SW_Cell_Mem
    will provide a cell to the chain SW_Iteration
    objects every clock cycle. It will also write a
    newly calculated cell every clock cycle.
  • The SW_Cell_Mem object stores every nth column,
    where n is the number of SW_Iteration objects.

Page 14
15
Smith Waterman Cells
  • There are as many cells as there are characters
    in the query sequence.
  • The array of cells represent a column of the
    scoring matrix.
  • The initial (zero) column is initialized and
    stored into the cell memory object, SW_Cell_Mem.
  • Each cell contains the following four parameters
  • Pattern a character from the query sequence
  • Score the score of this cell in the current i,j
    position
  • PatternStart the position in the query sequence
    from which the score was calculated
  • DataStart the position in the reference
    sequence from which the score was calculated

Page 15
16
Cell Data Types
  • Data Element size may be adjusted depending on
    usage
  • Pattern contains as many bits as needed to
    encode characters from the sequences 4 bits for
    nucleotides.
  • Score and PatternStart Equal in size. Must be
    large enough to encode the number of entries in
    the query sequence
  • DataStart will be the largest data set as it
    must be able to encode any position in the
    reference sequence.
  • Right size for the job
  • Less circuitry is needed to calculate matches in
    smaller sequences.
  • Smaller sequences may exploit more parallelism.

Page 16
17
Smith Waterman Data Sets
In this example, our Pattern contains 4 bits, for
modeling nucleotides. The Score and PatternStart
parameters contain 26 bits, so our query sequence
may contain up to 67,108,864 characters. The
DataStart parameter contains 27 bits, meaning our
reference sequence may contain up to 134,217,728
characters.
Page 17
18
Smith Waterman Iteration
Page 18
19
SW_Iteration Object
  • Inputs
  • Matrix_In receives a constant stream of cells.
    It is imperative for efficiency that the pipe
    remains full.
  • Data receives a single character from the
    reference sequence. The cells computed will be
    for the column of the scoring matrix
    corresponding to the Data value.
  • CountBy the radix of the algorithm (number of
    iteration objects)
  • Init_J_In this iteration objects index in the
    chain of iteration objects
  • ClkG System Clock
  • Token_In a token pulse precedes a set of cells,
    allowing the iteration object to clear-out data
    from the previous set of cells
  • Init initialization pulse utilized only before
    search commences
  • G accompanies each valid cell

Page 19
20
SW_Iteration Object
  • Outputs
  • Matrix_Out newly-computed cell
  • Token_Out passes token to next iteration object
  • D accompanies each newly-computed cell
  • Init_J_Out used by next iteration object
  • I J current row and column used to report
    results

Page 20
21
Pipe Stages
  • The SW_Iteration object contains four pipe
    stages.
  • A cell is received by and produced by the
    SW_Iteration object every clock cycle.
  • When a cell enters, it is coming from the
    previous column, so its values are those of the
    West neighbor.
  • Since the cell in the row above any given cell is
    in the next pipe stage, access to both the North
    and Northwest neighbors values are possible.

Page 21
22
Parallelism
  • If a given hardware system has enough physical
    resources to accommodate n SW_Iteration objects,
    the Smith Waterman program may operate on n
    columns in parallel.
  • Hence n cells are computed every clock cycle.
  • Each Virtex II 6000 can support 64 iteration
    objects

Page 22
23
The ImplementationPipeline Primitives Inside
the FPGA
Page 23
24
The Implementation Smith Waterman Pipeline
XPR Router
PE2
PE1 (Controller)
PE3
PE4
PE5
PE6
PE7
PE8
X86 System
 
Bus Controller
XPE Data Distribution
Page 24
25
The Results Rat vs. Human Genetic Code
Page 25
26
The Results Bacteria to Bacteria Comparison
Page 26
27
The Results Informal Statistics
  • Total Operations / Second
  • 1 Smith-Waterman Step includes
  • 25 Logic Operations (Adds, compares, mostly 26-27
    bit ops, some single bit ops)
  • 13 Data Reorder Operations (Move, Combine)
  • 11 Data Stor (Assignment)
  • Logic Operations Only
  • 25 Ops 25Mhz 448 Smith-Waterman kernels
    280Billion Operations / Second
  • Logic Data Operations
  • 49 Ops 25Mhz 448 Smith-Waterman kernels
    550Billion Operations / Second
  • Total Aggregate Communications Bandwidth of
    Systolic Array
  • 12 88 25Mhz 26.4 Gb/s plus 7 22 50Mhz
    7Gb/s 34.1 Gb/s
  • Resources Consumed / Resources Available
  • PE2 PE7 60 to 70 consumed
  • PE1 20 consumed XPE 5 XPR .1
  • Compilation time
  • Gates 70 Million Total
  • Time to compile 20 Minutes
  • Power Consumption
  • Meter50 Watts

Page 27
28
Summary Conclusions
  • This Viva prototype of the Smith-Waterman
    algorithm demonstrates that the algorithm can be
    parallelized for fast operation in an FPGA system
    and validates the usage of FPGAs to increase the
    speed of the Smith-Waterman algorithm compared to
    clusters
  • Speed of the Prototype
  • An HC-62 has the bandwidth to pass cells between
    7 FPGAs, allowing for 448 parallel SW_Iteration
    objects
  • At a conservative 30 Mhz system clock speed, this
    gives 30,000 448 13.4 Billion Smith Waterman
    steps/second.
  • Opportunities to further optimize the algorithm
    include
  • Increasing the number of SW_Iterations that can
    be done in parallel (up to 100 Billion Smith
    Waterman steps/second)
  • Increasing the clock speed of the hardware (up to
    1 Trillion Smith Waterman steps/second)

Page 28
Write a Comment
User Comments (0)
About PowerShow.com