Algorithms for Biochip Design and Optimization - PowerPoint PPT Presentation

About This Presentation
Title:

Algorithms for Biochip Design and Optimization

Description:

Algorithms for Biochip Design and Optimization Ion Mandoiu Computer Science & Engineering Department University of Connecticut Overview Driver Biochip Applications ... – PowerPoint PPT presentation

Number of Views:156
Avg rating:3.0/5.0
Slides: 57
Provided by: IonMa8
Category:

less

Transcript and Presenter's Notes

Title: Algorithms for Biochip Design and Optimization


1
Algorithms for Biochip Design and Optimization
  • Ion Mandoiu
  • Computer Science Engineering Department
  • University of Connecticut

2
Overview
  • Physical design of DNA arrays
  • DNA tag set design
  • Digital microfluidic biochip testing
  • Conclusions

3
Driver Biochip Applications
  • Driver applications
  • Gene expression (transcription analysis)
  • SNP genotyping
  • CNP analysis
  • Genomic-based microorganism identification
  • Point-of-care diagnosis
  • healthcare, forensics, environmental monitoring,
  • As focus shifts from basic research to clinical
    applications, there are increasingly stringent
    design requirements on sensitivity, specificity,
    cost
  • Assay design and optimization become critical

4
Single Nucleotide Polymorphisms
  • Human Genome ? 3 ? 109 base pairs
  • Main form of variation between individual
    genomes single nucleotide polymorphisms (SNPs)
  • Total SNPs ? 1 ? 107
  • Difference b/w any two individuals ? 3 ? 106
    SNPs (? 0.1 of entire genome)

ataggtccCtatttcgcgcCgtatacacgggTctata
ataggtccGtatttcgcgcAgtatacacgggActata
ataggtccCtatttcgcgcCgtatacacgggTctata
5
Watson-Crick Complementarity
  • Four nucleotide types A,C,T,G
  • As paired with Ts (2 hydrogen bonds)
  • Cs paired with Gs (3 hydrogen bonds)

6
SNP genotyping via direct hybridization
Labeled sample
  • SNP1 with alleles T/G
  • SNP2 with alleles A/G

Array with 2 probes/SNP
Hybridization
7
In-Place Probe Synthesis
8
In-Place Probe Synthesis
9
In-Place Probe Synthesis
10
Simplified DNA Array Flow
Probe Selection
Design
Physical Design Probe Placement Embedding
Mask Manufacturing
Manufacturing
Array Manufacturing
Hybridization Experiment
End User
Analysis of Hybridization Intensities
Gene expression levels, SNP genotypes,
11
Unwanted Illumination Effect
  • Unintended illumination during manufacturing ?
    synthesis of erroneous probes
  • Effect gets worse with technology scaling

12
Border Length Minimization Objective
  • Effects of unintended illumination ? border
    length

13
Synchronous Synthesis
  • Periodic deposition sequence, e.g., (ACTG)k
  • Each probe grown by one nucleotide in each period

? border conflicts b/w adjacent probes 2 x
Hamming distance
14
2D Placement Problem
  • Find minimum cost mapping of the Hamming graph
    onto the grid graph
  • Special case of the Quadratic Assignment Problem

Edge cost 2 x Hamming distance
15
2D Placement Sliding-Window Matching
  • Proposed by Doll et al. 94 in VLSI context
  • Slide window over entire chip
  • Repeat fixed of iterations (? O(N) time for
    fixed window size), or until improvement drops
    below certain threshold

16
2D Placement Epitaxial Growth
  • Proposed by PreasL88, ShahookarM91 in VLSI
    context
  • Simulates crystal growth
  • Efficient row implementation
  • Use lexicographical sorting for initial ordering
    of probes
  • Fill cells row-by-row
  • Bound number of candidate probes considered when
    filling each cell
  • Constant of lookahead rows ? O(N3/2) runtime, N
    probes

17
2D Placement Recursive Partitioning
  • Very effective in VLSI placement
    AlpertK95,Caldwell et al.00
  • 4-way partition using linear time clustering
  • Repeat until Row-Epitaxial can be applied

18
Asynchronous Synthesis
G
T
C
A
G
T
C
Deposition Sequence
A
G
T
Probes
C
A
A
A
G
G
T
T
T
G
C
G
A
A
19
Optimal Single-Probe Re-Embedding
  • Efficient solution by dynamic programming

20
In-Place Re-Embedding Algorithms
  • 2D placement fixed, allow only probe embeddings
    to change
  • Greedy optimally re-embed probe with largest
    gain
  • Chessboard alternate re-embedding of black/white
    cells
  • Sequential re-embed probes row-by-row

Chip size Greedy Greedy Chessboard Chessboard Sequential Sequential
Chip size LB CPU LB CPU LB CPU
100 125.7 40 120.5 54 119.9 64
500 127.1 943 121.4 1423 120.9 1535
21
Integration with Probe Selection
Pool Size Pool Row-Epitaxial Pool Row-Epitaxial
Pool Size Improv CPU sec.
1 - 217
2 4.3 1040
4 8.2 1796
8 11.8 3645
16 15.2 7515
Probe Selection
Probe Pools
Physical Design Placement Embedding
Chip size 100x100
22
Overview
  • Physical design of DNA arrays
  • DNA tag set design
  • Digital microfluidic biochip testing
  • Conclusions

23
Universal Tag Arrays
  • Brenner 97, Morris et al. 98
  • Array consisting of application independent tags
  • Two-part reporter probes aplication specific
    primers ligated to antitags
  • Detection carried by a sequence of reactions
    separately involving the primer and the antitag
    part of reporter probes

24
Universal Tag Array Advantages
  • Cost effective
  • Same tag array used for different analyses
  • ? can be mass-produced
  • Only need to synthesize new set of reporter
    probes
  • More reliable!
  • Solution phase hybridization better understood
    than hybridization on solid support

25
SNP Genotyping with Tag Arrays
Tag
Primer
G

A
G
2. Solution phase hybridization
  1. Mix reporter probes with unlabeled genomic DNA

C
antitag
4. Solid phase hybridization
3. Single-Base Extension (SBE)
26
Tag Set Design Problem
t1
t1
t2
t2
t1
t2
t1
  • (H1) Tags hybridize strongly to complementary
    antitags
  • (H2) No tag hybridizes to a non-complementary
    antitag

Tag Set Design Problem Find a maximum
cardinality set of tags satisfying (H1)-(H2)
27
Hybridization Models
  • Melting temperature Tm temperature at which 50
    of duplexes are in hybridized state
  • 2-4 rule
  • Tm 2 (As and Ts) 4 (Cs and Gs)
  • More accurate models exist, e.g., the
    near-neighbor model

28
Hybridization Models (contd.)
  • Hamming distance model, e.g., Marathe et al. 01
  • Models rigid DNA strands
  • LCS/edit distance model, e.g., Torney et al. 03
  • Models infinitely elastic DNA strands
  • c-token model Ben-Dor et al. 00
  • Duplex formation requires formation of
    nucleation complex between perfectly
    complementary substrings
  • Nucleation complex must have weight ? c, where
    wt(A)wt(T)1, wt(C)wt(G)2 (2-4 rule)

29
c-h Code Problem
  • c-token left-minimal DNA string of weight ? c,
    i.e.,
  • w(x) ? c
  • w(x) lt c for every proper suffix x of x
  • A set of tags is a c-h code if
  • (C1) Every tag has weight ? h
  • (C2) Every c-token is used at most once

c-h Code Problem Ben-Dor et al.00 Given c and
h, find maximum cardinality c-h code
30
Algorithms for c-h Code Problem
  • Ben-Dor et al.00 approximation algorithm based
    on DeBruijn sequences
  • Alphabetic tree search algorithm
  • Enumerate candidate tags in lexicographic order,
    save tags whose c-tokens are not used by
    previously selected tags
  • Easily modified to handle various combinations
    of constraints
  • MT 05, 06 Optimum c-h codes can be computed in
    practical time for small values of c by using
    integer programming
  • Practical runtime using Garg-Koneman
    approximation and LP-rounding

31
Token Content of a Tag
  • c4
  • CCAGATT
  • CC
  • CCA
  • CAG
  • AGA
  • GAT
  • GATT

Tag ? sequence of c-tokens End pos 2
3 4 5 6 7
c-token CC?CCA?CAG?AGA?GAT?GATT
32
Layered c-token graph for length-l tags
l
l-1
c/2
(c/2)1

c1
t
s
cN
33
Integer Program Formulation MPT05
  • Maximum integer flow problem w/ set capacity
    constraints
  • O(hN) constraints variables, where N c-tokens

34
Packing LP Formulation
35
Garg-Konemann Algorithm
  • x ? 0 y ? ? // yi are variables of the dual LP
  • Find min weight s-t path p, where weight(v) yi
    for every v?Vi
  • While weight(p) lt 1 do
  • M ? maxi p ? Vi
  • xp ? xp 1/M
  • For every i, yi ? yi( 1 ? p ? Vi/M )
  • Find min weight s-t path p, where weight(v)
    yi for v?Vi
  • 4. For every p, xp ? xp / (1 - log1??)

GK98 The algorithm computes a factor (1- ?)2
approximation to the optimal LP solution with
(N/?) log1?N shortest path computations
36
LP Based Tag Set Design
  1. Run Garg-Konemann and store the minimum weight
    paths in a list
  2. Traversing the list in reverse order, pick tags
    corresponding to paths if they are feasible and
    do not share c-tokens with already selected tags
  3. Mark used c-tokens and run the alphabetic tree
    search algorithm to select additional tags

37
Periodic Tags MT05
  • Key observation c-token uniqueness constraint in
    c-h code formulation is too strong
  • A c-token should not appear in two different
    tags, but can be repeated in a tag
  • A tag t is called periodic if it is the prefix of
    (?)? for some period ?
  • Periodic strings make best use of c-tokens

38
c-token factor graph, c4 (incomplete)
CC AAG
AAC
AAAA AAAT
39
Vertex-disjoint Cycle Packing Problem
  • Given directed graph G, find maximum number of
    vertex disjoint directed cycles in G
  • MT 05 APX-hard even for regular directed graphs
    with in-degree and out-degree 2
  • h-c/21 approximation factor for tag set design
    problem
  • Salavatipour and Verstraete 05
  • Quasi-NP-hard to approximate within ?(log1-? n)
  • O(n1/2) approximation algorithm

40
Cycle Packing Algorithm
  • Construct c-token factor graph G
  • T?
  • For all cycles C defining periodic tags, in
    increasing order of cycle length,
  • Add to T the tag defined by C
  • Remove C from G
  • Perform an alphabetic tree search and add to T
    tags consisting of unused c-tokens
  • Return T

41
Experimental Results
42
More Hybridization Constraints
t1
t1
t2
  • Enforced during tag assignment by
  • - Leaving some tags unassigned and distributing
    primers across multiple arrays Ben-Dor et al.
    03
  • - Exploiting availability of multiple primer
    candidates MPT05

43
Herpes B Gene Expression Assay
GenFlex Tags
Tm pools Pool size 500 tags 500 tags 1000 tags 1000 tags 2000 tags 2000 tags
Tm pools Pool size arrays Util. arrays Util. arrays Util.
60 1446 1 4 82.26 3 65.35 2 57.05
60 1446 5 4 88.26 3 70.95 2 63.55
67 1560 1 4 86.33 3 69.70 2 61.15
67 1560 5 4 91.86 3 76.00 2 67.20
70 1522 1 4 88.46 3 73.65 2 65.40
70 1522 5 4 92.26 2 91.10 2 70.30
Periodic Tags
Tm pools Pool size 500 tags 500 tags 1000 tags 1000 tags 2000 tags 2000 tags
Tm pools Pool size arrays Util. arrays Util. arrays Util.
60 1446 1 4 94.06 2 97.20 1 72.30
60 1446 5 4 96.13 2 100.00 1 72.30
67 1560 1 4 96.53 2 98.70 1 78.00
67 1560 5 4 98.00 2 99.90 1 78.00
70 1522 1 4 96.73 2 98.90 1 76.10
70 1522 5 4 97.80 2 99.80 1 76.10
44
Overview
  • Physical design of DNA arrays
  • DNA tag set design
  • Digital microfluidic biochip testing
  • Conclusions

45
Digital Microfluidic Biochips
Srinivasan et al. 04
  • Electrodes typically arranged in rectangular
    grid
  • Droplets moved by applying voltage to adjacent
    cell
  • Can be used for analyses of DNA, proteins,
    metabolites

46
Design Challenges
  • Testing
  • High electrode failure rate, but can re-configure
    around
  • Performed both after manufacturing and concurrent
    with chip operation
  • Main objective is minimization of completion time
  • Module placement
  • Assay operations (mixing, amplification, etc.)
    can be mapped to overlapping areas of the chip if
    performed at different times
  • Droplet routing
  • When multiple droplets are routed simultaneously
    must prevent accidental droplet merging or
    interference







Merging
Interference
47
Concurrent Testing Problem
  • GIVEN
  • Input/Output cells
  • Position of obstacles (cells in use by ongoing
    reactions)
  • FIND
  • Trajectories for test droplets such that
  • Every non-blocked cell is visited by at least one
    test droplet
  • Droplet trajectories meet non-merging and
    non-interference constraints
  • Completion time is minimized

Defect model test droplet gets stuck at
defective electrode Su et al. 04 ILP-based
solution for single test droplet case heuristic
for multiple input-output pairs with single test
droplet/pair
48
ILP Formulation for Unconstrained Number of
Droplets
  • Each cell (i,j) visited at least once
  • Droplet conservation
  • No droplet merging
  • No droplet interference
  • Minimize completion time

49
Special Case








  • NxN Chip
  • I/O cells in Opposite Corners
  • No Obstacles
  • ? Single droplet solution needs N2 cycles

50
Lower Bound
  • Claim Completion time is at least 4N 4 cycles

Proof In each cycle, each of the k droplets
place 1 dollar in current cell
? 3k(k-1)/2 dollars paid waiting to depart









? 1 dollar in each cell
? k dollars in each diagonal
? 3k(k-1)/2 dollars paid waiting for last droplet
51
Stripe Algorithm with N/3 Droplets









52
Stripe Algorithm with Obstacles of width Q
  • Divide array into vertical stripes of width Q1
  • Use one droplet per stripe
  • All droplets visit cells in assigned stripes in
    parallel
  • In case of interference droplet on left stripe
    waits for droplet in right stripe

53
Results for 120x120 Chip, 2x2 Obstacles
Obstacle Area Average completion time (cycles) Average completion time (cycles) Average completion time (cycles) Average completion time (cycles) Average completion time (cycles) k40 vs. k1 speed-up
k1 k12 k20 k30 k40 k40 vs. k1 speed-up
0 14400 1412 944 710 593 24x
1 14256 1420 953.4 715.2 598.8 24x
5 13680 1473 982.8 725 596.2 23x
10 12960 1490 1010.8 734.8 592.6 22x
15 12240 1501 1025.8 730.8 588.2 21x
20 11520 1501 1046.8 738.4 580.8 20x
25 10800 1501 1071 736.6 570 19x
20x decrease in completion time by using
multiple droplets
54
Overview
  • Physical design of DNA arrays
  • DNA tag set design
  • Digital microfluidic biochip testing
  • Conclusions

55
Conclusions
  • Biochip design is a fertile area of applications
  • Combinatorial optimization techniques can yield
    significant improvements in assay
    quality/throughput
  • Very dynamic area, driver applications and
    underlying technologies change rapidly

56
Acknowledgments
  • Physical design of DNA arrays A.B. Kahng, P.
    Pevzner, S. Reda, X. Xu, A. Zelikovsky
  • Tag set design D. Trinca
  • Testing of digital microfluidic biochips R.
    Garfinkel, B. Pasaniuc, A. Zelikovsky
  • Financial support UCONN Research Foundation, NSF
    awards 0546457 and 0543365

57
Questions?
Write a Comment
User Comments (0)
About PowerShow.com