Minimum PCR Primer Set Selection with Amplification Length and Uniqueness Constraints

About This Presentation
Title:

Minimum PCR Primer Set Selection with Amplification Length and Uniqueness Constraints

Description:

Minimum PCR Primer Set Selection with Amplification Length and Uniqueness Constraints Ion Mandoiu University of Connecticut CS&E Department Combinatorial Optimization ... –

Number of Views:64
Avg rating:3.0/5.0
Slides: 25
Provided by: dnaEngrU
Category:

less

Transcript and Presenter's Notes

Title: Minimum PCR Primer Set Selection with Amplification Length and Uniqueness Constraints


1
Minimum PCR Primer Set Selection with
Amplification Length and Uniqueness Constraints
  • Ion Mandoiu
  • University of Connecticut
  • CSE Department

2
Combinatorial Optimization Applications in
Bioinformatics
  • Fast growing number of applications
  • Dynamic Programming Integer Programming in
    sequence alignment
  • TSP and Euler paths in DNA sequencing
  • Integer Programming in Haplotype inference
  • Integer Programming approximation algorithms
    for efficient pathogen identification (string
    barcoding)

3
High-Thrughput Assay Design
  • New source of combinatorial problems
  • Microarray probe selection
  • Mask design for Affy arrays
  • Universal tag arrays
  • Self-assembling microarrays
  • Quality control
  • This talk Multiplex PCR primer set selection
  • Optimization goals
  • Improved speed
  • High reliability
  • Reduced COST

4
Outline
  • Motivation and problem formulations
  • Greedy algorithm for primer set selection with
    amplification length constraints
  • LP-rounding algorithm for primer set selection
    with uniqueness constraints
  • Experimental results
  • Conclusions

5
Uniplex PCR

6
Primer Pair Selection Problem
3'
5'
Reverse primer
? L
? L
Forward primer
3'
5'
amplification locus
  • Given
  • Genomic sequence around amplification locus
  • Primer length k
  • Amplification upperbound L
  • Find Forward and reverse primers of length k
    that hybridize within a distance of L of each
    other and optimize amplification efficiency
    (melting temperatures, secondary structure, cross
    hybridization, etc.)

7
Motivation for Primer Set Selection (1)
  • Spotted microarray synthesis Fernandes and
    Skiena02
  • Need unique pair for each amplification product,
    but primers can be reused to minimize cost
  • Potential to reduce primers from O(n) to O(n1/2)
    for n products

8
Motivation for Primer Set Selection (2)
  • SNP Genotyping
  • Thousands of SNPs that must genotyped using
    hybridization based methods (e.g., SBE)
  • Selective PCR amplification needed to improve
    accuracy of detection steps (whole-genome
    amplification not appropriate)
  • No need for unique amplification!
  • Primer minimization is critical
  • Fewer primers to buy
  • Fewer multiplex PCR reactions

9
Primer Set Selection Problem
  • Given
  • Genomic sequences around each amplification
    locus
  • Primer length k
  • Amplification upperbound L
  • Find
  • Minimum size set of primers S of length k such
    that, for each amplification locus, there are two
    primers in S hybridizing to the forward and
    reverse sequences within a distance of L of each
    other
  • For some applications S should contain a unique
    pair of primers amplifying each each locus

10
Previous Work (1)
  • Pearson et al. 96LinhartShamir02Souvenir
    et al.03
  • - Separately select forward and reverse primers
  • - To enforce bound of L on amplification length,
    select only primers that are within a distance of
    L/2 of the target SNP
  • Ignores half of the feasible primer pairs
  • Solution can increase by a factor of O(n) by
    ignoring them!
  • Greedy set cover algorithm gives O(ln n)
    approximation factor for this formulation
  • Cannot approximate better unless PNP

11
Previous Work (2)
  • FernandesSkiena02 model primer selection as
    a minimum multicolored subgraph problem
  • Vertices of the graph correspond to candidate
    primers
  • There is an edge colored by color i between
    primers u and v if they hybridize to i-th forward
    and reverse sequences within a distance of L
  • Goal is to find minimum size set of vertices
    inducing edges of all colors
  • No non-trivial approximation factor known
    previously

12
Selection w/o Uniqueness Constraints
  • Can be seen as a simultaneous set covering
    problem
  • - The ground set is partitioned into n disjoint
    sets, each with 2L elements
  • The goal is to select a minimum number of sets
    ( primers) that cover at least half of the
    elements in each partition
  • Naïve modifications of the greedy set cover
    algorithm do not work
  • Key idea use potential function ? for a partial
    solution P minium number of elements that are
    not yet covered as measure of infeasibility
  • Initially, ? nL
  • For feasible solutions, ? 0

13
Potential-Function Driven Greedy
  • Select a primer that decreases the potential
    function ? by the largest amount (breaking ties
    arbitrarily)
  • Repeat until feasibility is achived
  • Lemma Each greedy selection reduces ? by a
    factor of at least (1-1/OPT)
  • Theorem The number of primers selected by the
    greedy algorithm is at most ln(nL) larger than
    the optimum

14
Selection w/ Uniqueness Constraints
  • Can be modeled as minimum multicolored subgraph
    problem add edge colored by color i between two
    primers if they amplify i-th SNP and do not
    amplify any other SNP
  • Trivial approximation algorithm select 2
    primers for each SNP
  • O(n1/2) approximation since at least n1/2
    primers required by every solution
  • Non-trivial approximation?

15
Integer Program Formulation
  • Variable xu for every vertex (candidate primer)
    u
  • xu set to 1 if u is selected, and to 0 otherwise
  • Variable ye for every edge e
  • ye set to 1 if corresponding primer pair
    selected to amplify one of the SNPs
  • Objective minimize sum of xus
  • Constraints
  • for each i, sum of ye e amplifying SNP i ? 1
  • ye ? xu for every e incident to u

16
LP-Rounding Algorithm
  • Solve linear programming relaxation
  • Select node u with probability xu
  • Theorem With probability of at least 1/3, the
    number of selected nodes is within a factor of
    O(m1/2lnn) of the optimum, where m is the maximum
    number of edges sharing the same color.
  • For primer selection, m ? L2 ? approximation
    factor is O(Lln n)

17
Experimental Setting
  • SNP sets extracted from NCBI databases
    randomly generated
  • C/C code run on a 2.8GHz Dell PowerEdge
    running Linux
  • Compared algorithms
  • G-FIX greedy primer cover algorithm of Pearson
    et al.
  • - Primers restricted to be within L/2 of
    amplified SNPs
  • G-VAR naïve modification of G-FIX
  • For each SNP, first selected primer can be L
    bases away from SNP
  • If first selected primer is L1 bases away from
    the SNP, opposite sequence is truncated to a
    length of L- L1
  • G-POT potential function driven greedy
    algorithm
  • MIPS-PT iterative beam-search heuristic of
    Souvenir et al (WABI03)

18
Experimental Results, NCBI tests
19
Experimental Results, k8
20
Experimental Results, k10
21
Experimental Results, k12
22
Runtime, k10
23
Conclusions
  • New combinatorial optimization problems arising
    in the area of high-throughput assay design
  • Theoretical insights (such as approximation
    results) give algorithms with significant
    practical improvements
  • Choosing the proper problem model is critical to
    solution efficiency

24
Ongoing Work Open Problems
  • Allow degenerate primers
  • Incorporate more biochemical constraints into the
    model (melting temperature, secondary structure,
    cross hybridization, etc.)
  • Close gap between O(lnn) inapproximability bound
    and O(L lnn) approximation factor for minimum
    multi-colored subgraph problem
  • Approximation algorithms for partition into
    multiple multiplexed PCR reactions (Aumann et al.
    WABI03)

25
Acknowledgments
  • Kishori Konwar
  • Alex Russell
  • Alex Shvartsman
  • Financial support from UCONN Research Foundation
Write a Comment
User Comments (0)
About PowerShow.com