Title: Minimum PCR Primer Set Selection with Amplification Length and Uniqueness Constraints
1Minimum PCR Primer Set Selection with
Amplification Length and Uniqueness Constraints
- Ion Mandoiu
- University of Connecticut
- CSE Department
2Combinatorial Optimization Applications in
Bioinformatics
- Fast growing number of applications
- Dynamic Programming Integer Programming in
sequence alignment - TSP and Euler paths in DNA sequencing
- Integer Programming in Haplotype inference
- Integer Programming approximation algorithms
for efficient pathogen identification (string
barcoding)
3High-Thrughput Assay Design
- New source of combinatorial problems
- Microarray probe selection
- Mask design for Affy arrays
- Universal tag arrays
- Self-assembling microarrays
- Quality control
-
- This talk Multiplex PCR primer set selection
- Optimization goals
- Improved speed
- High reliability
- Reduced COST
4Outline
- Motivation and problem formulations
- Greedy algorithm for primer set selection with
amplification length constraints - LP-rounding algorithm for primer set selection
with uniqueness constraints - Experimental results
- Conclusions
5 Uniplex PCR
6 Primer Pair Selection Problem
3'
5'
Reverse primer
? L
? L
Forward primer
3'
5'
amplification locus
- Given
- Genomic sequence around amplification locus
- Primer length k
- Amplification upperbound L
- Find Forward and reverse primers of length k
that hybridize within a distance of L of each
other and optimize amplification efficiency
(melting temperatures, secondary structure, cross
hybridization, etc.)
7Motivation for Primer Set Selection (1)
- Spotted microarray synthesis Fernandes and
Skiena02 - Need unique pair for each amplification product,
but primers can be reused to minimize cost - Potential to reduce primers from O(n) to O(n1/2)
for n products
8Motivation for Primer Set Selection (2)
- SNP Genotyping
- Thousands of SNPs that must genotyped using
hybridization based methods (e.g., SBE) - Selective PCR amplification needed to improve
accuracy of detection steps (whole-genome
amplification not appropriate) - No need for unique amplification!
- Primer minimization is critical
- Fewer primers to buy
- Fewer multiplex PCR reactions
9 Primer Set Selection Problem
- Given
- Genomic sequences around each amplification
locus - Primer length k
- Amplification upperbound L
- Find
- Minimum size set of primers S of length k such
that, for each amplification locus, there are two
primers in S hybridizing to the forward and
reverse sequences within a distance of L of each
other - For some applications S should contain a unique
pair of primers amplifying each each locus
10 Previous Work (1)
- Pearson et al. 96LinhartShamir02Souvenir
et al.03 - - Separately select forward and reverse primers
- - To enforce bound of L on amplification length,
select only primers that are within a distance of
L/2 of the target SNP - Ignores half of the feasible primer pairs
- Solution can increase by a factor of O(n) by
ignoring them! - Greedy set cover algorithm gives O(ln n)
approximation factor for this formulation - Cannot approximate better unless PNP
11 Previous Work (2)
- FernandesSkiena02 model primer selection as
a minimum multicolored subgraph problem - Vertices of the graph correspond to candidate
primers - There is an edge colored by color i between
primers u and v if they hybridize to i-th forward
and reverse sequences within a distance of L - Goal is to find minimum size set of vertices
inducing edges of all colors - No non-trivial approximation factor known
previously
12 Selection w/o Uniqueness Constraints
- Can be seen as a simultaneous set covering
problem - - The ground set is partitioned into n disjoint
sets, each with 2L elements - The goal is to select a minimum number of sets
( primers) that cover at least half of the
elements in each partition - Naïve modifications of the greedy set cover
algorithm do not work - Key idea use potential function ? for a partial
solution P minium number of elements that are
not yet covered as measure of infeasibility - Initially, ? nL
- For feasible solutions, ? 0
13 Potential-Function Driven Greedy
- Select a primer that decreases the potential
function ? by the largest amount (breaking ties
arbitrarily) - Repeat until feasibility is achived
- Lemma Each greedy selection reduces ? by a
factor of at least (1-1/OPT) - Theorem The number of primers selected by the
greedy algorithm is at most ln(nL) larger than
the optimum
14 Selection w/ Uniqueness Constraints
- Can be modeled as minimum multicolored subgraph
problem add edge colored by color i between two
primers if they amplify i-th SNP and do not
amplify any other SNP - Trivial approximation algorithm select 2
primers for each SNP - O(n1/2) approximation since at least n1/2
primers required by every solution - Non-trivial approximation?
15 Integer Program Formulation
- Variable xu for every vertex (candidate primer)
u - xu set to 1 if u is selected, and to 0 otherwise
- Variable ye for every edge e
- ye set to 1 if corresponding primer pair
selected to amplify one of the SNPs - Objective minimize sum of xus
- Constraints
- for each i, sum of ye e amplifying SNP i ? 1
- ye ? xu for every e incident to u
16 LP-Rounding Algorithm
- Solve linear programming relaxation
- Select node u with probability xu
- Theorem With probability of at least 1/3, the
number of selected nodes is within a factor of
O(m1/2lnn) of the optimum, where m is the maximum
number of edges sharing the same color. - For primer selection, m ? L2 ? approximation
factor is O(Lln n)
17 Experimental Setting
- SNP sets extracted from NCBI databases
randomly generated - C/C code run on a 2.8GHz Dell PowerEdge
running Linux - Compared algorithms
- G-FIX greedy primer cover algorithm of Pearson
et al. - - Primers restricted to be within L/2 of
amplified SNPs - G-VAR naïve modification of G-FIX
- For each SNP, first selected primer can be L
bases away from SNP - If first selected primer is L1 bases away from
the SNP, opposite sequence is truncated to a
length of L- L1 - G-POT potential function driven greedy
algorithm - MIPS-PT iterative beam-search heuristic of
Souvenir et al (WABI03)
18 Experimental Results, NCBI tests
19 Experimental Results, k8
20 Experimental Results, k10
21 Experimental Results, k12
22 Runtime, k10
23Conclusions
- New combinatorial optimization problems arising
in the area of high-throughput assay design - Theoretical insights (such as approximation
results) give algorithms with significant
practical improvements - Choosing the proper problem model is critical to
solution efficiency
24Ongoing Work Open Problems
- Allow degenerate primers
- Incorporate more biochemical constraints into the
model (melting temperature, secondary structure,
cross hybridization, etc.) - Close gap between O(lnn) inapproximability bound
and O(L lnn) approximation factor for minimum
multi-colored subgraph problem - Approximation algorithms for partition into
multiple multiplexed PCR reactions (Aumann et al.
WABI03)
25Acknowledgments
- Kishori Konwar
- Alex Russell
- Alex Shvartsman
- Financial support from UCONN Research Foundation
-