Minimum PCR Primer Set Selection with Amplification Length and Uniqueness Constraints

About This Presentation

Title:

Minimum PCR Primer Set Selection with Amplification Length and Uniqueness Constraints

Description:

Minimum PCR Primer Set Selection with Amplification Length and Uniqueness Constraints Ion Mandoiu University of Connecticut CS&E Department Combinatorial Optimization ... –

Number of Views:64

Avg rating:3.0/5.0

Slides: 25

Provided by: dnaEngrU

Learn more at: https://dna.engr.uconn.edu

Category:

more less

Transcript and Presenter's Notes

Title: Minimum PCR Primer Set Selection with Amplification Length and Uniqueness Constraints

1
Minimum PCR Primer Set Selection with
Amplification Length and Uniqueness Constraints

Ion Mandoiu
University of Connecticut
CSE Department

2
Combinatorial Optimization Applications in
Bioinformatics

Fast growing number of applications
Dynamic Programming Integer Programming in
sequence alignment
TSP and Euler paths in DNA sequencing
Integer Programming in Haplotype inference
Integer Programming approximation algorithms
for efficient pathogen identification (string
barcoding)

3
High-Thrughput Assay Design

New source of combinatorial problems
Microarray probe selection
Mask design for Affy arrays
Universal tag arrays
Self-assembling microarrays
Quality control
This talk Multiplex PCR primer set selection
Optimization goals
Improved speed
High reliability
Reduced COST

4
Outline

Motivation and problem formulations
Greedy algorithm for primer set selection with
amplification length constraints
LP-rounding algorithm for primer set selection
with uniqueness constraints
Experimental results
Conclusions

5
Uniplex PCR

6
Primer Pair Selection Problem
3'
5'
Reverse primer
? L
? L
Forward primer
3'
5'
amplification locus

Given
Genomic sequence around amplification locus
Primer length k
Amplification upperbound L
Find Forward and reverse primers of length k
that hybridize within a distance of L of each
other and optimize amplification efficiency
(melting temperatures, secondary structure, cross
hybridization, etc.)

7
Motivation for Primer Set Selection (1)

Spotted microarray synthesis Fernandes and
Skiena02
Need unique pair for each amplification product,
but primers can be reused to minimize cost
Potential to reduce primers from O(n) to O(n1/2)
for n products

8
Motivation for Primer Set Selection (2)

SNP Genotyping
Thousands of SNPs that must genotyped using
hybridization based methods (e.g., SBE)
Selective PCR amplification needed to improve
accuracy of detection steps (whole-genome
amplification not appropriate)
No need for unique amplification!
Primer minimization is critical
Fewer primers to buy
Fewer multiplex PCR reactions

9
Primer Set Selection Problem

Given
Genomic sequences around each amplification
locus
Primer length k
Amplification upperbound L
Find
Minimum size set of primers S of length k such
that, for each amplification locus, there are two
primers in S hybridizing to the forward and
reverse sequences within a distance of L of each
other
For some applications S should contain a unique
pair of primers amplifying each each locus

10
Previous Work (1)

Pearson et al. 96LinhartShamir02Souvenir
et al.03
- Separately select forward and reverse primers
- To enforce bound of L on amplification length,
select only primers that are within a distance of
L/2 of the target SNP
Ignores half of the feasible primer pairs
Solution can increase by a factor of O(n) by
ignoring them!
Greedy set cover algorithm gives O(ln n)
approximation factor for this formulation
Cannot approximate better unless PNP

11
Previous Work (2)

FernandesSkiena02 model primer selection as
a minimum multicolored subgraph problem
Vertices of the graph correspond to candidate
primers
There is an edge colored by color i between
primers u and v if they hybridize to i-th forward
and reverse sequences within a distance of L
Goal is to find minimum size set of vertices
inducing edges of all colors
No non-trivial approximation factor known
previously

12
Selection w/o Uniqueness Constraints

Can be seen as a simultaneous set covering
problem
- The ground set is partitioned into n disjoint
sets, each with 2L elements
The goal is to select a minimum number of sets
( primers) that cover at least half of the
elements in each partition
Naïve modifications of the greedy set cover
algorithm do not work
Key idea use potential function ? for a partial
solution P minium number of elements that are
not yet covered as measure of infeasibility
Initially, ? nL
For feasible solutions, ? 0

13
Potential-Function Driven Greedy

Select a primer that decreases the potential
function ? by the largest amount (breaking ties
arbitrarily)
Repeat until feasibility is achived
Lemma Each greedy selection reduces ? by a
factor of at least (1-1/OPT)
Theorem The number of primers selected by the
greedy algorithm is at most ln(nL) larger than
the optimum

14
Selection w/ Uniqueness Constraints

Can be modeled as minimum multicolored subgraph
problem add edge colored by color i between two
primers if they amplify i-th SNP and do not
amplify any other SNP
Trivial approximation algorithm select 2
primers for each SNP
O(n1/2) approximation since at least n1/2
primers required by every solution
Non-trivial approximation?

15
Integer Program Formulation

Variable xu for every vertex (candidate primer)
u
xu set to 1 if u is selected, and to 0 otherwise
Variable ye for every edge e
ye set to 1 if corresponding primer pair
selected to amplify one of the SNPs
Objective minimize sum of xus
Constraints
for each i, sum of ye e amplifying SNP i ? 1
ye ? xu for every e incident to u

16
LP-Rounding Algorithm

Solve linear programming relaxation
Select node u with probability xu
Theorem With probability of at least 1/3, the
number of selected nodes is within a factor of
O(m1/2lnn) of the optimum, where m is the maximum
number of edges sharing the same color.
For primer selection, m ? L2 ? approximation
factor is O(Lln n)

17
Experimental Setting

SNP sets extracted from NCBI databases
randomly generated
C/C code run on a 2.8GHz Dell PowerEdge
running Linux
Compared algorithms
G-FIX greedy primer cover algorithm of Pearson
et al.
- Primers restricted to be within L/2 of
amplified SNPs
G-VAR naïve modification of G-FIX
For each SNP, first selected primer can be L
bases away from SNP
If first selected primer is L1 bases away from
the SNP, opposite sequence is truncated to a
length of L- L1
G-POT potential function driven greedy
algorithm
MIPS-PT iterative beam-search heuristic of
Souvenir et al (WABI03)

18
Experimental Results, NCBI tests
19
Experimental Results, k8
20
Experimental Results, k10
21
Experimental Results, k12
22
Runtime, k10
23
Conclusions

New combinatorial optimization problems arising
in the area of high-throughput assay design
Theoretical insights (such as approximation
results) give algorithms with significant
practical improvements
Choosing the proper problem model is critical to
solution efficiency

24
Ongoing Work Open Problems

Allow degenerate primers
Incorporate more biochemical constraints into the
model (melting temperature, secondary structure,
cross hybridization, etc.)
Close gap between O(lnn) inapproximability bound
and O(L lnn) approximation factor for minimum
multi-colored subgraph problem
Approximation algorithms for partition into
multiple multiplexed PCR reactions (Aumann et al.
WABI03)

25
Acknowledgments