Probe design for microarrays using OligoWiz - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Probe design for microarrays using OligoWiz

Description:

for flexible probe design ... The Sequence we want to design a oligo for. Only BLAST hits that passed filtering are considered ... – PowerPoint PPT presentation

Number of Views:128

Avg rating:3.0/5.0

Slides: 28

Provided by: rasmuswe

Category:

more less

Transcript and Presenter's Notes

Title: Probe design for microarrays using OligoWiz

1
Probe designfor microarraysusing OligoWiz
Rasmus Wernersson, Assistant Professor Center for
Biological Sequence Analysis Technical University
of Denmark
2
Probe design
for microarrays

What is a Probe
OligoWiz
Probe Design
Cross Hybridization and Complexity
Affinity
Position

3
The DNA Array Analysis Pipeline
Question Experimental Design
Sample Preparation Hybridization
Array design Probe design
Buy Chip/Array
Image analysis
Normalization
Expression Index Calculation
Comparable Gene Expression Data
Statistical Analysis Fit to Model (time series)
Advanced Data Analysis Clustering PCA Classificat
ion Promoter Analysis Meta analysis Survival
analysis Regulatory Network
4
An Ideal Probe
must

- Discriminate well between its intended target
and all other targets in the target pool
- Detect concentration differences under the
hybridization conditions

5
Probe Type
comparisons

6
OligoWiz a Tool
for flexible probe design
7
About OligoWiz
How and Who
OligoWiz 2.0 is a client-server application for
designing oligonucleotides for microarrays The
OligoWiz client (the graphical interface) is
written in Java 1.4 and runs on virtually all
platforms The OligoWiz Server performs the
heavy-duty computation and is hosted on a
multi-CPU Altix server at CBS. OligoWiz is
created by Henrik Bjørn Nielsen and Rasmus
Wernersson both at the Center for Biological
Sequence Analysis at the Technical University of
Denmark.
8
About the OligoWiz scores

All scores are normalize to a value between 0.0
(worst) and 1.0 (best).
All scores are independent and is assigned a
user-adjustable weight.
A total score is calculated as the sum of all
weighted scores and is normalized to a value
between 0.0 and 1.0.

9
How to Avoid
cross-hybridization
From Kane et al. (2000) we learn that a 50mer
probe can detect significant false signal from a
target that has gt75-80 homology to a 50mer
oligo or a continuous stretch of gt15
complementary bases If we have substantial
sequence information on the given organism, we
can try to avoid this by choosing oligos that are
not similar to any other expressed sequences.
10
Probe Specificity
Hughes et al. 2001
11
Mapping Regions
without similarity to other transcripts
The Sequence we want to design a probe for
5
3
BLAST hits gt75 longer than 15bp
Regions suitable for probes
12
Filtering Self Detecting
BLAST hits out
The Sequence we want to design a oligo for
5
3
BLAST hits gt75 longer than 15bp
Sequence identical or very similar to the query
sequence
Therefore no BLAST hits with homology gt 97
and with a hit length vs. query length ratio gt
0.8, are considered.
13
Cross-hybridization
expressed as a homology score
Only BLAST hits that passed filtering are
considered If m is the number of BLAST hits
considered in position i. Let h(h1 i,...,hm i)
be the BLAST hits in position i in the
oligo Where n is the length of the oligo
14
Similar Affinity
for all oligos
Another way of ensuring a optimal discrimination
between target and non-target under hybridization
is to design all the oligos on an array with
similar affinity for their targets. This will
allow the experimentalist to optimize the
hybridization conditions for all oligos by
choosing the right hybridization temperature and
salt concentration. Commonly Melting Temperature
(Tm) is used as a measure for DNADNA or RNADNA
hybrid affinity.
15
Melting Temperature
difference
Where DH (Kcal/mol) is the sum of the nearest
neighbor enthalpy, A is a constant for helix
initiation corrections, DS is the sum of the
nearest neighbor entropy changes, R is the Gas
Constant (1.987 cal deg-1 mol-1) and Ct is the
total molar concentration of strands. Where
N is all oligos in all sequences.
16
Tm distributions
for 30mers and 50mers
17
DTm Distribution
for oligo length intervals
18
Avoid self annealing oligos
Sensitivity may be influenced
Probes that form strong hybrids with it self i.e.
probes that fold should be avoided. But,
accurate folding algorithms like the one employed
by mFOLD or RNAfold, is too time consuming, for
large scale folding of oligos.
Time consumption mFOLD 2 sec / 30mer Pr. gene
(500bp) 16 min.
19
Folding an oligonucleotide
an approximation

AT TG CT .........................................
......................................CG GT TT
Dynamic programming alignment to inverted self
The alignment is based on dinucleotides
AT TG CT .........................................
.....................................CG GT TT

.
20
Folding a lot of oligos
a fast heuristic implementation
21
Reasonably folding prediction
compared to mFOLD
22
Probes With Very Common
sub sequences may result in unspecific signal
If the sub-fractions of an oligo are very common
we define it as low-complex
Oligo with low-complexity AAAAAAAGGAGTTTTTTTTCAAA
AAACTTTTTAAAAAAGCTTTAGGTTTTTA (Human) Oligo
without low-complexity CGTGACTGACAGCTGACTGCTAGCCA
TGCAACGTCATAGTACGATGACT (Human)
23
Low-complexity
expressed as a score
For a given transcriptome a list of information
content from all words with length wl (8bp) is
calculated
Where f(w) is the number of occurrences of a
pattern and tf(w) is the total number of patterns
of length wl. A low-complexity score for a
given oligo is defined as Low-complexity
1-norm
Where norm is a function that normalizes to
between 1 and 0, L is the length of the oligo and
Wi is the pattern in position i.
24
Location of Oligo
within transcript

Labeling include reverse transcription of the
mRNA
and is sensitive to
- RNA degradation
Premature termination of cDNA synthesis
- Premature termination of cRNA transcription
(IVT)
A Position Score reflecting this (eukaryotes)
Position score (1-drp)D3end
Where drp is the chance of labeling termination
pr. base

25
Species databases
215 species currently available

The species databases are built from complete
genomic sequences or UniGene collections in the
case of Vertebrates.
The databases are used for
Cross hybridization
Low-complexity

26
Sequence Features
Intron/Exon structure, UTR regions etc.

Special purpose arrays
Example Detecting Differential splicing

Exon
Exon
Intron
Exon
Exon
27
Annotation String
- single letter code
Single letter code. Sequence ATGTCTACATATGAAGG
TATGTAA Annotation (EEEEEEEEEEEEEE)DIIIIIII E
Exon I Intron ( Start of exon ) End of
exon D Donor site A Accepter site
28
Extracting annotation
from GenBank files