Title: Probe design for microarrays using OligoWiz
1Probe designfor microarraysusing OligoWiz
Rasmus Wernersson, Assistant Professor Center for
Biological Sequence Analysis Technical University
of Denmark
2Probe design
for microarrays
- What is a Probe
- OligoWiz
- Probe Design
- Cross Hybridization and Complexity
- Affinity
- Position
3The DNA Array Analysis Pipeline
Question Experimental Design
Sample Preparation Hybridization
Array design Probe design
Buy Chip/Array
Image analysis
Normalization
Expression Index Calculation
Comparable Gene Expression Data
Statistical Analysis Fit to Model (time series)
Advanced Data Analysis Clustering PCA Classificat
ion Promoter Analysis Meta analysis Survival
analysis Regulatory Network
4An Ideal Probe
must
- - Discriminate well between its intended target
and all other targets in the target pool - - Detect concentration differences under the
hybridization conditions
5Probe Type
comparisons
6OligoWiz a Tool
for flexible probe design
7About OligoWiz
How and Who
OligoWiz 2.0 is a client-server application for
designing oligonucleotides for microarrays The
OligoWiz client (the graphical interface) is
written in Java 1.4 and runs on virtually all
platforms The OligoWiz Server performs the
heavy-duty computation and is hosted on a
multi-CPU Altix server at CBS. OligoWiz is
created by Henrik Bjørn Nielsen and Rasmus
Wernersson both at the Center for Biological
Sequence Analysis at the Technical University of
Denmark.
8About the OligoWiz scores
- All scores are normalize to a value between 0.0
(worst) and 1.0 (best). - All scores are independent and is assigned a
user-adjustable weight. - A total score is calculated as the sum of all
weighted scores and is normalized to a value
between 0.0 and 1.0.
9How to Avoid
cross-hybridization
From Kane et al. (2000) we learn that a 50mer
probe can detect significant false signal from a
target that has gt75-80 homology to a 50mer
oligo or a continuous stretch of gt15
complementary bases If we have substantial
sequence information on the given organism, we
can try to avoid this by choosing oligos that are
not similar to any other expressed sequences.
10Probe Specificity
Hughes et al. 2001
11Mapping Regions
without similarity to other transcripts
The Sequence we want to design a probe for
5
3
BLAST hits gt75 longer than 15bp
Regions suitable for probes
12Filtering Self Detecting
BLAST hits out
The Sequence we want to design a oligo for
5
3
BLAST hits gt75 longer than 15bp
Sequence identical or very similar to the query
sequence
Therefore no BLAST hits with homology gt 97
and with a hit length vs. query length ratio gt
0.8, are considered.
13Cross-hybridization
expressed as a homology score
Only BLAST hits that passed filtering are
considered If m is the number of BLAST hits
considered in position i. Let h(h1 i,...,hm i)
be the BLAST hits in position i in the
oligo Where n is the length of the oligo
14Similar Affinity
for all oligos
Another way of ensuring a optimal discrimination
between target and non-target under hybridization
is to design all the oligos on an array with
similar affinity for their targets. This will
allow the experimentalist to optimize the
hybridization conditions for all oligos by
choosing the right hybridization temperature and
salt concentration. Commonly Melting Temperature
(Tm) is used as a measure for DNADNA or RNADNA
hybrid affinity.
15Melting Temperature
difference
Where DH (Kcal/mol) is the sum of the nearest
neighbor enthalpy, A is a constant for helix
initiation corrections, DS is the sum of the
nearest neighbor entropy changes, R is the Gas
Constant (1.987 cal deg-1 mol-1) and Ct is the
total molar concentration of strands. Where
N is all oligos in all sequences.
16Tm distributions
for 30mers and 50mers
17DTm Distribution
for oligo length intervals
18Avoid self annealing oligos
Sensitivity may be influenced
Probes that form strong hybrids with it self i.e.
probes that fold should be avoided. But,
accurate folding algorithms like the one employed
by mFOLD or RNAfold, is too time consuming, for
large scale folding of oligos.
Time consumption mFOLD 2 sec / 30mer Pr. gene
(500bp) 16 min.
19Folding an oligonucleotide
an approximation
AT TG CT .........................................
......................................CG GT TT
Dynamic programming alignment to inverted self
The alignment is based on dinucleotides
AT TG CT .........................................
.....................................CG GT TT
.
20Folding a lot of oligos
a fast heuristic implementation
21Reasonably folding prediction
compared to mFOLD
22Probes With Very Common
sub sequences may result in unspecific signal
If the sub-fractions of an oligo are very common
we define it as low-complex
Oligo with low-complexity AAAAAAAGGAGTTTTTTTTCAAA
AAACTTTTTAAAAAAGCTTTAGGTTTTTA (Human) Oligo
without low-complexity CGTGACTGACAGCTGACTGCTAGCCA
TGCAACGTCATAGTACGATGACT (Human)
23Low-complexity
expressed as a score
For a given transcriptome a list of information
content from all words with length wl (8bp) is
calculated
Where f(w) is the number of occurrences of a
pattern and tf(w) is the total number of patterns
of length wl. A low-complexity score for a
given oligo is defined as Low-complexity
1-norm
Where norm is a function that normalizes to
between 1 and 0, L is the length of the oligo and
Wi is the pattern in position i.
24Location of Oligo
within transcript
- Labeling include reverse transcription of the
mRNA - and is sensitive to
- - RNA degradation
- Premature termination of cDNA synthesis
- - Premature termination of cRNA transcription
(IVT) - A Position Score reflecting this (eukaryotes)
- Position score (1-drp)D3end
- Where drp is the chance of labeling termination
pr. base
25Species databases
215 species currently available
- The species databases are built from complete
genomic sequences or UniGene collections in the
case of Vertebrates. - The databases are used for
- Cross hybridization
- Low-complexity
26Sequence Features
Intron/Exon structure, UTR regions etc.
- Special purpose arrays
- Example Detecting Differential splicing
Exon
Exon
Intron
Exon
Exon
27Annotation String
- single letter code
Single letter code. Sequence ATGTCTACATATGAAGG
TATGTAA Annotation (EEEEEEEEEEEEEE)DIIIIIII E
Exon I Intron ( Start of exon ) End of
exon D Donor site A Accepter site
28Extracting annotation
from GenBank files
- FeatureExtract server
- www.cbs.dtu.dk/services/FeatureExtract
29Excercise
- Running OligoWiz 2.0
- Java 1.4.1 or better required
- Input data
- Sequence only (FASTA)
- Sequence and annotation
- Rule-based placement of multiple probes
- Distance criteria
- Annotation criteria
- Please go to the exercise web-page linked from
the course programme.