Title: PatternHunter: faster and more sensitive homology search
1PatternHunter faster and more sensitive homology
search
- By Bin Ma, John Tromp and Ming Li
B92902019 ??? B92902033 ??? B92902039
??? B92902072 ??? B92902086 ??? B92902087 ???
2Agenda
- PatternHunter
- Spaced Seed
- Algorithm
- Performance
- PatternHunter II
- Algorithm
- Performance
- Translated PatternHunter
3PatternHunter Spaced Seed
4Outline
- A short review about BLAST.
- Some definition and background.
- Whats the difference and the same between BLAST
and PatternHunter. - Why PatternHunter is better??
- Nonconsecutive seeds
- Proof
5Blast Algorithm
- Find seeded matches
- Extent to HSPs (High scoring Segment Pairs)
- Gapped Extension, dynamic programming
- Report significant local alignments
6A short review about BLAST
- Find hits.
- BLAST first scans the database for words that
score at least T when aligned with some word
within the query sequence. Any aligned word pair
satisfying this condition is called a hit.
7A short review about BLAST
- Find HSPs
- HSP (High scoring Segment Pair) is much longer
than a single word pair, and may therefore
entail multiple hits on the same diagonal within
a relative shot distance of one another.
8A short review about BLAST
- Generate gapped alignment
- This means that two or more HSPs in BLAST with
scores well below 38 bits can, in combination,
rise to statistical significance. If any one of
these HSPs is missed, so may be the combined
result.
9A short review about BLAST
- In summary, the new gapped BLAST algorithm
requires two non-overlapping hits of score at
least T, within a distance A of one another, to
invoke an ungapped extension of the second hit.
If the HSP generated normalized score at least Sg
bits, then a gapped extension is triggered.
10Some definition, some background
- Similarity
- How similar it is between two sequences?
- Usually mean that the probability of the same
symbol appear in anywhere of two sequences. - Sensitivity
- The probability to find a local alignment.
- Specificity
- In all local alignments, how many alignments are
homologous.
11Define the Seed
Reference Bin Ma, John Tromp, Ming Li
Bioinformatics Vol. 18 no. 3 2002
- Defining the seed
- w -gt weight or number of positions to match
- Blastn 11 MegaBlast 28
- model -gt relative position of letters for each w
- m -gt length of model window
12Reference Bin Ma, John Tromp, Ming Li
Bioinformatics Vol. 18 no. 3 2002
Seed Parameters
w 11
letters
0, 1
- 1 1 1 0 1 0 0 1 0 1 0 0 1 1 0 1 1 1
m 18
model
1 exact match required 0 no match required,
any value
Patternhunter most sensitive model
Blastn seed is all 1s
13Seed, Hit, Homology
Reference Bin Ma, John Tromp, Ming Li
Bioinformatics Vol. 18 no. 3 2002
- What is a seed?
- Seeds determine how an algorithm looks for hits
- What is a hit?
- Hits indicate a similarity that may indicate a
homology
14Reference Bin Ma, John Tromp, Ming Li
Bioinformatics Vol. 18 no. 3 2002
hit
GCNTACACGTCACCATCTGTGCCACCACNCATGTCTCTAGTGATCCCTCA
TAAGTTCCAACAAAGTTTGC
GCCTACACACCGCCAGTTGTG-TTCCTGCTATGTCTCTAGTGAT
CCCTGAAAAGTTCCAGCGTATTTTGC GAGTACTCAACACCAACATTGA
TGGGCAATGGAAAATAGCCTTCGCCATCACACCATTAAGGGTGA----
GAATACTCAACAGCAACATCAAC
GGGCAGCAGAAAATAGGCTTTGCCATCACTGCCATTAAGGATGTGGG -
-----------------TGTTGAGGAAAGCAGACATTGACCTCACCGAGA
GGGCAGGCGAGCTCAGGTA
TTGACAGTACACTCATAGTGTTGAGGAAAGCTGACGTTGACCTCACC
AAGTGGGCAGGAGAACTCACTGA GGATGAGGTGGAGCATATGATCACC
ATCATACAGAACTCAC-------CAAGATTCCAGACTGGTTCTTG
GGATGAGATGGAACGTGTGATGACCAT
TATGCAGAATCCATGCCAGTACAAGATCCCAGACTGGTTCTTG
Human-Mouse genome homology
15Example
Reference Bin Ma, John Tromp, Ming Li
Bioinformatics Vol. 18 no. 3 2002
- Consider the following two sequences
- GAGTACTCAACACCAACATCAGTGGGCAATGGAAAAT
-
- GAATACTCAACAGCAACATCAATGGGCAGCAGAAAAT
- Whats the differences in finding the seed
between Blast and PatternHunter?
16BLAST usesconsecutive seeds
Reference Bin Ma, John Tromp, Ming Li
Bioinformatics Vol. 18 no. 3 2002
- In BLAST, we often use the consecutive model with
weight 11. - GAGTACTCAACACCAACATCAGTGGGCAATGGAAAAT
-
- GAATACTCAACAGCAACATCAATGGGCAGCAGAAAAT
- ? 11111111111 ? ? ? 11111111111 ?
- However, it fails to find the alignment in the
two sequence.
17Consecutive seeds
Reference Bin Ma, John Tromp, Ming Li
Bioinformatics Vol. 18 no. 3 2002
- Theres also a dilemma for BLAST type of search.
- Dilemma
- Sensitivity needs shorter seeds
- too many random hits, slow computation
- Speed needs longer seeds
- lose distant homologies
18PatternHunter uses non-consecutive seed
Reference Bin Ma, John Tromp, Ming Li
Bioinformatics Vol. 18 no. 3 2002
- In PatternHunter, we often use the spaced model
with weight 11 and length 18. - GAGTACTCAACACCAACATCAGTGGGCAATGGAAAAT
-
- GAATACTCAACAGCAACATCAATGGGCAGCAGAAAAT
- 111010010100110111
19Consecutive vs. Nonconsecutive?
Reference Bin Ma, John Tromp, Ming Li
Bioinformatics Vol. 18 no. 3 2002
- The non-consecutive seed is the primary
difference and strength of Patternhunter - Blastn
- 1 1 1 1 1 1 1 1 1 1 1
- PatternHunter
- 1 1 1 0 1 0 0 1 0 1 0 0 1 1 0 1 1 1
20A trivial comparison between spaced and
consecutive seed
Reference Ming Li, NHC2005
- Consider 111 and 1101.
- To fail seed 111, we can use
- 110110110110
- 66.66 similarity
- But we can prove, seed 1101 will hit every region
with 61 similarity for sufficient long region.
21Proof
Reference Ming Li, NHC2005
- Suppose there is a length 100 region which is not
hit by 1101. - We can break the region into blocks of 1a0b.
Besides the last block, the other blocks have the
following few cases - 10b for bgt1
- 110b for bgt2
- 1110b for bgt2
- In each block, similarity lt 3/5.
- The last block has at most 3 matches.
- So, in total there are at most 61 matches in 100
positions. The similarity is lt61.
22Formalize
Reference Ming Li, NHC2005
- Given i.i.d. sequence (homology region) with
Pr(1)p and Pr(0)1-p for each bit - 1100111011101101011101101011111011101
- Which seed is more likely to hit this region
- BLAST seed 11111111111
- Spaced seed 11111111111
11111111111
23Expect Less, Get More
Reference Ming Li, NHC2005
- Lemma The expected number of hits of a weight W
length M seed model within a length L region with
homology level p is - (L-M1)pW
- Proof. E(hits) ?i1 L-M1 pW
- Example In a region of length 64 with p0.7
- Pr(BLAST seed hits)0.3
- E( of hits by BLAST seed)1.07
- Pr(optimal spaced seed hits)0.466, 50 more
- E( of hits by spaced seed)0.93, 14 less
24Why Is Spaced Seed Better?
Reference Ming Li, NHC2005
- A wrong, but intuitive, proof seed s, interval
I, similarity p - E(hits) Pr(s hits) E(hits s hits)
- Thus
- Pr(s hits) Lpw / E(hits s hits)
- For optimized spaced seed, E(hits s hits)
- 11111111111 Non overlap
Prob - 11111111111 6
p6 - 11111111111 6
p6 - 11111111111 6
p6 - 11111111111 7
p7 - ..
- For spaced seed the divisor is 1p6p6p6p7
- For BLAST seed the divisor is bigger 1 p p2
p3
25Simulated sensitivity curves
Reference Ming Li, NHC2005
26Observations of spaced seeds
Reference Ming Li, NHC2005
- Seed models with different shapes can detect
different homologies. - Two consequences
- Some models may detect more homologies than
others - More sensitive homology search
- PatternHunter I
- Can use several seed models simultaneously to hit
more homologies - Approaching 100 sensitive homology search
- PatternHunter II
27PatternHunter Algorithm Performance
28Outline
- Hit generation
- Hit extension
- Gapped extension
- Performance
29Hit generation
- Index created for each position in the query
sequence
30Hit generation
- Similar to MegaBlast Hash tables
- Encode ATCG into binary code
- 00, 01, 10, 11 respectively
- Find each situations in one of the sequence and
record the offsets in the hash table
31Hit generation
- An example
- Now we want to find hits between sequences S and
T
32Spaced seed
- For sequence T
- Model
- Seed
A 00 T 01 C 10 G
11
Scan
A T A T G C A T
1 1 0 1 0 1 1 0
??
??
A T T C A
0001011000 88
Weight5 ? the value is between 0210-1
33After filling in the hash table
???
Position in T
- For each position in S
- Calculate int value
- 2. Find hits in S by the lookup value
0
1
2
3
10 19 34
(NULL)
14
10 48 134
???
???
87
88
2 8 33
???
34Hash tables space required
???
Position in T
0
34
19
10
4w integers T integers Total 4(w1)4T
bytes
1
(NULL)
14
2
3
134
48
10
???
???
87
88
33
8
2
???
35Cost a lot to make a hash table?
- If the number of hits found for one index is
large, the cost of computing index is relatively
negligible.
36Hit extension
- HSP Highscoring Segment Pair
- Scan those hits with a window, and choose the
highest-scored one.
37Hit extension
S
The chosen hit
T
38Hit extension
- Set the mid point of the chosen hit as the cut
point, split the graph into 4
39Hit extension
S
T
40Hit extension
- And then do the Smith-Waterman in 2 of the 4,
until it reaches the dropoff score.
41Hit extension
S
Smith-Waterman
Cost1/2O(mn)
Smith-Waterman
T
42Hit extension
- If the resulting segment pair has a score below
certain minimum, then ignore it. - Else we gain a HSP and do the next step-gap
extension.
43Hit extension
- A question when doing extension in 2 ways, how
to synchronize the score?
44Gapped Extension
- To find the best way to extend an HSP to the left
across gaps. - To extend an HSP we try all candidates from a
diagonal-sorted set. - Penalty for gap open gap extension cropping
45Gapped Extension
Search front
46From left to right
Optimal Left
Too Far Right
Too Far Right
Optimal Left
47From left to right
Optimal Left
Too Far Right
48Descriptions in the paper
- We use a red-black tree for this.
- Insert HSP when the optimal alignment to its left
is found - Retired from the tree once newly generated HSPs
are too far beyond its right endpoint to make use
of it.
49Thought 1
- The first one will be inserted ? Fast
50Thought 1
- May not find the best one
End
Start
Better
Worse
51Thought 2
- Insert HSP when the optimal alignment to its left
is found
Not complete HSP
52Thought 2
Insert both HSPs
Far but long (Good)
Close but short (Bad)
Next turn
53Thought 1
- Retired alignments are put into a priority queue
according to their scores.
Tree 1
Tree 2
54Performance
Ref. Altschul,S.F. et al (1997) Nucleic Acids
Res., 25, 33893402.
Ref. Bin Ma, John Tromp, Ming Li Bioinformatics
Vol. 18 no. 3 2002
55PatternHunter II
56Outline
- Overview
- PatternHunter II design
- Computing hit probability
- Finding seeds set
- Seed performance
- PHII performance
57Overview
- PatternHunter spaced seed
- PH2 design for better sensitivityAchieve a
sensitivity approaching that of Smith-Waterman
with a speed similar to the default Blastn - Extend single spaced seed to multiple ones
- Two main problem
- Large memory required for multiple hash tables
- Complexity of finding optimal seed combination
58PatternHunter II design
- A hash table is built for each seeds
- All hits generated from all hash tables are used
for gap extension - In two-hit mode, two nearby hits can be from
different hash tables
59PatternHunter II design (cont.)
- Large memory problem
- Divide into smaller segments
- e.g., with k 8, w 11, and n 32 x 106,
- the hash tables use about 256MBytes of
memory - Extend alignments across division boundary
- Still may lose alignments
60Computing hit probability
- Use DP, but extend the algorithm from single seed
to multiple seeds - Definition
- Homologous region R with length L
- Substring from i to j is denoted by Ri j
- A set of k seeds A a1, ,ak
- A hits R if theres an ai that hits R
- p is called the similarity level of R if R p
identities
61Computing hit probability (cont.)
- For a binary string b and ,
define - The goal is to find f(L, e)
- For any i gt b, we have
- We can compute f(i,b) from other f(i,b)
computed earlier
62Computing hit probability (cont.)
- Definition
- b is compatible with a seed a if bb-j 1
whenever aa-j 1 for 0 lt j ? min(a, b) - Define
- B be the set of binary strings that are not hit
by A but compatible with some a in A. - B(x) denote the longest proper prefix of x in B
63Computing hit probability (cont.)
- First, eis in B
- Suppose b is in B, then b is compatible with some
a in A by definition. Therefore, 1b is also
compatible with some a in A - If 1b is not in B, it must hit some a in A, so
f(i,1b)1 - If 0b is not in B, it cannot be hit by A,
therefore it cannot be compatible with any a in
A, so f(i,0b)f(i-bb, 0b), where 0bB(0b)
64Computing hit probability (cont.)
Ref. Li,M. et al, (2004) Comput. Biol., 2,
417440.
65Computing hit probability (cont.)
- Can also compute k-hits probability
- Change f(i,b) to f(i,b,k)
- We already have k 1. By induction, compute each
f(i,b,k) from f(i,b,k-1)
66Computing hit probability (cont.)
Ref. Li,M. et al, (2004) Comput. Biol., 2,
417440.
67Computing hit probability (cont.)
- Complexity
- It is proved that computing the hit probability
of multiple seeds is NP-hard - The time complexity of the algorithm is which
68Computing hit probability (cont.)
- Implement Algorithm DP on PC
- It took 0.70 sec to compute hit probability for a
set of 16 weight-11 seeds with length lt 21 on a
random region with length 64 - It only took 0.37 sec for the same number of set
and the same length but change the weight to 12 - The running time largely depends on the maximum
number of 0 in every seed
69Finding seeds set
- Cannot enumerate all possible seed sets by
Algorithm DP - The number of them are exponential!
- Also, finding the optimal space seed set is
proved NP-hard - Use a greedy method
70Finding seeds set (cont.)
- Compute the first seed a1 which maximizes the hit
probability of the set a1 - Then computer the second seed a2 for the set a1,
a2. Then a3 - Compute ai until
- Achieve the desire number of seeds
- Achieve the desire hit probability
71Finding seeds set (cont.)
- May not optimize the hit probability
- It is still time-consuming
- e.g. It took 12 CPU days for a Pentium 4 3GHz PC
to compute a set of 16 weight-11 seeds, each of
them are no longer then 21 - It take much longer time if the seeds become
slightly longer - Need a different approach
72Finding seeds set (cont.)
- Suppose we already have N seeds, and C is the
candidate set for the (N1)-th seed - For each c in C, estimates the hit probability in
m random region samples - m is reasonably large, such as 500
- Remove the worst performing halve from C, and
increase m to 2m - Repeat until only one seed left
73Seed performance
- Two ways to increase the sensitivity
- Increase the number of seeds
- Reduce the weight of a single seed
- Both increase running time
- The sensitivity of doubling the number of seeds
is approximately equal to reducing the weight of
a single seed by 1 - At high level, doubling the number of seeds
achieves better sensitivity
74Seed performance (cont.)
- From low to high
- Solid curves using the first k(1, 2, 4, 8, 16)
weight-11 seeds - Dashed curves single optimal weight w(10, 9, 8,
7) seeds
Ref. Li,M. et al, (2004) Comput. Biol., 2,
417440.
75Comparison
- Sensitivity / Speed
- PatternHunter II
- Blast
- Smith-Waterman algorithm
- SSearch
76SSearch Configuration
- Smith-Waterman algorithm
- A sub-program in the FASTA package
- FASTA package
- ftp//ftp.virginia.edu/pub/FASTA/
77Common Environment
- Score scheme
- Match 1
- Mismatch -1
- Gapopen -5
- Gapextension -1
- Local alignments scores gt 16
78Common Environment
- DNA sequences
- 2 sets of human and mouse EST sequences
- ftp//ftp.ncbi.nlm.nih.gov/blast/db/FASTA/
- month.est_human.Z
- month.est_mouse.Z
- Pentium IV 3GHz Linux PC
79Term Explanation
- EST
- Expressed Sequence Tag
- A unique stretch of DNA within a coding region of
a gene that is useful for identifying. - A short sub-sequence of a transcribed sequence.
80Term Explanation
- Coding Regions
- Regions of DNA/RNA sequences that code for
proteins. Usually starts with a start codon (ATG)
and ends with a stop codon. - The coding region of a gene is the portion of DNA
that is transcribed into mRNA and translated into
proteins.
81Repeat Masking
- Fact
- Long sequences of identical letters
- Especially of As and Ts
- example (Will be shown later)
- Solution
- Turn all those sequences of ten or more
repetitive letters to Ns.
82SSearch Result
- Num of humans EST 4
- Num of mouses EST 2005
- EST example (show)
-
Ref. Li,M. et al, (2004) Comput. Biol., 2,
417440.
83Optimal Versus Sub-Optimal
- Neither PatternHunter nor Blast tries to compute
the optimal alignments for the homologies they
have found. - Q Why not find the optimal alignments?
- Ans
- use Blast or PH2 to detect, then compute.
84Found
- SSearch finds a local alignment
- score x
- PatternHunter II finds a local alignment
- score gt x/2
- Then found for a pair of ESTs
85Sensitivity Definition
- Smith-Waterman
- Finds y pairs of ESTs
- Local alignment score at least x
- Other programs
- y of the y pairs can be found
- With alignment score gt x/2
- Ratio y / y
86Blastn Configuration
- Version 2.2.6
- NCBIs website
- -F F option
- To turn off the low-complexity region filtering
- Weight 11 seeds
- 11111111111
87Speed comparison
Ref. Li,M. et al, (2004) Comput. Biol., 2,
417440.
88Sensitivity comparison
- From low to high
- Dashed Blastn, seed weight 11
- Solid PH II, 1, 2, 4, 8 seeds weight 11
89Compare with other seeds
- From left to right
- PH II, two weight 11 seeds
- PH II, one weight 10 seed
- 1101100101000101101
- HMM model ,
90Seed Selection
- Use heuristic or exponential time algorithms
- For general seed selection problem
- PTAS
- polynomial time approximation scheme
91Homology Search
- Time-consuming
- DNA-DNA searches
- Blastn
- translated DNA-protein searches
- tBlastx
- tPH
- protein-protein searches
- Small query and database sizes
92Conclusion
- Optimized spaced seeds
- Blastn PH II
- Same sensitivity
- Speeds up by 5-100 times
- Optimized multiple spaced seeds
- PH II Smith-Waterman
- Approximately same sensitivity
- gt1000 times faster
93Translated PatternHunter
94Outline
- Whats translated search?
- BLASTs translated search
- Translated Pattern Hunter
- Performance
95Whats translated search?
- To translate a DNA sequence into a protein
sequence for alignment with another protein
sequence - But whats translation?
96Whats translation?
- In biology, translation means to translate DNA
into amino acids (AA) with a universal genetic
code map on a 3-codon basis. - The DNA sequence is transcribed into a RNA
sequence in which all Ts are replaced by Us
97The Genetic code
- We can use translation in homology search since
the genetic code is universal - Degeneracy some DNA codons map to the same AA
- They usually differs in the third codon
- Translation is one-way DNA ? Protein
98Why we need translated search?
- When a DNA database or a Protein database is not
available - Blastx DNA query, protein database
- tBlastn protein query, DNA database
- To find very distant homologies
- tBlastx DNA query database, both translated
- Slowest but more functional structural homology
in addition to sequential homology - Why?
99Substitution Matrix
- Some AAs are similar in their chemical or
physical properties - Not only match/mismatch in substitution anymore!
- Stop codon is assigned the most negative score in
BLAST and tPH - PAM (Point Accepted Mutation)
- Based on global alignment of closely related
proteins (1 divergence for PAM1) - BLOSUM (BLOck SUbstitution Matrix)
- Based on local alignment of divergent proteins
(62 similarity for BLOSUM 62)
100Substitution Matrix
- Short alignments need to be relatively strong to
rise above background noise, so can only detect
close related homologies
Query Length Substitution Matrix Gap costs
lt35 PAM-30 (9,1)
35-50 PAM-70 (10,1)
50-85 BLOSUM-80 (10,1)
85 BLOSUM-62 (10,1)
adapted from NCBI substitution matrix
101BLASTs translated search
- The same in tBlast, tBlastn, tBlastx
- Aligns the 6-frame translations of the DNA
sequence against another protein sequence
102Reading Frame of DNA Sequence
- The DNA sequence can be read in six reading
frames, three in the forward and three in the
reverse direction.
Open Reading Frame
103BLASTs translated search
- Translate the DNA sequence into all 6 possible
frames - Align each frame against the protein sequence,
just like BLASTp. - The pairs with significant scores are reported
104How good is significant?
- The expected number of alignments scoring S or
greater between two sequences m, n is - E mnKe?S or E mne-S
- where K,?, used for normalization, depend on the
sequence composition - Different K,?is used for each frame
- Non-conding sequence tend to yield alignments of
marginal significance
105Translated PatternHunter
- The version of PH for translated search
- Compared with PatternHunter, tPH uses very
different algorithms for hit generation and
gapped extensions
106Hit Generation in tPH
- Weight 5 instead of 11
- Space complexity 520 114 in PH
- Length 6 or 7
- Does not require exact matches
- Hit all the five pairs have scores 0 and the
total score is above a tolerance T - Use BLOSUM 62
- Multiple seeds are used
107Hit Generation in tPH
Seed 1011, T7
A
A
C
G
U
U
U
U
C
U
A
C
U
A
G
A
A
A
G
A
G
C
A
Query
All possible hits
Indexed Subject
108Gapped Extension in tPH
- The same as in BLAST?
- BLAST cant handle frame shift errors
- Huh?
109Frame Shift Error
- When a single DNA is deleted/inserted, it cause
the reading frame to shift
A
A
C
G
U
U
U
U
C
U
A
C
U
A
G
A
A
A
G
A
G
A
- BLAST cant detect such variation
- It aligns the 6 frames with subject independently
- In fact, most frame shift mutations can
completely abolish the proteins function - They are usually lethal
110Frame Shift Error
- In this example
- BLAST can only find at most two separated
segments - tPH can connect them with a single deletion of
C - How?
111Gapped Extension in tPH
- tPH regards the DNA sequences as a sequence of
overlapped codons - Use a modified Smith-Waterman algorithm that can
take frame shift into account - Substitution S(i-1, j-3) s (pi, nj-2..j)
- Insertion of DNA S(i, j-1) frameshift
- Insertion of DNA S(i, j-2) frameshift
- Insertion of AA S(i, j-3) gap
- Deletion of AA S(i-1, j) gap
112Scoring Scheme
nGACACUAGAAUCG
P AspArgTyrSer
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 6 4 3
0 0 0 8
0 0 0 6
0 0 0 10
Query GAC ACU A-- GAA --- UCG Asp Thr
--- Glu Tyr Ser Subject Asp --- --- Arg Tyr Ser
S(i-1, j-3) s (pi, nj-2..j) S(i, j-1)
frameshift (-1) S(i, j-2) frameshift (-1) S(i,
j-3) gap (-2) S(i-1, j) gap (-2)
113Performance Evaluation
- 4407 human expressed sequence tag (EST) sequences
- Split in the middle as subject and query
114Number of Alignments Found
- T12 for BLAST
- 3x speed
- Higher sensitivity
Ref. Derek Kisman et al, Bioinformatics Vol. 21
no. 4 2005
115Unique Alignment Found
- Most contains frameshifts
Ref. Derek Kisman et al, Bioinformatics Vol. 21
no. 4 2005
116Using 4 Seeds
- Differs from PH2
- Short seeds
- High dependency between seeds
Ref. Derek Kisman et al, Bioinformatics Vol. 21
no. 4 2005
117Reference
- PatternHunter
- Bin Ma, John Tromp, Ming Li Bioinformatics Vol.
18 no. 3 2002 - Ming Li, NHC2005
- PatternHunter II
- Li,M., Ma,B., Kisman,D. and Tromp,J. (2004)
Comput. Biol., 2, 417440. - NTU R94922059 ???s powerpoint
118Reference
- tPatternHunter
- Derek Kisman, Ming Li, Bin Ma, and Li Wang,
Bioinformatics Vol. 21 no. 4 2005 - Others
- Wikipedia http//en.wikipedia.org/wiki
- NCBI http//www.ncbi.nlm.nih.gov
119Thank you for your attention!