Title: Y.C.%20Wong%20Public%20Lecture%20040611
1(No Transcript)
2(No Transcript)
3(No Transcript)
4 1, 2, 3, and Beyond
- A slideshow for HKU Open Day in 1980
- I did the narration and background music
- The experience has a great impact on my journey
- Mathematics is beyond numbers
- We find it in buildings, banks, and
supermarkets - in atoms, molecules, and genes
5(No Transcript)
6Outline
- DNA and RNA
- Genome, genes, and diseases
- Palindromes and replication origins in viral
genomes - Mathematics for prediction of replication origins
Cytomegalovirus (CMV) Particle
7DNA and RNA
- DNA is deoxyribonucleic acid, made up of 4
nucleotide bases Adenine, Cytosine, Guanine, and
Thymine. - RNA is ribonucleic acid, made up of 4 nucleotide
bases Adenine, Cytosine, Guanine, and Uracil. - For uniformity of notation, all DNA and RNA data
sequences deposited in GenBank are represented as
sequences of A, C, G, and T. - The bases A and T form a complementary pair, so
are C and G.
8Genes and Genome
9Genes and Diseases
10Virus and Eye Diseases
CMV Particle
- CMV Retinitis
- inflammation of the retina
- triggered by CMV particles
- may lead to blindness
Genome size 230 kbp
11Replication Origins and Palindromes
- High concentration of palindromes exists around
replication origins of other herpesviruses - Locating palindrome clusters on CMV genome
sequence might reveal likely locations of its
replication origins.
12Palindromes in Letter Sequences
Odd Palindrome
ANUTFORA
AROFTUNA
J
Even Palindrome
Step on no pets
STEPON
NOPETS
13DNA Palindromes
14Association of Palindrome Clusters with
Replication Origins
15Computational Prediction of Replication Origins
in DNA Viruses
- Palindrome distribution in a random sequence
model - Criterion for identifying statistically
significant palindrome clusters - Evaluate prediction accuracy
- Try to improve
16Random Sequence Model
- A mathematical model can be used to generate a
DNA sequence - A DNA molecule is made up of 4 types of bases
- It can be represented by a letter sequence with
alphabet size 4
- Adenosine
- Cytosine
- Guanine
- Thymine
Wheel of Bases (WOB)
17Random Sequence Model
Each type of the bases has its chance (or
probability) of being used, depending on the base
composition of the DNA molecule.
- Adenosine
- Cytosine
- Guanine
- Thymine
Wheel of Bases (WOB)
18Random Sequence Model
Each type of the bases has its chance (or
probability) of being used, depending on the base
composition of the DNA molecule.
- Adenosine
- Cytosine
- Guanine
- Thymine
Wheel of Bases (WOB)
19Poisson Process Approximation of Palindrome
Distribution
20Use of the Scan Statistic to Identify Clusters of
Palindromes
21Measures of Prediction Accuracy
- Attempts to improve prediction accuracy by
- Adopting the best possible approximation to the
scan statistic distribution - Taking the lengths of palindromes into
consideration when counting palindromes - Using a better random sequence model
22Markov Chain Sequence Models
- More realistic random sequence model for DNA and
RNA - It allows neighbor dependence of bases (i.e., the
present base will affect the selection of bases
for the next base) - A Markov chain of nucleotide bases can be
generated using four WOBs in a Sequence
Generator (SG)
23Sequence Generator (SG)
Wheels of Bases (WOB)
24Sequence Generator (SG)
Wheels of Bases (WOB)
25Sequence Generator (SG)
Wheels of Bases (WOB)
26Sequence Generator (SG)
Wheels of Bases (WOB)
27Sequence Generator (SG)
Wheels of Bases (WOB)
28Sequence Generator (SG)
Wheels of Bases (WOB)
29Sequence Generator (SG)
Wheels of Bases (WOB)
30Sequence Generator (SG)
Wheels of Bases (WOB)
31Sequence Generator (SG)
Wheels of Bases (WOB)
32Sequence Generator (SG)
Wheels of Bases (WOB)
33Sequence Generator (SG)
Wheels of Bases (WOB)
34Sequence Generator (SG)
Wheels of Bases (WOB)
35Sequence Generator (SG)
Wheels of Bases (WOB)
36Results Obtained for Markov Sequence Models
- Probabilities of occurrences of single
palindromes - Probabilities of occurrences of overlapping
palindromes - Mean and variance of palindrome counts
37Related Work in Progress
- Finding the palindrome distribution on Markov
random sequences - Investigating other sequence patterns such as
close repeats and inversions in relation to
replication origins
38Other Mathematical Topics in Genes and Diseases
- Optimization Techniques prediction of molecular
structures - Differential Equations molecular dynamics
- Matrix Theory analyzing gene expression data
- Fourier Analysis proteomic data
39Acknowledgements
Collaborators Louis H. Y. Chen (National
University of Singapore) David Chew (National
University of Singapore) Kwok Pui Choi (National
University of Singapore) Aihua Xia (University of
Melbourne, Australia) Funding Support NIH
Grants S06GM08194-23, S06GM08194-24, and
2G12RR008124 NSF DUE9981104 W.M. Keck Center of
Computational Struct. Biol. at Rice University
National Univ. of Singapore ARF Research Grant
(R-146-000-013-112) Singapore BMRC Grants
01/21/19/140 and 01/1/21/19/217
40St. Stephens Girls College
41University of Hong Kong
Department of Mathematics A Beach Picnic
42(No Transcript)
43Continuing to Find Mathematics in Genes and
Diseases
Ming-Ying Leung Department of Mathematical
Sciences University of Texas at El Paso (UTEP)