Title: Computational Virology
1Computational Virology
Lectures in
Bioinformatic Studies on the Evolution Structure
and Function of RNA-based Life Forms
Marcella A. McClure, Ph.D. Department of
Microbiology and the Center for Computational
Biology Montana State University, Bozeman
MT mars_at_parvati.msu.montana.edu
2Summary Real Bioinformatic Research
- Methods to test the hypothesis.
- 2) Testing the hypothesis.
- Predicting protein contacts.
3DoRNA-Dependent Polymerases Share Common Ancestry?
4The World of Viruses
DNA viruses
RNA viruses
RdDp
ssRNA
dsRNA
ssDNA
dsDNA
RdRp
host Pol II
ssRNA
- ssRNA
Does the RT domain of the RdDp share common
ancestry with the RdRp of negative and positive
polarity, single-stranded viruses?
5Retroid Agents
Retroviruses, retrotransposons,
pararetroviruses, retroposons, retroplasmids,
retrointrons, and retrons
reverse transcriptase mediated replication or
transposition
RNA viruses e.g., Ebola, rabies, influenza, polio
All cellular systems most DNA Viruses
RNA
DNA
transcription
Replication by DNA-dependent DNA polymerase
Replication by RNA-dependent RNA Polymerase
translation
snRNAs, ribozymes, tRNA, rRNA
PROTEIN SYNTHESIS
McClure, 2000
6Mononegavirales
OLD FOES rabies (Rhabdoviridae) measles,
RSV, mumps (Paramyxoviridae) EMERGING
THREATS Ebola, Marburg (Filoviridae) equine
morbillivirus, Nipah virus (Paramyxoviridae) MOD
EL AGENT vesicular stomatitis virus
(Rhabdoviridae)
7Roles of Retroid Agents
1) Disease a) retroviruses 1) exogenous
infectious HIV HTLV 2) endogenous
associations breast cancer, testicular tumors,
insulin dependent diabetes, multiple
sclerosis, rheumatoid arthritis,
schizophrenia and systemic lupus erythematosus
b)LINEs insertional mutagenesis 1)
Hemophilia A 2) muscular dystrophies Duchenne
and Fukuyama- congenital type 3) X-linked
disorders Alport Syndrome-Diffuse
Leiomyomatosis and Chronic Granulomatous Disease
2) Regulation of cellular genes and
reproduction 3) Telomere maintenance 4) Repair of
broken dsDNA 5) Exchange of genetic information
among and between organisms
8Plus-strand RNA Virus Families and Human Diseases
Togaviridae - Riff Valley Fever Flaviviridae -
Dengue Fever virus, West Nile virus Coronaviridae
- Infectious Bronchitis Caliciviridae - Hepatitis
E virus Picornaviridae - Human poliovirus,
Hepatitis A
9VSV Transcription
leader
N
VSV Transcription
5'
5'
read through
3'
P
P
P
P
P
P
VSV Replication
L
L
CO-ASSEMBLY
N
?
P
P
10RNA Template
11Replication
12Model of a poliovirus polymerase-dsRNA complex
HIV-1 Reverse Transcriptase
Poliovirus Polymerase
Poliovirus Polymerase Oligorner
Model of a poliovirus polymerase-dsRNA complex
based on the structure of HIV-1 RT complexed to
dsDNA (Huang etal., 1998).
13 Rhabdoviridae Genome
Paramyxoviridae Genome
Filoviridae Genome
N VP35 VP40 G
VP30 VP24 RdRp
MMLV Genome
Picornaviridae Genome
RdRp
VPg
Poly(A)
L P4 P2 P3 P1 2A 2B 2C
3A 3B 3C 3D
14RdRp of Plus strand viruses
GDD
RdRp of Mononegavirales
GDNQ
RdDp
FADDM
RT
RH
HYPOTHESIS The Reverse Transcriptase domain of
the RNA-dependent DNA Polymerase shares common
ancestry with the RNA-dependent RNA Polymerase
of the OrderMononegavirales and Plus Strand RNA
viruses.
15Biological Patterns
Whether randomness can be measured is a
difficult problem. One cannot judge the absence
of pattern without specifying which pattern, and
what is a pattern to you may not be a pattern to
me.
McClure, 2000
16Basic Strategy
Search Databases
Annotate and Preparation of Sequences
Multiple Alignment of Sequences
Refined Multiple Alignment
Analysis of Multiple Alignment
McClure, 2000
17Strategy for Assessing Protein Sequence Homology
Protein Sequence Data
SEQUENCE COMPARISON
gt30 identical homology
lt30 identical
MOTIF DETECTION
Support for homology Statistical tests
OSM present functionally equivalent
likely homologue
Functional identification, Phylogenetic
analysis, Structural prediction
Support for homology Gene order and size,
common function
McClure, 2000
18Experimental Design for Testing Motif Detection
Methods
Methods Appropriateness Availability Assumptions
Limitations User specific parameters
Bench Mark Sequences Biologically informative
markers Sequence length distribution Evolutionary
distribution Set size
Parameter Range Tests
Types of Test Data
Evaluate Results for Correct Identification of
Biologically Informative Marker
Method (s) that Accurately Identify Biologically
Informative Marker
RdRp and RdDp sequences
Test hypothesis RdRp share common ancestry with
RdDp
19Motif-detection Programs
Blockmaker Matchbox Meme Pima Pralign SAM
20BLOCKMAKER implements the Motifj algorithm,
which searches the sequences for conserved
triplets of amino acids that are separated by a
user-specified length. INTERALIGN uses the
symmetric-iterative protocol, which generates
consensus sequences using the significant segment
pair alignment method and regions of similarity
are derived from aligning those consensus
sequences. MATCHBOX implements a scanning
algorithm, which uses a window of 9 amino acids
that moves across the sequences in search of a
match. PIMA constructs a binary tree based on
similarity scores. A common pattern node is
generated by the S-W algorithm, and alignments
are made based on that common node. PROBE
implements the S-W algorithm and performs
transitive searches to find regions of sequence
similarity, which are then aligned using the
Gibbs sampling algorithm. MEME locates motifs
by estimating the parameters for a model that
maximizes the likelihood of the data using the
Expectation Maximization algorithm. SAM is a
linear HMM that implements the Baum-Welch
algorithm. Once the model converges, an
alignment can be generated.
21Data used to compare Motif finding methods
22Summary of small data set analysis
Summary of Large Data Set Analysis
23RdRp of Plus strand viruses
GDD
RdRp of Mononegavirales
GDNQ
RdDp
FADDM
RT
RH
HYPOTHESIS The Reverse Transcriptase domain of
the RNA-dependent DNA Polymerase shares common
ancestry with the RNA-dependent RNA Polymerase of
the Order Mononegavirales and Plus Strand RNA
viruses.
24(No Transcript)
25Sequence Length, Percent Identity and Distance
Values
26Small Dataset Output
27Large Dataset Output
28Conclusion The methods are not rigorous enough
to test the hypothesis
29Can we predict amino acid contacts in protein
complexes without knowledge of structure?