Title: Computational Virology
1Computational Virology
Lectures in
Bioinformatic Studies on the Evolution Structure
and Function of RNA-based Life Forms
Marcella A. McClure, Ph.D. Department of
Microbiology and the Center for Computational
Biology Montana State University, Bozeman
MT mars_at_parvati.msu.montana.edu
2Summary Lecture II
-
- Introduction to Retroid Agents
- The Genome Parsing Suite
- Retroid Agents in the Human Genome
- Discovery-based Hypothesis Generation
3Retroid Agents
Retroviruses, retrotransposons,
pararetroviruses, retroposons, retroplasmids,
retrointrons, and retrons
RNA viruses e.g., Ebola, rabies, influenza, polio
All cellular systems most DNA Viruses
reverse transcriptase mediated replication or
transposition
RNA
DNA
Replication by DNA-dependent DNA polymerase
transcription
Replication by RNA-dependent RNA Polymerase
translation
snRNAs, ribozymes tRNA, rRNA
PROTEIN SYNTHESIS
McClure, 2000
4Distribution of Retroid Agents among Eukaryotes
and Eubacteria
5Variable features of Retroid genomes
6Gene Maps
Phylogenetic Tree based
Gene Maps
on 65 RT sequences
MA
C
NC
retroviruses
HIV-1
orphan class
DIRS-1
C
NC
gypsy-like retrotransposons
17.6
NC
CaMV
caulimoviruses
hepadnaviruses
HBV
NC
copia-like retrotransposons
Copia
C
LIN-H
NC
C
CIN4
C
R2Bm
NC
retroposons
C
I-FAC
INGI
introns
INT-SC1
Group II
plasmids
MAUP
retrons
MX65
TERT
1000
2000
3000
4000
RT reverse transcriptase
RH ribonuclease H
Nucleotides
H-C/IN integrase
PR aspartic acid protease
McClure, 2000
7RNA-dependent DNA Polymerase
Reverse Transcriptase
Ribonuclease H
1 2 3 4
5 6
1 2 3 4
P
D
K
D E D
NX
D
3
fingers
palm
fingers
palm
thumb
connection
Aspartic Acid Protease
1 2 3
1 2 3
DTG G ILG
DTG G ILG
Integrase
1 2 3 4
1 2 3 4
D D E
Hx
H CX
C
Hx
H CX
C
D D E
4
2
4
2
zinc-binding
core
DNA-binding
zinc-binding
core
DNA-binding
8Roles of Retroid Agents
1) Disease a) retroviruses 1) exogenous
infectious HIV HTLV 2) endogenous
associations breast cancer, testicular tumors,
insulin dependent diabetes, multiple
sclerosis, rheumatoid arthritis,
schizophrenia and systemic lupus erythematosus
b) LINEs insertional mutagenesis 1)
Hemophilia A 2) muscular dystrophies Duchenne
and Fukuyama- congenital type 3) X-linked
disorders Alport Syndrome-Diffuse
Leiomyomatosis and Chronic Granulomatous Disease
2) Regulation of cellular genes and
reproduction 3) Telomere maintenance 4) Repair
of broken dsDNA 5) Exchange of genetic
information among and between organisms
9Possible function of HERV-W
10What is the host genomic environment of active
Retroid Agents ?
Predicted functional RT
Predicted Retroid genome
Real Contig
Real Chromosome
What roles do Retroid Agents play in disease,
development, reproduction and evolution through
out the three domains of life?
11Mapping Genomic Retroid Agents
12The seven major steps of GPS
13RNA-dependent DNA Polymerase
Reverse Transcriptase
Ribonuclease H
1 2 3 4
5 6
1 2 3 4
P
D
K
D E D
NX
D
3
fingers
palm
fingers
palm
thumb
connection
Aspartic Acid Protease
1 2 3
1 2 3
DTG G ILG
DTG G ILG
Integrase
1 2 3 4
1 2 3 4
D D E
Hx
H CX
C
Hx
H CX
C
D D E
4
2
4
2
zinc-binding
core
DNA-binding
zinc-binding
core
DNA-binding
14The score of a given motif is calculated by
M, M1 and M2 are based on the number of amino
acids in a motif found in common between a known
RT query sequence and the potential RT M is a
count of amino acid identities M1 is a count on
conservative substitution of (ILMV, AG, ST, DE,
NQ, FY, RK) M2 accounts for older substitutions
(LIMV, AGST, DENQ, FYW, RKH)
The overall OSM score is calculated by
T motifs is the number of motifs comprising the
OSM
15Status of the Human Genome Project
- 3,200,000 Kbp of the euchromatic portion of the
human chromosomes are being sequenced - Heterochromatic portion is not being done
- As of January 5, 2003
- Non-redundant sequence only
- 98.8 of euchromatic portion has been done
- 3.0 is completed to the working draft level
- 95.8 has been completed to 99 accuracy
16(No Transcript)
17Distribution of Significant Blast hits
Distribution per chromosome of RT-like sequences
in the April 2003 Freeze of the human genome.
Unique indicates all unique RT signals, Intact
includes all RTs with six motifs present and
Perfect indicates the number of RTs with all six
motifs present in order with no frame-shifts or
stop codons. Full LINE indicates LINEs that are
full length, but may or may not have stop codons
and/or frame-shifts. Perfect LINE indicates
LINEs that are full length and contain no
frame-shifts or stop codons
18Classification of 1877 whole LINEs
A total of 165 LINEs appear to be perfect, while
97 contain a single stop codon and 91 a single
frame-shift..
19Distribution of significant BLAST hits per query
sequence.
20Distribution of the 482 Low Frequency Reverse
Transcriptase hits
Distribution of the 482 Low Frequency Reverse
Transcriptase hits with remnants of at least one
motif. Number of Low Frequency hits/Number of
hits with a minimum of one recognizable motif.
Of the 482 hits, 108 have at least one
recognizable RT motif. The remaining 374 hits
have remnants of at least one motif and were
conserved enough to be scored by GPS.
21Chromosomes
HIV
MPMV
Spuma
TERT
Chromosome
Motifs
K D QG DD G-K LG
K D QG DD
G-K LG
K D QG DD G-K LG
K D QG DD G-K LG
1R
1
1R
29R
C (1)C
(1)C
C
2
1C
1R
14R
3
1C
1C
13R
1R
1C
C 1R
4
9R
3C(1)C
2C
5
1C
1C
1C1R 1C 1C
1C12R
6
8R 1C
7
10R
8
6R
9
(1)R C
15R
10
1C
13R
11
1C 1R
10R
12
1C
1R
1C
8R
(1)C C1R
13
1C
4R
14
8R
15
1R
1C
12R 1R
R 1C
16
24R
1R(1)C
17
1C
21R
18
1C
8R
19
1C 1R (2)C C C
2R
22R 2R
20
1C
8R
21
5R
1C
10R 1R
(1)C
C
22
R 1C
X
1C
5R
1R (1)C
C C C
Y
1C(1)R
22Looking at the environment of each Retroid Agent
Truncated LINE inserted into Intron 6
Truncated L1MB1 inserted into Intron 6
Truncated L1PA5 inserted into Intron 8
Truncated LINE inserted into Intron 18
Chromosome 21 contig NT_029490
TPTE Gene
Figure 3 Looking at the environment of each
Retroid Genome. In this example, four truncated
LINEs are found within three different exons of a
putative Tyrosine Phosphatase gene (TPTE).
Insertions of Retroid genomes into introns may
have little effect on a gene, or may allow for
gene shuffling. In this case none of the coding
region for the gene was disrupted, which
demonstrates that Retroid sequence information
may be utilized to make introns, or selection
favors insertions that do not disrupt coding
capacity or introns may provide the preferential
target site for transposition. The black lines
represent the exons of the TPTE gene.
23RepeatMasker Information
24Distribution of Retroid Agents on Human
Chromosomes
(November, 2002 Freeze)
Query 21 distinct reverse
transcriptase sequences representing 18 subgroups
were used to query the NCBIs Human Genome
Database Results 1) Retroid Agents
are not randomly distributed on Human
Chromosomes. 2) Chromosomes X and Y have the
highest percent Retroid Agent sequence 3) Of
those remaining, Chromosome 4, has the most,
while Chromosome 20 comprises the least
percent Retroid Agents. Only two
chromosomes, 19 and 21 are without at least one
intact and potentially active LINE. Using
exact sequence lengths for each hit of each
category indicated in the table of data,
the November freeze of the human genome contains
at least 1.11 unique RT sequences, 0.40
full-length LINEs and 0.035 active LINEs.
25New hypotheses from discovery-based research
- 1) Low frequency RT-like sequences (not from
LINEs or ERVs) are discernible in the Human
Genome. - 2) Human low frequency RT-like sequences are
remnants of ancient invasions. - 3) Human low frequency RT-like sequences are
remnants of failed invasions. - 4)The pattern of low frequency RT-like sequences
is unique In each organismal genome. - 5) Both unique and trans-organismal patterns of
low frequency RT-like sequences are found in
Eukaryotes.
What mechanisms could be maintaining these
signals ?
- Gene conversion, an event without a mechanism.
- Transcriptional inactivation due to methylation
of CpG regions. - Translational recoding.
- Complementation.
26The McClure Lab
Rochelle Clinton, B.S., Programmer
Brad Crowther, B.S., Bioinformatician I/Lab
Manager
Hugh Richardson, Ph.D., M.S., Programmer
Travis Danielson, graduate student
Vijay Raghavan, graduate student
Kendall Harwood, Undergraduate
Crystal Hepp, Undergraduate
Aaron Juntunen, Undergraduate programmer
Angela Olson, Undergraduate
Dr. Marcella McClure, P.I. (Marcie)