Computational Virology - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Computational Virology

Description:

Computational Virology. Lectures in. Bioinformatic Studies on the Evolution Structure and Function of RNA-based Life Forms ... Distribution of Retroid Agents ... – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 27
Provided by: crg3
Category:

less

Transcript and Presenter's Notes

Title: Computational Virology


1
Computational Virology
Lectures in
Bioinformatic Studies on the Evolution Structure
and Function of RNA-based Life Forms
Marcella A. McClure, Ph.D. Department of
Microbiology and the Center for Computational
Biology Montana State University, Bozeman
MT mars_at_parvati.msu.montana.edu
2
Summary Lecture II
  • Introduction to Retroid Agents
  • The Genome Parsing Suite
  • Retroid Agents in the Human Genome
  • Discovery-based Hypothesis Generation

3
Retroid Agents
Retroviruses, retrotransposons,
pararetroviruses, retroposons, retroplasmids,
retrointrons, and retrons




RNA viruses e.g., Ebola, rabies, influenza, polio
All cellular systems most DNA Viruses
reverse transcriptase mediated replication or
transposition
RNA
DNA
Replication by DNA-dependent DNA polymerase
transcription
Replication by RNA-dependent RNA Polymerase
translation
snRNAs, ribozymes tRNA, rRNA

PROTEIN SYNTHESIS
McClure, 2000
4
Distribution of Retroid Agents among Eukaryotes
and Eubacteria
5
Variable features of Retroid genomes
6
Gene Maps
Phylogenetic Tree based
Gene Maps
on 65 RT sequences
MA
C
NC
retroviruses
HIV-1
orphan class
DIRS-1
C
NC
gypsy-like retrotransposons
17.6
NC
CaMV
caulimoviruses
hepadnaviruses
HBV
NC
copia-like retrotransposons
Copia
C
LIN-H
NC
C
CIN4
C
R2Bm


NC
retroposons
C
I-FAC
INGI
introns
INT-SC1
Group II
plasmids
MAUP
retrons
MX65
TERT
1000
2000
3000
4000
RT reverse transcriptase
RH ribonuclease H

Nucleotides
H-C/IN integrase

PR aspartic acid protease
McClure, 2000
7
RNA-dependent DNA Polymerase
Reverse Transcriptase
Ribonuclease H
1 2 3 4
5 6
1 2 3 4
P
D
K
D E D





NX
D
3




fingers
palm
fingers
palm
thumb
connection
Aspartic Acid Protease
1 2 3
1 2 3
DTG G ILG
DTG G ILG
Integrase
1 2 3 4
1 2 3 4
D D E
Hx
H CX
C
Hx
H CX
C
D D E
4
2
4
2
zinc-binding
core
DNA-binding
zinc-binding
core
DNA-binding




8
Roles of Retroid Agents
1) Disease a) retroviruses 1) exogenous
infectious HIV HTLV 2) endogenous
associations breast cancer, testicular tumors,
insulin dependent diabetes, multiple
sclerosis, rheumatoid arthritis,
schizophrenia and systemic lupus erythematosus
b) LINEs insertional mutagenesis 1)
Hemophilia A 2) muscular dystrophies Duchenne
and Fukuyama- congenital type 3) X-linked
disorders Alport Syndrome-Diffuse
Leiomyomatosis and Chronic Granulomatous Disease
2) Regulation of cellular genes and
reproduction 3) Telomere maintenance 4) Repair
of broken dsDNA 5) Exchange of genetic
information among and between organisms
9
Possible function of HERV-W
10
What is the host genomic environment of active
Retroid Agents ?
Predicted functional RT
Predicted Retroid genome
Real Contig
Real Chromosome
What roles do Retroid Agents play in disease,
development, reproduction and evolution through
out the three domains of life?
11
Mapping Genomic Retroid Agents
12
The seven major steps of GPS

13
RNA-dependent DNA Polymerase
Reverse Transcriptase
Ribonuclease H
1 2 3 4
5 6
1 2 3 4
P
D
K
D E D





NX
D
3




fingers
palm
fingers
palm
thumb
connection
Aspartic Acid Protease
1 2 3
1 2 3
DTG G ILG
DTG G ILG
Integrase
1 2 3 4
1 2 3 4
D D E
Hx
H CX
C
Hx
H CX
C
D D E
4
2
4
2
zinc-binding
core
DNA-binding
zinc-binding
core
DNA-binding




14
The score of a given motif is calculated by
M, M1 and M2 are based on the number of amino
acids in a motif found in common between a known
RT query sequence and the potential RT M is a
count of amino acid identities M1 is a count on
conservative substitution of (ILMV, AG, ST, DE,
NQ, FY, RK) M2 accounts for older substitutions
(LIMV, AGST, DENQ, FYW, RKH)
The overall OSM score is calculated by
T motifs is the number of motifs comprising the
OSM
15
Status of the Human Genome Project
  • 3,200,000 Kbp of the euchromatic portion of the
    human chromosomes are being sequenced
  • Heterochromatic portion is not being done
  • As of January 5, 2003
  • Non-redundant sequence only
  • 98.8 of euchromatic portion has been done
  • 3.0 is completed to the working draft level
  • 95.8 has been completed to 99 accuracy

16
(No Transcript)
17
Distribution of Significant Blast hits
Distribution per chromosome of RT-like sequences
in the April 2003 Freeze of the human genome.
Unique indicates all unique RT signals, Intact
includes all RTs with six motifs present and
Perfect indicates the number of RTs with all six
motifs present in order with no frame-shifts or
stop codons. Full LINE indicates LINEs that are
full length, but may or may not have stop codons
and/or frame-shifts. Perfect LINE indicates
LINEs that are full length and contain no
frame-shifts or stop codons
18
Classification of 1877 whole LINEs
A total of 165 LINEs appear to be perfect, while
97 contain a single stop codon and 91 a single
frame-shift..
19
Distribution of significant BLAST hits per query
sequence.
20
Distribution of the 482 Low Frequency Reverse
Transcriptase hits
Distribution of the 482 Low Frequency Reverse
Transcriptase hits with remnants of at least one
motif. Number of Low Frequency hits/Number of
hits with a minimum of one recognizable motif.
Of the 482 hits, 108 have at least one
recognizable RT motif. The remaining 374 hits
have remnants of at least one motif and were
conserved enough to be scored by GPS.
21
Chromosomes
HIV
MPMV
Spuma
TERT
Chromosome
Motifs
K D QG DD G-K LG
K D QG DD

G-K LG
K D QG DD G-K LG
K D QG DD G-K LG
1R
1
1R
29R
C (1)C
(1)C
C
2
1C
1R
14R
3
1C
1C
13R
1R

1C
C 1R
4
9R
3C(1)C
2C
5
1C
1C
1C1R 1C 1C
1C12R
6
8R 1C
7
10R
8
6R
9
(1)R C
15R
10
1C
13R
11
1C 1R
10R
12
1C
1R

1C
8R
(1)C C1R
13


1C
4R
14
8R
15
1R


1C
12R 1R
R 1C
16
24R
1R(1)C
17
1C
21R
18
1C
8R
19
1C 1R (2)C C C
2R
22R 2R
20
1C
8R
21
5R
1C
10R 1R
(1)C



C
22
R 1C
X
1C
5R
1R (1)C
C C C
Y
1C(1)R
22
Looking at the environment of each Retroid Agent
Truncated LINE inserted into Intron 6
Truncated L1MB1 inserted into Intron 6
Truncated L1PA5 inserted into Intron 8
Truncated LINE inserted into Intron 18

Chromosome 21 contig NT_029490
TPTE Gene
Figure 3 Looking at the environment of each
Retroid Genome. In this example, four truncated
LINEs are found within three different exons of a
putative Tyrosine Phosphatase gene (TPTE).
Insertions of Retroid genomes into introns may
have little effect on a gene, or may allow for
gene shuffling. In this case none of the coding
region for the gene was disrupted, which
demonstrates that Retroid sequence information
may be utilized to make introns, or selection
favors insertions that do not disrupt coding
capacity or introns may provide the preferential
target site for transposition. The black lines
represent the exons of the TPTE gene.
23
RepeatMasker Information
24
Distribution of Retroid Agents on Human
Chromosomes
(November, 2002 Freeze)
Query 21 distinct reverse
transcriptase sequences representing 18 subgroups
were used to query the NCBIs Human Genome
Database Results 1) Retroid Agents
are not randomly distributed on Human
Chromosomes. 2) Chromosomes X and Y have the
highest percent Retroid Agent sequence 3) Of
those remaining, Chromosome 4, has the most,
while Chromosome 20 comprises the least
percent Retroid Agents. Only two
chromosomes, 19 and 21 are without at least one
intact and potentially active LINE. Using
exact sequence lengths for each hit of each
category indicated in the table of data,
the November freeze of the human genome contains
at least 1.11 unique RT sequences, 0.40
full-length LINEs and 0.035 active LINEs.
25
New hypotheses from discovery-based research
  • 1) Low frequency RT-like sequences (not from
    LINEs or ERVs) are discernible in the Human
    Genome.
  • 2) Human low frequency RT-like sequences are
    remnants of ancient invasions.
  • 3) Human low frequency RT-like sequences are
    remnants of failed invasions.
  • 4)The pattern of low frequency RT-like sequences
    is unique In each organismal genome.
  • 5) Both unique and trans-organismal patterns of
    low frequency RT-like sequences are found in
    Eukaryotes.

What mechanisms could be maintaining these
signals ?
  • Gene conversion, an event without a mechanism.
  • Transcriptional inactivation due to methylation
    of CpG regions.
  • Translational recoding.
  • Complementation.

26
The McClure Lab
Rochelle Clinton, B.S., Programmer
Brad Crowther, B.S., Bioinformatician I/Lab
Manager
Hugh Richardson, Ph.D., M.S., Programmer
Travis Danielson, graduate student
Vijay Raghavan, graduate student
Kendall Harwood, Undergraduate
Crystal Hepp, Undergraduate
Aaron Juntunen, Undergraduate programmer
Angela Olson, Undergraduate

Dr. Marcella McClure, P.I. (Marcie)
Write a Comment
User Comments (0)
About PowerShow.com