What is functional genomics - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

What is functional genomics

Description:

This area is described as 'the development and application ... Non-primate Lentiviruses. HIV. RSV. HTLV-like viruses. pol gene. Herpesviruses: Poxviruses: ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 37
Provided by: Biol2
Category:

less

Transcript and Presenter's Notes

Title: What is functional genomics


1
What is functional genomics?
2
Functional genomics is the determination of
genome function by whatever means necessary.
This area is described as 'the development and
application of global (genome-wide or
system-wide) experimental approaches to assess
gene function by making use of the information
and reagents provided by genome sequencing and
mapping' Hietor Boguski, Science,
1997
3
To some it is only the use of micro array
Technology.
4
To others it is the integration of an immense
amount of heterogeneous data to understand
the complexity of life.
5
To others it is a a strategy of using wet and in
silico tools to address specific questions in a
given organism.
6
Bioinformatics and Functional Genomics
Jonathan Pevsner Functional Genomics
Chapter 6 Gene Expression Chapter 7
Microarray Data Analysis Chapter 8
Protein analysis and proteomics Chapter
9 Protein structure Chapter 10 Multiple
Sequence Alignment Chapter 11 Molecular
Phylogeny and Evolution
7
What is functional genomics?
One component is the prediction of protein
function, interaction and structure from
analyzing primary sequence information.
8
These studies in functional genomics reveal
duplication, divergence, rearrangement and
structure prediction all based on primary
sequence analysis.
McClure, M.A., (2001) "Evolution of the DUT Gene
Horizontal Transfer between Host and Pathogen in
all Three Domains of Life," Current Protein and
Peptide Science, 2,313-324. Baldo, A.M. and
McClure, M.A., (1999) "Evolutionary History of
the dUTPase Enzyme Horizontal Exchange between
Viral Pathogens and Their Hosts," Journal of
Virology, 73 7710-7721.
9
dUTPase
dUTP
dUMP

2 Pi
  • What is dUTPase?
  • Deoxyuridine triphosphatase converts dUTP into
    dUMP.
  • Why is dUTPase important?
  • Excessive amounts of dUTP cause the incorporation
    of uracil into growing DNA chains.
  • All organisms examined to date possess a UNG to
    excise dUTP misincorporated into DNA.
  • In the presence of a large dUTP pool UNG repair
    causes DNA strand breakage, which can lead to
    cell death.

10
(No Transcript)
11
Questions regarding dUTPase
  • What is the relationship, and distribution of
    dUTPase in viruses, eukaryotes, eubacteria, and
    archaea?
  • Is the genome location of dUTPase conserved?
  • What kinds of motif arrangements exist?
  • How might these arrangements assemble?

12
Three observed types of Motif Arrangements
single gene copy one ordered-series-of-motifs five
conserved motifs
I II III IV V
amino carboxyl
tandem duplication two ordered-series-of-motifs fi
ve conserved motifs
I II ? IV V
? ? III ? -
amino
middle
carboxyl
tandem triplication three ordered-series-of-motifs
15 conserved motifs
I II III IV V
I II III IV V
I II III IV V
13
Representative Sample of dUTPase HMM Alignment
Eukaryotes Eubacteria Archaea Retroviruses Herpesv
iruses Other DNA Viruses
14
Retroviruses
pol gene
gag gene
HTLV-like viruses
RSV
PR
other retroviruses
MMTV-like viruses
dUTP
PR
Non-primate Lentiviruses
dUTP
PR
RT
RH
IN
HIV
PR
Herpesviruses ???alpha ???gamma
dUTP
dUTP
RRRP
PMASE
dUTP
dUTP
RRRP
PMASE
Poxviruses Vacinnia Suid Pox
dUTP
TIF
RR
?
?
?
?
?
UH
dUTP
RR
TIF
?
?
UH
Bacteria E. coli P. aeruginosa C.
burnetti
dUTP
?
DSF
dUTP
DSF
PPMM
?
dUTP
PIB
PPMM
15
100
81
100
Additive Distance Tree for 90 dUTPase sequences
HSAP

ORFV
VARI
CPOX
100
67

VACW
VACL
SPOX
60
AVAD
CELA
100
CELM
66
CELC
99
LESC
74
CALB
SCER
PBCV
BPT5
69
92
100
Eukaryotes Eubacteria Archaea Retroviruses Herpesv
iruses Other DNA Viruses
96
CTRA
CDIF
100
MTUB
BPRT
100
100
BPSP
MJAN

100
SIRV
DAMB
IHER
SHER
100
97
95
100
80
100
100
100
100
98
MIAP
JSRV
75
100
SARV
83
100
HER1
100
MMTC
100
94
77
100
100
60
100
75
ASFV
ONPH
59
61
HH8A
AH1A
100
EH4A
75
SH1A
16
(No Transcript)
17
Assembly and Active Site location in H. sapiens
dUTPase
Clifford D. Mol et al (1996) Structure 4(9)
1077-1092
18
Hypothesized dUTPase folding and assembly
H. sapiens (single)
Herpes (double)
C. elegans (triple)
19
Summary
  • dUTPase is found in
  • 8 virus families (Retroviruses and DNA viruses)
  • Eukaryotes (multicellular and unicellular)
  • Eubacteria (Proteobacteria, Firmicutes,
    Chlamydiales, Spirochaetales)
  • Archaea
  • Three kinds of motif arrangements exist
  • (Single, Double, and Triple)
  • It is hypothesized these arrangements assemble
  • ?, ? Herpesvirus dUTPases (Double) as trimers
  • C. elegans dUTPases (Triple) as monomers.
  • The genome location of dUTPase is not conserved
    in viruses and eubacteria.
  • 6 virus groups have dUTPases similar to those of
    hosts.

20
What is functional genomics?
Basically it is the prediction of protein
function and interaction from analyzing sequence
information. How can we know anything about
protein interactions without protein structure?
21
New work
A Functional Genomics Approach to Inferring Amino
Acid Contacts Among the L, P and N proteins of
the Replication/Transcription Complex of the
Order Mononivavirales
  • Protein disorder
  • Low hydrophobicity and high mean net charge are
    good indicators of natively unfolded proteins
  • Predictors of Natural Disordered Regions
    (PONDR)--
  • utilizes neural networks to distinguish
    disordered from ordered regions

2) Evolutionary Dynamic Approaches A)
Intermolecular compensatory mutations Pazos and
Valencia 1) predicting
interacting partners 2) detecting
correlated mutations between two interacting
proteins 3) extending to three
interacting partners B)
Evolutionary-Structure Function (EFS) -- Simon
and Sidow Determines numbers amino acid
replacements given a fixed phylogenetic topology,
ranking constrained regions C)
Intramolecular compensatory mutations
-- Pollack calculates likelihood estimates of
allowing for rate variation and robustly
discriminates coevolution of intra-sites versus
random effects.
3) Use experimental results to model and validate
expectations 4) Test the predicted structure for
the Ebola
22
VSV Transcription
leader
N
VSV Transcription
5'
5'
read through
3'
P
P
P
P
P
P
VSV Replication
L
L
CO-ASSEMBLY
N
?
P
P
23
Rhabdoviridae Genome
VSV
Paramyxoviridae Genome
Sendai
24
Heterogeneous Date to Infer ProteinProtein
Contacts
Multiple Alignment
N, P and L sequences
ALL experimental information regarding positions
of functions and interactions of L. N and P
Evolutionary Dynamics Analysis
Predict regions of disorder
Inter-CM analysis
Phylogenetic reconstruction
PONDAR
Calculate H/R
ESF-analysis
Intra-CM analysis
Integration of Heterogeneous Data Sources in a
Bayesian Framework
Most Probable Amino Acid Contact Points
25
N, P and Proteins
required for replication
N protein
RNA-BS
1
524
Sendai
RNA-BS
PPBS
RNA-BS
PPBS
PCS
VSV
1
422

PPBS
P protein
Oligomerization domain
NPBS
RSR

RES
1
LPBS
Sendai


NPBS

NPBS
568






NPBS

NPBS
LPBS
GTP binding
VSV
1
265


L protein

I

II

III

IV

V
Sendai
2228
1

RSR
PPBS
MT
RNA-BS
VI


I

II

III
V

IV
VSV

1
2109



MT
PPBS
26
Mtase of Ebola virus
27
VSV Transcription
leader
N
VSV Transcription
5'
5'
read through
3'
P
P
P
P
P
P
VSV Replication
L
L
CO-ASSEMBLY
N
?
P
P
28
Can data like these help infer L and P regions of
potential contact?
29
Can data like these help infer N and P potential
regions of contact?
30
Analysis of Evolutionary Dynamics
  • Evolutionary-Structure-Function-EFS analysis
    provides the likelihood
  • estimates of rates of change within a protein.
    The basis of this approach relies on
  • maintaining phylogenetic tree topology while
    calculating the number of amino acid
  • replacements over a fixed-window size over the
    entire multiple sequence alignment.
  • Plotting these values as a function of position
    provides a rate profile for the protein.
  • A heuristic algorithm identifies and ranks the
    evolutionarily constrained regions of the
  • sequence thereby providing a relative measure of
    the importance of each region.
  • This approach has been demonstrated to be
    consistent with known experimental
  • data for a number of proteins (Simon, A. L.,
    Stone, E. A. and Sidow, A. (2002). Inference of
    functional regions
  • in proteins by quantification of evolutionary
    constraints. Proc Natl Acad Sci U S A 99 (5).
    2912-2917.)

2) Predicting intramolecular compensatory
mutations also uses likelihood estimates, and
does not rely on the accuracy of inferring
ancestral nodes in phylogenetic reconstruction .
This approach also allows for variation of rate
of evolution along tree branches. Basically,
this method can robustly discriminate coevolution
of intramolecular sites from correlations due to
random effects (Pollock, D. D., Taylor, W. R. and
Goldman, N. (1999). Coevolving protein residues
maximum likelihood identification and
relationship to structure. Journal of Molecular
Biology 287 (1). 187-198.)
31
What about predicting intermolecular compensatory
mutations?
Two different methods exist to predict
protein-protein interactions from intermolecular
compensatory mutation analysis. The first
predicts the interacting partners (Pazos, F.,
Helmer-Citterich, M., Ausielo, G. and Valencia,
A. (1997). Correlated mutations contain
Information about protein-protein interaction.
J. Mol. Biol. 271 511-523), while the other
determines the actual interaction sites via
compensatory mutation analysis (Pazos, F.,
Helmer-Citterich, M., Ausielo, G. and Valencia,
A. (1997). Correlated mutations contain
information about protein-protein interaction. J.
Mol. Biol. 271 511-523, Ouzounis, C.,
Perez-Irratxeta, C Sander, C. and Valencia, A.
(1998). Are binding residues conserved? Pac Symp
Biocomput 401-412.)
How can we use these methods?
32
What is a Bayesian Inference Network?
What is Bayesian Inference?
Bayesian inference dates back to 1790.
Bayesian inference has always been controversial.
33
Bayesian Inference is a different way of thinking
about probability. Bayesian inference is a
subjective interpretation of probability. When
the probability of an occurrence is unknown, an
opinion can be expressed about what is unknown as
a prior probability. What is a prior
probability? It is the probability distribution
of the proportions of value on the believe that
an observer has without knowledge of data. After
observing data, then one can alter an opinion
about the values assigned in the prior
probability. This new probability distribution,
called the the posterior distribution, is
calculated by Bayes' rule. All of the
observer's knowledge about the prior distribution
is contained in the posterior distribution, and
statistical inferences are made by summarizing
this distribution. Bayes rule turns prior
probabilities into posterior probabilities.
Posterior probabilities have some observation
about the data in them.
So what is so controversial about Bayesian
inference?
34
There is no agreement on what proportion of value
should be placed on believes and opinions about
unknown events. Furthermore, there is the
issue of whether or not a prior probability on
an unknown event can even exist. This is a
philosophical question not a scientific one.
35
The Bayesian approach to heterogeneous data
integration tries to fit the data to a model
using a prior distribution of the values of what
is believed to be good data.
sequence-based experiments
xy contact in virus 1
The Simple Network
36
sequence-based experiments
replication
transcription
replication
transcription
xy contact in virus type 1
xy contact in virus type 2
The Complexity of Two Viruses in the Network
Write a Comment
User Comments (0)
About PowerShow.com