Title: Day 7: Using genomics to predict new pathways
1Day 7 Using genomics to predict new pathways
2- Genome sequences
- Allowing us to interpret the function of proteins
within the context in which they occur - Reverse this process predict the function of a
protein from the context in which it tends to
occur ? prediction of protein function/pathways
from genome sequences
3what on earth does the ketoglurateferredoxin
oxidoreductase do in P. abyssi when there are no
connecting enzymes of the citric acid cycle ?
42-ketoglutarate likely derived from glutamate
5Succinyl-CoA can be broken down via
Methyl-malonyl CoA
6Instead of interpreting, actually predicting
protein function using genomic association
deoxycitidine
Cdd
deoxyuridine, deoxythimidine
DeoA
Glyceraldehyde-3-p, acetaldehyde
deoB
deoC
deoxyribose-1-P
deoxyribose-5-P
DeoD
purine deoxyribonucleosides
deoB ?
M.genitalium M.tuberculosis
deoD deoC deoA cdd pmm
7- Prediction that the cdd gene encodes a protein
that (also) functions as a phosphoribomutase is
based on - Genomic association (operon) with genes involved
in the nucleoside salvage pathway. - Conservation of this association among distantly
related species. - Substrate specificity is less conserved than
catalytic function ? conserved is the mutase
function, altered is the substrate specificity
from a mannose/glucose to a ribose. - A phosphoribose mutase is required, and otherwise
absent from the genome - Such predictions of course have to be confirmed
by experimental research
8Define distantly related species..
Remember the rapid shuffling of genomes (compared
to 16S rRNA identity)
9Variations in the genome rearrangements dependent
on the relative direction of transcription ?
hints to the operon organization of genes in
prokaryotes
10Except for the theoretical argument proteins
that are not only encoded in the same operon, but
this organization is actually conserved in
evolution, we also need experimental benchmarks
(compare the protein sequence similarity ?
homology benchmarking via the structure) Dandekar
, Snel, Huynen and Bork, TIBS 1998. Conservation
of gene order a fingerprint of proteins that
physically interact
11..Benchmarking..
12Conservation of the Tryptophane synthesis operon
among the compared genomes
13Types of Genomic Association for the Prediction
of Functional Interaction
- I gene fusion/fission
- II conservation of gene order (operons)
- III co-occurrence of genes in genomes
- IV shared regulatory elements
- V coexpression data
14Gene fission in the evolution of carbamoyl
phosphate synthase B (carB)
15Predicting functional interactions between
proteins by the co-occurrence of their genes in
genomes.
Distribution of four M.genitalium genes among 25
genomes MG299 (pta) 0 0 0 1 1 0 0 0 0 1 1 0 1 0
1 1 0 0 0 1 0 1 1 1 1 MG357(ackA) 0 0 0 1 1 0 0 0
0 1 1 0 1 0 1 1 0 0 0 1 0 1 1 1 1 MG019(dnaJ) 0 0
1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1
1 MG305(dnaK) 0 0 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 0
0 1 1 1 1 1 1
Using the mutual information between genes as a
scoring heuristic for their co-occurrence. M(pta,
ackA)0.69 (phospotransacetylase, acetate
kinase) M(dnaJ, dnaK)0.55 (heat shock
proteins) M(dnaJ, ackA)0.19
16..entropy and mutual information
H (i) - Si Pi log Pi
H (j) - Sj Pj log Pj
H (i,j) - Si,j Pi,j log Pi,j
Entropy (H) is the disorderdness of the system,
is maximal when all states occur with equal
frequency, minimal when one state dominates the
distribution. In terms of the distribution of
genes,it is maximal when genes occur with 50
frequency.
M (i,j) H (i) H (j) - H (i,j)
Mutual information (M) is the sum of the
individual entropies minus the combined entropy.
It is maximal when individual entropies are
maximal (P0.5) and the combined entropy is
minimal (of the four possibilities, 0 0, 0 1, 1 0
and 1 1, only two are occupied 0 0 and 1 1 or 0
1 and 1 0)
17Applicability of using Genomic context
information for
M.genitalium genes
Gene-order 215
Fusion 27
480 genes in total
Co-occurrence 54
18Selectivity of Genomic Context for function
prediction
19Correlation between the strength of the genomic
and functional associations (operon)
20Correlation between the strength of the genomic
and functional associations (fusion)
21Correlation between the strength of the genomic
and functional associations (co-occurrence)
22Genomic context vs. homology based function
prediction in M.genitalium
Context 238
Homology 368
21
26
Added info from genomic context
23Combining homology information with genomic
association for function prediction
Repeated occurrence of MG009, one of the most
widespread enzymes on earth, encoding a
phosphohydrolase, with thymidilate kinase (tmk)
suggests a role of MG009 in pyrimidine metabolism.
24Conservation of gene order of the hypothetical
gene MG134 with dnaX, RecR suggests physical
interaction between their gene products
25From pairwise interactions to functional modules,
pathways
26 The first iteration of trpB in M. jannaschii
(MJ1038) retreives trpA (MJ1037), with which trpB
physically interacts
27(No Transcript)
28Genomic context indicates a link between the
Shikimate and Tryptophane synthesis pathways
tyrA
aroB
asd
truA
aroE
aroC
hemK
hyp
trpF
trpC
trpE
Shikimate pathway
trpG
trpA
trpD
trpB
Tryptophane synthesis pathway
hyp
2c-rr
29Modular gain and loss of genes in the Pyrococci
30Enzymes that are encoded in conserved operons and
that are lost/gained together catalyze reactions
that are closer in metabolic space than ones that
are in conserved in operons but that are not
gained/lost together
31Limited Relevance of Gene Order for Functional
Interaction in eukaryotes
- operons in Nematodes
- Gene-order conservation of co-expressed genes
between the fungi of C.albicans and S.cerevisiae
32Divergently transcribed, co-regulated gene pairs
tend to be conserved between S.cerevisiae and C.
albicans
33Finding Interaction Partners for a Human Disease
Gene frataxin
- Friedreichs ataxia
- No (homolog with) known function
- No gene fusion or gene order conservation
34(No Transcript)
35Ancestor Proteobacteria
fdx
IscS
IscU
IscR
RnaM
(time)
36The mitochondrial HSP70 protein that is involved
in iron-sulfur cluster (isc) assembly in yeast is
derived from DnaK, rather than from HscA (the
proteobacterial isc HSP70), indicating a
paralogous switch in isc assembly from the
proteobacteria to the eukaryotes.
37A comparative genome analysis based system view
of iron-sulfur cluster assembly
Isa1,2p
IscR
Nfu1p
Nfs1p
Isu1p
Ssq1p, Jac1p
EC2524
38Mitochondrial iron-sulfur assembly
Arh1/fpr
Atm1
Cys
NifS
e-
fdx
e-
S
2Fe2S
Ala
e.g. fdx, Complex I
Fe
NifU
HscA/SSQ1, HscB frataxin ?
39Large scale (omics) experimental approaches to
physical interaction - 2-hybrid -
co-precipitation, masspec
40Further Reading
- Genomic context Huynen M, Snel B, Lathe W 3rd,
Bork P. (2000) Predicting protein function by
genomic context quantitative evaluation and
qualitative inferences.Genome Res.
10(8)1204-10. - Genomic context Gabaldon T, Huynen MA. (2004)
Prediction of protein function and pathways in
the genome era. Cell Mol Life Sci. 2004 61
930-44.