Evolutionary Systems Biology - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Evolutionary Systems Biology

Description:

3 How does gene co-expression network topology relate to sequence evolution? ... a coherent picture of the links between phenotypic and genomic evolution? ... – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 68
Provided by: Wol2
Category:

less

Transcript and Presenter's Notes

Title: Evolutionary Systems Biology


1
Evolutionary Systems Biology
  • Eugene V. Koonin
  • National Center for Biotechnology Information
  • National Library of Medicine
  • NIH, Bethesda, MD

Nothing in (systems) biology makes sense except
in the light of evolution After Theodosius Dobzha
nsky (1970)
2
Molecular evolution 1962-
  • Zuckerkandl, E., Pauling, L. 1962. Molecular
    evolution. In Horizons in Biochemistry, pp.
    189-225

Majority of studies focus on sequence evolution
Phylogeny and taxonomy Genomic effects of n
atural selection Mechanisms of heredity, e.g.,
horizontal gene transfer
3
From Molecular Evolution to Evolutionary Systems
Biology 2001-
Every complex system can be abstracted in a netw
ork
(I. King Jordan, pers. com.)
Systems biology offers an opportunity to study
how the phenotype is generated from the genotype
and with it a glimpse of how evolution has
crafted the phenotype. M. Kirschner, Cell, 2005
4
The links between evolution of mammalian
gene sequences
and gene expression networks
5
Gene sequences and expression in evolution
Evolution of expression regulation may be even
more important than sequence
evolution
Britten, R. J. Davidson, E. H. Repetitive and
non-repetitive DNA sequences and a speculation
on the origins of evolutionary noveltyQ Rev
Biol (1971) 46111-138
King, M-C. Wilson, A. C. "Evolution at
two levels in humans and chimpanzees
Science (1975) 188 107-116

6
Gene sequences and expression in evolution
Now we can study both on genome scale and at high
resolution
Schena, M. et al. Quantitative monitoring
of gene expression patterns with a
complementary DNA Microarray
Science (1995) 270 467-470
Fleischmann, R. D. et al. Whole-genome
random sequencing and assembly
of Haemophilus influenzae Rd.
Science (1995) 269 496-512
7
Motivation
Gene sequence and expression divergence
? comparative analysis of human rodent gene
sequences substitution rates selection ?
comparative analysis of human rodent gene
expression profiles correlations coexpress
ion network topology organization
Questions 1 - Are sequence evolutionary rates
related to expression levels and patterns?
2 - What are the topological properties of gene
co-expression networks? 0 3 How does gene co-e
xpression network topology relate to sequence
evolution? 4 - How does natural selection affec
t gene expression divergence (convergence)?
8
Sequence analysis
Mouse
Human
  • Identify orthologs
  • 2. Align sequences
  • 3. Calculate substitution rates

gi13376510refNM_025000.1NP_079276.1
CTTTGAGGTGTCATCCCTTTTGGAGAATGCTTTTCAGATTGGAGGCCATC
---CTTGGCACTACATCGTC... gi37574097refNM_19800
5.1NP_932122.1 CTTTGAAGTTTCATCTCT---GGAGAACGCATT
CCAGATCGGAGGCCATCAAACTTGGCACTACATCATC...
CDS dN nonsynonymous nucleotide substitution r
ate dS synonymous nucleotide substitution rate
5 3 UTRs d nucleotide substitution
rate 5 promoters nt word counts cis-regula
tory sequences
9
  • Expression profile analysis

Su, A. I. et al. Large scale analysis of the
human and mouse transcriptomes
PNAS (2002) 99 4465 Su, A. I. et al. A gene
atlas of the mouse and human protein-encoding
transcriptomes PNAS (2004) 101 6062
10
  • Expression profile analysis

Pearson correlation coefficient used as the
measure of similarity/distance between expression
profiles
11
Results
  • Co-expression network

12
Results
Expression breadth vs. evolutionary rate
expression breadth
P 5.5e-7 P 2.1e-5
P ns P 4.9e-5
13
Results
Expression level vs. evolutionary rate
expression level
P 5.5e-7 P 1.1e-4
P ns P 4.9e-5
14
Expression vs. sequence divergence
  • ? 19 tissues shared between human mouse
  • species-specific profiles of orthologous genes
  • are strongly correlated

within
between
cumulative frequency
r
15
Results
Expression divergence vs. sequence divergence
correlation
P ns P ns
P ns P ns
However, divergence of expression profiles is not
correlated
to sequence divergence of orthologs
16
Sequence versus expression divergence
  • divergence of sequence and expression pattern
    not correlated
  • but expression pattern divergence not neutral
  • are distinct evolutionary mechanisms
    responsible?

sequence - purifying selection
divergence from CA
expression - adaptive selection
convergence to coexpression
A B C
A C B
CA
CA
CA
CA
17
Results
  • Co-expression network

Network parameters
1 node degree (k) number of links p
er node 2 degree distribution P(k) probabil
ity that a node has k links 3 cluster
ing coefficient (C) ratio of the number
of observed links connecting the kI
neigbors of node i to the possible number
of links
Network models
Barabasi Oltvai (2004) Nat Rev Genet. 5101
18
Results
  • Co-expression network

19
http//www.cytoscape.org
20
Results
  • Co-expression network

Network models
Node degree distribution
  • The co-expression network is
  • scale-free

Barabasi Oltvai (2004) Nat Rev Genet. 5101
21
Results
  • Co-expression network

Network models
Clustering coefficient (C) x node degree
C 22x stronger than expected
  • The co-expression network is
  • not hierarchical

Barabasi Oltvai (2004) Nat Rev Genet. 5101
22
Results
Node degree vs. evolutionary rate
co-expressed genes
P 4.9e-5 P 1.7e-2
P ns P 5.8e-2
  • Co-expression network hubs evolve slowly

23
  • Network comparison degree distribution

Human
Mouse
158K edges 7,208 non-zero degree vertices (avg
degree 44)
178K edges 7,730 non-zero degree vertices (avg
degree 46)
24
  • Network comparison clustering coefficient

Human 0.41
Mouse 0.44
? Networks contain very dense areas but no
evidence of
hierarchical structure
25
  • Network comparison global vs. local properties

-Globally, the mouse and human networks are very
similar for all networks we observe
1 - power-law degree distributions
2 - high clustering coefficient
3 - many densely connected components
independent of measure used to connect n
odes Euclidean dist, Manhattan dist, D
ot product etc -Question how similar are they
at the local level?
26
Network comparison intersection graph
(edges connecting orthologs)
Pearson correlation, Cosine Euclidean, Manhatte
n, Jensen-Shannon
  • only a small percentage of the edges is
    preserved

27
Results
  • Network comparison intersection graph
    (significance)

test against randomly re-wired networks (with
degree distribution preserved)
PCC
Euclidean dist
  • the observed conservation is highly significant

  • - even if low

28
Conclusions
  • Sequence evolution is strongly linked to
    expression network topology e.g, highly
    connected genes evolve slowly
  • However, this relationship is not simple
  • -sequence divergence between orthologs is not
    correlated to expression divergence
  • -expression networks seem to rewire rapidly
  • during evolution despite conservation of the
  • global structure
  • Thus, as opposed to sequence evolution, which is

  • dominated by divergence, expression networks
    may evolve
  • primarily by convergence of regulatory elements

Jordan IK, Marino-Ramirez L, Wolf YI, Koonin
EV.Conservation and coevolution in the
scale-free human gene coexpression network.Mol
Biol Evol. 2004 Nov21(11)2058-70
Jordan IK, Marino-Ramirez L, Koonin
EV.Evolutionary significance of gene expression
divergence.Gene. 2005 Jan 17345(1)119-26
Tsaparos, P, Jordan, IK, Koonin EV, in preparation
29
Unifying measures of gene function and evolution
30
Evolutionary systems biology
  • In principle, we address the classical problem
    the relationship between the (largely neutral?)
    evolution of the genome and the (largely
    adaptive) evolution of the phenotype
  • In practice, the progress of genomics other
    OMICS allows us to measure, on whole-genome
    scale, the effects of all kinds of molecular
    phenotypic characteristics (expression level,
    protein-protein interactions etc etc) on
    evolutionary rates
  • Can we synthesize these measurements to produce
  • a coherent picture of the links between
    phenotypic and genomic evolution?

31
The Cautionary Tale
"It was six men of Indostan / To learning much
inclined, Who went to see the Elephant / (Though
all of them were blind), That each by observati
on / Might satisfy his mind " (J.G. Saxe)
32
The Cautionary Tale
"each was partly in the right / And all were in
the wrong"
(J.G. Saxe)
33
Evolution Rate Fitness Effect
1974 Kimura Ohta there should be a correlation
between 1976 Zuckerkandl the evolution rate and
importance 1977 Wilson et al. of a gene (knocko
ut fitness effect)
1999 Hurst Smith no, there isn't (mammalian
data)
2001 Hirsh Fraser yes, there is (the other guys
had
2002 Jordan et al. small biased dataset)
2003 Pal et al. no, there isn't (expression
level 2004 Rocha Danchin determines both ER and
KE)
2003 Hirsh Fraser yes, there is (we have
double- 2003 Krylov et al. checked and still ther
e is)
2005 Wall et al. there is weak but highly
significant
2005 Zhang He 2005 Drummond et al.
Consensus, finally?
34
Different Faces of the Hypercube?
Pairwise correlations
Synthesis
35
Analysis of Multidimensional Data
36
Analysis of Multidimensional Data
37
Analysis of Multidimensional Data
PC1
PC3
PC2
Principal Components Analysis (PCA) introduces a
new orthogonal coordinate system where axes are
ranked by the fraction of original variance
accounted for.
38
The Data Set KOGs
  • Ideally, we would like to obtain and synthesize
    data for individual genes in precise space-time
    coordinates (e.g., instant evolutionary rates)
  • However
  • some of the parameters (variables) are not easily
    measurable (if defined at all) for genes in
    extant species e.g. rate of evolution
  • much of the data are inherently noisy, either due
    to technical problems or true biological
    variation e.g. fitness effect of gene
    disruption.
  • Thus, we analyze orthologous protein sets, using
    the proteins from different species to derive
    some data and smooth out variations in other.
  • Practically, this means using the KOG dataset
    (with additions) 10058 KOGs from 15 species.
  • Koonin et al. A comprehensive evolutionary
    classification of proteins encoded in complete
    eukaryotic genomes.Genome Biol. 20045(2)R7

39
The Data Set KOGs
plants
Amoebozoa
Fungi
Animals
Original KOGs for some species, "index orthologs"
for other.
40
Variables Gene Loss
Propensity for Gene Loss (PGL), introduced by
Krylov et al. (Genome Res. 13, 2229-2235, 2003).
Computed from KOG phyletic pattern.
Originally an empirical measure (Dollo parsimony
reconstruction of events ratio of branch
lengths). In this work employs an Expectation
Maximization algorithm.
41
Variables Gene Duplication
Number of Paralogs, average number observed for a
given KOG. Example KOG0417 (Ubiquitin-protein li
gase) and KOG0424 (Ubiquitin-protein ligase).
42
Variables Evolution Rate
Select a taxon Build an alignment (MUSCLE) Comp
ute distance matrix (PAML) Select minimum distan
ce between members of the two subtrees of the
group.
Ascomycota Sordariomycetes vs. Yeasts
43
Variables Expression Level
Expression Level data for S. cerevisiae, D.
melanogaster and H. sapiens were downloaded from
UCSC Table Browser (hgFixed).
Organism Table exp. probes KOGs
Sacce yeastChoCellCycle 17 6602 3030
Drome arbFlyLifeAll 162 4921 2617
Homsa gnfHumanAtlas2All 158 10197 3872
Standardized (?0 ?1) log values maximum
expression level among paralogs was used to
represent a KOG.
44
Variables Interactions
Physical Protein Protein and Genetic Interactions
(PPI and GI) data for S. cerevisiae, C. elegans
and D. melanogaster were downloaded from GRID FTP
site. Maximum number of interaction partners amo
ng paralogs was used to represent a KOG.
45
Variables Lethality
Lethality of Gene Knockout data for S. cerevisiae
were downloaded from MIPS FTP site (0/1 values).
Embryonic Lethality of RNAi Interference data for
C. elegans were taken from Kamath et al., 2003
(0/1 values).
46
Missing Data
Total 32 variables in 10058 KOGs lots of
missing data. Complete data (all 34 parameters av
ailable) 381 KOGs too few. Combined data 7 va
riables, 3724 KOGs (after removal of outliers).
Example evolution rate.
At.Os Sc.Ca Mg.Nc Hs.Mm. Pl.MF
KOG0009 - 0.168 0.300 - 0.405 KOG0010 0.671 1.252
0.606 0.087 1.492 KOG0011 0.905 1.698 0.428 0.07
3 1.547 KOG0012 - 2.238 0.665 0.244 - KOG0013 0.
355 - - 0.014 1.343 KOG0014 1.913 4.041 - 0.126 2
.840 KOG0015 - 2.286 0.400 0.027 - KOG0016 - - 0
.506 0.380 - 0.667 1.864 0.521 0.075 1.910
At.Os Sc.Ca Mg.Nc Hs.Mm. Pl.MF
- 0.090 0.575 - 0.212 1.006 0.672 1.162 1.166 0
.781 1.358 0.911 0.821 0.984 0.810 - 1.201 1.2
75 3.275 - 0.532 - - 0.181 0.703
2.869 2.168 - 1.692 1.487 - 1.227 0.767 0.365 -
- - 0.970 5.087 -
Average 0.293 0.957 0.977 1.917 0.472 2.054
0.786 3.028
47
Correlations between variables
NP PPI GI PGL ER EL KE NP - PPI 0.205 - GI 0.0
70 0.041 - PGL 0.001 -0.117 0.006 - ER -0.065 -0
.185 0.047 0.147 - EL 0.329 0.222 -0.040 -0.113 -
0.278 - KE 0.017 0.219 -0.090 -0.193 -0.157 0.137
-
48
Two Classes of Variables
Observation on the pattern of pairwise
relationships in the data "phenotypic" and
"evolutionary" variables.
49
Structure of the correlation tablephenotypic
and evolutionary variables
NP PPI GI PGL ER EL KE NP - PPI 0.205 - GI 0.0
70 0.041 - PGL 0.001 -0.117 0.006 - ER -0.065 -0
.185 0.047 0.147 - EL 0.329 0.222 -0.040 -0.113 -
0.278 - KE 0.017 0.219 -0.090 -0.193 -0.157 0.137
-
GI redundant pathways, backup
KE essential function, no backup
50
PCA of the Data Space
PC.1 PC.2 PC.3 NP 0.35 0.59 0.16 PPPI 0.46 0.09
-0.21 GPPI -0.04 0.46 -0.79 PGL -0.30 0.40 0.44
ER -0.43 0.16 -0.12 EL 0.51 0.23 0.28 KE 0.37
-0.44 -0.15 -------------------------------------
---- Var. 26.10 16.95 14.26
51
PCA of the Data Space
1st vs 2nd PC
52
PCA of the Data Space
2nd vs 3rd PC
53
Positive and negative contributions to PC1
PC1 status/importance of a gene
54
Positive and negative contributions to PC2
adaptable"
"rigid"
PC2 genes adaptability
55
Positive and negative contributions to PC3
"edge"
"core"
PC3 a different (non-interactive) kind of
adaptability
56
Interpretation of the first 3 PCs
PC3 Adaptability 2"
PC2 "Adaptability1"
PC1 "Status"
57
Prediction of the adaptability model expression
profile skew
Skew 0
Skew 0
Human expression profiles
PC2 LOW PC2 HIGH P Status LOW 1.9 2.3 2.9E
-07
Status HIGH 2.1 2.6 3.6E-12
PC2 is a measure of "Adaptability"
58
The status-adaptability model of the
phenotype-genotype-evolution relationship
59
Status and Adaptability of Gene Classes
Classification of KOGs into 4 major categories
60
Status and Adaptability of Gene Classes
High Adaptability
neutral
High Status Low Adaptability
Low Status
Classification of KOGs into 4 major categories
61
Status and Adaptability of Genes
Low Status
High Status
Replication Repair KOGs
62
Status and Adaptability of Genes
Variable Repair
Core Replication
Replication Repair KOGs
63
Status and Adaptability of Genes
Cytoplasmic and Mitochondrial ribosomal proteins
64
Status and Adaptability of Genes
Replication Licensing Complex and Histones
65
Conclusions
  • Two composite variables "status" and
    "adaptability" dominate the multidimensional
    parameter space of quantitative genomics
  • The notion of status provides biologically
    relevant null hypotheses regarding the
    connections between various phenotypic and
    evolutionary variables
  • Breaks in the pattern may indicate non-trivial
    links - targets for further investigation
  • Functional groups of genes show distinctive
    patterns of status and adaptability
  • Wolf YI, Carmel L, Koonin EV, submitted

66
The Cautionary Tale
?
Wrong again?!
67
Acknowledgements
I. King Jordan
Liran Carmel
Yuri I. Wolf
Artwork Olga Karengina (LSDN, Moscow)
Panayiotis Tsaparas
Leonardo Mariño-Ramírez
Write a Comment
User Comments (0)
About PowerShow.com