Title: Model-based investigation of bacterial metabolism using gene essentiality data.
1Model-based investigation of bacterial metabolism
using gene essentiality data.
- PhD defense Maxime Durot
- PhD prepared in the
- Computational Systems Biology Group at Genoscope
- under the supervision of
- Vincent Schachter Jean Weissenbach
2Motivation goals of the thesis
3Metabolism
Picture Roche Applied Science
http//www.expasy.org/tools/pathways/
4Information from two scales
genome
metabolism
phenotype
molecular scale
cellular scale
5Mutant phenotyping experiments
Wild-type bacterium
Wild-type growth phenotype
Gene
Genome
Knock-out mutant
Mutant growth phenotype
Deleted gene
- Mutant phenotype
- No growth gene is essential on the tested
environment - Growth gene is dispensable on the tested
environment - Experiments are performed genome-wide for a
growing number of organisms (Gerdes et al, Curr
Opin Biotechnol 2006)
6Confronting the two scales is complex
7Modeling metabolism can help
(Stelling, Curr Opin Microbiol. 2004)
8The constraint-based modeling framework
A(ext)
B(ext)
P(ext)
- Key concepts
- variable of interest reactions fluxes
R2
R1
B
R3
R4
R5
R6
R7
A
C
P
R9
R8
D
9The constraint-based modeling framework
A(ext)
B(ext)
P(ext)
- Key concepts
- variable of interest reactions fluxes
0.5
1.5
B
1
0
0
0.5
0.5
A
C
P
1
1
D
10The constraint-based modeling framework
A(ext)
B(ext)
P(ext)
- Key concepts
- variable of interest reactions fluxes
- constraint-based approach applying constraints
to the model reduces the possible flux
distributions
R2
R1
B
R3
R4
R5
R6
R7
A
C
P
R9
R8
D
Admissible flux distributions
v3
v2
v1
11The constraint-based modeling framework
A(ext)
B(ext)
P(ext)
- Key concepts
- variable of interest reactions fluxes
- constraint-based approach applying constraints
to the model reduces the possible flux
distributions - Classical constraints
- metabolism in steady-state metabolic
concentrations remain constant - some reactions are irreversible
- flux values are bound to a maximal value
R2
R1
B
R3
R4
R5
R6
R7
A
C
P
R9
R8
D
Admissible flux distributions
Applicable at genome scale
12The constraint-based modeling framework
A(ext)
B(ext)
P(ext)
- Key concepts
- variable of interest reactions fluxes
- constraint-based approach applying constraints
to the model reduces the possible flux
distributions - explore the space of admissible flux
distributions - Classical constraints
- metabolism in steady-state metabolic
concentrations remain constant - some reactions are irreversible
- flux values are bound to a maximal value
R2
R1
B
R3
R4
R5
R6
R7
A
C
P
R9
R8
D
Admissible flux distributions
Applicable at genome scale
13Models and gene essentiality datasets
- Constraint-based models can predict growth
phenotypes for genetic and environmental
perturbations (Price et al, Nat Rev Microbiol
2004)(Durot et al, FEMS Microbiol Rev 2009) - Gene essentiality datasets have been used to
provide rough assessments of metabolic models
(Covert et al, Nature 2004)(Joyce et al,
J Bacteriol 2006) - Compute predictive accuracy for gene essentiality
prediction - List of inconsistencies, used as a starting point
for curation - Can gene essentiality datasets be used more
systematically for metabolic model assessment
refinement ?
14Objectives of the thesis
- Develop a framework for the refinement of
metabolic models using gene essentiality data
15Context the Metabolic Thesaurus project
- Acinetobacter baylyi ADP1
- ?-proteobacteria, Pseudomonales group
- Nutritionally versatile, strictly aerobic
- Non-pathogenic
- Evidence of xenobiotic degradation capabilities
- Experimental context
- Reliable genome annotation (Barbe et al, Nucleic
Acics Res 2004) - Comprehensive knock-out mutant collection (de
Berardinis et al, Mol Syst Biol 2008) - Phenotyping capability complete conditional
essentiality datasets on several media (de
Berardinis et al, Mol Syst Biol 2008)
16Objectives of the thesis
- Develop a framework for the refinement of
metabolic models using gene essentiality data - Application to Acinetobacter baylyi metabolism
- reconstruct a global metabolic model from its
genome annotation - assess and refine the model using mutant
phenotypes - point out poorly understood metabolic events
requiring further experimental investigation
17Outline
- A/ A formal framework for comparing predicted and
experimental gene essentialities - B/ Reconstruction and refinement of A. baylyi
metabolic model using mutant phenotypes - C/ Automated reasoning with metabolic models and
essentiality data
18A/ A formal framework for comparing predicted and
experimental gene essentialities
19Model refinement using experimental data
Improved metabolic reconstruction
20Formal representation of a metabolic model
- Model refinement using large-scale genetics data
requires - Computer generation of variants of models
- Understanding the impact of model variations on
phenotype predictions - Problem
- Constraint-based models appear to be complex
mathematical objects - An appropriate representation of metabolic models
is required to perform automated reasoning with
essentiality
21Formal representation of a metabolic model
Genetic background
GPR
Set of reactions fulfilling the modeling
constraints
- Boolean gene-reaction associations (GPR)
Gene
g1
g2
Boolean rules
Protein
p1
p2
r1 g1
r2 g1 and g2
Complex
c1
Reaction
r1
r2
22Formal representation of a metabolic model
Genetic background
GPR
Metabolites of the medium
Set of reactions fulfilling the modeling
constraints
Producible metabolites
- Boolean gene-reaction associations (GPR)
- Set of metabolic reactions (NETWORK)
23Formal representation of a metabolic model
Genetic background
essential biomass precursors
GPR
Metabolites of the medium
Set of reactions fulfilling the modeling
constraints
Producible metabolites
- Boolean gene-reaction associations (GPR)
- Set of metabolic reactions (NETWORK)
- List of essential biomass precursors (BIOMASS)
24Predicting mutant phenotypes
genetic perturbation
25Confronting model predictions with experiments
- Comparison of predictions with experiments reveal
inconsistencies
26Classifying inconsistencies according to likely
cause correction type
Type of inconsistency
False essential
False dispensable
GPR
decrease impact of gene deletion on reaction set
increase impact of gene deletion on reaction set
- add an alternate enzyme - gene is a
non-essential subunit of a complex - reaction may
occur spontaneously
- remove an isozyme - form a complex instead of
isozyme - gene has an additional essential role
NETWORK
augment reaction set
reduce reaction set
- remove or block an alternate pathway
- add an alternate pathway
BIOMASS
reduce biomass requirements
augment biomass requirements
- remove a biomass precursor
- add a biomass precursor
27B/ Reconstruction and refinement of A. baylyi
metabolic model using mutant phenotypes
28A. baylyi model reconstruction
- Two step process
- Identify all metabolic reactions occurring in the
cell - Adapt representation to modeling requirements
291/ Metabolic network reconstruction
302/ Adapt to modeling requirements
- Specific developments made for A. baylyi model
- Automated expansion of generic pathways
- Inference of enzyme complexes by homology to E.
coli
31Initial model reconstruction
- 859 reactions using 697 metabolites, linked with
787 genes - 109 metabolites that are exchangeable with the
environment
32Evidence supporting the enzymatic function of
model genes
33Experimental datasets
Dataset 2
- Genome-wide gene essentialities from A. baylyi
mutant collection construction - Selection on succinate minimal medium
- Gene essentiality results
- Growth phenotypes of wild-type strain on 190
carbon sources - Results
- Growth on 45 carbon sources
- No growth on remaining 145 carbon sources
(de Berardinis et al, Mol Syst Biol 2008)
34Iterative refinement of A. baylyi model
Initial reconstruction
Dataset 1
from
growth phenotypes of wild-type strain on 190
carbon sources
genome annotation
pathway databases
literature
1 strain x 190 media
iAbaylyiv1
35Model refinement using dataset 1
iAbaylyiv1
86
overall prediction accuracy
24 / 45 (53)
correctly predicted carbon sources
140 / 145 (97)
correctly predicted non carbon sources
36Iterative refinement of A. baylyi model
Initial reconstruction
Dataset 1
from
growth phenotypes of wild-type strain on 190
carbon sources
genome annotation
pathway databases
literature
1 strain x 190 media
iAbaylyi
v1
Model accuracy
88 on dataset 1
37Iterative refinement of A. baylyi model
Initial reconstruction
Dataset 1
from
growth phenotypes of wild-type strain on 190
carbon sources
genome annotation
pathway databases
literature
1 strain x 190 media
iAbaylyi
v1
Dataset 2
Model accuracy
88 on dataset 1
genome-wide gene essentialities from A. baylyi
mutant collection construction
3093 strains x 1 medium
Gene
Status
Gene
Status
ACIAD0001
NA
ACIAD0001
NA
ACIAD0002
Essential
ACIAD0002
Essential
ACIAD0003
Dispensable
ACIAD0003
Dispensable
ACIAD0004
Essential
ACIAD0004
Essential
ACIAD0005
Dispensable
ACIAD0005
Dispensable
ACIAD0006
Dispensable
ACIAD0006
Dispensable
38Model refinement using dataset 2
iAbaylyiv2
88
overall prediction accuracy
187 / 251 (75)
correctly predicted essential genes
489 / 516 (95)
correctly predicted dispensable genes
39Iterative refinement of A. baylyi model
Initial reconstruction
Dataset 1
from
growth phenotypes of wild-type strain on 190
carbon sources
genome annotation
pathway databases
literature
1 strain x 190 media
iAbaylyi
v1
Dataset 2
Model accuracy
88 on dataset 1
genome-wide gene essentialities from A. baylyi
mutant collection construction
3093 strains x 1 medium
Gene
Status
Gene
Status
ACIAD0001
NA
ACIAD0001
NA
ACIAD0002
Essential
ACIAD0002
Essential
ACIAD0003
Dispensable
ACIAD0003
Dispensable
ACIAD0004
Essential
ACIAD0004
Essential
ACIAD0005
Dispensable
ACIAD0005
Dispensable
ACIAD0006
Dispensable
ACIAD0006
Dispensable
40Iterative refinement of A. baylyi model
Initial reconstruction
Dataset 1
from
growth phenotypes of wild-type strain on 190
carbon sources
genome annotation
pathway databases
literature
1 strain x 190 media
iAbaylyi
v1
Dataset 2
Model accuracy
88 on dataset 1
genome-wide gene essentialities from A. baylyi
mutant collection construction
3093 strains x 1 medium
Gene
Status
Gene
Status
ACIAD0001
NA
ACIAD0001
NA
ACIAD0002
Essential
ACIAD0002
Essential
ACIAD0003
Dispensable
ACIAD0003
Dispensable
ACIAD0004
Essential
ACIAD0004
Essential
ACIAD0005
Dispensable
ACIAD0005
Dispensable
Dataset 3
ACIAD0006
Dispensable
ACIAD0006
Dispensable
growth phenotypes of A. baylyi mutant collection
on 8 minimal media
Quantitative
growth
measure
2350 strains x 8 media
41Model refinement using dataset 3
iAbaylyiv3
93
overall prediction accuracy
correctly predicted gene phenotypeswith 1
essentiality
16 / 36 (44)
406 / 419 (97)
correctly predicted gene phenotypeswith no
essentiality
42Iterative refinement of A. baylyi model
Initial reconstruction
Dataset 1
from
growth phenotypes of wild-type strain on 190
carbon sources
genome annotation
pathway databases
literature
1 strain x 190 media
iAbaylyi
v1
Dataset 2
Model accuracy
88 on dataset 1
genome-wide gene essentialities from A. baylyi
mutant collection construction
3093 strains x 1 medium
Gene
Status
Gene
Status
ACIAD0001
NA
ACIAD0001
NA
ACIAD0002
Essential
ACIAD0002
Essential
ACIAD0003
Dispensable
ACIAD0003
Dispensable
ACIAD0004
Essential
ACIAD0004
Essential
ACIAD0005
Dispensable
ACIAD0005
Dispensable
Dataset 3
ACIAD0006
Dispensable
ACIAD0006
Dispensable
growth phenotypes of A. baylyi mutant collection
on 8 minimal media
Quantitative
growth
measure
2350 strains x 8 media
43GPR correction example
- ACIAD0661 (hisG) and ACIAD1257 (hisZ) were
initially assigned as isozymes of ATP
phosphoribosyl transferase reaction. - Observed essentiality of both genes suggests they
are both necessary to the activity. - Further examination of the literature confirms
that both proteins form an enzymatic complex
(Sissler et al, PNAS 1999)
PRPP
ATP phospho-ribosyltransferase
ACIAD0661 OR ACIAD1257
phosphoribosyl-ATP
protein
histidine
essential gene or reaction
dispensable gene or reaction
biomass precursor
44GPR correction example
PRPP
PRPP
ATP phospho-ribosyltransferase
ACIAD0661 OR ACIAD1257
ACIAD0661 AND ACIAD1257
phosphoribosyl-ATP
phosphoribosyl-ATP
protein
protein
histidine
histidine
essential gene or reaction
dispensable gene or reaction
biomass precursor
45Network correction example
- ACIAD0822-0824 (gatABC) annotated as an
aspartyl/glutamyl-tRNA amidotransferase - gatABC are essential only way to produce
asparagine. - ACIAD1920 (glnS) catalyzes direct charging of
glutamine on its tRNA - Essentiality of ACIAD1920 suggests that gatABC
pathway is not effective for glutamine
aspartate
glutamate
ACIAD3371 ORACIAD0272
ACIAD0609
glutamate-tRNA(gln)
aspartate-tRNA(asn)
glutamine
ACIAD0822 AND ACIAD0823 AND ACIAD0824
ACIAD0822 AND ACIAD0823 AND ACIAD0824
ACIAD1920
asparagine -tRNA(asn)
glutamine -tRNA(gln)
protein
protein
essential gene or reaction
dispensable gene or reaction
biomass precursor
46Network correction example
aspartate
aspartate
glutamate
ACIAD3371 ORACIAD0272
ACIAD0609
ACIAD0609
glutamate-tRNA(gln)
aspartate-tRNA(asn)
aspartate-tRNA(asn)
glutamine
glutamine
ACIAD0822 AND ACIAD0823 AND ACIAD0824
ACIAD0822 AND ACIAD0823 AND ACIAD0824
ACIAD0822 AND ACIAD0823 AND ACIAD0824
ACIAD1920
ACIAD1920
asparagine -tRNA(asn)
asparagine -tRNA(asn)
glutamine -tRNA(gln)
glutamine -tRNA(gln)
protein
protein
protein
protein
essential gene or reaction
dispensable gene or reaction
biomass precursor
47A. baylyi model refinement
48Online prediction of mutant phenotypes
(Le Fèvre et al, Bioinformatics 2009)
49C/ Automated reasoning with metabolic models and
essentiality data
50Automated reasoning on gene-reaction associations
GPR
- Use phenotypes as specifications for
gene-reaction associations - Assume NETWORK and BIOMASS parts of the model are
correct - For each inconsistency
- search all GPRs compatible with experimental data
511/ Deduce impact scenarios from phenotypes
- Equivalent view of gene-reaction associations
- Deletion impact
- Impact (deletion of G1,,Gn) R1,..,Rp
inactivated - Key idea
- Phenotypes of reaction deletions can be predicted
- Compatible deletion impacts must follow the
rules - ? lethal gene deletions must impact an essential
reaction set - ? viable gene deletions must not impact any
essential reaction set
521/ Deduce impact scenarios from phenotypes
- For each inconsistency, generate all possible
impact scenarios - Closed-world assumption
- the set of genes potentially linked to a reaction
is known
531/ Deduce impact scenarios from phenotypes
- For each inconsistency, generate all possible
impact scenarios - Closed-world assumption
- the set of genes potentially linked to a reaction
is known
scenario 1
impact
541/ Deduce impact scenarios from phenotypes
- For each inconsistency, generate all possible
impact scenarios - Closed-world assumption
- the set of genes potentially linked to a reaction
is known
scenario 2
impact
551/ Deduce impact scenarios from phenotypes
- For each inconsistency, generate all possible
impact scenarios - Closed-world assumption
- the set of genes potentially linked to a reaction
is known
scenario 3
impact
561/ Deduce impact scenarios from phenotypes
- For each inconsistency, generate all possible
impact scenarios - Closed-world assumption
- the set of genes potentially linked to a reaction
is known
scenario 4
impact
572/ Implement proposed impacts with GPR
- Choose an impact scenario
- For each reaction, find Boolean rules
implementing the impacts - analogy to logic circuit design
- GPR specificity no negation rule
- monotonic increasing Boolean function (F(0,0)
F(1,0) F(1,1)) - constrains the possible implementations
582/ Implement proposed impacts with GPR
G1
- Specifications for R1
- G1 deletion does not impact R1
- G2 deletion does not impact R1
- G3 deletion does impact R1
G2
R1
G3
R2
G4
scenario 1
G1 G2 G3 GPR
0 0 0
1 0 0
0 1 0
1 1 0 0
0 0 1
1 0 1 1
0 1 1 1
1 1 1
G1 G2 G3 GPR
0 0 0 0
1 0 0 0
0 1 0 0
1 1 0 0
0 0 1
1 0 1 1
0 1 1 1
1 1 1 1
monotony
592/ Implement proposed impacts with GPR
- Generate all possible cases
G1 G2 G3 GPR
0 0 0 0
1 0 0 0
0 1 0 0
1 1 0 0
0 0 1 ?
1 0 1 1
0 1 1 1
1 1 1 1
GPR G3
GPR G3 and (G1 or G2)
602/ Implement proposed impacts with GPR
- Generate all possible cases
- Choose closest behavior to the original GPR
- Propose experiment to fully determine the Boolean
rule - G2, G3 double deletion here
G1 G2 G3 GPR
0 0 0 0
1 0 0 0
0 1 0 0
1 1 0 0
0 0 1 ?
1 0 1 1
0 1 1 1
1 1 1 1
61Comparing AutoGPR proposals with expert
interpretations
- Comparison with manual corrections of A. baylyi
model
62Comparing AutoGPR proposals with expert
interpretations
- Comparison for S. cerevisiae model
- iND750 model predictions compared with gene
essentiality data on 8 environments (Duarte et
al, Genome Res 2004) - Inconsistent predictions were manually
interpreted (not corrected)
63Number of generated proposals for A. baylyi
64Reducing complexity
- First, simply test the existence of GPR
corrections - Impose similar reactions to have similar GPR
65Examining corrections across environments
- GPR corrections can contradict each other across
environments
(Durot et al, BMC Syst Biol 2008)
- Possible interpretations
- Inconsistencies between experimental conditions
- Error in NETWORK or BIOMASS model components
- GPR are not constant across environments
- Conditional expression of genes
- Regulatory interactions intervene
66Conclusion perspectives
67Main contributions
- Reconstruction of a global metabolic model of A.
baylyi - Development of a framework for interpreting
inconsistent growth phenotype predictions - Systematic interpretation of A. baylyi mutant
phenotypes using its metabolic model - Design of an automated method to reason on GPR
corrections from gene essentialities
68Perspectives
- A. baylyi metabolic model
- Tool to integrate further experimental data
- RNA-seq , metabolomics on A. baylyi and mutants
- Metabolic model reconstruction
- Automate the reconstruction process from genome
annotation - Systematically assess model correctness using
high-throughput experimental data - gt Microme European project to be started
69Acknowledgments
Supervisors
Vincent Schachter Jean Weissenbach
Metabolic Thesaurus experimental work
Acinetobacter baylyi annotation
Marcel Salanoubat Véronique de Berardinis Alain
Perret Marielle Besnard Christophe
Lechaplais Agnès Pinet
Claudine Médigue David Vallenet Valérie
Barbe Georges Cohen Nuria Fonknechten Annett
Kreimeyer
Computational Systems Biology group
François Le Fèvre Gilles Vieira Richard
Baran Pierre-Yves Bourguignon Serge Smidtas (
former members)
70Discussion