Title: The Bacterial PanGenome C' Donati Novartis Vaccines
1The Bacterial Pan-GenomeC. DonatiNovartis
Vaccines Diagnostics
2Outline
- Conventional approach to vaccine development
- The reverse vaccinology approach from sequence
to vaccine - candidates
- When a single sequence is not sufficient the
GBS case - An example of what you can discover by comparing
more strains the pili in Streptococci - How many distinct genes can you find in a single
species - The bacterial Pan-genome
- Implications for vaccine design
- Population based vaccine design
3Conventional vaccine development
Killed vaccines
Cultivate Microorganism
Live attenuated vaccines
Subunit vaccines
Test immunogenicity
Test Convalescent sera
Antigen selection
Purify components
5-15years
Identify components
Clone genes
5-15 Years
Immunogenicity testing in animal models
VACCINE DEVELOPMENT
Vaccine
4Current approaches to a novel MenB vaccine
5The reverse vaccinology approach the MenB Vaccine
2,272,325 bp GC content 51.3 2,158 predicted
genes (83 of genome) 1,158 assigned putative
functions (54)
6Computer assisted antigen prediction
Protein are selected on the basis of their
localization. Surface associated or secreted
proteins are considered good vaccine candidates,
due to their exposure to the host immune system
Secreted proteins Outer membrane
proteins Porin-like structures Lipoproteins Perip
lasmic proteins Inner membrane proteins
7Selection procedure flowchart
ORF prediction on the partial genomic
sequence (Glimmer)
Homology search for all the predicted
ORFs (PSI-Blast, Fasta)
Hits found (function assigned)
No hits found (hypothetical proteins
Localization prediction (PSORT, SignalP, TMPRED)
SELECTED
DISCARDED
8MenB Vaccine The Genomic Approach
600 potential vaccine candidates identified 350
proteins successfully expressedin E.coli 344
proteins purified and usedto immunize mice 355
sera tested 91 novel surface-exposedproteins
identified 28 novel proteinshave
bactericidalactivity
Clinical trial
5 vaccine candidates
9Reverse vaccinology is now a routine discovery
approach
Group A Streptococcus
Group B Sreptococcus
Chlamydia trachomatis and Pneumoniae
Pneumococcus
Tuberculosis
Yersinia pestis
Gonococcus
Porphyromonas gingivalis
ExPEC Extraintestinal E. coli
Staphylococcus
Malaria
10Genetic variability of GBS isolates compared to a
reference strain
CGH against multiple isolates shows that
approximately 18 of the reference genome is
absent from at least one strain
11The eight sequenced GBS strains
12Comparative analysis of the eight GBS genomes
13Lessons learned from genomics
- Genetic variability within a bacterial species is
much larger than expected. - The genome of each isolate of a bacterial
population contains some features that are not
shared by other strains - How many genomes are necessary to define the
genetic content of a bacterial species? - How large is the genetic repertoire of a single
bacterial species? - How can we classify the isolates of a bacterial
species?
14How many new genes will you find if you sequence
one more strain?
Combinations8! (n-1)!(8-n)!
56
200 150 100 50
168
280
Number of new genes discovered
n
Number of genomes sequenced
15The number of new genes is well described by an
exponentially decaying function
16The number of new genes does not go to zero
increasing the number of genomes
- Average number of new genes found in each new
genome sequenced - 33 10 (95 CL)
17The GBS Pangenome
Since each new strain introduces new genes, the
size of the pan-genome grows linearly with the
number of the sequenced strains.
18The Core Genome
- The core genome of a species is the set of genes
common to all isolates - the core genome of GBS includes 1800 distinct
genes
19Functional categories encoded by shared and
unique genes
- Core proteins encode the main metabolic
functions - The dispensable genome contains an high fraction
of proteins of unknown function or associated
with mobile elements
20Functional classes in Pangenomes components
Mobile extrachromosomal elements
Hypothetical.
Cell envelope Mobile elementsUnknown function.
Houskeeping Regulatory functions Cell
envelope Transport and binding.
21Universal protection of a protein-based GBS
vaccine
The protective vaccine was formulated by
including 4 antigens, none of which is shared by
all strains
22Two of the protective antigens were found to be
components of pili structures
- Presence of pili in pathogenic streptococci was
unknown (two papers published in Science, July
1st 2005) - The GBS program allowed discovery of pili also
in GAS (group A streptococcus) and pneumococcus
23Pili in pneumococcus
24Streptococcus Pathogenicity Islands encoding pili
25The pan-genome in different species
An open pan-genome E.coli
A closed pan-genome B.mallei
Exponential model
Power-law model
Power-law model
- Recent data show that the many species
(B.pseudomallei, E.coli, GAS, GBS, S.aureus,
Synechoccoccus) have a pan-genome much larger
than the genome of single strains, while the
pan-genome of some species (B.anthracis,
B.mallei, Chlamydia) can be determined by
sequencing a small number of strains - For large number of strains, probably the
pan-genome grows slower than previously estimated
26Open and closed Pangenome
27Are closed species real species?
28Homogenous vs heterogenous species
The choice of the sample can cause the appearance
of characteristic features in the pangenome plot
29Source of genetic diversity
30Future developments population driven reverse
vaccinology
- If new vaccines are going to be formulated using
non-core antigens, we need to understand the
population structure of the pathogen, in order to
select representative antigens from populations
relevant for disease - Genetic markers of bacterial population structure
- Serotype
- Different serotypes are determined by variations
in a small number of genes in the capsule
biosynthesis locus - New serotypes arise by a large number of distinct
mechanisms - MLST type
- MLST type depends on the allelic profile of a
small number of housekeeping genes - Single Locus Variants represent the evolution of
an ancestral clone
31Strain clustering based on gene content
strains
absent
present
dispensable genes
core genes
- The genetic content of the dispensable genome
does not correlate with serotype
32Distribution of rlrA positive Pneumococcus by
Serotype
- Classification of strains in serotypes does not
correlate with pilus presence - There are some serotypes that are always pilus-,
but no serotype that are always pilus
33Clonal Complexes
Each isolate has an allelic profile, i.e. a
combination of alleles in the 7 loci, which
defines an ST Two STs are connected when they
differ at single locus. A group of connected STs
forms a clonal complex
2
1
3
34MLST describes the micro-evolution patterns of
bacterial species
35Distribution of Pneumococcus isolates by Clonal
Complex
There is a good correlation between clonal
complex and pilus presence
36Distribution of rlrA islet Pneumococcus in
Clonal Complex 176
37Conclusions and open questions
- The reverse vaccinology approach is a powerful
tool to identify new candidates for vaccine
development - A single strain is not sufficient to exhaust the
variability of a bacterial species - The pan-genome of a bacterial species represents
its genetic repertoire - In many cases, a good level of protection can be
obtained only by a combination of antigens not
belonging to the core genome - Understanding of the population structure of a
bacterial species is key to the rational
formulation of a protective vaccine - What is the best way to describe population
structure of bacterial species? - Do genes exposed to the immune system behave
like neutral loci? - How important are the effects of recombination?
38Acknowledgements
Novartis vaccines Diagnostics Rino
Rappuoli Duccio Medini Vega Masignani Antonello
Covacci Bioinformatics and pangenome Guido
Grandi John Telford GBS Michéle Barocchi Ilaria
Ferlenghi Pneumo Mariagrazia Pizza MenB
The Institute of Genomic Research Claire
Fraser Hervé Tettelin Samuel Angiuoli Brett
Whitty MenB, GBS sequencing and pangenome
Harvard University Michael Cieslewicz,Michael
WesselsDennis Kasper GBS Karolinska
Institutet Normark S. Henriques-Normark
B. Pneumo
39Different model of species variability
Genomic diversity
genomic halo(strain-specific)
core
core
dispensable
dispensable
CLOSED Pan-genomespecies
OPEN Pan-genomespecies
40The Pan-genome / Dark-matter analogy
Less than thirty years ago astrophysicists
discovered that every galaxy core is surrounded
by a dark-matter halo. Now we know that the dark
matter accounts for more than 98 of the whole
matter in the universe, and that this galactic
halos are likely to be the composed by different
kinds of still unknown matter. This was the
biggest revolution in recent cosmology and
astrophysics. In the case of bacterial species
with an open pan-genome, today we observe the
presence of a small fraction of strain-specific
genes that increases the pan-genome size as long
as new independent strains are available. A kind
of genomic halo. Also, while some of these genes
are related to phages, we find that some are
probably not, brand new genes of unknown
function.
41More examples of Pangenome E.coli
A B Exp(-n/C)
A Bn(-C))
For larger number of strains, a power law is a
better description of data. The asymptotic value
of the law is smaller, but still different from
zero
42Localization of RrgA
43Pneumococcus pili
44Pilus distribution in S. pneumoniae determined by
CGH
45The Pasteur principles to develop vaccines
Isolate Inactivate Inject the causative
organism
46REVERSE VACCINOLOGY GROUP B STREPTOCOCCUS
Group B Streptococcus (GBS) Responsible for most
menigitis in new-born infants (0 2
months). Most of the infections occur during
delivery. Capsular polysaccharides elicit
protective antibodies which, if present in the
mother, are transferred to the baby through the
placenta and prevent infection. Problem
associated with vaccine development Nine major
serotypes (different capsule) so far classified
in GBS Objective of the project Identify few
conserved protein antigens capable of
eliciting protective (opsonic) antibodies
47Pneumococcus pili are important in virulence
48Gene arrangement in MenB MC58
49Genetic distance based on gene content
The genetic distance between strains of the same
serogroup is not different from the distance
between serogroups
50Reverse Vaccinology approach to GBS vaccine
- A serotype V GBS 2603 was sequenced
- 2175 ORFs predicted
- 665 proteins were predicted to be surface exposed
- 473/665 were predicted to contain less than 5
- transmembrane regions
- 356 proteins were successfully expressed and
purified - 296/356 were purified as soluble proteins
- Groups of 4 CD1 mice were immunized
- with all 356 purified proteins
51How to sample genetic variability Multi Locus
Sequence Typing (MLST)
- Genome variability is described by clonal
complexes, i.e. groups of isolates that
originated from a founder clone - Clonal complexes are defined by sequencing a
small number (typically 7) of housekeeping genes
52Open questions
53MenB MC58 Main features
- 2,272,325 bp
- GC content 51.3
- 2,158 predicted genes (covering 83 of the
genome) - 1158 assigned putative functions (54)
- Genes of unknown functions in MenB
- 245 (16) predicted genes matched gene products
of unknown function in other species - 532 (25) have no homology to any sequence
present in the database
54Future developments population driven reverse
vaccinology
If new vaccines are going to be formulated using
non-core antigens, we need to understand the
population structure of the pathogen, in order to
select representative antigens from populations
relevant for disease
Panmictic(N. gonorrhoeae)
Clonal(Salmonella)
Epidemic(N. meningitidis)
55Genetic markers of bacterial population structure
- Serotype
- Different serotypes are determined by variations
in a small number of genes in the capsule
biosynthesis locus - New serotypes arise by a large number of distinct
mechanisms - MLST type
- MLST type depends on the allelic profile of a
small number of housekeeping genes - Single Locus Variants represent the evolution of
an ancestral clone