Title: Pathogenomics Project
1- Pathogenomics Project
- Cross-Domain Horizontal Gene Transfer Analysis
- Horizontal Gene Transfer Identifying
Pathogenicity Islands
2Pathogenomics
Goal Identify previously unrecognized mechanisms
of microbial pathogenicity using a combination of
informatics, evolutionary biology, microbiology
and genetics.
3- Explosion of data
- 26 of the 36 publicly available bacterial genome
sequences are for pathogens - Approximately 24,000 pathogen genes with no known
function! - 177 bacterial genome projects in progress
Data as of June, 2001
4Bacterial Pathogenicity
Processes of microbial pathogenicity at the
molecular level are still minimally
understood Pathogen proteins identified that
manipulate host cells by interacting with, or
mimicking, host proteins
5Yersinia Type III secretion system
6Approach
Idea Could we identify novel virulence factors
by identifying bacterial pathogen genes more
similar to host genes than you would expect based
on phylogeny?
7Approach
Search pathogen genes against databases.
Identify those with eukaryotic similarity.
Modify screening method /algorithm
Evolutionary significance. - Horizontal
transfer? Similar by chance?
- Prioritize for biological study.
- - Previously studied in the laboratory?
- Can UBC microbiologists study it?
- C. elegans homolog?
8Genome data for
Anthrax Necrotizing fasciitis Cat scratch
disease Paratyphoid/enteric fever Chancroid
Peptic ulcers and gastritis Chlamydia
Periodontal disease Cholera Plague Dental
caries Pneumonia Diarrhea (E. coli
etc.) Salmonellosis Diphtheria Scarlet
fever Epidemic typhus Shigellosis Mediterranean
fever Strep throat Gastroenteritis
Syphilis Gonorrhea Toxic shock
syndrome Legionnaires' disease Tuberculosis
Leprosy Tularemia Leptospirosis Typhoid
fever Listeriosis Urethritis Lyme disease
Urinary Tract Infections Meliodosis Whooping
cough Meningitis Hospital-acquired
infections
9Bacterial Pathogens
Chlamydophila psittaci Respiratory disease,
primarily in birds Mycoplasma mycoides
Contagious bovine pleuropneumonia Mycoplasma
hyopneumoniae Pneumonia in pigs Pasteurella
haemolytica Cattle shipping fever Pasteurella
multicoda Cattle septicemia, pig
rhinitis Ralstonia solanacearum Plant bacterial
wilt Xanthomonas citri Citrus canker Xylella
fastidiosa Pierces Disease - grapevines
Bacterial wilt
10Approach
Prioritized candidates
Study function of homolog in model host (C.
elegans)
Study function of gene in bacterium. Infection
of mutant in model host
Collaborations with others
C. elegans
DATABASE
World Research Community
11Interdisciplinary group
- Informatics/Bioinformatics
- BC Genome Sequence Centre
- Centre for Molecular Medicine and Therapeutics
- Evolutionary Theory
- Dept of Zoology
- Dept of Botany
- Canadian Institute for Advanced Research
Coordinator
- Pathogen Functions
- Dept. Microbiology
- Biotechnology Laboratory
- Dept. Medicine
- BC Centre for Disease Control
- Host Functions
- Dept. Medical Genetics
- C. elegans Reverse Genetics Facility
- Dept. Biological Sciences SFU
12Development of first database Sequence
similarity-based approach
- For each complete bacterial and eukaryote genome
BLASTP (and MSP Crunch) of all deduced proteins
against non-redundant SWALL database - Overlay NCBI taxonomy information ? form ACEDB
database - Query database for bacterial proteins whos top
scoring hit is eukaryotic (and eukaryotic
proteins whos top hit is bacterial) - Perform similar query, but filtering different
taxonomic groups from the analysis
13BAE-watch Database Bacterial proteins with
unusual similarity with Eukaryotic proteins
14Problem Proteins highly conserved in the three
domains of life
Top hit to a protein from another domain may
occur by chance. StepRatio score helps detect
these. Example Glucose-6-Phosphate Reductase
15Example of a case with a high StepRatio Enoyl
ACP reductase
16BAE-watch Database Bacterial proteins with
unusual similarity with Eukaryotic proteins
17Haemophilus influenzae Rd-KW20 proteins most
strongly matching eukaryotic proteins
18Brinkman et al. (2001) Bioinformatics.
17385-387.
PhyloBLAST a tool for analysis
19(No Transcript)
20Trends in this Sequence-based Analysis
- Identifies the strongest cases of lateral gene
transfer between bacteria and eukaryotes - Most common cross-domain horizontal transfers
- Bacteria Unicellular
Eukaryote - Identifies nuclear genes with potential organelle
origins - A control Method identifies all previously
reported Chlamydia trachomatis plant-like genes.
21First case Bacterium Eukaryote Lateral
Transfer
N-acetylneuraminate lyase (NanA) of the protozoan
Trichomonas vaginalis is 92-95 similar to NanA
of Pasteurellaceae bacteria.
de Koning et al. (2000) Mol. Biol. Evol.
171769-1773
22N-acetylneuraminate lyase role in pathogenicity?
- Pasteurellaceae
- Mucosal pathogens of the respiratory tract
- T. vaginalis
- Mucosal pathogen, causative agent of the STD
Trichomonas
23N-acetylneuraminate lyase (sialic acid lyase,
NanA)
Hydrolysis of glycosidic linkages of terminal
sialic residues in glycoproteins, glycolipids
Sialidase Free sialic acid
Transporter Free sialic acid
NanA N-acetyl-D-mannosamine pyruvate
Involved in sialic acid metabolism Role in
Bacteria Proposed to parasitize the mucous
membranes of animals for nutritional purposes
Role in Trichomonas ?
24Another case A Sensor Histidine Kinase for a
Two-component Regulation System
Signal Transduction Histidine kinases common in
bacteria Ser/Thr/Tyr kinases common in
eukaryotes However, a histidine kinase was
recently identified in fungi, including pathogens
Fusarium solani and Candida albicans How did
it get there?
Candida
25Streptomyces Histidine Kinase. The Missing Link?
Brinkman et al. (2001) Infection and Immunity. In
Press.
Pseudomonas aeruginosa PhoQ
Xanthomonas campestris RpfC
100
Vibrio cholerae TorS
100
Escherichia coli TorS
Escherichia coli RcsC
Candida albicans CaNIK1
39
100
Neurospora crassa NIK-1
100
Fungi
Fusarium solani FIK1
100
51
54
Fusarium solani FIK2
Streptomyces coelicolor SC4G10.06c
100
Streptomyces coelicolor SC7C7.03
virulence factor ?
Pseudomonas aeruginosa GacS
100
100
Pseudomonas fluorescens GacS / ApdA
100
Pseudomonas tolaasii RtpA / PheN
100
Pseudomonas syringae GacS / LemA
100
86
Pseudomonas viridiflava RepA
100
Azotobacter vinelandii GacS
Erwinia carotovora RpfA / ExpS
virulence factor
100
Escherichia coli BarA
100
Salmonella typhimurium BarA
0.1
26Plant-like genes in Chlamydia
- Chlamydiaceae Obligate intracellular pathogens
of humans - Proteins Unusually high number most similar to
plant proteins - Previous proposal Obtained genes from a
plant-like amoebal host? (a relative of
Chlamydiaceae infects Acanthamoeba)
27Plant-like genes in Chlamydia
28Plant-like genes in Chlamydia
29Chlamydiaceae share an ancestral relationship
with Cyanobacteria and Chloroplast
Pyrococcus furiosus (Archaea)
Thermotoga maritima
Aquifex pyrophilus
Bacillus subtilis
Chlamydophila pneumoniae
Chlamydiaceae
538
Chlamydophila psittaci
1000
704
Chlamydia muridarum
1000
Chlamydia trachomatis
1000
Chlamydomonas reinhardtii
530
Chloroplasts
Klebsormidium flaccidum
998
988
Zea mays
1000
Nicotiana tabacum
1000
Synechococcus PCC6301
349
Cyanobacteria
1000
Synechocystis PCC6803
1000
Microcystis viridis
Escherichia coli
Zea mays mitochondrion
764
Rickettsia prowazekii
986
868
Caulobacter crescentus
0.1
30Chlamydiaceae share an ancestral relationship
with Cyanobacteria and Chloroplast
S10
L23
L29
L22
L16
L14
L24
S14
L18
L30
L15
S19
S17
S3
S8
S5
L3
L4
L2
L5
L6
Escherichia
Bacillus
Thermatoga
Synechocystis
Chlamydia
Unique shared-derived characters unite
Chlamydiaceae and Synechocystis
31Chlamydiaceae plant-like genes reflect an
ancestral relationship with Cyanobacteria and
Chloroplast
- Chlamydia do not appear to be exchanging DNA with
their hosts - Existing knowledge of Cyanobacteria may stimulate
ideas about the function and control of
pathogenic Chlamydia?
Non-unique shared characters include a multistage
developmental lifecycle, storage of glucose
primarily as glycogen, and non-flagellar motility
32Expanding the Cross-Domain Analysis
- Identify cross-domain lateral gene transfer
between bacteria, archaea and eukaryotes - No obvious correlation seen with protein
functional classification - Most cases no obvious correlation seen between
organisms involved in potential lateral
transfer - Exceptions
- Unicellular eukaryotes
- Organelle-functioning proteins in Rickettsia,
Synechocystis, and Chlamydiaceae
33Horizontal Gene Transfer and Bacterial
Pathogenicity
Pathogenicity Islands Uro/Entero-pathogenic E.
coli Salmonella typhimurium Yersinia
spp. Helicobacter pylori Vibrio cholerae
Transposons ST enterotoxin genes in E.
coli Prophages Shiga-like toxins in
EHEC Diptheria toxin gene, Cholera
toxin Botulinum toxins Plasmids Shigella,
Salmonella, Yersinia
34Pathogenicity Islands
- Associated with
- Atypical GC
- tRNA sequences
- Transposases, Integrases and other mobility genes
- Flanking repeats
35IslandPath Identifying Pathogenicity Islands
Yellow circle high GC Pink circle
low GC tRNA gene lies between the two
dots rRNA gene lies between the two dots
Both tRNA and rRNA lie between the two dots
Dot is named a transposase Dot is named an
integrase
36 Neisseria meningitidis serogroup B strain MC58
Mean GC 51.37 STD DEV 7.57 GC SD
Location Strand Product 39.95 -1
1834676..1835113 virulence associated pro.
homolog 51.96 1835110..1835211 -
cryptic plasmid A-related 39.13 -1
1835357..1835701 hypothetical 40.00 -1
1836009..1836203 hypothetical 42.86 -1
1836558..1836788 hypothetical 34.74 -2
1837037..1837249 hypothetical 43.96
1837432..1838796 conserved hypothetical
40.83 -1 1839157..1839663 conserved
hypothetical 42.34 -1 1839826..1841079
conserved hypothetical 47.99
1841404..1843191 - put. hemolysin activ.
HecB 45.32 1843246..1843704 - put.
toxin-activating 37.14 -1 1843870..1844184 -
hypothetical 31.67 -2 1844196..1844495 -
hypothetical 37.57 -1 1844476..1845489 -
hypothetical 20.38 -2 1845558..1845974 -
hypothetical 45.69 1845978..1853522 -
hemagglutinin/hemolysin-rel. 51.35
1854101..1855066 transposase, IS30 family
37Variance of the Mean GC for all Genes in a
Genome Correlation with bacterias clonal nature
non-clonal
clonal
38Pathogenomics Project Future Developments
- Identify eukaryotic motifs and domains in
pathogen genes - Threader Detect proteins with similar tertiary
structure - Identify more motifs associated with
- Pathogenicity islands
- Virulence determinants
- Functional tests for new predicted virulence
factors - Expand analysis to include viral genomes
-
39Acknowledgements
- Jeff Blanchard (National Centre for Genome
Resources, New Mexico) - Olof Emanuelsson (Stockholm Bioinformatics
Center) - Genome Sequence Centre, BC Cancer Agency
40- Pathogenomics group
- Ann M. Rose, Yossef Av-Gay, David L. Baillie,
Fiona S. L. Brinkman, Robert Brunham, Artem
Cherkasov, Rachel C. Fernandez, B. Brett Finlay,
Hans Greberg, Robert E.W. Hancock, Steven J.
Jones, Patrick Keeling, Audrey de Koning, Don G.
Moerman, Sarah P. Otto, B. Francis Ouellette,
Nancy Price, Ivan Wan. - www.pathogenomics.bc.ca