Molecular biology in silico - PowerPoint PPT Presentation

About This Presentation
Title:

Molecular biology in silico

Description:

2% of all recent papers in biological journals. Essential component ... note='similar to EGAD:HI0736 percent identity: 38.5; identified by sequence. similarity; ... – PowerPoint PPT presentation

Number of Views:240
Avg rating:3.0/5.0
Slides: 57
Provided by: mikhail6
Category:

less

Transcript and Presenter's Notes

Title: Molecular biology in silico


1
Molecular biology in silico
  • Mikhail Gelfand
  • Research and Training Center Bioinformatics,
  • Institute for Information Transmission Problems,
    RAS
  • AlBio06, Moscow, July 2006

2
Propaganda
red papers (experiments)blue sequence fragments
3
Complete genomes
GOLD db.(III.2006)361 complete
genomesIncomplete (in the process) 952
bacteria 58 archaea 607 eukaryotes (incl.
ESTs) 46 metagenomes
4
More propaganda
  • Most genes will never be studied in experiment
  • Even in E.coli only 20-30 new genes per year
    (hundreds are still uncharacterized)
  • Bioinformatics molecular biology in silico
  • 2 of all recent papers in biological journals
  • Essential component of biological research
  • Make predictions about function and regulation of
    genes (many quite reliable!)
  • Metabolic reconstruction and prediction of
    phenotype given genome
  • Identify really interesting cases, fill gaps in
    knowledge
  • Universally missing genes not a single known
    gene even for 10 reactions of central
    metabolism. No genes for gt40 reactions overall
  • Conserved hypothetical genes (5-15 of any
    bacterial genome) essential, but unknown
    function

5
Haemophilus influenzae, 1995
6
Vibrio cholerae, 2000
7
How?Similarity to known proteins
  • Useful for many purposes (allows one to annotate
    50-75 genes in a bacterial genome)
  • Necessary first step
  • May be automated
  • to some extent
  • in particular, care is needed to avoid too
    specific predictions
  • Problem propagation of annotation errors
  • Boring (nothing new)

8
Noradrenaline transporter in an archaeon?
SOURCE Methanococcus jannaschii. ORGANISM
Methanococcus jannaschii Archaea
Euryarchaeota Methanococcales
Methanococcaceae Methanococcus.
FEATURES Location/Qualifiers
source 1..492
/organism"Methanococcus jannaschii"
/db_xref"taxon2190" Protein
1..492 /product"sodium-dependent
noradrenaline transporter" CDS
1..492
/gene"MJ1319"
/note"similar to EGADHI0736 percent identity
38.5 identified by sequence
similarity putative"
/coded_by"U6757271..1549"
/transl_table11
  • Now corrected Hypothetical sodium-dependent
    transporter MJ1319.

9
Similarity to hypothetical proteins somebody
elses errors
The correct annotation
10
Genes with curious functional assignments
  • C75604 Probable head morphogenesis protein,
    Deinococcus radiodurans
  • O05360 Automembrane protein H, Yersinia
    enterocolitica
  • Q8TID9 Benzodiazepine (valium) receptor TspO,
    Methanosarcina acetivorans
  • NP_069403 DR-beta chain MHC class II,
    Archaeoglobus fulgidus

11
Errors in experimental papers
SwissProt DEFINITION Hypothetical 43.6 kDa
protein. ACCESSION P48012 ... KEYWORDS
Hypothetical protein. SOURCE Debaryomyces
occidentalis ORGANISM Debaryomyces
occidentalis Eukaryota Fungi
Ascomycota Saccharomycotina Saccharomycetes
Saccharomycetales Saccharomycetaceae
Debaryomyces. CAUTION Was originally (Ref.1)
thought to be 3-isopropylmalate dehydrogenase
(LEU2). PIR DEFINITION
3-isopropylmalate dehydrogenase (EC 1.1.1.85)
- yeast(Schwanniomyces occidentalis). ACCE
SSION S55845 KEYWORDS oxidoreductase.
12
SwissProt entry DSDX_ECOLI
-!- CAUTION An ORF called dsdC was originally
(Ref.3) assigned to the wrong DNA strand and
thought to be a D-serine deaminase activator, it
was then resequenced by Ref.2 and still thought
to be "dsdC", but this time to function as a
D-serine permease. It is Ref.1 that showed that
dsdC is another gene and that this sequence
should be called dsdX. It should also be noted
that the C-terminal part of dsdX (from 338
onward) was also sequenced (Ref.6 and Ref.7) and
was thought to be a separate ORF (don't worry, we
also had difficulties understanding what
happened!).
13
Positional clustering
  • Genes that are located in immediate proximity
    tend to be involved in the same metabolic pathway
    or functional subsystem
  • mainly in prokaryotes, very weak in eukaryotes
  • caused by operon structure, but not only
  • horizontal transfer of loci containing several
    functionally linked operons
  • compartmentalisation of products in the cytoplasm
  • very weak evidence
  • stronger if observed in may unrelated genomes
  • May be measured
  • e.g. the STRING database/server (P.Bork, EMBL)
  • and other sources

14
STRING trpB positional clusters
15
Functionally dependent genes tend to cluster on
chromosomes in many different organisms
Vertical axis number of gene pairs with
association score exceeding a threshold. Control
same graph, random re-labeling of vertices
16
More genomes (stronger links) gt highly
significant clustering
17
Especially in linear pathways (right)
18
Fusions
  • If two (or more) proteins form a single
    multidomain protein in some organism, they all
    are likely to be tightly functionally related
  • Very useful for the analysis of eukaryotes
  • Sometimes useful for the analysis of prokaryotes

19
STRING trpB fusions
20
Phyletic patterns
  • Functionally linked genes tend to occur together
  • Enzymes with the same function (isozymes) have
    complementary phyletic profiles

21
STRING trpB co-occurrence (phyletic profiles)
22
Phyletic profiles in the Phe/Tyr pathway
shikimate kinase
23
Archaeal shikimate-kinase
Chorismate biosynthesis pathway (E. coli)
24
Arithmetics of phyletic patterns
Shikimate dehydrogenase (EC 1.1.1.25) AroE
COG0169 aompkzyqvdrlbcefghsnuj-i--
5-enolpyruvylshikimate 3-phosphate synthase (EC
2.5.1.19) AroA COG0128 aompkzyqvdrlbcefghsnuj-
i--
Chorismate synthase (EC 2.5.1.19)
AroC COG0082 aompkzyqvdrlbcefghsnuj-i--
25
Distribution of association scores (monotonic
for subunits, bimodal for isozymes)
26
E.g. transporters
  • Transporters of end products of metabolic
    pathways may substitute the entire pathway
  • Transporters of compounds for catabolic pathways
    co-occur with pathways
  • Transporters for intermediates substitute
    upstream parts of pathways

27
Example bioY
28
Other approaches to phyletic patterns
  • Gene signatures of lifestyles
  • e.g. thermophily DNA gyrase is the only gene
    specific to all hyperthermophiles (bacterial and
    archaeal)
  • see COGs
  • Regulators and signals

29
Example bioRgene black arrowcandidate
site red dot
30
Comparative analysis of regulation
  • Phylogenetic footprinting regulatory sites are
    more conserved than non-coding regions in general
    and are often seen as conserved islands in
    alignments of gene upstream regions
  • Consistency filtering regulons (sets of
    co-regulated genes) are conserved gt
  • true sites occur upstream of orthologous genes
  • false sites are scattered at random

31
Enzymes
  • Identification of a gap in a pathway (universal,
    taxon-specific, or in individual genomes)
  • Search for candidates assigned to the pathway by
    co-localization and co-regulation (in many
    genomes)
  • Prediction of general biochemical function from
    (distant) similarity and functional patterns
  • Tentative filling of the gap
  • Verification by analysis of phylogenetic
    patterns
  • Absence in genomes without this pathway
  • Complementary distribution with known enzymes for
    the same function

32
Transporters
  • Identification of candidates assigned to the
    pathway by co-localization and co-regulation (in
    many genomes)
  • Prediction of general function by analysis of
    transmembrane segments and similarity
  • Prediction of specificity by analysis of
    phylogenetic patterns
  • End product if present in genomes lacking this
    pathway (substituting the biosynthetic pathway
    for an essential compound)
  • Input metabolite if absent in genomes without the
    pathway (catabolic, also precursors in
    biosynthetic pathways)
  • Entry point in the middle if substituting an
    upper or side part of the pathway in some genomes

33
5 UTR regions of riboflavin genes from bacteria
34
Conserved secondary structure of the RFN-element
Capitals invariant (absolutely conserved)
positions. Lower case letters strongly
conserved positions. Dashes and stars
obligatory and facultative base pairs Degenerate
positions R A or G Y C or U
K G or U B not A V not U.
N any nucleotide. X any
nucleotide or deletion
35
RFN the mechanism of regulation
  • Transcription attenuation
  • Translation attenuation

36
Early observation an uncharacterized gene (ypaA)
with an upstream RFN element
37
Phylogenetic tree of RFN-elements (regulation of
riboflavin biosynthesis)
no riboflavin biosynthesis
duplications
no riboflavin biosynthesis
38
YpaA riboflavin (vitamin B2) transporter in
Gram-positive bacteria
  • 5 predicted transmembrane segments gt a
    transporter
  • Upstream RFN element (likely co-regulation with
    riboflavin genes) gt transport of riboflaving or
    a precursor
  • S. pyogenes, E. faecalis, Listeria sp. ypaA, no
    riboflavin pathway gt transport of riboflavin
  • Prediction YpaA is riboflavin transporter
    (Gelfand et al., 1999)
  • Validation
  • YpaA transports flavines (riboflavin, FMN, FAD)
    (by genetic analysis, Kreneva et al., 2000)
  • ypaA is regulated by riboflavin (by microarray
    expression study, Lee et al., 2001)
  • via attenuation of transcription (and to some
    extent inhibition of translaition) (Winkler et
    al., 2003)

39
A new family of nickel/cobalt transporters
  • No experimental data
  • No structural data
  • Specificity predicted by comparative genomics
  • and then validated in experiment
  • Mutational analysis under way

40
Conserved signal upstream of nrd genes
41
Identification of the candidate regulator by the
analysis of phyletic patterns
  • COG1327 the only COG with exactly the same
    phylogenetic pattern as the signal
  • large scale on the level of major taxa
  • small scale within major taxa
  • absent in small parasites among alpha- and
    gamma-proteobacteria
  • absent in Desulfovibrio spp. among
    delta-proteobacteria
  • absent in Nostoc sp. among cyanobacteria
  • absent in Oenococcus and Leuconostoc among
    Firmicutes
  • present only in Treponema denticola among four
    spirochetes

42
COG1327 Predicted transcriptional regulator,
consists of a Zn-ribbon and ATP-cone domains
regulator of the riboflavin pathway?
43
Additional evidence
  • sometimes clustered with nrd genes or with
    replication genes dnaB, dnaI, polA
  • candidate signals upstream of other
    replication-related genes
  • dNTP salvage
  • topoisomerase I, replication initiator dnaA,
    chromosome partitioning, DNA helicase II
  • experimental confirmation in Streptomyces
    (Borovok et al., 2004)

44
Multiple sites (nrd genes) FNR, DnaA, NrdR
45
Mode of regulation
  • Repressor (overlaps with promoters)
  • Co-operative binding
  • most sites occur in tandem (gt 90 cases)
  • the distance between the copies (centers of
    palindromes) equals an integer number of DNA
    turns
  • mainly (94) 30-33 bp, in 84 31-32 bp 3 turns
  • 21 bp (2 turns) in Vibrio spp.
  • 41-42 bp (4 turns) in some Firmicutes

46
Combined regulatory network for iron homeostasis
genes in a-proteobacteria.
Fe
Fe
- Fe
Fe
-
FeS status
of cell
FeS
- Fe
Fe
The connecting line denote regulatory
interactions, which the thickness reflecting the
frequency of the interaction in the analyzed
genomes. The suggested negative or positive mode
of operation is shown by dead-end and arrow-end
of the line.
47
Distribution of Irr, Fur/Mur, MntR, RirA,
and IscR regulons in a-proteobacteria
?' in RirA column denotes the absence of the
rirA gene in an unfinished genomic sequence and
the presence of candidate RirA-binding sites
upstream of the iron uptake genes.
48
Phylogenetic tree of the Fur family of
transcription factors in a-proteobacteria - I
Fur in g- and b- proteobacteria
Fur in e- proteobacteria
Fur in Firmicutes
in a-proteobacteria
Regulator of manganese uptake genes (sit, mntH)
in a-proteobacteria
Regulator of iron uptake and metabolism genes
a-proteobacteria
49
Erythrobacter litoralis
Caulobacter crescentus
Novosphingobium aromaticivorans
Zymomonas mobilis
Sequence logos for the identified Fur-binding
sites in the D group of a-proteobacteria
Sphinopyxis alaskensis
Oceanicaulis alexandrii
Rhodospirillum rubrum
Gluconobacter oxydans
Magnetospirillum magneticum
Parvularcula bermudensis -
Identified Mur-binding sites
Bacillus subtilis
The A, B, and C groups
Sequence logos for the known Fur-binding sites
in Escherichia coli and Bacillus subtilis
Mur
a
of - proteobacteria -
Escherichia coli
50
Phylogenetic tree of the Fur family of
transcription factors in a-proteobacteria - II
Fur in g- and b- proteobacteria
Fur in e- proteobacteria
Fur in Firmicutes
a-proteobacteria
Irr in a-proteo- bacteria regulator of
iron homeostasis
51
Sequence logos for the identified Irr binding
sites in a-proteobacteria.
(8 species) - Irr
The A group
The B group

(4 species) - Irr
The C group (12 species) - Irr
52
Phylogenetic tree of the Rrf2 family of
transcription factors in a-proteobacteria
Nitrite/NO-sensing regulator NsrR (Nitrosomonas
europeae, Escherichia coli)
Positional clustering of rrf2-like genes
with iron uptake and storage genes Fe-S cluster
synthesis operons genes involved in nitrosative
stress protection sulfate uptake/assimilation
genes thioredoxin reductase carboxymuconolactone
decarboxylase-family genes hmc cytochrome
operon
Iron repressor RirA (Rhizobium leguminosarum)
Cysteine metabolism repressor CymR (Bacillus
subtilis)
Cytochrome complex regulator Rrf2 (Desulfovibrio
vulgaris)
Iron-Sulfur cluster synthesis repressor
IscR (Escherichia coli)
proteins with the conserved C-X(6-9)-C(4-6)-C
motif within effector-responsive domain proteins
without a cysteine triad motif
53
Sequence logos for the identified RirA-binding
sites in a-proteobacteria
The A group - RirA
(8 species)
(12 species)
The C group - RirA
54
Distribution of the conserved members of the Fe-
and Mn-responsive regulons and the predicted
RirA, Fur/Mur, Irr, and DtxR binding sites in
a-proteobacteria
Genes Functions Iron uptake Iron storage FeS
synthesis
Iron usage Heme biosynthesis Regulatory
genes Manganese uptake
55
An attempt to reconstruct the history
56
Acknowledgements
  • Dmitry Rodionov (comparative genomics)
  • Andrei Mironov (software)
  • Alexei Vitreschak (riboswitches)
  • Slides
  • Michael Galperin (NCBI, Bethesda)
  • Andrei Osterman (Burnham Institute, San-Diego)
  • Collaboration
  • Thomas Eitinger (Humboldt University, Berlin)
    Co/Ni transporters
  • Andy Johnston (University of East Anglia) Fe in
    alphas
  • Funding
  • Howard Hughes Medical Institute
  • Russian Fund of Basic Research
  • RAS, program Molecular and Cellular Biology
  • INTAS
Write a Comment
User Comments (0)
About PowerShow.com