Automated Annotation of Microbial Genomes, Opportunities and Pitfalls - PowerPoint PPT Presentation

About This Presentation
Title:

Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

Description:

Automated Annotation of Microbial Genomes, Opportunities and Pitfalls Margie Romine Pacific Northwest National Laboratory Richland, Washington – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0
Slides: 23
Provided by: MRo128
Category:

less

Transcript and Presenter's Notes

Title: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls


1
Automated Annotation of Microbial Genomes,
Opportunities and Pitfalls
  • Margie Romine
  • Pacific Northwest National Laboratory
  • Richland, Washington

2
Shewanella oneidensis MR-1
  • Breathes Mn Fe and other metals thereby
    changing their solubility
  • Also reduces radionuclides and hence impacts
    their mobility at contaminated sites
  • Genome sequenced by the Institute for Genome
    Research in 2002 (funded by DOE-OBER)
  • Can we now better determine how this organism
    interacts with metals and radionuclides?

3
Shewanella spp. Inhabit Many Niches
2 more were sequenced by DOEs Joint Genome
Institute and 14 more are under way!
  • Energy rich - fermentation is occurring and
    energy is continuously being deposited via
    sedimentation
  • Rapidly changing redox conditions/dominant
    electron acceptors
  • Microbial partners are present to remove the
    acetate produced via anaerobic respiration.

4
Bacterial Genome Sequencing Explodes
  • 341 completed genomes, 976 ongoing
  • Partial genome sequences released in just days
    now by JGI!
  • How do we use sequence information to understand
    how all these organisms function in the
    environment?
  • Annotation is the key, but is now largely
    automated and hence of lower quality

5
What is Annotation?
AGCTTAACTGGGATACGACGACCAGTAGACAGGTRTACGATGAGATATAT
AT
Locate genes
Translate to proteins
MASDLKKIYTRPRPDSAWQECVAALFDGHSKDKLACNDDL
Gather Evidence of function
Assign putative functions
6
Annotation Drives Post-genomic Research
Function predictions
Methodologies
Data
Interpretation
Gene predictions
DNA microarrays
mRNA expression
Metabolic modeling
ChiP-Chip
DNA binding sites
Protein predictions
Proteomics
Protein expression
Hypothesis
Targeted gene knock-outs
7
Annotation with Gnare/Puma2
  • Developed at Argonne National Laboratory by
    Natalia Maltsev, Mark DSouza, Elizabeth Glass,
    Dina Sulakhe, Mustafa Syed, Pavan Anumula
  • http//compbio.mcs.anl.gov/puma2/cgi-bin/index.cgi
  • Gnare Private genome sequences
  • Puma2 Public genome sequences

8
Types of Functional Descriptors
  • Hypothetical protein
  • Conserved hypothetical protein
  • Conserved domain protein
  • Function associated protein
  • Class specific enzyme
  • Specific function predicted
  • Function validated

9
Checking Functions Where No Domain Hit Occurs
type IV secretion outer membrane protein, PilW?
10
MKNCQKG
11
Clues in Interpro Domain Descriptor
This is a family of hypothetical proteins. A
number of the sequence records state they are
transmembrane proteins or putative permeases. It
is not clear what source suggested that these
proteins might be permeases and this information
should be treated with caution.
autoinducer-2 transport protein, TqsA
2.A.86 The Autoinducer-2
Exporter (AI-2E) Family The AI-2E family
(UPF0118) is a large family of prokaryotic
proteins derived from a variety of bacteria and
archaea. Those examined are about 350 residues in
length, and the couple that have been examined
exhibit 7 putative transmembrane a-helical
spanners (TMSs). E. coli, B. subtilis and several
other prokaryotes have multiple paralogues
encoded within their genomes. Herzberg et al.
(2006) have presented strong evidence for a role
of a AI-2E family homologue, YdgG (renamed TqsA),
as an exporter of the E. coli autoinducer-2
(AI-2) (Camilli and Bassler, 2006 Chen et al.,
2002). AI-2 is a proposed signalling molecule for
interspecies communication in bacteria. It is a
furanosyl borate diester (Chen et al., 2002).
12
(No Transcript)
13
Clusters with N-acetyl glucosame catabolic enzymes
Hypothesis experimentally validated!
14
(No Transcript)
15
Relevant abstracts mentioning your query species
(Shewanella oneidensis)
sulfite dehydrogenase catalytic molybdopterin
subunit, SorA
16
Mistake in Interpro Database found!
17
More Automation in Evidence Collecting Needed
18
Protein Location Linked to Function
19
Multiple Routes of Secretion
LepB
LepB
LspA
F E
G
PilD
GG
C39
20
Bioinformatics Tools for Localization Prediction
  • Incorrect start sites have strong impact on
    predictions!
  • Different tools have unique specialties
  • No one tool provides good predictions for all
    proteins

LepB
IM TM
Psort LipoP Predsi Phobius SignalP TatP
Sosui TmHMM Phobius Psort HMMTOP
LspA
b barrel
SubLoc Cello Psort Secretome
LipoP Lipo Psort
ProfTMB Bomp BBTM
21
Example c type cytochromes
  • Contain CXXCH motif for binding heme
  • so do some other proteins that are not c type
    cytochromes ?
  • All are secreted across the inner membrane and
    then assembled
  • 60 proteins in MR-1 have CXXCH
  • Only 43 have a leader peptide and are predicted
    to be c type cytochromes

22
Future Needs in Annotation Automation
  • Current methods of automated annotation will lead
    to propagation of annotation errors and burying
    of useful evidence
  • But manual annotation cannot keep up with rate at
    which sequences are produced
  • Additional automations are needed!
  • Protein localization
  • Specialty database mining (TCDB, merops, etc)
  • Experimental data mining appropriate databases
    dont exist
Write a Comment
User Comments (0)
About PowerShow.com