The Genome Properties System - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

The Genome Properties System

Description:

facultative. microaerophilic. 1 oxygen requirement. 30. 72. 1 optimal growth ... Facultative anaerobic. Human pathogen: YES. Phenylalanine biosynthesis: ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 38
Provided by: sele8
Category:

less

Transcript and Presenter's Notes

Title: The Genome Properties System


1
The Genome Properties System
  • Applications in genome annotation and
    comparative genomics

Jeremy D. Selengut, Daniel H. Haft, Owen
White The Institute for Genomic Research,
Rockville, MD
2
(No Transcript)
3
What is a Genome Property?
  • An ATTRIBUTE of biological organisms that is
    rigorously defined such that assertion of its
    absence, presence, or quantitative extent can be
    made (either automatically or manually) in a
    self-consistent manner.
  • Biological processes including metabolic
    pathways, observable phenotypes, and quantitative
    measures of genomic content.

www.tigr.org/Genome_Properties
4
What is a Genome Property?
  • A standardized bioinformatic analysis applied
    over all sequenced genomes yielding discrete
    assertions with controlled vocabularies and
    linked to traceable evidence.

www.tigr.org/Genome_Properties
5
(No Transcript)
6
Quantitative measures
Quantitative data is calculated directly from the
genome sequence or the set of predicted genomic
features. Other examples of predicted
proteins avg. intergenic length amino acid
abundances
www.tigr.org/Genome_Properties
7
Phenotypic Data
WHITE Natural transformation YES
Phenotypic data must be manually curated from
literature citations and expert
curation Phenotypic data is therefore SPARSE
compared to data calculated from genomic content
www.tigr.org/Genome_Properties
8
Taxonomic data
Taxonomic information is included in the system
and is queryable to separate data by phylum,
class, order, etc. Actinobacteria (High-GC
Gram positive bacteria) Gamma proteobacteria
www.tigr.org/Genome_Properties
9
Component-based Properties
  • Metabolic pathways and other biological processes
    are carried out by combinations of genetically
    encoded COMPONENTS (ususally proteins).
  • Many components are both biologically required
    and broadly (and accurately) detectable by
    homology methods.
  • Detection of the complete set of these components
    implies the presence of the process.

10
Example Fucose catabolism
Chen YM, Zhu Y, Lin EC   The organization of
the fuc regulon specifying L-fucose dissimilation
in Escherichia coli K12 as determined by gene
cloning.   Mol Gen Genet 1987 Dec210(2)331-7.  
PMID3325779
11
Example Fucose catabolism
Transporter to import L-fucose
12
Example Fucose catabolism
Transporter to import L-fucose L-fucose to
fuculose by L-fucose ketol-isomerase
(5.3.1.25)
13
Example Fucose catabolism
Transporter to import L-fucose L-fucose to
fuculose by L-fucose ketol-isomerase
(5.3.1.25) then to L-fuculose-1P by
L-fuculokinase (2.7.1.51)
14
Example Fucose catabolism
Transporter to import L-fucose L-fucose to
fuculose by L-fucose ketol-isomerase
(5.3.1.25) then to L-fuculose-1P by
L-fuculokinase (2.7.1.51) then to
L-lactaldehyde and glycerone-P by L-fuculose-P
aldolase (4.1.2.17)
15
Example Fucose catabolism
Transporter to import L-fucose L-fucose to
fuculose by L-fucose ketol-isomerase
(5.3.1.25) then to L-fuculose-1P by
L-fuculokinase (2.7.1.51) then to
L-lactaldehyde and glycerone-P by L-fuculose-P
aldolase (4.1.2.17) Plus alcohol and/or
aldehyde dehydrogenases
16
Example Fucose catabolism
Transporter to import L-fucose L-fucose to
fuculose by L-fucose ketol-isomerase
(5.3.1.25) then to L-fuculose-1P by
L-fuculokinase (2.7.1.51) then to
L-lactaldehyde and glycerone-P by L-fuculose-P
aldolase (4.1.2.17) Plus alcohol and/or
aldehyde dehydrogenases Plus transcription
factors
17
Example Fucose catabolism
Transporter to import L-fucose L-fucose to
fuculose by L-fucose ketol-isomerase
(5.3.1.25) then to L-fuculose-1P by
L-fuculokinase (2.7.1.51) then to
L-lactaldehyde and glycerone-P by L-fuculose-P
aldolase (4.1.2.17) Plus alcohol and/or
aldehyde dehydrogenases Plus transcription
factors
18
Detection of components via Equivalog HMMs
  • Hidden Markov Models (HMMs) allow automated
    assignment of sequences to homology families.
    (TIGRFAMs, Pfam)
  • Equivalog-type HMMs are designed to detect only
    members of families having the same function.
    (1219 equivalogs in TIGRFAMs, 439 built into
    Genome Properties thus far)

19
Example Fucose catabolism
Transporter to import L-fucose L-fucose to
fuculose by L-fucose ketol-isomerase
(5.3.1.25) then to L-fuculose-1P by
L-fuculokinase (2.7.1.51) then to
L-lactaldehyde and glycerone-P by L-fuculose-P
aldolase (4.1.2.17) Specific equivalog
TIGR01086 L-fuculose phosphate aldolase Plus
alcohol and/or aldehyde dehydrogenases Plus
transcription factors
20
Detection of components via less specific
family-level HMMs
  • Homology families which include gene products
    with a range of substrate specificities.
  • These can detect proteins which may be involved
    in a particular process.
  • We can require proximity to known genes

21
Example Fucose catabolism
Transporter to import L-fucose L-fucose to
fuculose by L-fucose ketol-isomerase
(5.3.1.25) then to L-fuculose-1P by
L-fuculokinase (2.7.1.51) then to
L-lactaldehyde and glycerone-P by L-fuculose-P
aldolase (4.1.2.17) Specific equivalog
TIGR01086 L-fuculose phosphate aldolase Generic
family PF00596 Class II Aldolase Plus alcohol
and/or aldehyde dehydrogenases Plus
transcription factors
22
User Interface Query Page
www.tigr.org/Genome_Properties
23
(No Transcript)
24
(28)
(1)
(1)
(1)
25
(No Transcript)
26
Detection of components via regular expression
text searchescombined with operon prediction
  • Primary search terms (and exclusionary terms)

27
Detection of components via regular expression
text searchescombined with operon prediction
  • Primary search terms (and exclusionary terms)
  • YES osidase, anase, arabinase, dextrinase,
    osaminidase
  • NO capsul, export, cyanase, biosynthe,
    nucleosidase, tryptophanase

28
Detection of components via regular expression
text searchescombined with operon prediction
  • Primary search terms (and exclusionary terms)
  • Secondary (optional) search terms

29
Detection of components via regular expression
text searchescombined with operon prediction
  • Primary search terms (and exclusionary terms)
  • Secondary (optional) search terms
  • Selection rules
  • 2 or more primary matches
  • 5 or more primary secondary matches
  • Threshold percentage of genes in operon matching
    search terms

30
Genome Properties - Content
  • biological niche
  • animal pathogen
  • human pathogen
  • optimum salinity
  • optimal growth temperature
  • optimal pH
  • oxygen requirement
  • plant pathogen
  • temperature environment
  • cell surface component
  • capsule
  • flagella
  • outer membrane
  • peptidoglycan(murein) biosynthesis
  • S-layer
  • type IV pilus
  • cellular growth, organization and division cell
    shape
  • minCDE system
  • mreBCD system

protein transport Sec-system protein
translocase Tat (Sec-independent) protein
export type I secretion type II secretion
type III secretion type IV
secretion quantitative content amino acid
abundance count of DNA molecules count of
predicted proteins count of tRNAs DNA
dinucleotide thermophily metric RRYY-RY-YR DNA
GC content DNA size (megabases) functional
gene clustering - property level protein
average length selfish genetic elements
CRISPR region group I intron group II
intron inteins metabolism (subcategories) bio
synthesis catabolism central intermediary
metabolism energy metabolism nucleic acid
metabolism protein modification and
cofactors storage polymer systems
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
Data mining -- Clustering
35
Applications
  • Catalyst for the generation of new TIGRFAMs HMMs
  • Impovements in HMM quality
  • Greater richness of annotation
  • Quality control on gene calling
  • Gene-name independent annotations
  • Genome process-level summarization
  • Phylogenetic profiling

36
Acknowledgements
Lauren Brinkac
Tanja Davidsen
Dan Haft
Owen White
Nikhat Zafar
Funding U.S. Dept. of Energy
U.S. National Science Foundation
37
www.tigr.org/Genome_Properties
Species Helicobacter pylori (2 strains
sequenced) Phylum Proteobacteria Cell shape
Rod GC content (average) 39.0 Oxygen
requirement Microaerophilic Human pathogen
YES TCA cycle PARTIAL Flagella YES
Species Porphyromonas gingivalis (1 strain
sequenced) Phylum Bacteroidetes Cell shape
Rod GC content 48.3 Oxygen requirement
Anaerobic Human pathogen YES Selenocysteine NO
Outer membrane YES Flagella NO
Species Yersinia pestis (2 strains
sequenced) Phylum Proteobacteria Cell shape
Coccus GC content 47.6 Oxygen requirement
Aerobic Human pathogen YES Capsule
proteinaceous Polyketide Natural Products YES
(1) Glycine betaine biosynthesis YES Flagella
CRYPTIC
Species Shigella flexneri (2 strains sequenced)
Phylum Proteobacteria Cell shape Rod GC
content 50.9 Oxygen requirement Facultative
anaerobic Human pathogen YES Selenocysteine YES
Polyketide or NRPS Natural Products NO
Write a Comment
User Comments (0)
About PowerShow.com