Title: An Ontological Approach for Describing Phospho-proteins in Rhodococcus
1An Ontological Approach for Describing
Phospho-proteins in Rhodococcus
- Dept. of Computer Science,
- University of British Columbia.
- Dennis Wang, Gavin Ha, Jennifer Chen, Nancy Wang
- CPSC 445. April 5th. 2007
2What is an ontology?
- Purpose
- knowledge representation reasoning
- Facilitates knowledge sharing and reuse
- Definition
- a data model that represents a set of concepts
within a domain and the relationships between
those concepts. - It is used to reason about the objects within
that domain. - Describe individuals (instances), classes
(concepts), attributes, relations and axioms - Uses
- AI, information architecture, semantic web,
software engineer
3Problems in biology
- Biology knowledge based
- use prior knowledge to infer new knowledge
- data rich
- Biologist needs extensive prior knowledge to
analyze data obtained - Pace of data production beyond ones ability to
acquire knowledge - Need an automated system to apply domain experts
knowledge to biological data
4Solution ontology bioinformatics
- Joint effort of biologist and computer scientist
- Build ontologies using domain knowledge
- Rapid classification of large datasets
- Allows query to find instances of a class
- Create controlled vocabularies for shared use
across different biological and medical domains. - In bioinformatics, ontology can make knowledge
available to community and its applications.
5Example Gene Ontology (GO)
- provides structured, controlled vocabularies
and classifications that cover several domains of
molecular biology - Uses
- annotation of large data sets
- the ability to group gene products to some high
level term - Computational (putative) assignments of molecular
function based on sequence similarity to
annotated genes or sequences.
6How are ontologies built?
- There is no standardized methodology
- But, efforts to make more comprehensive
guidelines - In general
- Informal Stage
- natural language
- Formal Stage
- formal knowledge representation language
7Ontology-building life cycle
- Inspired by software engineering.
- User Model (Biologist)
- 1) Identification of the
- purpose and scope
- of the ontology
- 2) Acquisition of
- domain knowledge
Identify purpose and scope
Knowledge Acquisition
8Ontology-building life cycle
- Conceptualization Model
- (Bioinformatician/Biologist)
- 3) Identifying key
- concepts in the domain.
- 4) Integration by using
- and incorporating other
- existing ontologies
Identify purpose and scope
Knowledge Acquisition
Building
Conceptualization
Integrating existing ontologies
9Ontology-building life cycle
- Implementation Model (Bioinformatician)
- 5) Representing concepts with a formal language
- 6) Documenting informal
- and formal definitions
- 7) Evaluation of the
- appropriateness of the
- ontology for its intended
- application
Identify purpose and scope
Available Development Tools
Knowledge Acquisition
Language Representation
Building
Conceptualization
Integrating existing ontologies
Encoding
Evaluation
10Describing Phospho-Protein using Phosphabase
Ontology
- Can we use the phosphabase ontology to describe
phospho-proteins discovered by the Rhodococcus
Genome Project?
11Web Ontology Language (OWL)
- XML syntax
- OWL-DL (Description Logic) Certain restrictions
to guarantee decidability based on description
logic - OWL uses Resource Description Framework (RDF)
- Subject Predicate Object
- Basic components in OWL
- classes
- Individuals
- properties
Individual Anne Condon
Individual Jennifer Chen
12Phosphobase Ontology
Wolstencroft et al, 2006
- Biological Motivation
- Driven by protein domain architecture to describe
signalling protein families - Background knowledge required for construction
- Signal protein domains
- Presence of protein domains within signal
proteins - OWL Ontology
- Ontology uses OWL-DL
- Description-logic can be applied to classify
proteins using reasoners - Many different ways to represent this knowledge
in OWL
13Phosphabase.owl
14OWL DL Reasoners Pellet
- Input
- Ontology OWL-DL format
- axioms about classes into TBox
- type and property assertions (individuals) into
ABox - Query - RDQL (SPARQL) format
- Instance data (individuals)
- Tableau Reasoner
- Checks satisfiability of an ABox with respect to
a TBox - Test for knowledge base consistency
Parsia and Sirin, ISWC 2004
15Instance Data
Locus ID RHA1_ro01186 RHA1_ro01186 Acknowledgements for this annotation
Strain Rhodococcus sp. RHA1NBCI Taxonomy Database Replicon ChromosomeRefseq NC_008268
Start 1260414 Stop 1260866
Gene Name  Alternate gene name(s) Â
Protein / Product Name protein-tyrosine-phosphatase Alternate product name(s) Â
Refseq GI Number 111018199 Â Category Protein Â
Localization Cytoplasmic (Class 3) Â Transposon Mutant Available? No transposon mutant available yet
COG predictions Wzb, Protein-tyrosine-phosphatase Signal transduction mechanisms. Â Wzb, Protein-tyrosine-phosphatase Signal transduction mechanisms. Â Wzb, Protein-tyrosine-phosphatase Signal transduction mechanisms. Â
PseudoCAPEC Number 3.1.3.48Â Â COG0394Â
Comments
PFAM predictions PF01451 LMWPc, Low molecular weight phosphotyrosine protein phosphatase.. PF01451 LMWPc, Low molecular weight phosphotyrosine protein phosphatase.. PF01451 LMWPc, Low molecular weight phosphotyrosine protein phosphatase..Â
go_function protein tyrosine phosphatase activity goid 0004725 Â
16Query Result
17Instance Data
Locus ID RHA1_ro05453 RHA1_ro05453 Acknowledgements for this annotation
Strain Rhodococcus sp. RHA1NBCI Taxonomy Database Replicon ChromosomeRefseq NC_008268
Start 5845588 Stop 5847288
Gene Name  Alternate gene name(s) Â
Protein / Product Name probable protein-tyrosine kinase Alternate product name(s) Â
Refseq GI Number 111022419 Â Category Protein Â
Localization Cytoplasmic Membrane (Class 3) Â Transposon Mutant Available? No transposon mutant available yet
COG predictions Mrp, ATPases involved in chromosome partitioning Cell division and chromosome partitioning. Â Mrp, ATPases involved in chromosome partitioning Cell division and chromosome partitioning. Â Mrp, ATPases involved in chromosome partitioning Cell division and chromosome partitioning. Â
PseudoCAPEC Number 2.7.10.1Â Â COG0489Â
TIGRFAM predictions TIGRFAM Accession TIGR01007TIGRFAM name and function eps_fam - capsular exopolysaccharide family (6.7e-46)TIGRFAM EC Number Role Transport and binding proteins  Sub Role Carbohydrates, organic alcohols, and acidsTIGRFAM to Gene Ontology Mappings TIGRFAM Accession TIGR01007TIGRFAM name and function eps_fam - capsular exopolysaccharide family (6.7e-46)TIGRFAM EC Number Role Transport and binding proteins  Sub Role Carbohydrates, organic alcohols, and acidsTIGRFAM to Gene Ontology Mappings TIGRFAM Accession TIGR01007TIGRFAM name and function eps_fam - capsular exopolysaccharide family (6.7e-46)TIGRFAM EC Number Role Transport and binding proteins  Sub Role Carbohydrates, organic alcohols, and acidsTIGRFAM to Gene Ontology Mappings
Comments
PFAM predictions PF02706 Wzz, Chain length determinant protein. This family includes proteins involved in lipopolysaccharide (lps) biosynthesis. This family comprises the whole length of chain length determinant protein (or wzz protein) that confers a modal distribution of chain length on the O-antigen component of lps. This region is also found as part of bacterial tyrosine kinases.. PF02706 Wzz, Chain length determinant protein. This family includes proteins involved in lipopolysaccharide (lps) biosynthesis. This family comprises the whole length of chain length determinant protein (or wzz protein) that confers a modal distribution of chain length on the O-antigen component of lps. This region is also found as part of bacterial tyrosine kinases.. PF02706 Wzz, Chain length determinant protein. This family includes proteins involved in lipopolysaccharide (lps) biosynthesis. This family comprises the whole length of chain length determinant protein (or wzz protein) that confers a modal distribution of chain length on the O-antigen component of lps. This region is also found as part of bacterial tyrosine kinases..Â
go_component signal recognition particle (sensu Eukaryota) goid 0005786 Â
18Query Result
19Instance Data
Locus ID RHA1_ro05554 RHA1_ro05554 Acknowledgements for this annotation
Strain Rhodococcus sp. RHA1NBCI Taxonomy Database Replicon ChromosomeRefseq NC_008268
Start 5971327 Stop 5972865
Gene Name  Alternate gene name(s) Â
Protein / Product Name probable alkaline phosphatase Alternate product name(s) Â
Refseq GI Number 111022520 Â Category Protein Â
Localization Unknown (This protein may have multiple localization sites) (Class 3) Â Transposon Mutant Available? No transposon mutant available yet
COG predictions PhoD, Phosphodiesterase/alkaline phosphatase D Inorganic ion transport and metabolism. PhoD, Phosphodiesterase/alkaline phosphatase D Inorganic ion transport and metabolism. PhoD, Phosphodiesterase/alkaline phosphatase D Inorganic ion transport and metabolism.
TIGRFAM predictions TIGRFAM to Gene Ontology Mappings COG3540Â
Comments
PFAM predictions PF00245 Alk_phosphatase, Alkaline phosphatase. PF00245 Alk_phosphatase, Alkaline phosphatase. PF00245 Alk_phosphatase, Alkaline phosphatase.Â
No Result
go_component organelle inner membrane goid 0019866 Â
20Conclusions
- Ontologies can be used as a standard model for
the exchange of biological information - Building ontologies can get very complicated
- Biologists with little description logic training
- Computer scientist with little knowledge of
biology - Need more bioinformaticians
- Ontologies can facilitate automated annotation of
genes / gene products - Difficult to Read and Infer from Ontologies
- Ontologies can get very big (Phosphabase only
small example) - Reasoners are sometimes slow and inaccurate
www.quicklybored.com
21Acknowledgements
- Rhodococcus sp. RHA1 data
- Eltis Lab Dr. Lindsay Eltis, Dept. Microbiology
Biochemistry - Phosphabase Ontologoy
- Wolstencroft Lab, University of Manchester, UK
- Bioinformatics paper Wolstencroft et al, 2006
- Phosphabase Ontology processing
- Benjamin Good, iCAPTURE Centre, Vancouver