A flexible graph-based controlled vocabulary engine - PowerPoint PPT Presentation

1 / 9
About This Presentation
Title:

A flexible graph-based controlled vocabulary engine

Description:

... will aid in the identification of disease gene candidates by ... A tree's nodes are associated with terms describing expression states in that tree's domain ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 10
Provided by: open70
Category:

less

Transcript and Presenter's Notes

Title: A flexible graph-based controlled vocabulary engine


1
A flexible graph-basedcontrolled vocabulary
engine
  • Johann Visagie
  • ltjohann_at_egenetics.comgt

2
Background
  • Implementation of a controlled vocabulary engine
  • Basis for a more complex profiling system that
    will aid in the identification of disease gene
    candidates by integrating
  • transcript information
  • standardised controlled vocabulary of expression
    terms
  • genomic sequence
  • genetic mapping information

3
Structure of the Vocabulary
  • Orthogonal set of hierarchical schemas (trees)
  • Each schema describes an expression domain, e.g.
  • Anatomical site, Pathology, Development Stage,
    Cell Type
  • A tree's nodes are associated with terms
    describing expression states in that tree's
    domain
  • Mapped 6937 cDNA libraries (incl. dbEST, SAGE),
    each with one or more nodes in as many trees as
    possible

4

5
Graph-based implementation
  • 2nd iteration
  • Python modules implementing hierarchical data
    structures, based on generalised graph library
  • Flexible enough for future experimentation
    (different data structures, multiple relationship
    types, etc.)
  • All operations in-memory
  • Overcomes most limitations of prior
    implementation
  • Forced unique terms, limited to pure trees, speed
    issues, database-centricity

6
Query language
  • Parser for a simplistic Boolean query language
  • pathologycancer AND (anatomyliver OR
    anatomystomach)
  • Implicit "query sets"
  • Tool for the power user
  • Each query term resolves to set of nodes in a
    tree (the node matching the term, and all its
    children), which maps to set of cDNA libraries
  • Note Multiple orthogonal classification domains
    allow for construction of almost arbitrary query
    resolution

7
Interfaces
  • Python API
  • Under development
  • SOAP v1.1
  • DAS v1.5 (under investigation)
  • wxPython-based GUI
  • Curation
  • Query interface for users

8
Application
  • Proved its worth in a number of SANBI research
    projects
  • Components of controlled vocabulary system are in
    use by a number of groups

9
Acknowledgements
  • Soraya Bardien-Kruger
  • Alan Christoffels
  • Tania Hide
  • Winston Hide
  • Paul Hüsler
  • Janet Kelso
  • Damian Smedley
Write a Comment
User Comments (0)
About PowerShow.com