caBIG Architecture Workspace - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

caBIG Architecture Workspace

Description:

caBIG Architecture Workspace – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 11
Provided by: scott136
Category:

less

Transcript and Presenter's Notes

Title: caBIG Architecture Workspace


1
caBIG Architecture Workspace
  • Common Query Language SIG
  • Scott Oster

2
Agenda
  • Introductions
  • Introduce who is here
  • Are there any caBIG representatives not present
    that should be involved in this SIG?
  • Summary of SIGs Goals Overview of what we are
    attempting to do and what is out of scope.
  • Use Cases an example of some, already
    identified, query use cases.
  • Requirements Details of existing architectural
    decisions which place some requirements on the
    query language.
  • Existing Landscape What languages and
    technologies are out there for starting points?
  • Standards
  • Implementations
  • Plan of attack How do we plan to achieve our
    goals, and what are the next steps.

3
Introductions
4
SIGs Goals
  • Come to a consensus on the following aspects of
    the query language which will be used to query
    caBIG grid data resources
  • Requirements
  • Properties
  • Details
  • Define the features and requirements of a query
    engine capable of performing the caBIG query use
    cases, and create/identify a query language which
    meets these criteria.
  • e.g. some engine features will create language
    requirements
  • Fully understand and document the ramifications
    of a query language choice and the other
    components of caBIG that will be affected.

5
Example Queries
  • Query I want to collect all microarray data
    (Affy only) available from all cancer centers
    from patients with bladder or ovarian cancer that
    were part of any clinical trial protocol using
    cisplatin within the past five years. In
    addition, I want to know all available tissue
    samples, cancerous and non-cancerous (normal)
    tissue localized within 10mm of tumor site from
    this patient group such that I can perform Affy
    gene expression studies to include with
    previously performed studies that were identified
    by the query. Finally, I need all severe adverse
    events for the group of patients identified that
    had a severity rating of 3-4 and are likely
    linked to cisplatin administration.
  • Query I want all solid tumors, specifically for
    lung cancer, that have a diagnosis based on tumor
    pathology. Each diagnosis must have an image of
    the tumor that allows for independent
    verification of diagnoses. Each record retrieved
    must also have either proteomics marker data or
    microarray data (Affy or two-color) included so
    that different molecular techniques can be
    correlated to the tumor pathology. In addition, I
    want all protein annotations for markers and
    genes associated with the proteomics and
    microarray data so I can perform meta-analyses.
  • Query I want to retrieve a dataset for all
    patients that have been in at least two clinical
    trials at any cancer center throughout the US
    that had two separate cancer diagnoses (not the
    same cancer diagnosed twice, but two different
    cancers). I also need a comprehensive treatment
    history for each patient. If treatment history is
    not complete, do not want patient included in
    dataset.

Courtesy of U Penn.
6
Existing Language Requirements
  • Expressible in XML
  • Consistent with result sets being XML
  • Capable of representing queries to object model
    views of data
  • Capable of representing complex joins
  • Data resource agnostic (should not have special
    dependencies on a particular underlying storage
    technology)

7
Existing Landscape
  • Existing Standards (and some not so standards)
  • XPath / XQuery / XQL (XML)
  • OQL / SQL (Relational Data)
  • RQL / RDQL / RDFQL (RDF)
  • caBIO prototype
  • Any others worth investigating?
  • Existing Technologies
  • DQP (OGSA-DAIs Distributed Query Processor
    (OQL))
  • Plenty of other related work, lets discuss whats
    relevant

8
Short Term Plan of Attack
  • White paper describing the concept of a common
    query language, the system impact a particular
    choice will have, and a description of the
    relationship between clients, query engines, the
    language, and the grid infrastructure.
  • Clearly define the scope of this SIG.
  • Preliminary white paper describing desired
    language features and corresponding requirements.
  • White paper identifying existing candidates, and
    a preliminary tradeoff analysis.
  • Development of a preliminary Strawman proposal,
    which we will iterate on.

9
(No Transcript)
10
Recap from Arch/VCDE Joint Face to Face Meeting
  • A single common language for clients, instead of
    multiple specialized languages.
  • A language based on object-oriented data source
    views
  • Need to look at how ability to query over
    semantic/ontology information will be integrated
    into the language
  • A separate set of constructs and language
    extensions may be needed
  • Separation of query interface from query
    execution engine
  • A well defined set of interfaces for clients to
    use.
  • A well defined set of interfaces for data source
    access
  • Middleware service to map queries defined in the
    common language to an execution plan.
  • Need to look at how to access analytical services
  • Joins will be important
  • Use of foreign keys, identifiers to support
    efficient execution of equality joins
  • Do not want to limit ourselves only to equi-joins
  • Technologies to start from
  • caBIO
  • XQuery/XPath
  • OQL
  • DQP
  • RDQL (for semantic information and ontologies)
Write a Comment
User Comments (0)
About PowerShow.com