Adapting an Existing Data Service to be caBIG - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Adapting an Existing Data Service to be caBIG

Description:

Create the UML model elements using the UML modeling tool. ... but are each specified and visible on separate property dialogs within Enterprise Architect. – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 2
Provided by: Hail4
Category:

less

Transcript and Presenter's Notes

Title: Adapting an Existing Data Service to be caBIG


1
Adapting an Existing Data Service to be caBIG
Silver-level Compliant Peter Hussey LabKey
Software, Inc, Seattle, WA USA Contact
peter_at_labkey.com
Abstract
Challenges in Adapting an Existing Application to
caCORE
caCORE SDK Development Process
The National Cancer Institutes caBIG initiative
aims for interoperability of bioinformatics
applications. caBIG envisions that this will be
achieved by encouraging all applications to
implement a standard programming interface and to
register their terms and data objects with a
centralized service. The required programming
interface is essentially defined in terms of the
behavior of applications built using the caCORE
Software Development Kit (SDK). The caCORE SDK is
designed and documented for building a new
application from scratch. Little is documented on
how one might achieve caBIG silver-level
compliance in an application not built with the
caCORE SDK. This poster describes the caCORE SDK
development and build process and how the LabKey
team changed it to work with their existing
proteomics platform software. The LabKey/CPAS
solution creates a parallel web application that
supports the caBIG programming interface and
accesses LabKey/CPAS data through a SQL View
layer.
  • There are three phases in the caCORE development
    paradigm
  • Create the UML model elements using the UML
    modeling tool. This is a painstaking task for any
    moderately complex real-world application. The
    application object model is essentially specified
    twice as a UML Class model and as a UML Data
    model. The Class model corresponds to the objects
    in the application that a developer will
    ultimately use to access the data service. The
    Data model describes the implementation of those
    classes in a relational database, In most cases
    there is a single SQL table that corresponds to a
    single Class object. The data objects are linked
    together through a set of specific relationships
    and attribute values that must all match exactly,
    but are each specified and visible on separate
    property dialogs within Enterprise Architect.
    (Note the 4.0 SDK has added a very useful
    validation step to the build process that should
    make it much easier to track down and fix
    inconsistencies and omissions in the UML models
    than what the LabKey/CPAS team experienced.)
    Figure 2 shows a small subset of the LabKey/CPAS
    UML model in a diagram that combines some the
    class elements and the data elements in a single
    diagram.
  • Register the classes and attributes of the UML
    model objects with NCIs Enterprise Vocabulary
    Services (EVS) and the Cancer Data Standards
    Repository (caDSR). The common data element
    identifiers resulting from this step are
    incorporated into the class model objects as
    additional tagged values.
  • Run the SDK build process, creating three runtime
    entities from the model (figure 2)
  • Most large-scale, team-built applications are not
    designed using an application generator approach.
    LabKey/CPAS is one such application. Yet
    LabKey/CPAS still needs to participate in the
    interoperability of caBIG. For these situations,
    the caCORE SDK can be used to generate a web
    application that runs in parallel to an existing
    application and exposes a caBIG silver-compliant
    programming interface over the data managed by
    the non-caCORE application. The main
    pre-requisite to this architecture is that the
    data to expose is held in a relational database.
    We also made the big simplification that the
    caCORE-generated web application would expose
    read-only interfaces, which is allowed and
    appropriate for caBIG compliance. Within this
    simplified target, we still encountered
    difficulties around the following
  • SQL schema implementation differences from
    caCORE. The caCORE SDK makes several assumptions
    regarding the database schema that may not be
    true for an existing application
  • A class in the object model to be exposed
    corresponds 1-to-1 with a table in the SQL Schema
  • The object identifier maps to a single integer
    primary key in the corresponding relational
    table.
  • A relationship between Class objects corresponds
    to a foreign key in the SQL tables
  • Security integration. An existing application
    will likely have some security implementation
    that logically should extend to the caBIG
    interface. The caCORE SDK, however, discusses
    only the implementation of security in a new
    application, not integration with an existing
    security model.

Introduction
In 2007, the LabKey/CPAS development team set out
to achieve caBIG silver level compliance for
the MS2 proteomics data managed by CPAS, our
application used by several large cancer center
clients. Achieving compliance proved difficult
because caBIG compliance for a data service is
defined in terms of the behavior of applications
built with the caCORE SDK. LabKey/CPAS was not
designed or built with any reference to the
caCORE SDK. The caBIG compliance guidelines
suggest that building an application with the
caCORE SDK was just one possible implementation
of silver compliance. We found, however, no
precise definition or test for what caBIG silver
compliance meant, in particular what queries a
silver-compliant service needed to support. Our
challenge became finding a way to incorporate the
caCORE runtime architecture into our existing
application with minimal impact on existing code.
The LabKey/CPAS Solution
LabKey/CPAS resolved these challenges through the
creation of a SQL View layer. In our solution,
the Data model defines a virtual schema
definition in a database schema named cabig. We
then created a set of SQL views with the same
names and same columns as the UML Data model. The
caCORE-generated web application interacts with
these views as if they were tables. The web
application cannot tell the difference. Under the
covers, the view layer passes through the queries
to the original base tables (managed by the
non-caCORE application), and fixes up the
differences along the way. We wrapped the cabig
view definition scripts into a new module of
LabKey/CPAS and included a small set of UI
changes that configures and tests caBIG access
for a given folder.
  • Database definition scripts, in the form of SQL
    CREATE TABLE commands
  • A web application that implements the UML Class
    model and can translate requests for objects into
    SQL commands.
  • A set of programming interface libraries that
    enable applications to query, insert, update and
    delete application objects over several different
    communication channels, including local Java
    applications and web service calls.

The caCORE Application Paradigm
The caCORE SDK is based on a software development
paradigm that starts with an abstract model of
the entities represented in a particular
application. Real-world examples of such entities
include identified peptides in an MS2 run or
microarray test results. Entities are usually
related to other entities in known ways. For
example a single MS2 run entity must have 1 or
more FASTA databases and may have 0 or one or
more identified peptides. Generally the
interesting entities in an application are those
stored in the database. There is often a close
correspondence between a row (record) in a SQL
table in the database used by an application and
an instance (single entity) of a class of similar
entities to be exposed by the application. The
caCORE SDK architecture is based largely on the
1-to-1 correspondence between an application
class and a SQL table.
  • The view layer solves the issues described above
  • Security Integration Since data access in
    LabKey/CPAS is granted on a folder-by-folder
    basis, we wanted to enable or disable caBIG
    access by folder. We added a single true/false
    caBIGPublished column to our existing
    core.Containers table. This bit is turned on and
    off by the Publish button accessible on a
    projects Permissions page. The corresponding
    Containers view in the cabig schema includes the
    restriction WHERE caBIGPublishedtrue. All of
    the other view definitions in the cabig schema
    include an inner join to the cabig.Containers
    view. As a result, the caBIG interface sees only
    data in those containers that have been
    published.
  • Data Model Compliance Most of the underlying
    CPAS tables have a single integer primary key,
    but a few had two-column integer keys. To meet
    the caCOREs requirement for a single column key,
    the SQL View definition includes a sum function
    SELECT ((4294967296 op.propertyid)op.o
    bjectid) AS id, ..As a second example, the
    PeptidesData table in CPAS is used to store score
    values from different search engines in
    generically-named ScoreX columns.. For caBIG,
    we chose to represent the scores for different
    engines as different objects (preserving the
    1-to-1 paradigm). We handled this difference in
    the view layer by creating a view per search
    engine, with the appropriate filter.

Search Application
Scriptapps
caCOREAPI
ClientAPI
Figure 2. The caCORE SDK Build process
caCORE Runtime Architecture
In a software application based on the caCORE
design, developers write web pages and
program-to-program applications using the API
generated by the SDK build process. The web
application handles both read and write access to
the underlying SQL database in order to support
the creation and management of application
objects.
LabKey/CPAS
caCORE web application
  • At the core of the generated caCORE web
    application is Hibernate, an open source
    middleware layer for mapping Java programming
    objects into SQL table objects and vice-versa.
    (Figure 3). The caCORE SDK build process
    translates the UML model into configuration files
    that allow Hibernate to construct complex queries
    by translating relationships between objects into
    SQL JOIN constructs. Hibernate allows
    programmers to issue database queries in a simple
    a Query By Example format. The use of Hibernate
    in the caCORE runtime yields several benefits
  • It avoids mixing SQL commands application code,
    common source of bugs in web database
    applications.
  • It is highly configurable, allowing the developer
    to tune the way Hibernate translates object
    access into SQL.
  • It supports a standardized Hibernate Query
    Language (HQL) that looks like SQL but works
    unchanged across all supported relational
    databases, allowing the developer to issue more
    complex queries than can be expressed via the
    standard QBE mechanism.
  • The caCORE SDK allows a developer or analyst to
    leverage application model knowledge into a
    working web database application that would
    otherwise be very difficult and expensive to
    build from scratch.

SQL database
cabig Views
Figure 4. The caCORE implementation for CPAS
Conclusion
Our efforts to adapt our existing comprehensive
proteomics application to achieve caBIG silver
compliance proved successful once we decided on
the basic approach of running the caCORE SDK
generated web application in parallel to
LabKey/CPAS. In our design, the SDK generated
application accesses the relational data through
a set of views that handle some of the tricky
mapping and security problems. The views also act
as a buffer between the underlying base tables
and the web application, allowing names to change
in one place without affecting the other. In the
future, it will be relatively easy for LabKey to
expand of the scope of our caBIG interface to
incorporate any data managed by LabKey Server. In
fact, putting the data into LabKey may well be
the fastest way for a developer to achieve silver
compliance for a data service, while at the same
time gaining many of the data analysis and
management features that are built-in to the
LabKey platform.
One of the design goals of the caCORE
architecture is to create an inter-operability
standard that is not tied to a single programming
language. So in the caCORE development paradigm,
the developer describes objects and their
relationships in Universal Modeling Language
(UML). UML is a high-level, primarily graphical
approach to defining a programming project. UML
is implemented by a number of tools including
Enterprise Architect and ArgoUML, the two tools
supported by the current caCORE SDK (version
4.0). UML modeling, however, is only partly
standardized. It is difficult, for example, to
transfer a model between tools without losing
information in the transfer.
Figure 3. The caCORE runtime architecture
Write a Comment
User Comments (0)
About PowerShow.com