Title: caBIG Overview
1caGrid 0.5
caGrid Team Mission Define the caBIG system
architecture that satisfies the requirements of
the caBIG Community
August, 2005
2caGrid
Grid-Enabled Client
Analytical Service
Tool 1
Tool 2
Research Center
NCICB
Grid Data Service
Tool 3
Tool 4
Grid Portal
Research Center
3Architectural Considerations
- Requirements
- Support scientific requirements Use cases from
cancer research community - Support functional requirements identifiers,
workflow, query, etc - Support non-functional requirements security,
reliability, performance, open source, etc - Principles
- Driven by cancer research community requirements
- Focus on solving a business problem, not a
technology problem - Services-Oriented Architecture
- Metadata driven and implements Virtualization
- Expose objects, not backend databases (like
RDBMS) - Standards, compatibility, and community
acceptance - OGSA / OGSI
4Architectural Considerations
- Characteristics
- caGrid presents an Object-Oriented view of data
- Data types are well-defined and registered in a
repository - Defined by XSD and ISO/IEC 11179
- Described by UML, and semantic Ontologies
- Formal harmonization and curation process
- Standardized metadata facilitates discovery
- Leverage existing technologies
- caDSR, EVS, Mobius GME Common data elements,
controlled vocabularies, schema management - Globus Toolkit (currently version 3.2.1)
- Core grid services infrastructure
- Service deployment, service registry, invocation,
secure communication - OGSA-DAI (currently version 5.0)
- Core support for data services
5caGrid 0.5 Architecture
Functions
Quality of Service
Business Process
Semantic service
ID Resolution
GUMS
Analytical
UI
Security
Resource Management
caDSR
Service Registry
Service
GSI
OGSA-DAI
GT3
GME
Index
Service Description
caDSR
Grid Communication Protocol
GLOBUS Toolkit
CAMS
GT3
Transport
EVS
GT3
6Grid View
caBIO
Other caBIG DataResource
caARRAY
rProteomics
Other caBIG Analysis tool
- Data source exposed as objects
- Well-defined objects using caDSR / EVS
- Mobius GME for schemas
- Metadata identifies services, objects exposed,
relationships between objects, relationships
between services - Standard Grid interfaces
- Standard query language and interface
- Advertisement and Discovery
- Security
- Invocation / Schedule
- Execution / coordination
Resource API
caBIG Dataresource
GRAM
Security
Identifiers
OGSA-DAI
caBIG Analytical Service
caDSR EVS
Query
Invocation
Globus
Registry
Grid client API
GUI
Admin
7caGrid
caGrid Toolkit/Infrastructure
8caGrid Metadata and Data Description
- Client and service APIs are object oriented, and
operate over well-defined and curated data types - Objects are defined in UML and converted into
Administered Components, which are in turn
registered in the Cancer Data Standards
Repository (caDSR) - Object definitions draw from vocabulary
registered in the Enterprise Vocabulary Services
(EVS), and their relationships are thus
semantically described - XML serialization of objects adhere to XML
schemas registered in the Global Model Exchange
(GME) - All data in caGrid travel between services and
between client and services as XML documents that
conform to well-defined schemas stored in GME
9caGrid 0.5 Services Metadata and Registry
- Metadata and Registry Services
- Support for Advertisement and Discovery processes
- Metadata and registry services maintain metadata
associated with data and analytical services - All services register information to an Index
Service - Services can be discovered using semantics of
their data types - Three types of Service Metadata
- Common Metadata describes generic information
about service providing Cancer Center - Data Service Metadata describes the data exposed
using terminology and objects from caDSR/EVS - Analytical Service Metadata describes the
supported operations and their inputs and outputs
using terminology and objects from caDSR/EVS
10caGrid 0.5 Services Data and Analytical Services
- Data Services
- Data services present an object view of data
sources - Objects exposed as data services will comply with
common data elements registered in the caDSR/EVS - Data Services leverage OGSA-DAI 5.0
- Currently Query only (no update, insert, or
delete) - Analytical Services
- Analytical Services are base OGSI services
- Required to be strongly-typed with respect to
input and output - Analytical services input and output objects
conforming to registered classes in caDSR - Graphical tool to automatically create source
code, configuration files, and build process for
new analytical services - Input and output parameters can be discovered
from GME
11caGrid 0.5 Services -- Query
- Query services
- Federated and semantic queries
- Once the data sources are identified. The
researcher can submit queries to data services
using the web and windows based GUIs - The researcher specifies the query in a standard
way regardless of the data source. The syntax of
the query is represented in XML - Metadata extracted from caDSR provides
information regarding objects exposed - Result sets can be transformed and redirected
anywhere in the grid.Developers can use the API
to implement applications - Currently using a custom query language
implemented as an activity - Queries and Results are contained in OGSA-DAI
Activities, Perform, and Response Documents
12caGrid 0.5 Services -- Security
- Secure Communication
- Authentication - Parties involved can be assured
of one another identity - Message Integrity Message sent by either party
is guaranteed to same message when it is
received. - Privacy Communication between the two parties
can only be interpreted by the two parties - Single Sign On
- Users and Grid Services should have one method of
authenticating themselves to the grid, all
services in the grid should accept this method - Access Control on caBIG Services
- caBIG services determine which users or services
may access them - User/Organizational Attribute Management
- Services should have a method for determining the
attributes of a requesting party. Such
attributes may be needed to service the request,
for example a username and password is needed to
perform a query on a relational database on the
partys behalf. - Attributes should be standardized such that they
may be used across institutional and application
boundaries - Delegation
- caBIG services can interact with other caBIG
services on a users behalf - User/Organization Management
13caGrid 0.5 Services -- Security
- Core Components
- Globus Security Infrastructure (GSI)
- Core security infrastructure
- Grid User Management Service (GUMS)
- Grid Service for the management and creation of
grid users and grid user credentials. - Attribute Management Service (CAMS)
- Grid Service for the management of user/virtual
organization attributes - Authorization Manager
- A general interface in which a caBIG service
calls to determine if a user is authorized to
perform operation X on resource Y - Can be used to integrate grid security with
external authentication/authorization systems - External Components
- Local Authentication/Authorization Systems
- General Authorization Systems (e.g., PERMIS)
- Grid Authorization Services
14caGrid 0.5 Services
- Portal, GUIs, and Client API
- Web based UI
- Graphical User Interfaces
- Programmatic access to the grid
15Deployment and Advertising
caGrid provides an easy way to expose data
services in the grid. When the api is generated
with the caCORE SDK, no code is required to
expose the new data service. The researcher
specifies the index service (virtual
organization) where the service will be
registered.
16Discovery
This enables researchers to find service
providers in the grid. caGrid 0.5 provides web
and windows based discovery applications. The
same functionality can be performed using the API
17Analytical Service Creation Tool
- Developer defines the operations of the service
and just has to focus on the implementation of
them - Input and output parameters can be discovered
from GME - Schema types can be automatically downloaded and
configured as operation parameters - Specified types are used to create necessary Java
Objects using Globus behind the scenes
18Test bed Infrastructure
19Acknowledgements caGrid Development team
- SAIC
- William Sanchez
- Tara Akhavan
- Manav Kher
- Rouwei Wu
- Jijin Yan
- OSU
- Scott Oster
- Shannon Hastings
- Steve Langella
- Tahsin Kurc
- Joel Saltz
- Panther
- Brian Gilman
- Nick Encina
- Oracle
- Ram Chilukuri
- TerpSys
- Gavin Brennan
- Troy Smith
- Wei Lu
- Doug Kanoza
- BAH
- Arumani Manisundaram
- Mike Keller
- Brian Davis
- NCICB
- Peter Covitz
- Avinash Shanbhag
- George Komatsoulis
- Denise Warzel
- Frank Hartel
20Acknowledgements Reference Implementations
- Georgetown PIR
- Baris Suzek
- Scott Shung
- Georgetown - caArray
- Colin Freas
- Nick Marcou
- Arnie Miles
- DUKE - rProteomics
- Patrick McConnell
- UPMC - caTIES
- Rebecca Crawley
- Kevin Mitchell
- SAIC
- John Moy caArray
- Sumeet Muju caArray
- Juergen Lorenz caArray
- Andrew Shinohara Test
- Mike Connelly - caBIO
- Jennifer Zeng caBIO
- Nafis Zebarjadi - SDK
21End of Talk