Title: caBIG Overview
1Data Grid Services Design caGrid 0.5
Manav Kher. Ruowei Wu. Jijin Yan Ram Chilukuri
August, 2005
2Outline
- Data service architecture.
- Silver compliant data services.
- Silver data services and the grid.
- caBIG Data resources.
- caBIG XML query activity.
- Query language and perform document.
- Data service configuration files.
- Deployment diagram.
- Data services and Grid security
- Future actions.
3Grid Data Service Architecture
Diagram from OGSA-DAI
4Grid Data Service Architecture (Cont.)
Data Layer The data layer consists of caBIG
silver data resources (Server side). Business
logic layer This layer encapsulates the core
functionality. This includes Execution of
Perform documents which specify sequences of data
resource queries and updates and data
transformation and delivery operations.
Preparation of responses to client requests for
data resource query, update, transformation and
delivery activities. Responses include execution
status information and can also include data.
Responses are in the form of Response documents.
Data transformation and delivery management.
caBIG data resource and SDK activity. Presentatio
n layer Business logic layer interface This
interface communicates information between the
presentation and business logic layers. This
interface supports invocation of OGSA-DAI
functionality within the business logic layer in
a way that is independent of any Web or Grid
environment i.e. a way that is also suitable to
allow non-Web-enabled clients to access OGSA-DAI
functionality directly. Presentation layer This
layer encapsulates the functionality relating to
exposing OGSA-DAI to a Grid via Web- or
Grid-enabled interfaces. For each realization
there is associated WSDL and XML Schema
describing the Web- or Grid-enabled interfaces.
The following presentation layer interfaces are
supported OGSA-DAI OGSI-compliant services based
on the Globus Toolkit 3.2, OGSA-DAI
WS-RF-compliant services based on the Globus
Toolkit 4.0 and OGSA-DAI WS-I-compliant services
based on Apache Axis 1.2 Apache. Client
OGSA-DAI provides a Client Toolkit which provides
a higher-level of interaction with OGSA-DAI
services than that supported by exchanging
Perform and Response documents.
5caGrid Layers
Data Layer - caBIG Object Resource
Metadata and Semantic connector Layer caDSR and
EVS
Grid Layer GT3 and OGSA-DAI
6Silver compliant data services
Standard API
Client Generated
caDSR
CDE
Object Model Generated Server
EVS
Standard Vocabulary
Data
7Silver data resource in the caGrid infrastructure
Standard data grid interface
EVS
caBIG Gird Infrastructure
Query
Silver complaint data services
8caBIG Data Resource - caGrid OGSA-DAI extensions.
DataResourceMediator (from ogsa-dai)
- Reference Implementations extends the OGSA-DAI
DataResourceMediator abstract class
SDKDataResourceMediator
CaArrayDataResourceMediator
9Interacting with data resources
OGSA-DAI and caGrid extensions supports
interaction with data resources, and other data
manipulation operations, via a document-oriented
interface Activities - are the data resource
manipulation, data transformation and delivery
actions that a client wants an OGSA-DAI service
to perform. Activities are the basic building
block of Perform documents. Perform Documents -
are used by clients to specify to OGSA-DAI
services the data resource query and update, data
transformation and data delivery activities they
want executed. Response Documents - are used by
OGSA-DAI services to inform clients as to the
status of execution of their Perform documents
and, often, to also return data to a client.
10Activities
Activities include data resource manipulation,
data transformation and delivery actions that a
client wants an OGSA-DAI service to perform. Some
activities are data resource-specific (e.g.
relational or XML query activities), others (e.g.
delivery and data transformation) are
generic. Activities are the basic building block
of Perform documents. Activities are designed to
inter-operate. For example, the output of an
caGridQuery can be directed to a deliverToURL
activity thereby allowing data to be delivered to
third parties. To support inter-operation an
activity can have zero or more inputs and zero or
more outputs. These outputs can be given specific
names and are termed stream
11caBIG Activity - caGrid OGSA-DAI extensions.
Activity (from ogsa-dai)
- Extends from OGSA-DAI activity.
- Query language implementation which represents
the data source API. - When code is generate with SDK, no code required
to expose data service in the grid. - One query language regardless of the data source.
CaBIGXMLQueryActivity
SDKXMLQueryActivity
CaArrayXMLQueryActivity
12Perform Document
Perform documents are used by clients to specify
to OGSA-DAI services the data resource query and
update, data transformation and data delivery
activities they want executed. A Perform document
specifies an inter-connected collection of one or
more activities. Activities are connected by
ensuring that the output stream of one activity
is named as the input stream of another activity.
Any activity whose output stream(s) are not
referenced by another activity's input stream(s)
will have their output inserted into a Response
document
ltgridDataServicePerform xsischemaLocation"http/
/ogsadai.org.uk/namespaces/2003/07/gds/typesgt
ltdocumentationgtThis example demonstrates how to
parameterise an caBIOlt/documentationgt
ltcaBIGXMLQuery name"MyQueryTest1"gt
ltTarget name"gov.nih.nci.cabio.domain.Taxon"
path"gov.nih.nci.cabio.domain.Taxon"gt
ltObjects name"gov.nih.nci.cabio.domain.impl.Gene
"gt ltProperty name"id"
predicate"equal" value"2"/gt
lt/Objectsgt lt/Targetgt ltwebRowSetStream
name"myQueryOutput"/gt lt/caBIGXMLQuerygt
ltdeliverToGDT name"deliverQueryResults"gt
ltfromLocal from"myQueryOutput"/gt
lttoGDT streamId"otherServiceInput
mode"full"gthttp//localhost8080/www.Georgetown.e
dult/toGDTgt lt/deliverToGDTgt lt/gridDataServicePe
rformgt
13Sample XML Query Language
Description Run a caGrid query on a caBIG data
resource API.
ltcaBIGXMLQuery name"MyQueryTest7"gt ltTarget
name"gov.nih.nci.cabio.domain.Agent"gt
ltObjects name"gov.nih.nci.cabio.domain.impl.Targ
et"gt ltGroup LogicRelation"OR"gt
ltObjects
name"gov.nih.nci.cabio.domain.impl.Gene"gt
ltProperty name"id"
predicate"equal" value"2"/gt
lt/Objectsgt ltObjects
name"gov.nih.nci.cabio.domain.impl.Gene"gt
ltProperty name"symbol"
predicate"like" value"Nat"/gt
lt/Objectsgt lt/Groupgt
lt/Objectsgt lt/Targetgt lt/caBIGQuerygt
14Query language Specification
Element caBIGXMLQuery - This represent the
activity name. Attribute name - This gives a
name for a query. Currently there is no process
for the attribute. Therefore, the name can be
arbitrary. Element Target - The target object
of a searching query. Attribute name - Name of
the target object. It should be of full package
path. Attribute path - This gives an association
between the target object and the criteria
object. Different path may result in different
search results. Element Object - This is a
search criteria object. In NCICB object oriented
data model, searching criteria is object(s).
Attribute name - This is the name of the search
criteria object. It also should be of full
package path. Element Group - Under this
element, a list or collection of different
searching objects are composed. Attribute
LogicRelation - Currently, it only contains "OR"
and "AND". "OR" represents list and "AND"
represents collection.
15Response document Specification
- Element request (one or more) - the status of the
request (the status of execution of the Perform
document) - Attribute status (required) - the status of the
request. This takes one of the following values - PROCESSING - the request is still running.
- COMPLETED - the request has sucessfully
completed. - TERMINATED - the request has been terminated.
- ERROR - the request encountered a problem.
- Attribute cause (zero or one) - if the status is
an ERROR then this attribute will hold the name
of the activity that caused the error. - Element result (one or more) - the status of an
activity plus, depending upon the activity, any
results or other information. - Attribute name (required) - the name of the
activity. This corresponds to the value of the
name attribute of an activity specified within
the Perform document. - Attribute status (required) - the status of the
activity. This takes one of the following values - UNSTARTED - the activity has not yet been
started. - PROCESSING - the activity is still running.
- COMPLETED - the activity has sucessfully
completed. - ERROR - the activity encountered a problem.
- Zero or more XML elements containing the results
of an activity, if applicable., if applicable.
16Data resource configuration
- Tomcat / Axis
- Server-config.wsdd
- OGSA-DAI
- DataResourceConfigCABIG.xml
- ActivityConfigCABIG.xml
- Index service
- caGrid-SDE-registration.xml
- caGrid-SDE-config.xml
- Metadata
- caGrid-common-metadata.xml
- cadsr-metadata-extract.xml
17Server-config.wsdd
ltservice name"cagrid/caBIO" provider"Handler"
style"wrapped" use"literal"gt ltparameter
name"instance-dai.version" value"OGSI 5.0"/gt
ltparameter name"instance-schemaPath"
value"schema/ogsadai/gds/gds_service.wsdl"/gt
ltparameter name"ogsadai.gdsf.config.xml.file"
value"/caGrid05/jakarta-tomcat-5.0.30/webapps/o
gsa/WEB-INF/etc/_cagrid_caBIO/dataResourceConfigCA
BIG.xml"/gt ltparameter name"className"
value"gov.nih.nci.cagrid.data.stubs.CaGridDataSer
viceFactoryPortType"/gt ltparameter
name"ogsadai.gdsf.activity.xml.file
value"/caGrid05/jakarta-tomcat-5.0.30/webapps/ogs
a/WEB-INF/etc/_cagrid_caBIO/activityConfigCABIG.xm
l"/gt ltparameter name"operationProviders"
value"org.globus.ogsa.impl.base.providers.service
data.ServiceDataProviderManager
org.globus.ogsa.impl.core.registry.RegistryPublish
Provider org.globus.ogsa.impl.ogs
i.NotificationSourceProvider
org.globus.ogsa.impl.ogsi.FactoryProvider"/gt
ltparameter name"dai.version" value"OGSI 5.0"/gt
ltparameter name"baseClassName"
value"uk.org.ogsadai.service.gdsf.impl.GridDataSe
rviceFactory"/gt ltparameter name"instance-baseCl
assName" value"uk.org.ogsadai.service.gds.impl.Gr
idDataService"/gt ltparameter name"serviceConfig"
value"etc/_cagrid_caBIO/caGrid-SDE-config.xml"/gt
ltparameter name"allowedMethods" value""/gt
ltparameter name"instance-operationProviders"
value"org.globus.ogsa.impl.ogsi.NotificationSourc
eProvider"/gt ltparameter name"registrationConfig
" value"etc/_cagrid_caBIO/caGrid-SDE-registration
.xml"/gt ltparameter name"schemaPath"
value"schema/cagrid/cagdsf/caGridDataServiceFacto
ryPortType_service.wsdl"/gt ltparameter
name"instance-name" value"Data Service"/gt
ltparameter name"persistent" value"true"/gt
ltparameter name"instance-className"
value"uk.org.ogsadai.service.gds.GDSPortType"/gt
ltparameter name"activateOnStartup"
value"true"/gt ltparameter name"handlerClass"
value"org.globus.ogsa.handlers.RPCURIProvider"/gt
ltparameter name"factoryCallback"
value"uk.org.ogsadai.service.gdsf.impl.GridDataSe
rviceFactoryCallback"/gt ltparameter name"name"
value"caGrid Data Service Factory"/gt lt/servicegt
18ActivityConfigCABIG.xml
ltactivityConfiguration xmlns"http//ogsadai.o
rg.uk/namespaces/2004/05/gdsf/config"
xmlnsxsi"http//www.w3.org/2001/XMLSchema-instan
ce" xsischemaLocation"http//ogsadai.org.uk/
namespaces/2004/05/gdsf/config
http//localhost8080/ogsa/schema/cagrid/xsd/cabig
_activity_config.xsd"gt lt!-- Location of the
base perform document schema --gt
ltbasePerformDocumentSchema
location"http//localhost8080/ogsa/schema/cagrid
/types/grid_data_service_types.xsd"/gt
ltactivityMap schemaBase"http//localhost8080/ogs
a/schema/cagrid/xsd/activities/"gt lt!--
caGrid specific activities --gt
ltactivity name"caBIGXMLQuery"
implementation"gov.nih.nci.cagrid.activity.caBIOX
MLQueryActivity" schema"caBIG_xml_q
uery.xsd"gt ltdescriptiongt
caGrid XML query implementation.
lt/descriptiongt lt/activitygt lt!--
Delivery activities --gt lt/activityMapgt lt/activit
yConfigurationgt
19caGrid-SDE-registration.xml
lt?xml version"1.0" encoding"UTF-8"
?gt ltserviceConfiguration xmlnsogsi"http//www.gr
idforum.org/namespaces/2003/03/OGSI"
xmlnsaggr"http//www.globus.org/namespaces/2003/
09/data_aggregator" xmlnsxsd"http//www.w3.
org/2001/XMLSchema"gt ltregistrationsgt
ltregistration registry"http//cagrid-registry.nci
.nih.gov8080/ogsa/services/base/index/IndexServic
e" keepalive"true" lifetime"1200"
remove"true"gt ltaggrDataAggregationgt
ltogsiparamsgt
ltaggrAggregationSubscriptiongt
ltogsiserviceDataNamesgt
ltogsiname xmlnsdata"http//cagrid.nci.nih.gov/1
/caDSRMetadata"gt
datacaDSRMetadata
lt/ogsinamegt ltogsiname
xmlnscom"http//cagrid.nci.nih.gov/1/CommonServi
ceMetadata"gt
comCommonServiceMetadata
lt/ogsinamegt lt/ogsiserviceDataN
amesgt ltaggrlifetimegt60000lt/aggr
lifetimegt lt/aggrAggregationSubscr
iptiongt lt/ogsiparamsgt
lt/aggrDataAggregationgt lt/registrationgt
lt/registrationsgt lt/serviceConfigurationgt
20caGrid-SDE-config.xml
lt?xml version"1.0" encoding"UTF-8"
?gt ltserviceConfiguration xmlnsogsi"http//www.gr
idforum.org/namespaces/2003/03/OGSI"
xmlnsaggregator"http//www.globus.org/namespaces
/2003/09/data_aggregator"
xmlnsprovider-exec"http//www.globus.org/namespa
ces/2003/04/service_data_provider_execution"
xmlnsxsd"http//www.w3.org/2001/XMLSchema"gt
ltinstalledProvidersgt ltproviderEntry
class"org.globus.ogsa.impl.base.providers.service
data.impl.AsyncDocumentProvider" /gt
lt/installedProvidersgt ltexecutedProvidersgt
ltprovider-execServiceDataProviderExecutiongt
ltprovider-execserviceDataProviderNamegtAsyncDocum
entlt/provider-execserviceDataProviderNamegt
ltprovider-execserviceDataProviderImplgt
org.globus.ogsa.impl.base.providers.servicedata.im
pl.AsyncDocumentProvider lt/provider-execser
viceDataProviderImplgt ltprovider-execservice
DataProviderArgsgt -i 60000 -f
etc/_cagrid_caBIO/caGrid-common-metadata.xml
lt/provider-execserviceDataProviderArgsgt
ltprovider-execrefreshFrequencygt360lt/provider-exec
refreshFrequencygt ltprovider-execasyncgttrue
lt/provider-execasyncgt lt/provider-execService
DataProviderExecutiongt ltprovider-execServiceD
ataProviderExecutiongt ltprovider-execservice
DataProviderNamegtAsyncDocumentlt/provider-execserv
iceDataProviderNamegt ltprovider-execserviceD
ataProviderImplgt org.globus.ogsa.impl.bas
e.providers.servicedata.impl.AsyncDocumentProvider
lt/provider-execserviceDataProviderImplgt
ltprovider-execserviceDataProviderArgsgt
-i 60000 -f etc/_cagrid_caBIO/cadsr-metadata-ex
tract.xml lt/provider-execserviceDataProvid
erArgsgt ltprovider-execrefreshFrequencygt360lt
/provider-execrefreshFrequencygt
ltprovider-execasyncgttruelt/provider-execasyncgt
lt/provider-execServiceDataProviderExecutiongtlt/e
xecutedProvidersgt lt/serviceConfigurationgt
21caGrid-common-metadata.xml
ltCommonServiceMetadata xmlns"http//cagrid.nci.ni
h.gov/1/CommonServiceMetadata"gt
ltresearchCenterInfogt ltresearchCenterBioDat
aTypegtBiologylt/researchCenterBioDataTypegt
ltresearchCenterNamegtReserach Centerlt/researchCente
rNamegt ltresearchCenterTypegtXYZlt/researchCe
nterTypegt ltresearchCenterAddressgt6116
Exceutive Dr.lt/researchCenterAddressgt
ltresearchCenterPhonegt301-451-1234lt/researchCenterP
honegt ltresearchCenterFaxgt301-451-1234lt/res
earchCenterFaxgt ltresearchCenterPOCNamegtJon
h Brownlt/researchCenterPOCNamegt
ltresearchCenterDescriptiongtGoodlt/researchCenterDes
criptiongt ltresearchCenterCommentsgtTesting
caGridlt/researchCenterCommentsgt
lt/researchCenterInfogt lt/CommonServiceMetadatagt
22Semantic Metadata UML to caDSR Mapping
- UML Class is mapped to an Object Class
- Attribute of a UML Class is mapped to a Property
- Combination of UML Class and Attribute is mapped
to Data Element Concept - Combination of UML Class, Attribute and its
Datatype is mapped to Data Element (CDE) - UML Class and its attribute are based on EVS
concepts - UML Model/Project is mapped to Classification
Scheme - Packages are mapped to classification scheme
items
23Domain Model Metadata - caDSR
- Unique Identifier public ID and version
- Short Name Acronym of the project to which the
domain model belongs to. - Long Name Full name of the project
- Detailed description of the project
24Domain Object Metadata - caDSR
- Unique identifier consisting of public ID and
version - Package qualified domain object name, long name
and description based on EVS concepts - EVS concepts it is based on
- Concept Code, Concept Preferred Name, Concept
Definition
25Domain Object Attribute Metadata - caDSR
- Unique identifier consisting of public ID (CDE
ID) and version - Attribute Name, Long Name and Description based
on EVS concepts - EVS concept codes it is based on
- Value Domain information
- Datatype
- Permissible Values
- Concept codes
- List contained within each Domain Object
26Domain Object Association Metadata - caDSR
- Describes a named relationship (source -gt target)
between two Domain Objects - Uses references to domain object unique
identifier(public ID and version) instead of
value copy - List contained within each Domain Object
27caDSR-metadata-extract.xml
lt?xml version"1.0" encoding"UTF-8"?gt ltcaDSRMetad
ata xmlns"http//cagrid.nci.nih.gov/1/caDSRMetada
ta" xmlnsxsi"http//www.w3.org/2
001/XMLSchema-instance"
xsischemaLocation"http//cagrid.nci.nih.gov/1/ca
DSRMetadata
http//localhost/ogsa/schema/cagrid/types/Common/
caDSRMetadata.xsd"gt ltdomain-model id"2262164"
version"3.0"gt ltshort-namegtcaCORElt/short-namegt lt
long-namegtcaCORElt/long-namegt ltdescriptiongtcaCORE
Descriptionlt/descriptiongt ltdomain-object
id"2223329" version"1.0"gt ltfull-namegt ltpack
age-namegtgov.nih.nci.cabio.domainlt/package-namegt
ltclass-namegtDiseaseOntologyRelationshiplt/class-n
amegt lt/full-namegt ltlong-namegtDiseaseOntologyRe
lationshiplt/long-namegt ltshort-namegtC45371lt/short
-namegt ltconcept-codes-listgt ltconcept-element
order"0"gt ltconcept-codegtC45371lt/concept-codegt
ltconcept-preferred-namegt
DiseaseOntologyRelationship lt/concept-preferre
d-namegt ltconcept-definitiongt The
disease relationship specifies the relationship
among diseases. lt/concept-definitiongt lt/con
cept-elementgt lt/concept-codes-listgt ltattribute
s-listgt ltattribute id"2223846"
version"3.0"gt ltnamegtidlt/namegt ltlong-namegt
Identifierlt/long-namegt ltshort-namegtC25364lt/sho
rt-namegt ltconcept-codes-listgt ltconcept-el
ement order"0"gt
28Data service deployment Diagram
Research Center Grid Node
caGrid Service
Service metadata
Application server
Data Base
Data grid Service I.
Service Configuration files
. . .
caGrid Service
Service metadata
Application server
Data grid Service N.
Data Base
Service Configuration files
- Tomcat.
- Axis.
- Globus.
- OGSA-DAI
29Data service and security
GUMS (User Management)
CAMS (Attribute Management)
Index Service (Service Registry)
Data Service Client
30Data service and security
GUMS (User Management)
CAMS (Attribute Management)
Index Service (Service Registry)
caArray Data Service (Secure)
Data Service Client
31Data service and security
GUMS (User Management)
CAMS (Attribute Management)
User Login
Index Service (Service Registry)
caArray Data Service (Secure)
Data Service Client
32Data service and security
GUMS (User Management)
CAMS (Attribute Management)
User Login
Index Service (Service Registry)
caArray Data Service (Secure)
Retrieve Proxy Certificate
Data Service Client
33Data service and security
GUMS (User Management)
CAMS (Attribute Management)
User Login
Index Service (Service Registry)
caArray Data Service (Secure)
Discovery
Retrieve Proxy Certificate
Data Service Client
34Data service and security
GUMS (User Management)
CAMS (Attribute Management)
User Login
Index Service (Service Registry)
caArray Data Service (Secure)
Retrieve Proxy Certificate
Query (Secure)
Data Service Client
35Data service and security
GUMS (User Management)
CAMS (Attribute Management)
caArray username/password
User Login
Index Service (Service Registry)
caArray Data Service (Secure)
Retrieve Proxy Certificate
Query (Secure)
Data Service Client
36Data service and security
GUMS (User Management)
CAMS (Attribute Management)
User Login
Index Service (Service Registry)
caArray Data Service (Secure)
Retrieve Proxy Certificate
Response
Data Service Client
37Future actions
- Federated query engine.
- Implement process for schema management.
- Test performance and large datasets
- Extend query language to support writeable APIs.