Title: Austrian Grid OGSADAI Tutorial
1Austrian Grid OGSA-DAI Tutorial
- Alexander Wöhrer
- Institute of Scientific Computing
- University of Vienna
- woehrer_at_par.univie.ac.at
- based on the
- OGSA-DAI Tutorial
- given at GGF13_at_Seoul
2Agenda
- Theoretical Part about OGSA-DAI
- Overview
- Architecture
- Engine
- Activities
- Data Resource Configuration
- Lunch break (1h)
- Practical Part
3OGSA-DAI Facts
- Current Release 5
- First release 2001
- De-facto standard for database access on the Grid
- Support by OGSA-DAI Team
- previously by UK Grid Support Centre
- Contribute to standardisation efforts
- Input into Global Grid Forum DAIS Working Group
and other groups - Provide a reference implementation of DAIS spec
- Included in OMII (provide UK e-Science Grid
Infrastructure Bundle) - Partners all over UK, including IBM and Oracle
- Funded by UK e-Science Grid Core program
- Roadmap availiable up to Release 7
- http//www.ogsadai.org.uk/docs/OtherDocs/OGSA-DAI
RoadmapV2.0.pdf
4OGSA-DAI Overview
- Terminology
- WS Architecture ? OGSA DAI Architecture
- OGSA-DAI Services
- GDS
- GDSF
- DAISGR
- Supported Data resources
- High level design
5OGSA-DAI Overview
- Terminology Data
- Data resource
- any object data can sink/source data
- currently databases in scope
- Data service
- common interface to to a data resource
- exposes capabilities of data resource
- may provide additional capabilities
- OGSA-DAI
- Open Grid Service Architecture Data Access and
Integration - Reference implementation of Grid Data Service
6OGSA-DAI Overview
OGSA-DAI Service Architecture
Web Service Architecture
7OGSA-DAI Services
- Three main services
- DAISGR (registry) for discovery
- GDSF (factory) to represent a data resource
- GDS (data service) to access a data resource
8GDSF and GDS
- Grid Data Service Factory (GDSF)
- Represent a data resource
- Persistent service
- Exposes capabilites and metadata
- May register with a DAISGR
- Grid Data Service (GDS)
- Created by GDSF
- Transient service
- Required to access data resource
- Holds the client session
9DAISGR
- DAI Service Group Registry (DAISGR)
- Persistent service
- Based on OGSI Service Groups
- GDSFs may registry with DAISGR
- Since R5 services no longer automatically
register - Clients access DAISGR to discover
- Resources
- Services (may need specific capabilities)
- support for a given portType or activity
10Why OGSA-DAI?
- Why use OGSA-DAI over JDBC?
- Can embed additional functionality at the service
end - Transformations, compressions
- Third party delivery
- The extensible activity framework
- Avoiding unnecessary data movement
- Common interface to heterogeneous data resources
- Relational, XML databases, and files
- Usefulness of the Registry for service discovery
- Dynamic service binding process
- Provision of good meta-data is necessary
- Language independence at the client end
- Do not need to use Java
- Platform independence
- Do not have to worry about connection technology,
drivers, etc
11Supported Data Resources
12High level design to support Multiple Service
Interfaces
13Multiple Service Interfaces
- OGSI
- OGSI 1.0
- GT 3
- WS-I
- WS-I 1.0
- WS-Security, UDDI, SOAP, WSDL, Axis 1.2
- WS-RF
- WS-Adressing, WS-RF
- GT 4
14OGSA-DAI Architecture
- GDS internals
- Engine
- Perform document
- Activities
15OGSA-DAI Architecture
- Low-level components of a Grid Data Service
- Engine
- Activities
- Data Resource Implementations
- Role Mappers
- Extensibility of OGSA-DAI architecture
- Interfaces
- Abstract classes
- Implementations
16Grid Data Service
- GDS has a document based interface
- Consumes perform documents
- Produces response documents
- More sophisticated behaviour possible
- Third party data delivery
- get data
- talk to other GDSs,
- Motivation for using a document interface
- Change in behaviour ?gtinterface change
- Reduce number of operation calls
- Extensible
17Grid Data Service internals
18The GDS Engine
- Engine is the central GDS component
- Dictates behavior when perform documents are
submitted - Parses and validates perform document
- Identifies required activities implementations
- Processes activities
- Composes response document
- Returns response document to GDS
19Perform documents
- Perform documents encapsulate a serialization
- of multiple interactions with a service into a
- single interaction
- Abstract each interaction into an activity
- Data can flow from one activity to another
- No control constructs present
- no conditionals, loops or variables
- Not intended for human consumption
- Generated and processed by client toolkit
20Perform/Response Document
ltperform xmlns" xmlnsxsi
xsischemaLocation"gt ltsqlQueryStatement
name"statement"gt ltexpressiongt
select from littleblackbookwhere id10
lt/expressiongt ltresultSetStream
nameoutput"/gt lt/sqlQueryStatementgt
ltdeliverToURLname"deliverOutput"gt
ltfromLocal fromoutput"/gt
lttoURLgtftp//anonfrog_at_ftp.example.com/homelt/toURL
gt lt/deliverToURLgt lt/performgt
ltgridDataServiceResponse xmlns"gt ltresult
name"deliverOutput" statusCOMPLETED"/gt
ltresult name"statement" statusCOMPLETED"/gt lt/gr
idDataServiceResponsegt
21Activities
- An Activity dictates an action to be performed
- Query a data resource
- Transform data
- Deliver results
- Engine processes a sequence of activities
- Subset of activities available to a GDS
- Specified in GDSF Configuration
- Data can flow between activities
22Activity Taxonomy
- Activities fall into three main categories
- Statement
- Interact with the data resource, e.g. direct an
SQL query to a DBMS - Delivery
- Deliver data in various ways, e.g. to a third
party - Transform
- Perform transformations on data, e.g. XSL
Transform, compression
Activity
Statement
Delivery
Transform
23Predefined Activities (supported)
xmlCollectionMangagement
xmlResourceMangagement
DeliveryToSMTP
xUpdateStatement
inputStream
xQueryStatement
outputStream
xPathStatement
DeliveryFromGDT
DeliveryToGDT
relational ResourceManager
DeliveryToStream
sqlBulkLoadRowset
DeliveryFromGFTP
xslTransform
sqlUpdateStatement
DeliveryToGFTP
zipArchive
sqlStoredProcedure
DeliveryToUrl
gzipCompression
sqlQueryStatement
DeliveryFromUrl
24Custom Activities
- Users can develop additional activities
- To support different query languages
- e.g. OQL
- To perform different kinds of transformation
- e.g. STX
- To deliver results using a different mechanism
- e.g. WebDAV
- An activity requires
- XSD schema myActivity.xsd
- Java implementation myActivity.java
25Acitvity In/Outputs
- Activities read and write blocks of data
- Allows efficient streaming between activities
- Reduces memory overhead
- A block is a Java Object
- Untyped, but usually a String or byte array
- Interfaces for reading and writing
- BlockReader and BlockWriter
26OGSA-DAI Configuration
- Interaction between
- GDSF GDS data resource
- What can be configured?
- Where?
- Tools
27Data Resource and GDSF
28GDSF Configuration
29How a GDS works
30What do you need?
- Database Driver
- Name of the driver class
- Is the JAR file in the library path of the
container? - data resource configuration document
- Database specifics
- Relational or XML database?
- Metadata youd like to publish
- Vendor and version of your database
- data resource configuration document
31What do you need?
- Functionalities of your GDS
- specifying information about the activities a
client can execute - activity configuration document
- Authorisation
- Grid Credentials How do distinguished names map
to database roles? - role map document
32Configuration of a GDSF
- 3 XML files for configuration
- Command line tools to create them
- we will use them in the hands-on part
- For tuning/tailoring of a GDSF good to know where
to look into
33OGSA-DAI Client Toolkit
- Why?
- Important objects
- Service Fetcher
- Simple requests
- Complex requests
- Processing results
34Why use a Client Toolkit?
- Ease of use
- Nobody wants to write XML!
- Encapsulates connection mechanism
- Familiar interfaces to results
- Protection from changes
- A client API can hide changes in the service
architecture and implementation
35Service Fetcher
- OGSA-DAI uses three main service types
- DAISGR (registry) for discovery
- GDSF (factory) to represent a data resource
- GDS (data service) to access a data resource
- The ServiceFetcher class creates service objects
from a URL - ServiceGroupRegistry registry
- ServiceFetcher.getRegistry( registryHandle)
- GridDataServiceFactory factory
- ServiceFetcher.getFactory( factoryHandle)
- GridDataService service
- ServiceFetcher.getGridDataService( handle )
36Creating/destroying GDS
- A factory object can create a new Grid Data
Service. - GridDataService service
- factory.createGridDataService()
- Grid Data Services are transient (i.e. have
finite lifetime) so they can be destroyed by the
user. - service.destroy()
37Interaction with a GDS
- Client sends a request to a data service
- A request contains a set of activities
- ---------------------------------------
- The Data service processes the request
- Returns a response document with a result for
each activity
38Example Activities
- SQLQuery
- SQLQuery query new SQLQuery("select from
littleblackbook - where id'3475'")
- XPathQuery
- XPathQuery query new XPathQuery(
"/entry_at_idlt10" ) - XSLTransform
- XSLTransform transform new XSLTransform()
- DeliverToGFTP
- DeliverToGFTP deliver new DeliverToGFTP("ogsadai
.org.uk", 8080, "myresults.txt" )
39Simple Requests
- Simple requests consist of only one activity
- Send the activity directly to the perform method
- SQLQuery query new SQLQuery(
- "select from littleblackbookwhere
id'3475'") - Response response service.perform( query )
40Complex Requests
41Constructing a complex request
- ActivityRequest request new ActivityRequest()
- request.add( query )
- request.add( transform)
- request.add( delivery )
42Connecting activities
- ActivityRequest request new ActivityRequest()
- request.add( query )
- request.add( transform )
- request.add( delivery )
- transform.setInput( query.getOutput() )
- deliver.setInput( transform.getOutput() )
43Perform Request
- Finally perform the request!
- Response response service.perform( request )
- The response contains status and results of each
activity in the request. - System.out.println( response.getAsString() )
44Processing Results
- Varying formats of output data
- SQLQuery (JDBC ResultSet)
- ResultSet rs query.getResultSet()
- SQLUpdate (Integer)
- int rows update.getModifiedRows()
- XPathQuery (XMLDB ResourceSet)
- ResourceSet results query.getResourceSet()
- Output can always be retrieved as a String
- String output myactivity.getOutput().getData()
45Delivery Methods
46Client driven data integration scenario
47Thanks
- Thanks you for your attention!
- Further questions?
48References
- OGSA-DAI Tutorial at GGF 13 in Seoul
- http//www.ogsadai.org/courses/GGF13/index.html
- OGSA-DAI Project page
- http//www.ogsadai.org