Title: Applying Grid Technologies to Distributed Data Mining
1Enabling Access to Federated Grid Databases An
OGSA-DAI ODBC Driver
Michael J. Jackson1 Ashley D. Lloyd2 Terence M.
Sloan1
1EPCC 2Curtin Business School Edinburgh
University Management School
2Overview
- Why develop an OGSA-DAI ODBC driver?
- ODBC
- OGSA-DAI
- Design and Development
- What does an OGSA-DAI ODBC driver give us?
- Issues and Concerns
3Why?
- Facilitate use of standard data analysis tools in
a Grid environment - Remove need for Grid awareness
- Allow use of existing data analysis skills in a
Grid environment - Improve rate of adoption of Grid technologies
- Data analysis tools
- SPSS, SAS
- How can standard data analysis tools access
Grid-enabled databases? - An ODBC driver for OGSA-DAI
4Open DataBase Connectivity ODBC
ODBC data source
Database API
ODBC API
ODBC API
Data source name
Reside on same host
5ODBC Advantages
- Application developers
- Applications can be database-independent
- No need to compile against database-specific
libraries - Call-level interface execute SQL generated at
run-time - Change a database gt only change driver and
configuration - Database manufacturers
- An ODBC-compliant driver allows the database to
be a back end for any ODBC-compliant application
6OGSA-DAI
- Open Grid Services Architecture Data Access and
Integration - Extensible framework for data access and
integration - Expose heterogeneous data resources to a Grid
through web services - Data operations
- Access, update, management and integration
relational, XML, files - Compression and transformation
- Delivery to URLs, FTP, GridFTP, mail, other
services - Base for developing higher-level services
- Data federation and distributed query processing
- Data mining
- Data visualisation
7Accessing Data Resources via OGSA-DAI
OGSA-DAI Perform document
JDBC API
OGSA-DAI Response document
8An ODBC Driver for OGSA-DAI
ODBC API
ODBC API
Data source name
OGSA-DAI Response document
OGSA-DAI Perform document
JDBC API
9A Simple Scenario
- Data analysis
- ODBC view
- Connect to OGSA-DAI ODBC data source
- Submit a SELECT FROM table query
- Get back the results
- Disconnect from the data source
- OGSA-DAI view
- Connect to an OGSA-DAI data service
- Construct a Perform document holding the query
- Send it to the service
- Receive a Response document from the service
- Parse it to get the results
10Development Options
- Implement an OGSA-DAI ODBC driver
- From scratch
- Use an open source ODBC driver
- Extract a data resource-independent skeleton
- Customise it to OGSA-DAI
- Use an ODBC SDK
- OpenAccess
- Simba
- Syware
11Using an SDK
- Proof of concept
- Prototype within a tight time-scale
- OpenAccess SDK
- 30 day evaluation licence
- Provides an ODBC driver
- Developer codes an Interface Provider (IP)
- Supports Java development gt exploit OGSA-DAIs
client toolkit
12An ODBC Driver for OGSA-DAI using OpenAccess
OGSA-DAI Perform document
OGSA-DAI CTk API
OpenAccess API
OGSA-DAI Response document
Data resource configuration (e.g. service URL)
13Testing
- OpenAccess ODBC SQL query tool
- Submit SQL statements to an ODBC data source
- Present the results
- EPCC
- OGSA-DAI ODBC data source on a PC
- ODBC driver OGSA-DAI service URL
- Curtin Business School
- OGSA-DAI server and services
- Database
14What does this give us?
- Transparency
- Database location
- Changes are restricted to the OGSA-DAI server
- Client applications are unaffected
- Database product
- Global access of data
- Publish service URL
- Security
- Database user names and passwords reside on
OGSA-DAI server - Clients can be required to provide credentials to
connect to OGSA-DAI services
15Data Federation
ODBC API
ODBC API
Data source name
OGSA-DAI documents
Virtual database
16Issues and Concerns
- OGSA-DAI WSI / WSRF compliance
- Prototype developed using OGSA-DAI OGSI
- Data source includes OGSA-DAI factory service URL
- OGSA-DAI WSI or WSRF data service URL
resource ID - Driver development
- Complete the OpenAccess IP
- Write a pure OGSA-DAI ODBC driver from scratch
- ODBC conformance
- Cursors, sessions, transactions, timeouts,
meta-data - Analysis of SAS or SPSS ODBC usage
- Efficiency
17Questions