Title: Using OGSADAI in a commercial environment
1Using OGSA-DAI in a commercial environment Terry
Sloan EPCC Telephone 44 131 650 5155
Email tsloan_at_epcc.ed.ac.uk
- FirstDIG
- Outstanding issues raised by these projects
3First Data Investigation on the Grid
FirstDIG http//www.epcc.ed.ac.uk/firstdig/
- Few UK e-Science projects involve service
companies such as First plc - First plc
- Operate worldwide in variety of transport sectors
- Over 10000 vehicles in the UK, 23 of the market
- UKs largest operator
- The challenge for First
- Meeting the needs of the travelling public whilst
making money - Data integration and mining may assist but huge
range of fragmented data sources
5Data Sources in the Bus Industry
- Many different kinds of data involved with
running a bus company - Mileage, revenue, customer contact, schedule,
fuel consumption, vehicle maintenance, routes - Many means to collect data
- Manually entered data at depot
- Data collected on buses from ticket machines
- Data collected on buses from GPS systems
- GPS system notes when bus passes through a
predefined footprint and records the time at
which this happens
6Answering Business Questions
- Want to combine data from more than one source
- Complaints versus Lateness
- Revenue versus Lost Miles
- Complaints versus Lost Miles
- Want data aggregated in some way
- By Service
- By Day
- Want to consider subsets of the data
- e.g. weekdays only
7Disparate Databases
- Data is typically stored in disparate databases
- Various reasons for this Incremental
construction of systems. - Not a problem for day-to-day running and querying
but - Introduces challenges for Data Analysis
- Systems introduced at different times
- Different database engines
- Different front-ends
- Different operating systems
- Different physical locations
- Different ways of representing data
- These issues are NOT unique to buses
- Open Grid Services Architecture Data Access and
Integration - Potentially provides a solution
- Need business users to make transition from
science to commerce - Grid middleware
- Assists with the access and integration of data
from separate data sources via the Grid - Represents databases as Grid Services
- Enables access from other machines in a secure
9FirstDIG Achievements
- Deployment at First South Yorkshire
- Combined two databases to answer real business
questions - The Customer Contact System
- Microsoft Access
- Information on customer complaints e.g. time,
service, nature - The Mileage database
- Information on bus mileage e.g. lost miles
- Produced generic Grid Data Service Browser
- SQL access including joins across the databases
10First Grid Data Service Browser
11Informing Business Regional Policy
Grid-enabled fusion of global data local
knowledge INWA http//www.epcc.ed.ac.uk/inwa/
- An e-Social Science demonstrator
- Demonstrates how grid technologies can improve
business - Combining private and public data sources
- Finance and Telecommunications
- Uses many grid technologies
- TOG from Sun DCG provides access to remote HPC
resource - OGSA-DAI provides access control and discovery of
distributed heterogeneous data resources - FirstDIG grid data service browser provides SQL
access to OGSA-DAI enabled resources - Globus Toolkit 2 and 3
13INWA Grid Infrastructure
Grid Engine
Globus Grid
Bank data
Telco data
- http//www.epcc.ed.ac.uk/
- FirstDIG
- http//www.epcc.ed.ac.uk/firstdig/
- http//www.ogsadai.org.uk
- http//www.epcc.ed.ac.uk/inwa
- Sun Data Compute Grids
- http//www.epcc.ed.ac.uk/sungrid/
- Transfer-queue Over Globus (TOG)
- http//gridengine.sunsource.net/project/gridengin
15Outstanding issues raised by FirstDIG INWA
16Outstanding IssuesUsability
- OGSA-DAI is middleware, client toolkit helps
- Incorporation of demo First browser helpfulish
- But really want
- Interfaces to real data analysis dbms packages
eg SPSS - Otherwise users could end up building
applications that replicate these eg the First
Grid Data Service Browser - Want to be able to point Access, Excel, etc at a
grid data source and examine it
17Outstanding issuesData
- CSV (Comma separated value) data sources
- are common but current JDBC-ODBC drivers do not
have sufficient functionality (NOT an OGSA-DAI
issue per se) - No support for BIT type field
- And others eg BOOLEAN, BINARY, etc
- Certain characters (eg , gt) are not handled by
the OGSA-DAI XML parser - Company names often have in them
- Dates from certain sources not handled properly
- First Grid Data Service has to handle this
18Outstanding issuesMiscellaneous
- Security
- Rolemap file is not encrypted
- If one GDS accesses another GDS the user security
credentials are not passed on so it does not work - Installation Testing
- Install Set-up
- Well-explained but still a fair amount of user
effort involved - Lack of an example OGSA-DAI site to point at to
test that your OGSA-DAI installation works
19Outstanding IssuesMiscellaneous
- Installation Testing
- Lack of an example OGSA-DAI site to point at to
test that your OGSA-DAI installation works - Large results sets
- Can increase JVM size but this is not scalable
- This occurred on most datasets
- Integration
- DQP is a start .(Linux, OQL)
- Why use OGSA-DAI ?
- Easysoft etc
- http//www.easysoft.com/products/2001/main.phtml
20Why use OGSA-DAI ?
a RDBMS engine that appears to client apps as a
fully conformant ODBC 3.5 data source.can be
used to provide real-time, heterogeneous access
to multiple target data sources.