OGSADQP: A ServiceBased Distributed Query Processor For The Grid

1 / 25
About This Presentation
Title:

OGSADQP: A ServiceBased Distributed Query Processor For The Grid

Description:

Implements the query execution model and semantics ... Importing the schema. 24. What happens behind the scenes (1) 25. What happens behind the scenes (2) ... –

Number of Views:26
Avg rating:3.0/5.0
Slides: 26
Provided by: savas4
Category:

less

Transcript and Presenter's Notes

Title: OGSADQP: A ServiceBased Distributed Query Processor For The Grid


1
OGSA-DQP A Service-Based Distributed Query
Processor For The Grid
  • Arijit Mukherjee
  • School of Computing Science
  • University of Newcastle

2
Peoples and partners
Dr. Jim Smith
Arijit Mukherjee
Dr. Savas Parastatidis
Prof. Paul Watson
Dr. Alvaro AA Fernandez
Prof. Norman Paton
Dr. M. Nedim Alpdemir
Dr. Rizos Sakellariou
Anastasios Gounaris
3
Acknowledgement
  • Much of the content in many of the slides has
    been authored by co-workers, especially M.N.
    Alpdemir, Alvaro A.A. Fernandes, A. Gounaris and
    N.W. Paton for the presentation at 1st
    International Conference on Service Oriented
    Computing, Trento, Italy, 15th-18th December
    2003.
  • Errors are mine, of course.

4
Outline
  • Introduction
  • DQP Approaches
  • Goals
  • Innovations
  • Architecture
  • Initialisation and setup
  • Query Evaluation
  • Performance
  • Future work
  • Summary

5
Service-based in what sense?
  • OGSA-DQP is Service-based in two orthogonal
    senses
  • Supports querying over data storage and analysis
    resources made available as services
  • Hence resource virtualisation via SOA
  • Construction of distributed query plans and their
    execution over the grid are factored out as
    services

6
Background federation
Query
Results
  • One DQP approach is to federate.
  • This is complex to manage because there always is
    an autonomous DBMS at every node.
  • Resource allocation is static.
  • Query optimisation cannot take evolving
    circumstances into account.

7
Background mediation
Query
Results
  • Another approach is to use mediator/wrapper
    middleware.
  • Autonomy is now less of an issue because the
    wrappers reconcile differences and impose a
    global schema.
  • Often there is only one mediator in one fixed
    location, which is limiting.
  • Query optimisation is not as hard.

mediator
wrapper
wrapper
DBMS
data
8
OGSA-DQP Approach
Query
Results
  • OGSA-DQP uses a middleware approach.
  • It can be seen as a mediator over OGSA-DAI
    wrappers.
  • It promises bottom-lines regarding
  • efficiency leave to it to schedule in
    parallel
  • effectiveness leave to it to orchestrate your
    services
  • usability use it as a Grid data service.

OGSA-DQP
OGSA-DAI
OGSA-DAI
DBMS
data
9
OGSA-DQP Goals
  • To benefit from homogeneous access to
    heterogeneous data sources OGSA-DAI.
  • To benefit from Grid abstractions for on-demand
    allocation of resources required for a task
    OGSA/OGSI/GT3.
  • To provide transparent, implicit support for
    parallelism and distribution. Polar
  • To orchestrate the composition of data retrieval
    and analysis services.
  • To expose this orchestration capability as a Grid
    data service.

10
OGSA-DQP Innovations
  • OGSA-DQP dynamically allocates evaluators to do
    work on behalf of the mediator.
  • This allows for runtime circumstances to be taken
    into account when the optimiser decides how to
    partition and schedule.
  • OGSA-DQP uses a parallel physical algebra most
    mediator-based query processors do not.

11
OGSA-DQP Architecture
  • Extends the OGSA-DAI with two new services (and
    their corresponding factories)
  • Grid Distributed Query Service
  • Exposed to client
  • Finds and retrieves service descriptions
  • Parses, compiles, optimizes, schedules the query
    execution plans over a union of distributed data
    resources
  • Grid Query Evaluation Service
  • Not exposed to the client
  • Implements the physical query algebra
  • Implements the query execution model and
    semantics
  • Evaluates a partition of the query execution plan
    generated by the GDQS
  • Interacts with other GQESs/GDSs/Web Services

12
Initialisation and Setup
  • To generate an execution plan, a GDQS
  • retrieves information about the data and
    computational services deemed of interest by the
    requestor
  • Interacts with GDS Factories, obtains WSDL docs
    of analysis services and (in future) Index
    Services to acquire relevant metadata
  • Compiles, optimises, partitions and schedules the
    query execution.
  • In the Query execution phase, the GDQS
  • Interacts with GDS factories to create the leaf
    services (i.e. data source wrappers) in the plan
  • Creates instances of evaluators as specified by
    the query plan
  • Allocates partitions to service instances (i.e.,
    computational nodes) for execution according to
    schedule.

13
Example Query Plan
  • select p.proteinId, Blast(p.sequence) from p in
    protein, t in proteinTerm
  • where t.termIdGO0008372 and
    p.proteinIdt.proteinId

select (p.proteinId, blast)
4, 5
operation_call (blast(p.sequence))
3, 6
exchange
hash_join (p.proteinIdt.proteinId)
2, 3
exchange
exchange
select (proteinId)
select (proteinId, sequence)
table_scan (proteinTerms) (termedABCD)
table_scan (proteins)
6
(c) partitioned plan
14
Query Evaluation
  • Query installation stage
  • As many GQES instances are created as there are
    partitions specified.
  • Each partition is sent to the GQES instance it is
    scheduled for.
  • Query evaluation stage
  • Each GQES evaluates its partition using an
    iterator model.
  • Queries execute under pipelined and partitioned
    parallelism.
  • Results are conveyed to client.

15
OGSA-DQP Execution Flow
16
What we provide
  • Resource virtualisation through a
    service-oriented architecture
  • Data Resource Discovery using service registries
  • Computational Resource Discovery via Index
    Services (not implemented yet)
  • Reliance on GDSs for metadata and data access
  • Coarse-grained services with document-oriented
    interfaces
  • By acquiring and manipulating data in a data-flow
    architecture that is constructed dynamically,
    OGSA-DQP constructs, on-the-fly, a lightweight
    DQPE.

17
OGSA-DQP Performance
  • One performance bottleneck is the lack of an
    efficient data transfer mechanism.
  • As OGSA-DQP is essentially a data flow system,
    efficient data movement is critical. Currently
    underlying technology is not mature enough to
    provide this capability.
  • For example, block delivery does not work
    efficiently, because
  • the block size is not configurable (till OGSA-DAI
    4.0)
  • the data is transferred as XML documents using
    SOAP over HTTP.

18
Possible future work
  • More friendly (!) - Use SQL
  • More portable - Support Cygwin, Solaris for the
    compiler/optimiser
  • Better performer we are working on it
  • More functional - Semi-structured data Streams.
  • More dynamic - Use Index Services dynamically
    install services
  • More application test-beds - Sensor networks.
  • More adaptive - Queries may be long running,
    environment is constantly changing - static
    optimisation is likely to become stale fast.
    Monitor, assess and respond (e.g., switch
    operators/ algorithms, spawn more copies,
    relocate).
  • More deployable - As a web service, e.g..

19
Summary
  • OGSA-DQP is a service-based distributed query
    processor for the Grid that is
  • Exposed as a service
  • Implemented as an orchestration of services.
  • OGSA-DQP is an enactor of declarative Grid
    service orchestrations that
  • Improves on Grid portals when only retrieval and
    analysis is involved
  • Fills the gap left by the lack of a service
    orchestration framework in the OGSA.

20
Where to find out more papers
  • M N Alpdemir, A Mukherjee, A Gounaris, A A A
    Fernandes, N W Paton, P Watson, J Smith. Service
    Based Distributed Querying on the Grid. 1st
    International Conference on Service Oriented
    Computing, 2003, LNCS 2910
  • M N Alpdemir, A Mukherjee, A Gounaris, A A A
    Fernandes, N W Paton, P Watson, J Smith .
    OGSA-DQP A Service for Distributed Querying on
    the Grid, in Proceedings of the Advances in
    Database Technology - EDBT 2004, LNCS 2992
  • M N Alpdemir, A Mukherjee, A Gounaris, A A A
    Fernandes, N W Paton, P Watson, J Smith. An
    Experience Report on Designing and Building
    OGSA-DQP A Service Based Distributed Query
    Processor for the Grid. GGF9 Workshop on
    Designing and Building Grid Services, 2003.
  • M N Alpdemir, A Mukherjee, A Gounaris, N W Paton,
    P Watson, A A A Fernandes, J Smith. OGSA-DQP A
    Service-Based Distributed Query Processor for the
    Grid. 2nd UK e-Science All Hands Meeting, 2003.
  • J Smith, A Gounaris, P Watson, N W Paton, A A A
    Fernandes, R Sakellariou. Distributed Query
    Processing on the Grid. GRID 2002, LNCS 2536
  • ( papers available from http//www.cs.ncl.ac.uk/r
    esearch/pubs/authors/byType.php?id110 )

21
Where to find out more software
  • OGSA-DQP
  • Grid middleware to query distributed data sources
  • www.ogsadai.org.uk/dqp
  • 508 downloads till 1015AM today!
  • OGSA-DAI
  • Grid middleware to interface with data(bases)
  • www.ogsadai.org.uk/
  • Globus Toolkit
  • Open-source implementation of OGSA/OGSI
  • www.globustoolkit.org/

22
Thank You
  • ?

23
Importing the schema
24
What happens behind the scenes (1)
25
What happens behind the scenes (2)
Write a Comment
User Comments (0)
About PowerShow.com