R-GMA - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

R-GMA

Description:

Heriot-Watt, Edinburgh. Andrew Cooke, Alasdair Gray, Lisha Ma, Werner Nutt. IBM-UK ... of computers at Heriot-Watt!' 'Between which nodes was yesterday ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 35
Provided by: Werne67
Category:
Tags: gma | heriot

less

Transcript and Presenter's Notes

Title: R-GMA


1
R-GMA DataGrids Monitoring System 1/7/2003
Werner Nutt (Heriot-Watt
University) ltw.nutt_at_hw.ac.ukgt
2
RGMA Relational Grid Monitoring
Architecture
  • Grid Monitoring and Information System
    developed within DataGrid (Work Package
    3)
  • Based on the Grid Monitoring Architecture
    of the Global
    Grid Forum
  • Code is open source and freely available
    Homepage type wp3 into Google

3
Contributors
  • Heriot-Watt, Edinburgh
  • Andrew Cooke, Alasdair Gray, Lisha Ma, Werner
    Nutt
  • IBM-UK
  • James Magowan, Manfred Oevers, Paul Taylor
  • Queen Mary, University of London
  • Roney Cordenonsi
  • CCLRC/PPARC
  • Rob Byrom, Laurence Field, Steve Hicks, Manish
    Soni, Antony Wilson, Jason Leake
  • Linda Cornwall, Abdeslem Djaoui, Steve Fisher,
    Robin Middleton
  • SZTAKI, Hungary
  • Peter Kacsuk, Norbert Podhorszki
  • Trinity College Dublin
  • Brian Coghlan, Stuart Kenny, David OCallaghan

4
Overview
  • Grid monitoring Requirements
  • The R-GMA approach A virtual monitoring
    database
  • Components of R-GMA
  • Schema
  • Producers and Consumers
  • Registry
  • Republishers
  • Query Planning

5
Major Components of DataGrid
6
WP7 R-GMA Collects Network Monitoring Data
7
The Grid Monitoring Problem
  • In a Grid we have
  • Computers
  • Storage elements
  • Network nodes and connections
  • Application programmes,
  • Monitoring
  • What is the current state of the system?
  • How did the system behave in the past ?

8
Monitoring Data Come in two Kinds
  • A Grid monitoring system makes available two
    kinds of data
  • static data pools, e.g., databases on
  • network topology, nodes connected
  • applications available (versions, licences, ...)
  • streams of data, e.g.,
  • sensor data (cpu load, network traffic, ...)
  • Data streams may give rise to data pools if they
    are archived
  • Today R-GMA is tailored towards streams,
  • but not pools

9
Examples of Monitoring Queries
  • Show me the (average) cpu-load of computers at
    Heriot-Watt!
  • Between which nodes was yesterday the average
    transportation time for 1 MB packets higher than
    than 0. seconds?
  • For every computing element CE, how many
    computers of CE have currently a cpu-load of no
    more than 30?

10
Grid Monitoring Requirements
  • Support for publishing data pools and
    streams
  • Support for locating data sources
    (automatic, if possible)
  • Queries with different temporal interpretations
    (continuous,
    latest state, history)
  • Scalability (there
    may be thousands of data sources)
  • Resilience to failure
    (data sources may become unavailable)
  • Flexibility (we dont know which queries
    will be posed)

11
Architecture Approach 1 A Monitoring Data
Warehouse
  • Idea
  • store all data about the Grid status into a huge
    database
  • and query it
  • Not realistic
  • Loading takes time
  • Data occupy space
  • Connections to the warehouse may fail
  • Often monitoring data flow as data streams, and
    queries ask for data streams as output

12
Approach 2 Monitoring with a
Multi-agent System
  • The Grid Monitoring Architecture (GMA) of the
    Global Grid Forumdistinguishes between
  • Consumers of information
  • Producers of information
  • Directory Service
  • Producers register their supply
  • Consumers register their demand

Directory Service mediates between producers and
consumers
13
Questions about GMA
  • Which kinds of producers and consumers are there?
  • In which language do producers register their
    supply
    and consumers their demand ?
  • What is the meaning of a registration?
  • How does a consumer find suitable producers?
    And how does a producer find suitable
    consumers?
  • Producers have different capabilities to answer
    queries
    (e.g. selections,
    joins, ). Which of them should they register?

14
R-GMA A Virtual Monitoring Data Warehouse
  • Language of producers and consumers
    relational queries (SQL)
  • Vocabulary Relations in a global schema
  • Consumer poses queries over
    global schema
  • Producer
  • has a type (stream p., database p.)
  • publishes relations R1, ,Rk
  • for every R, registers a simple view V on the
    global schema

15
Schema Contributions
CPULoad (Global Schema) CPULoad (Global Schema) CPULoad (Global Schema) CPULoad (Global Schema) CPULoad (Global Schema)
Country Site Facility Load Timestamp
UK RAL CDF 0.3 19055711022002
UK RAL ATLAS 1.6 19055611022002
UK GLA CDF 0.4 19055811022002
UK GLA ALICE 0.5 19055611022002
CH CERN ALICE 0.9 19055611022002
CH CERN CDF 0.6 19055511022002
CPULoad (Stream Producer 2) CPULoad (Stream Producer 2) CPULoad (Stream Producer 2) CPULoad (Stream Producer 2) CPULoad (Stream Producer 2)
UK GLA CDF 0.4 19055811022002
UK GLA ALICE 0.5 19055611022002
CPULoad (Stream Producer 1) CPULoad (Stream Producer 1) CPULoad (Stream Producer 1) CPULoad (Stream Producer 1) CPULoad (Stream Producer 1)
UK RAL CDF 0.3 19055711022002
UK RAL ATLAS 1.6 19055611022002
CPULoad (Stream Producer 3) CPULoad (Stream Producer 3) CPULoad (Stream Producer 3) CPULoad (Stream Producer 3) CPULoad (Stream Producer 3)
CH CERN ATLAS 1.6 19055611022002
CH CERN CDF 0.6 19055511022002
16
Contributions are Views
CPULoad (Producer 1) CPULoad (Producer 1) CPULoad (Producer 1) CPULoad (Producer 1) CPULoad (Producer 1)
UK RAL CDF 0.3 19055711022002
UK RAL ATLAS 1.6 19055611022002
SELECT FROM cpuLoad WHERE country UK AND
site RAL
CPULoad (Producer 2) CPULoad (Producer 2) CPULoad (Producer 2) CPULoad (Producer 2) CPULoad (Producer 2)
UK GLA CDF 0.4 19055811022002
UK GLA ALICE 0.5 19055611022002
SELECT FROM cpuLoad WHERE country UK AND
site GLA
17
Keys in the Global Schema
  • Network throughput
  • tp(src, dest, method, pcktSize, timestamp,
    time)
  • Intuitively, tp has the primary key
  • (src, dest, method, pcktSize, timestamp).
  • We need to know the primary keys
  • to understand the global schema
  • to answer latest snapshot queries
  • Primary keys are declared, but not enforced!
  • Although, sometimes they hold globally if
    they hold locally !

18
Metaphor Roles and Agents
  • R-GMA Clients Grid components or Grid
    applications
  • Clients can play the roles of producers or
    consumers
  • A client would need special capabilities for a
    role
  • Clients are supported in their roles by agents
  • Implementation
  • APIs for client roles new
    StreamProducer()
  • Agents are objects on a Web server

19
Primary Producers
  • Database producer
  • supports queries over fixed set of tuples (static
    queries)
  • can be used to publish a database
  • Stream producer
  • supports queries over changing set of tuples

    (continuous queries)
  • supports latest snapshot queries
  • offers up-to-date values for each primary key in
    a db
  • Today DatabaseProducers and StreamProducers
    in R-GMA are different from the
    above!

20
Communication Modes of Stream Producers
  • Stream Producers may offer two communication
    modes for continuous queries
  • lossless ( but tuples could become stale)
  • lossy ( but tuples are fresh)

Today R-GMAs StreamProducers are resilient and
support lossless communication
21
Republishers Publish Query Answers
  • Archiver shows the history of a stream.
  • Stream Republisher enables
  • merging,
  • thinning,
  • summarising of streams

22
Republishers in R-GMA Today
  • Republishers are called archivers
    (although some of them don't archive
    anything)
  • An archiver ( republisher)
  • is defined by a query
  • consumes only from stream producers
  • publishes the query result according to its
    type, using
  • a stream producer, or
  • a latest snapshot producer, or
  • a database producer (which keeps an
    archive)
  • Republishers are used to answer complex queries!

23
The Next Step Hierarchies of Stream Republishers
24
Republisher HierarchiesThe Issues
  • Republishers are defined by querieshierarchies
    have to be maintained automatically
  • new stream producers must only be added
    to republishers at
    lowest level
  • hierarchy has to be replanned if a republisher
    fails
  • difficult transition from one plan to the other
    without
    loss of tuples
  • How well can we describe the content of a
    stream?Possibly need for descriptions that join
  • stream relations CPULoad(machineID, load,
    timestamp)
  • static relations locatedAt(machineID,
    site)

25
What is the Meaning of a Query in R-GMA?
  • Assumption the views of (primary) producers are
    selections on a single relation, i.e., queries of
    the form
  • SELECT
  • FROM cpu_load
  • WHERE machine_id AB123 AND loc hw
    (each producer contributes its parts
    of a relation)
  • The virtual database contains the union of
    the data of all the primary producers
  • Conceptually, a query is evaluated
  • over the entire virtual db

26
Stream Queries can have Various Temporal
Interpretations
  • Consider a query over the relation Transport
    Time
  • tt(src, dest, pcktSize, method, timestamp, time)
  • SELECT FROM tt
  • WHERE src ral AND dest bologna
  • What is meant? Measurements
  • from now ? (Continuous Query)
  • up until now ? (History Query)
  • right now ? (Latest Snapshot Query)
  • Today Queries can
    be flagged with their type

27
Advanced Queries Mixing Temporal Query Types
  • Which connections have currently a
    transportation time that is higher than last
    week's average?
    (latest snapshot and history)
  • Show me the cpu load of those machines where it
    is lower than yesterday's load average!

  • (continuous and history)
  • We do not intend to support such
    queries by R-GMA!

28
In R-GMA Query Answering Needs Mediation
  • Suppose P1, P2 publish for tp (throughput)
  • P1 WHERE src hw
  • P2 WHERE src ral AND pcktSize gt 20
  • A global consumer poses its query over global
    relations
  • SELECT FROM tp WHERE pcktSize gt 10
  • A mediator translates this into queries over
    local relations
  • SELECT FROM P1.tp WHERE pcktSize gt 10
  • UNION
  • SELECT FROM P2.tp
  • Today R-GMAs mediator handles simple queries
    like the one above

29
Global and Local Consumers
  • Global consumers pose queries over global
    relations
  • SELECT FROM tp WHERE pcktSize gt 10 ,
  • which are translated into queries over local
    relations
  • SELECT FROM P1.tp WHERE pcktSize gt 10
  • UNION
  • SELECT FROM P2.tp
  • Local consumers pose queries over local
    relations directly
  • SELECT FROM P1.tp WHERE method ping
  • Today a consumer can be global or local,
  • but local relations cannot be
    referred to explicitly

30
How does the Mediator Find Suitable Publishers?
  • P1, P2, P3 publish for tt (Transport Time)
  • P1 src hw
  • P2 src ral AND pcktSize gt 20
  • P3 src ral AND method ping
  • Q SELECT FROM tt WHERE src ral AND method
    ping
  • We see P1 is not suitable for Q, but P2 and P3
    are. Why?
  • src hw AND src ral AND method ping
    is never true
  • src ral AND pcktSize gt 20 AND
    is sometimes true
  • Satisfiability
    Test!
  • Today implemented

31
So Which Publishers Should the Mediator
Ask?
  • P2 src ral AND pcktSize gt 20
  • P3 src ral AND method ping
  • Q SELECT FROM tt WHERE src ral AND
    method ping
  • All answers to Q returned by P2 are also returned
    by P3
  • whenever
  • src ral AND pcktSize gt 20 AND src ral AND
    method ping
  • is true, then
  • src ral AND method ping AND src ral AND
    method ping
  • is true.
  • Hence, R-GMA only needs to ask P3
  • Entailment
    Test!
  • Needed for Republisher Hierarchies!
    (not yet implemented)

32
But What Did the Producers Promise?
  • P registers view V
  • Does P promise
  • some of V ? (sound description)
  • all of V? (sound and
    complete description)
  • The Entailment Test only makes sense when the
    registered views are sound and complete
    descriptions
  • Producers should register completeness flags

33
Why May a Producer not be Complete?
  • The language of views is more restricted than the
    language of queriesHence republishers may be
    unable to say exactly what they
    publish
  • Archivers may archive in lossy mode
  • Producers may lose tuples
  • A producer may not know everything
    about
    the real world

  • Open to debate

34
Summary (1)
  • Monitoring data come in Pools and Streams
  • Global Schema
  • primary keys
  • Types of Stream Queries
  • continuous vs. history vs. latest snapshot
  • Producers
  • DB producers publish database
  • stream producers lossless vs.
    lossy communication modes
Write a Comment
User Comments (0)
About PowerShow.com