Title: R-GMA
1R-GMA DataGrids Monitoring System 1/7/2003
Werner Nutt (Heriot-Watt
University) ltw.nutt_at_hw.ac.ukgt
2RGMA Relational Grid Monitoring
Architecture
- Grid Monitoring and Information System
developed within DataGrid (Work Package
3) - Based on the Grid Monitoring Architecture
of the Global
Grid Forum - Code is open source and freely available
Homepage type wp3 into Google
3Contributors
- Heriot-Watt, Edinburgh
- Andrew Cooke, Alasdair Gray, Lisha Ma, Werner
Nutt - IBM-UK
- James Magowan, Manfred Oevers, Paul Taylor
- Queen Mary, University of London
- Roney Cordenonsi
- CCLRC/PPARC
- Rob Byrom, Laurence Field, Steve Hicks, Manish
Soni, Antony Wilson, Jason Leake - Linda Cornwall, Abdeslem Djaoui, Steve Fisher,
Robin Middleton - SZTAKI, Hungary
- Peter Kacsuk, Norbert Podhorszki
- Trinity College Dublin
- Brian Coghlan, Stuart Kenny, David OCallaghan
4Overview
- Grid monitoring Requirements
- The R-GMA approach A virtual monitoring
database - Components of R-GMA
- Schema
- Producers and Consumers
- Registry
- Republishers
- Query Planning
5Major Components of DataGrid
6WP7 R-GMA Collects Network Monitoring Data
7The Grid Monitoring Problem
- In a Grid we have
- Computers
- Storage elements
- Network nodes and connections
- Application programmes,
- Monitoring
- What is the current state of the system?
- How did the system behave in the past ?
8Monitoring Data Come in two Kinds
- A Grid monitoring system makes available two
kinds of data - static data pools, e.g., databases on
- network topology, nodes connected
- applications available (versions, licences, ...)
-
- streams of data, e.g.,
- sensor data (cpu load, network traffic, ...)
-
- Data streams may give rise to data pools if they
are archived - Today R-GMA is tailored towards streams,
- but not pools
9Examples of Monitoring Queries
- Show me the (average) cpu-load of computers at
Heriot-Watt! - Between which nodes was yesterday the average
transportation time for 1 MB packets higher than
than 0. seconds? - For every computing element CE, how many
computers of CE have currently a cpu-load of no
more than 30?
10Grid Monitoring Requirements
- Support for publishing data pools and
streams - Support for locating data sources
(automatic, if possible) - Queries with different temporal interpretations
(continuous,
latest state, history) - Scalability (there
may be thousands of data sources) - Resilience to failure
(data sources may become unavailable) - Flexibility (we dont know which queries
will be posed)
11Architecture Approach 1 A Monitoring Data
Warehouse
- Idea
- store all data about the Grid status into a huge
database - and query it
- Not realistic
- Loading takes time
- Data occupy space
- Connections to the warehouse may fail
- Often monitoring data flow as data streams, and
queries ask for data streams as output
12Approach 2 Monitoring with a
Multi-agent System
- The Grid Monitoring Architecture (GMA) of the
Global Grid Forumdistinguishes between
- Consumers of information
- Producers of information
- Directory Service
- Producers register their supply
- Consumers register their demand
Directory Service mediates between producers and
consumers
13Questions about GMA
- Which kinds of producers and consumers are there?
- In which language do producers register their
supply
and consumers their demand ? - What is the meaning of a registration?
- How does a consumer find suitable producers?
And how does a producer find suitable
consumers? - Producers have different capabilities to answer
queries
(e.g. selections,
joins, ). Which of them should they register?
14R-GMA A Virtual Monitoring Data Warehouse
- Language of producers and consumers
relational queries (SQL) - Vocabulary Relations in a global schema
- Consumer poses queries over
global schema - Producer
- has a type (stream p., database p.)
- publishes relations R1, ,Rk
- for every R, registers a simple view V on the
global schema
15Schema Contributions
CPULoad (Global Schema) CPULoad (Global Schema) CPULoad (Global Schema) CPULoad (Global Schema) CPULoad (Global Schema)
Country Site Facility Load Timestamp
UK RAL CDF 0.3 19055711022002
UK RAL ATLAS 1.6 19055611022002
UK GLA CDF 0.4 19055811022002
UK GLA ALICE 0.5 19055611022002
CH CERN ALICE 0.9 19055611022002
CH CERN CDF 0.6 19055511022002
CPULoad (Stream Producer 2) CPULoad (Stream Producer 2) CPULoad (Stream Producer 2) CPULoad (Stream Producer 2) CPULoad (Stream Producer 2)
UK GLA CDF 0.4 19055811022002
UK GLA ALICE 0.5 19055611022002
CPULoad (Stream Producer 1) CPULoad (Stream Producer 1) CPULoad (Stream Producer 1) CPULoad (Stream Producer 1) CPULoad (Stream Producer 1)
UK RAL CDF 0.3 19055711022002
UK RAL ATLAS 1.6 19055611022002
CPULoad (Stream Producer 3) CPULoad (Stream Producer 3) CPULoad (Stream Producer 3) CPULoad (Stream Producer 3) CPULoad (Stream Producer 3)
CH CERN ATLAS 1.6 19055611022002
CH CERN CDF 0.6 19055511022002
16Contributions are Views
CPULoad (Producer 1) CPULoad (Producer 1) CPULoad (Producer 1) CPULoad (Producer 1) CPULoad (Producer 1)
UK RAL CDF 0.3 19055711022002
UK RAL ATLAS 1.6 19055611022002
SELECT FROM cpuLoad WHERE country UK AND
site RAL
CPULoad (Producer 2) CPULoad (Producer 2) CPULoad (Producer 2) CPULoad (Producer 2) CPULoad (Producer 2)
UK GLA CDF 0.4 19055811022002
UK GLA ALICE 0.5 19055611022002
SELECT FROM cpuLoad WHERE country UK AND
site GLA
17Keys in the Global Schema
- Network throughput
-
- tp(src, dest, method, pcktSize, timestamp,
time) -
- Intuitively, tp has the primary key
- (src, dest, method, pcktSize, timestamp).
-
- We need to know the primary keys
- to understand the global schema
- to answer latest snapshot queries
- Primary keys are declared, but not enforced!
- Although, sometimes they hold globally if
they hold locally !
18Metaphor Roles and Agents
- R-GMA Clients Grid components or Grid
applications - Clients can play the roles of producers or
consumers - A client would need special capabilities for a
role - Clients are supported in their roles by agents
- Implementation
- APIs for client roles new
StreamProducer() - Agents are objects on a Web server
19Primary Producers
- Database producer
- supports queries over fixed set of tuples (static
queries) - can be used to publish a database
- Stream producer
- supports queries over changing set of tuples
(continuous queries) -
- supports latest snapshot queries
- offers up-to-date values for each primary key in
a db - Today DatabaseProducers and StreamProducers
in R-GMA are different from the
above!
20Communication Modes of Stream Producers
- Stream Producers may offer two communication
modes for continuous queries - lossless ( but tuples could become stale)
- lossy ( but tuples are fresh)
Today R-GMAs StreamProducers are resilient and
support lossless communication
21Republishers Publish Query Answers
- Archiver shows the history of a stream.
- Stream Republisher enables
- merging,
- thinning,
- summarising of streams
22Republishers in R-GMA Today
- Republishers are called archivers
(although some of them don't archive
anything) - An archiver ( republisher)
- is defined by a query
- consumes only from stream producers
- publishes the query result according to its
type, using - a stream producer, or
- a latest snapshot producer, or
- a database producer (which keeps an
archive) - Republishers are used to answer complex queries!
23The Next Step Hierarchies of Stream Republishers
24Republisher HierarchiesThe Issues
-
- Republishers are defined by querieshierarchies
have to be maintained automatically - new stream producers must only be added
to republishers at
lowest level - hierarchy has to be replanned if a republisher
fails - difficult transition from one plan to the other
without
loss of tuples - How well can we describe the content of a
stream?Possibly need for descriptions that join - stream relations CPULoad(machineID, load,
timestamp) - static relations locatedAt(machineID,
site) -
25What is the Meaning of a Query in R-GMA?
- Assumption the views of (primary) producers are
selections on a single relation, i.e., queries of
the form - SELECT
- FROM cpu_load
- WHERE machine_id AB123 AND loc hw
(each producer contributes its parts
of a relation) - The virtual database contains the union of
the data of all the primary producers - Conceptually, a query is evaluated
- over the entire virtual db
26Stream Queries can have Various Temporal
Interpretations
- Consider a query over the relation Transport
Time - tt(src, dest, pcktSize, method, timestamp, time)
- SELECT FROM tt
- WHERE src ral AND dest bologna
- What is meant? Measurements
- from now ? (Continuous Query)
- up until now ? (History Query)
- right now ? (Latest Snapshot Query)
- Today Queries can
be flagged with their type
27Advanced Queries Mixing Temporal Query Types
- Which connections have currently a
transportation time that is higher than last
week's average?
(latest snapshot and history) - Show me the cpu load of those machines where it
is lower than yesterday's load average! -
(continuous and history) - We do not intend to support such
queries by R-GMA!
28In R-GMA Query Answering Needs Mediation
- Suppose P1, P2 publish for tp (throughput)
- P1 WHERE src hw
- P2 WHERE src ral AND pcktSize gt 20
- A global consumer poses its query over global
relations - SELECT FROM tp WHERE pcktSize gt 10
-
- A mediator translates this into queries over
local relations - SELECT FROM P1.tp WHERE pcktSize gt 10
- UNION
- SELECT FROM P2.tp
- Today R-GMAs mediator handles simple queries
like the one above
29Global and Local Consumers
- Global consumers pose queries over global
relations - SELECT FROM tp WHERE pcktSize gt 10 ,
-
- which are translated into queries over local
relations - SELECT FROM P1.tp WHERE pcktSize gt 10
- UNION
- SELECT FROM P2.tp
- Local consumers pose queries over local
relations directly - SELECT FROM P1.tp WHERE method ping
- Today a consumer can be global or local,
- but local relations cannot be
referred to explicitly
30How does the Mediator Find Suitable Publishers?
- P1, P2, P3 publish for tt (Transport Time)
- P1 src hw
- P2 src ral AND pcktSize gt 20
- P3 src ral AND method ping
- Q SELECT FROM tt WHERE src ral AND method
ping - We see P1 is not suitable for Q, but P2 and P3
are. Why? - src hw AND src ral AND method ping
is never true - src ral AND pcktSize gt 20 AND
is sometimes true - Satisfiability
Test! - Today implemented
31 So Which Publishers Should the Mediator
Ask?
- P2 src ral AND pcktSize gt 20
- P3 src ral AND method ping
- Q SELECT FROM tt WHERE src ral AND
method ping - All answers to Q returned by P2 are also returned
by P3 - whenever
- src ral AND pcktSize gt 20 AND src ral AND
method ping - is true, then
- src ral AND method ping AND src ral AND
method ping - is true.
- Hence, R-GMA only needs to ask P3
- Entailment
Test! - Needed for Republisher Hierarchies!
(not yet implemented)
32 But What Did the Producers Promise?
- P registers view V
- Does P promise
- some of V ? (sound description)
- all of V? (sound and
complete description) - The Entailment Test only makes sense when the
registered views are sound and complete
descriptions - Producers should register completeness flags
33 Why May a Producer not be Complete?
- The language of views is more restricted than the
language of queriesHence republishers may be
unable to say exactly what they
publish - Archivers may archive in lossy mode
- Producers may lose tuples
- A producer may not know everything
about
the real world -
Open to debate
34Summary (1)
- Monitoring data come in Pools and Streams
- Global Schema
- primary keys
- Types of Stream Queries
- continuous vs. history vs. latest snapshot
- Producers
-
- DB producers publish database
- stream producers lossless vs.
lossy communication modes