Scoped and Approximate Queries in a Relational Grid Information Service

About This Presentation

Title:

Scoped and Approximate Queries in a Relational Grid Information Service

Description:

Approximate Cluster Finder ... size of the cluster being searched for. Thus ... Cluster Finder : Find N hosts, each running Linux, with total memory at least N ... –

Number of Views:24

Avg rating:3.0/5.0

Slides: 38

Provided by: don53

Learn more at: https://users.cs.northwestern.edu

Category:

more less

Transcript and Presenter's Notes

Title: Scoped and Approximate Queries in a Relational Grid Information Service

1
Scoped and Approximate Queries in a Relational
Grid Information Service

Dong Lu , Peter A. Dinda , Jason A. Skicewicz
Prescience Lab, Dept. of Computer Science
Northwestern University, Evanston, IL 60201

2
Outline

Introduction and motivation
Powerful queries, but expensive to execute
Trade off between result size and query time
Our solutions Scoped query, Approximate query,
Scoped Approximate query
Nondeterministic query (SC Talk on Tuesday)
Performance Evaluation

3
What is RGIS?

GIS A Grid Information Service stores
information about the resources and services in a
distributed computing environment and answer
queries about it.
RGIS Grid Information Service based on
relational data model.

4
Why RGIS?

RGIS can answer complex compositional queries
Relational algebra (SQL)
Joins
Difficult in a hierarchical model (directory
service)
Other reasons
Indexes separate from data model
Schema evoluation
Transactional insert/update/delete
Consistency

5
RGIS Model of a Grid
module

Annotated network topology graph
Annotation examples
Hosts memory, disk, OS, NICs, etc.
Router/Switch backplane bandwidth, ports
Link latency and bandwidth
Highly dynamic data in streams, not DB
Virtualization, Futures, Leases
Virtual machines

Software
endpoint
router
iplink
host
Network
Data link
maclink
macswitch
Physical
connectorswitch
connectorlink
6
The RGIS Design (Per Site)
7
Challenge/Trade off

Complex queries to a relational database can take
a long time,
Hours, days or even weeks when we want seconds.
Typically, returned result set is unnecessarily
big.
Get back all results
We need mechanisms to trade off the query time
with the size of result set.

8
Challenge/Trade off
All results
Approximate results
Nondeterministic results
Scoped results
9
Example Cluster Finder
Find N hosts connected to the same router, with
total memory N512 MB, all running Linux, and
the bisection bandwidth of The cluster is no
less than 100Mbits/sec.
10
Original SQL for 2 Host Cluster Finder
SELECT scoped-approx h1.distip, h2.distip
FROM hosts h1, hosts h2, iplinks l1, iplinks
l2, routers r WHERE h1.mem_mbh2.mem_mbgt1024
and h1.os'linux' and h2.os'linux' and
((l1.srcr.distip and l2.srcr.distip
and l1.desth1.distip and l2.desth2.distip)
or (l1.destr.distip and l2.destr.distip
and l1.srch1.distip and l2.srch2.distip))
and h1.distipltgth2.distip and L1.BW_MBS gt
100 AND L2.BW_MBS gt 100 SCOPED BY
r.distipX WITHIN 100 seconds
Original
11
Original SQL for Cluster Finder

It is 2N1 way join to look for a N node
cluster. Not scalable.

Routers
IP links
Hosts
Cluster 1
Cluster 2
12
Scoped Cluster Finder
Routers
IP links

Query the hosts
around a random
router.

Hosts
13
Scoped Cluster Finder
14
Approximate Cluster Finder

When searching for N hosts with total memory
N512, we can approximate the query with search
for N hosts with each having memory over 512.
Thus reduced or avoided the number of joins.
However, this wont find, say, N/2 hosts with 256
MB and N/2 hosts with 768 MB

15
Approximate Cluster Finder
SELECT R.DISTIP, H1.DISTIP FROM HOSTS
H1, IPLINKS L1, ROUTERS R WHERE
H1.MEM_MBgt512 AND H1.OS'LINUX' AND
L1.BW_MBS gt 100 AND ((L1.SRCR.DISTIP AND
L1.DESTH1.DISTIP) OR (L1.DEST
R.DISTIP AND L1.SRCH1.DISTIP)) AND R.DISTIP
IN (SELECT R.DISTIP FROM HOSTS
H1, IPLINKS L1, ROUTERS R WHERE
H1.MEM_MBgt512 AND H1.OS'LINUX' AND
L1.BW_MBSgt100 AND ((L1.SRCR.DISTIP
AND L1.DESTH1.DISTIP) OR (L1.DEST
R.DISTIP AND L1.SRCH1.DISTIP)) GROUP
BY R.DISTIP HAVING COUNT() gt
2) ORDER BY R.DISTIP
16
Scoped Approximate Cluster Finder

Combine approximate query with scoped query.
Scoped to one randomly chosen router at a time,
if no results found, choose another random router
and repeat the query.
Approximate N host join for 512N memory with
searches for N hosts each with gt512.
Always a THREE way join.
regardless of the size of the cluster being
searched for. Thus very scalable.
may need to search multiple routers.

17
Scoped Approximate Cluster Finder
The scoped approximate cluster finder has a fixed
number of joins.
18
Time bounded queries

The query rewriter will start the query as a
child process.
Parent kills the child process if no results
returned within deadline.

19
Limitations of Scoped and Approximate queries

The returned results are subset of original
query, and it is possible to report no results
while the original query could return results
after running a long time.
Not all queries can be written as Scoped or
Approximate queries.
It is hard to automate the Scoped and Approximate
query rewriting.

20
Performance Evaluation

Need to populate the database with large amount
of data.
Computational grids are still in early stages.
No large data sets available.
Use Smith MDS data for memory
We generate synthetic grids that are
representative of the Internet.
Can generate very large grids

21
GridG Generated Synthetic Grids

Three-level network WAN, MAN, LAN. Nodes on WAN,
MAN are routers, while nodes on LAN are hosts.
Links IP links annotated with bandwidth and
latency.
Hosts annotated with memory size, architecture,
number of processors, CPU clock rate, disk size,
etc.
User can control all the distributions and the
size of network.

22
GridG Synthesing Realistic Computational Grids
SC talk on Tuesday!
http//www.cs.northwestern.edu/urgis/GridG
23
Experimental Setup

Dell PowerEdge 4400 dual Xeon 1 GHz processors,
2 GB memory, 240 GB RAID 5 storage system.
Oracle 9i Enterprise edition, red hat Linux 7.1.
Each test is repeated either 25 or 100 times, and
we provide the average value.

24
Performance of various Query Technique with
Cluster Finder

Cluster size Standard Scoped Approx
Scoped Approx
2 21.44 2.27
7.62 1.16
4 gt7200 2047.9 7.48
1.32
8 gt9000 gt3600 7.46
1.43
16 N/A gt3600 7.51
1.45
32 N/A gt3600 7.65
5.96
64 N/A gt3600 gt120
9.58

(Time to run query in Seconds)
25
Performance of Scoped Approximate Queries

Cluster Finder Find N hosts, each running
Linux, with total memory at least N512 MB, all
connected to the same router, the bisection width
is at least 100Mbits.
Our running example
Non network query Find N hosts with total
memory at least N512 MB.
No joins needed at all

26
Performance of Scoped Approximate Queries (2)

Scalability with database size.
Scalability with the complexity of queries.
Scalability with concurrent users and update load.

27
Performance of Scoped Approximate Query (9.8K
hosts, Cluster Finder)
28
Performance of Scoped Approximate Query (101K
hosts , Cluster Finder)
29
Performance of Scoped Approximate Query (980K
hosts , Cluster Finder)
30
Performance of Scoped Approximate Query (9.8K
hosts, Non-network query)
31
Performance of Scoped Approximate Query (101K
hosts , Non-network query)
32
Performance of Scoped Approximate Query (980K
hosts , Non-network query)
33
Scalability with multiple concurrent users and
background load

Other research has shown that GIS servers will
undertake frequent updating while serving the
requests.
GIS servers serve multiple concurrent users.
Evaluate scoped approximate queries with
concurrent users and update load.
Concurrent users execute queries repeatedly
The update load execute transactional updates on
randomly selected hosts as fast as possible.
About 200 updates/second

34
Performance of Scoped Approximate Query (9.8K
hosts , Cluster Finder, with Concurrent Users,
looking for 64 nodes)
35
Performance of Scoped Approximate Query (9.8K
hosts , Non network query, with Concurrent Users,
looking for 64 nodes)
36
Conclusions

Described and evaluated two query techniques to
trade off query time with the size of result set
Scoped and Approximate query.
Combination of Scoped and Approximate query can
dramatically reduce response time and server load.

37
For more information

GridG and Related paper http//www.cs.northwester
n.edu/urgis/GridG
Synthesizing Realistic Computational Grids,
In proceedings of SC03.
RGIS and Related paper http//www.cs.northwestern
.edu/urgis/
Nondeterministic Queries in a Relational Grid
Information Service, In proceedings of SC03.

Write a Comment

User Comments (0)