Scoped and Approximate Queries in a Relational Grid Information Service

About This Presentation
Title:

Scoped and Approximate Queries in a Relational Grid Information Service

Description:

Approximate Cluster Finder ... size of the cluster being searched for. Thus ... Cluster Finder : Find N hosts, each running Linux, with total memory at least N ... –

Number of Views:24
Avg rating:3.0/5.0
Slides: 38
Provided by: don53
Category:

less

Transcript and Presenter's Notes

Title: Scoped and Approximate Queries in a Relational Grid Information Service


1
Scoped and Approximate Queries in a Relational
Grid Information Service
  • Dong Lu , Peter A. Dinda , Jason A. Skicewicz
  • Prescience Lab, Dept. of Computer Science
  • Northwestern University, Evanston, IL 60201

2
Outline
  • Introduction and motivation
  • Powerful queries, but expensive to execute
  • Trade off between result size and query time
  • Our solutions Scoped query, Approximate query,
    Scoped Approximate query
  • Nondeterministic query (SC Talk on Tuesday)
  • Performance Evaluation

3
What is RGIS?
  • GIS A Grid Information Service stores
    information about the resources and services in a
    distributed computing environment and answer
    queries about it.
  • RGIS Grid Information Service based on
    relational data model.

4
Why RGIS?
  • RGIS can answer complex compositional queries
  • Relational algebra (SQL)
  • Joins
  • Difficult in a hierarchical model (directory
    service)
  • Other reasons
  • Indexes separate from data model
  • Schema evoluation
  • Transactional insert/update/delete
  • Consistency

5
RGIS Model of a Grid
module
  • Annotated network topology graph
  • Annotation examples
  • Hosts memory, disk, OS, NICs, etc.
  • Router/Switch backplane bandwidth, ports
  • Link latency and bandwidth
  • Highly dynamic data in streams, not DB
  • Virtualization, Futures, Leases
  • Virtual machines

Software
endpoint
router
iplink
host
Network
Data link
maclink
macswitch
Physical
connectorswitch
connectorlink
6
The RGIS Design (Per Site)
7
Challenge/Trade off
  • Complex queries to a relational database can take
    a long time,
  • Hours, days or even weeks when we want seconds.
  • Typically, returned result set is unnecessarily
    big.
  • Get back all results
  • We need mechanisms to trade off the query time
    with the size of result set.

8
Challenge/Trade off
All results
Approximate results
Nondeterministic results
Scoped results
9
Example Cluster Finder
Find N hosts connected to the same router, with
total memory N512 MB, all running Linux, and
the bisection bandwidth of The cluster is no
less than 100Mbits/sec.
10
Original SQL for 2 Host Cluster Finder
SELECT scoped-approx h1.distip, h2.distip
FROM hosts h1, hosts h2, iplinks l1, iplinks
l2, routers r WHERE h1.mem_mbh2.mem_mbgt1024
and h1.os'linux' and h2.os'linux' and
((l1.srcr.distip and l2.srcr.distip
and l1.desth1.distip and l2.desth2.distip)
or (l1.destr.distip and l2.destr.distip
and l1.srch1.distip and l2.srch2.distip))
and h1.distipltgth2.distip and L1.BW_MBS gt
100 AND L2.BW_MBS gt 100 SCOPED BY
r.distipX WITHIN 100 seconds
Original
11
Original SQL for Cluster Finder
  • It is 2N1 way join to look for a N node
    cluster. Not scalable.

Routers
IP links
Hosts
Cluster 1
Cluster 2
12
Scoped Cluster Finder
Routers
IP links
  • Query the hosts
  • around a random
  • router.

Hosts
13
Scoped Cluster Finder
14
Approximate Cluster Finder
  • When searching for N hosts with total memory
    N512, we can approximate the query with search
    for N hosts with each having memory over 512.
  • Thus reduced or avoided the number of joins.
  • However, this wont find, say, N/2 hosts with 256
    MB and N/2 hosts with 768 MB

15
Approximate Cluster Finder
SELECT R.DISTIP, H1.DISTIP FROM HOSTS
H1, IPLINKS L1, ROUTERS R WHERE
H1.MEM_MBgt512 AND H1.OS'LINUX' AND
L1.BW_MBS gt 100 AND ((L1.SRCR.DISTIP AND
L1.DESTH1.DISTIP) OR (L1.DEST
R.DISTIP AND L1.SRCH1.DISTIP)) AND R.DISTIP
IN (SELECT R.DISTIP FROM HOSTS
H1, IPLINKS L1, ROUTERS R WHERE
H1.MEM_MBgt512 AND H1.OS'LINUX' AND
L1.BW_MBSgt100 AND ((L1.SRCR.DISTIP
AND L1.DESTH1.DISTIP) OR (L1.DEST
R.DISTIP AND L1.SRCH1.DISTIP)) GROUP
BY R.DISTIP HAVING COUNT() gt
2) ORDER BY R.DISTIP
16
Scoped Approximate Cluster Finder
  • Combine approximate query with scoped query.
  • Scoped to one randomly chosen router at a time,
    if no results found, choose another random router
    and repeat the query.
  • Approximate N host join for 512N memory with
    searches for N hosts each with gt512.
  • Always a THREE way join.
  • regardless of the size of the cluster being
    searched for. Thus very scalable.
  • may need to search multiple routers.

17
Scoped Approximate Cluster Finder
The scoped approximate cluster finder has a fixed
number of joins.
18
Time bounded queries
  • The query rewriter will start the query as a
    child process.
  • Parent kills the child process if no results
    returned within deadline.

19
Limitations of Scoped and Approximate queries
  • The returned results are subset of original
    query, and it is possible to report no results
    while the original query could return results
    after running a long time.
  • Not all queries can be written as Scoped or
    Approximate queries.
  • It is hard to automate the Scoped and Approximate
    query rewriting.

20
Performance Evaluation
  • Need to populate the database with large amount
    of data.
  • Computational grids are still in early stages.
  • No large data sets available.
  • Use Smith MDS data for memory
  • We generate synthetic grids that are
    representative of the Internet.
  • Can generate very large grids

21
GridG Generated Synthetic Grids
  • Three-level network WAN, MAN, LAN. Nodes on WAN,
    MAN are routers, while nodes on LAN are hosts.
  • Links IP links annotated with bandwidth and
    latency.
  • Hosts annotated with memory size, architecture,
    number of processors, CPU clock rate, disk size,
    etc.
  • User can control all the distributions and the
    size of network.

22
GridG Synthesing Realistic Computational Grids
SC talk on Tuesday!
http//www.cs.northwestern.edu/urgis/GridG
23
Experimental Setup
  • Dell PowerEdge 4400 dual Xeon 1 GHz processors,
    2 GB memory, 240 GB RAID 5 storage system.
  • Oracle 9i Enterprise edition, red hat Linux 7.1.
  • Each test is repeated either 25 or 100 times, and
    we provide the average value.

24
Performance of various Query Technique with
Cluster Finder
  • Cluster size Standard Scoped Approx
    Scoped Approx
  • 2 21.44 2.27
    7.62 1.16
  • 4 gt7200 2047.9 7.48
    1.32
  • 8 gt9000 gt3600 7.46
    1.43
  • 16 N/A gt3600 7.51
    1.45
  • 32 N/A gt3600 7.65
    5.96
  • 64 N/A gt3600 gt120
    9.58

(Time to run query in Seconds)
25
Performance of Scoped Approximate Queries
  • Cluster Finder Find N hosts, each running
    Linux, with total memory at least N512 MB, all
    connected to the same router, the bisection width
    is at least 100Mbits.
  • Our running example
  • Non network query Find N hosts with total
    memory at least N512 MB.
  • No joins needed at all

26
Performance of Scoped Approximate Queries (2)
  • Scalability with database size.
  • Scalability with the complexity of queries.
  • Scalability with concurrent users and update load.

27
Performance of Scoped Approximate Query (9.8K
hosts, Cluster Finder)
28
Performance of Scoped Approximate Query (101K
hosts , Cluster Finder)
29
Performance of Scoped Approximate Query (980K
hosts , Cluster Finder)
30
Performance of Scoped Approximate Query (9.8K
hosts, Non-network query)
31
Performance of Scoped Approximate Query (101K
hosts , Non-network query)
32
Performance of Scoped Approximate Query (980K
hosts , Non-network query)
33
Scalability with multiple concurrent users and
background load
  • Other research has shown that GIS servers will
    undertake frequent updating while serving the
    requests.
  • GIS servers serve multiple concurrent users.
  • Evaluate scoped approximate queries with
    concurrent users and update load.
  • Concurrent users execute queries repeatedly
  • The update load execute transactional updates on
    randomly selected hosts as fast as possible.
  • About 200 updates/second

34
Performance of Scoped Approximate Query (9.8K
hosts , Cluster Finder, with Concurrent Users,
looking for 64 nodes)
35
Performance of Scoped Approximate Query (9.8K
hosts , Non network query, with Concurrent Users,
looking for 64 nodes)
36
Conclusions
  • Described and evaluated two query techniques to
    trade off query time with the size of result set
    Scoped and Approximate query.
  • Combination of Scoped and Approximate query can
    dramatically reduce response time and server load.

37
For more information
  • GridG and Related paper http//www.cs.northwester
    n.edu/urgis/GridG
  • Synthesizing Realistic Computational Grids,
    In proceedings of SC03.
  • RGIS and Related paper http//www.cs.northwestern
    .edu/urgis/
  • Nondeterministic Queries in a Relational Grid
    Information Service, In proceedings of SC03.
Write a Comment
User Comments (0)
About PowerShow.com