Nondeterministic Queries in a Relational Grid Information Service - PowerPoint PPT Presentation

About This Presentation
Title:

Nondeterministic Queries in a Relational Grid Information Service

Description:

Nondeterministic Queries in a Relational Grid Information Service. Peter A. Dinda. Dong Lu ... D. Lu and P. Dinda, Synthesizing Realistic Computational Grids, SC 2003 ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 33
Provided by: csNorth
Category:

less

Transcript and Presenter's Notes

Title: Nondeterministic Queries in a Relational Grid Information Service


1
Nondeterministic Queries in a Relational Grid
Information Service
  • Peter A. Dinda
  • Dong Lu
  • Prescience Lab
  • Department of Computer Science
  • Northwestern University
  • http//plab.cs.northwestern.edu

2
Overview
  • RGIS GIS system based on the relational data
    model using SQL
  • Complex compositional queries can be posed
  • Find me 16 hosts on the same LAN that together
    have 32 GB of RAM
  • Can be very expensive to answer
  • Joins worst case O(nm) for m tables of size n
  • Introduce nondeterminism
  • User gets random sample of result set
  • Automated query transformation

3
Outline
  • Overview
  • Model
  • Implementation
  • Nondeterministic queries
  • Performance evaluation
  • Related work
  • Conclusions

D. Lu and P. Dinda, Synthesizing Realistic
Computational Grids, SC 2003 D. Lu, J. Skicewicz,
and P. Dinda, Scoped and Approximate Queries in a
Relational Grid Information Service, Grid 2003
4
RGIS Model of a Grid
module
  • Annotated network topology graph
  • Annotation examples
  • Hosts memory, disk, OS, NICs, etc.
  • Router/Switch backplane bandwidth, ports
  • Link latency and bandwidth
  • Highly dynamic data in streams, not DB
  • Virtualization, Futures, Leases
  • Virtual machines

Software
endpoint
router
iplink
host
Network
Data link
maclink
macswitch
Physical
connectorswitch
connectorlink
5
Outline
  • Overview
  • Model
  • Implementation
  • Nondeterministic queries
  • Performance evaluation
  • Related work
  • Conclusions

D. Lu and P. Dinda, Synthesizing Realistic
Computational Grids, SC 2003 D. Lu, J. Skicewicz,
and P. Dinda, Scoped and Approximate Queries in a
Relational Grid Information Service, Grid 2003
6
Software
Metadata
Network
Types
Data Link
Security
Physical
7
(No Transcript)
8
RGIS Design(Per Site)
9
RGIS Design (Intersite)
B
A
Update Push To Friend Site
RGIS Server
RGIS Server
Update Push To Friend Site
  • Site RGIS server pushes local updates to friend
    sites
  • Site RGIS server consolidates updates from site
    and friend sites
  • Site RGIS server answers all queries originating
    from its site

C
RGIS Server
10
Insert/Update/Delete
Dual Xeon 1 GHz, 2 GB, 8x36 GB RAID5, Oracle 9i
x
x
11
  • 2,700 lines of
  • authored SQL
  • 4,000 lines of
  • generated PL/SQL
  • 22,000 lines of
  • authored Perl
  • Main dependencies
  • DBI to Oracle 9i
  • SOAPLite
  • CGI
  • Not finished yet!

12
RGIS Design(Per Site)
This talk
13
Outline
  • Overview
  • Model
  • Implementation
  • Nondeterministic queries
  • Performance evaluation
  • Related work
  • Conclusions

D. Lu and P. Dinda, Synthesizing Realistic
Computational Grids, SC 2003 D. Lu, J. Skicewicz,
and P. Dinda, Scoped and Approximate Queries in a
Relational Grid Information Service, Grid 2003
14
Motivation
  • Queries for compositions of resources easily
    expressed in SQL
  • But such queries can be very expensive to execute
  • However, we typically dont need the entire
    result set, just some rows, and not always the
    same ones
  • And we need them in a bounded amount of time

select h1.insertid, h2.insertid from hosts
h1, hosts h2 where h1.osLINUX and
h2.osLINUX and h1.mem_mbh2.mem_mbgt3072
Find 2 hosts with Linux that together have 3 GB
of RAM
15
Why Not Just Limit?
  • Oracle rownum, MySQL limit clause
  • Return first k rows of result set
  • Problem Always get the SAME answer
  • Problem May STILL take a long time
  • Results not discovered until near the end
  • Problem Query time related to DATA as well as k

16
Query Approaches
Nondeterministic results (this paper)
All results
Available in Grid 2003 Paper
Approximate results
Scoped results
Return Random Sample of Result Set
17
Nondeterministic Version of Query
select nondeterministically h1.insertid,
h2.insertid from hosts h1, hosts h2 where
h1.osLINUX and h2.osLINUX and
h1.mem_mbh2.mem_mbgt3072 within 2 seconds
18
Implementing non-deterministic queries
select nondeterministically h1.insertid,
h2.insertid from hosts h1, hosts h2 where
h1.osLINUX and h2.osLINUX and
h1.mem_mbh2.mem_mbgt3072 within 2 seconds
Using Oracle-Specific Extensions
SELECT H1.INSERTID, H2.INSERTID FROM
HOSTS H1 SAMPLE(P), HOSTS H2 SAMPLE(P) WHERE
(H1.OS'LINUX' AND H2.OS'LINUX' AND
H1.MEM_MBH2.MEM_MBgt3072)
Query Manager and Rewriter
Random sample ofinput tables withSelection
Probability Pdetermined by time constraintand
server load
19
Implementing non-deterministic queries
select nondeterministically h1.insertid,
h2.insertid from hosts h1, hosts h2 where
h1.osLINUX and h2.osLINUX and
h1.mem_mbh2.mem_mbgt3072 within 2 seconds
Using Our Schema (Not Oracle-Specific) Rest of
Talk
SELECT H1.INSERTID, H2.INSERTID FROM
HOSTS H1, HOSTS H2 , INSERTIDS TEMP_H1 ,
INSERTIDS TEMP_H2 WHERE (H1.OS'LINUX' AND
H2.OS'LINUX' AND H1.MEM_MBH2.MEM_MBgt3072)
AND (H1.INSERTIDTEMP_H1.INSERTID AND
TEMP_H1.rand gt 982663452.975047 AND
TEMP_H1.rand lt 1025613125.93505) AND
(H2.INSERTIDTEMP_H2.INSERTID AND
TEMP_H2.rand gt 1877769069.94039 AND
TEMP_H2.rand lt 1920718742.90039)
Query Manager and Rewriter
Random sample ofinput tables withSelection
Probability Pdetermined by time constraintand
server load
20
Implementing non-deterministic queries
Host
insertid
random_number
0
N
x
xy
Random Starting Point
yPN
Reshuffling Requirement
21
Deadlines
  • Hard-limiting
  • Time-limited thread or process forked
  • Climbing
  • Start with low probability p, issue query, if no
    results, double probability, try again, keep
    going until no more time or have results
  • Estimation
  • Like climbing, but do polynomial estimation over
    previous runs to estimate if next run will exceed
    deadline

22
Outline
  • Overview
  • Model
  • Implementation
  • Nondeterministic queries
  • Performance evaluation
  • Related work
  • Conclusions

D. Lu and P. Dinda, Synthesizing Realistic
Computational Grids, SC 2003 D. Lu, J. Skicewicz,
and P. Dinda, Scoped and Approximate Queries in a
Relational Grid Information Service, Grid 2003
23
GridG Synthesing Realistic Computational Grids
  • Generates a Grid as an annotated layer 3 topology
  • Hosts, routers, links
  • Graph conforms to power laws of Internet topology
  • Annotations include
  • memory, clock speed, cpu type, number of CPUs,
    operating system type, link bandwidths, router
    bandwidths, etc.
  • Memory distribution according to Smith study of
    MDS contents

http//www.cs.northwestern.edu/urgis/GridG
24
Test Grids
Grid Size (Hosts) Query
50,000 Find n hosts with 3 GB of memory
500,000 Find n hosts with 3 GB of memory
5,000,000 Find n hosts with 3 GB of memory
10,000 Find 2 close hosts
50,000 Find 2 close hosts
100,000 Find 2 close hosts
25
Nondeterministic query performance
Select two hosts that together have gt3GB of RAM
Meaningful tradeoff between query processing time
and result set size is possible
26
Nondeterministic query performance
Select n hosts that together have gt3GB of RAM,
holding query time constant
Can use tradeoff to controlquery time
independent of query complexity
27
Deadlines
Max
Min
Find 2 hosts with collective 600 GB RAM (VERY
RARE)in 50K host grid
28
Extending RGIS to Support Grid Computing On
Virtual Machines
  • Virtuals
  • Each RGIS object has a unique id
  • Virtualization table associates unique id of
    virtual resources with unique ids of their
    constituent physical resources
  • Virtual nature of resource is hidden unless query
    explicitly requests it
  • Futures
  • An RGIS object that does not exist yet
  • Futures table of unique ids
  • Future nature of resource hidden unless query
    explicitly requests it

29
Related Work
  • SLP, X.500, LDAP
  • Condor ClassAds
  • MDS
  • R-GMA
  • Redline
  • Random sampling from databases
  • Olsen, others

30
Conclusions
  • GIS system based on relational data model
  • Powerful queries, but expensive to execute
  • Nondeterminism to control query time
  • Can be implemented without RDMBS support
  • Automated query translation in RGIS
  • Several techniques to implement deadlines for
    queries

31
People and Acknowledgements
  • Students
  • Jason Skicewicz, Andrew Weinrich (Web Soap),
    Jack Lange (CDN)
  • Collaborator
  • Relational Grid Resources Project at Indiana
  • Beth Plale
  • http//www.cs.indiana.edu/plale/projects/RGR
  • Funder
  • NSF

32
For MoreInformation
  • URGIS Site
  • http//www.cs.northwestern.edu/urgis
  • Prescience Lab
  • http//plab.cs.northwestern.edu
  • Join The User Comfort Study!
  • http//comfort.cs.northwestern.edu

Special Advertising Section
Write a Comment
User Comments (0)
About PowerShow.com