Title: Nondeterministic Queries in a Relational Grid Information Service
1Nondeterministic Queries in a Relational Grid
Information Service
- Peter A. Dinda
- Dong Lu
- Prescience Lab
- Department of Computer Science
- Northwestern University
- http//plab.cs.northwestern.edu
2Overview
- RGIS GIS system based on the relational data
model using SQL - Complex compositional queries can be posed
- Find me 16 hosts on the same LAN that together
have 32 GB of RAM - Can be very expensive to answer
- Joins worst case O(nm) for m tables of size n
- Introduce nondeterminism
- User gets random sample of result set
- Automated query transformation
3Outline
- Overview
- Model
- Implementation
- Nondeterministic queries
- Performance evaluation
- Related work
- Conclusions
D. Lu and P. Dinda, Synthesizing Realistic
Computational Grids, SC 2003 D. Lu, J. Skicewicz,
and P. Dinda, Scoped and Approximate Queries in a
Relational Grid Information Service, Grid 2003
4RGIS Model of a Grid
module
- Annotated network topology graph
- Annotation examples
- Hosts memory, disk, OS, NICs, etc.
- Router/Switch backplane bandwidth, ports
- Link latency and bandwidth
- Highly dynamic data in streams, not DB
- Virtualization, Futures, Leases
- Virtual machines
Software
endpoint
router
iplink
host
Network
Data link
maclink
macswitch
Physical
connectorswitch
connectorlink
5Outline
- Overview
- Model
- Implementation
- Nondeterministic queries
- Performance evaluation
- Related work
- Conclusions
D. Lu and P. Dinda, Synthesizing Realistic
Computational Grids, SC 2003 D. Lu, J. Skicewicz,
and P. Dinda, Scoped and Approximate Queries in a
Relational Grid Information Service, Grid 2003
6Software
Metadata
Network
Types
Data Link
Security
Physical
7(No Transcript)
8RGIS Design(Per Site)
9RGIS Design (Intersite)
B
A
Update Push To Friend Site
RGIS Server
RGIS Server
Update Push To Friend Site
- Site RGIS server pushes local updates to friend
sites - Site RGIS server consolidates updates from site
and friend sites - Site RGIS server answers all queries originating
from its site
C
RGIS Server
10Insert/Update/Delete
Dual Xeon 1 GHz, 2 GB, 8x36 GB RAID5, Oracle 9i
x
x
11- 2,700 lines of
- authored SQL
- 4,000 lines of
- generated PL/SQL
- 22,000 lines of
- authored Perl
- Main dependencies
- DBI to Oracle 9i
- SOAPLite
- CGI
- Not finished yet!
12RGIS Design(Per Site)
This talk
13Outline
- Overview
- Model
- Implementation
- Nondeterministic queries
- Performance evaluation
- Related work
- Conclusions
D. Lu and P. Dinda, Synthesizing Realistic
Computational Grids, SC 2003 D. Lu, J. Skicewicz,
and P. Dinda, Scoped and Approximate Queries in a
Relational Grid Information Service, Grid 2003
14Motivation
- Queries for compositions of resources easily
expressed in SQL - But such queries can be very expensive to execute
- However, we typically dont need the entire
result set, just some rows, and not always the
same ones - And we need them in a bounded amount of time
select h1.insertid, h2.insertid from hosts
h1, hosts h2 where h1.osLINUX and
h2.osLINUX and h1.mem_mbh2.mem_mbgt3072
Find 2 hosts with Linux that together have 3 GB
of RAM
15Why Not Just Limit?
- Oracle rownum, MySQL limit clause
- Return first k rows of result set
- Problem Always get the SAME answer
- Problem May STILL take a long time
- Results not discovered until near the end
- Problem Query time related to DATA as well as k
16Query Approaches
Nondeterministic results (this paper)
All results
Available in Grid 2003 Paper
Approximate results
Scoped results
Return Random Sample of Result Set
17Nondeterministic Version of Query
select nondeterministically h1.insertid,
h2.insertid from hosts h1, hosts h2 where
h1.osLINUX and h2.osLINUX and
h1.mem_mbh2.mem_mbgt3072 within 2 seconds
18Implementing non-deterministic queries
select nondeterministically h1.insertid,
h2.insertid from hosts h1, hosts h2 where
h1.osLINUX and h2.osLINUX and
h1.mem_mbh2.mem_mbgt3072 within 2 seconds
Using Oracle-Specific Extensions
SELECT H1.INSERTID, H2.INSERTID FROM
HOSTS H1 SAMPLE(P), HOSTS H2 SAMPLE(P) WHERE
(H1.OS'LINUX' AND H2.OS'LINUX' AND
H1.MEM_MBH2.MEM_MBgt3072)
Query Manager and Rewriter
Random sample ofinput tables withSelection
Probability Pdetermined by time constraintand
server load
19Implementing non-deterministic queries
select nondeterministically h1.insertid,
h2.insertid from hosts h1, hosts h2 where
h1.osLINUX and h2.osLINUX and
h1.mem_mbh2.mem_mbgt3072 within 2 seconds
Using Our Schema (Not Oracle-Specific) Rest of
Talk
SELECT H1.INSERTID, H2.INSERTID FROM
HOSTS H1, HOSTS H2 , INSERTIDS TEMP_H1 ,
INSERTIDS TEMP_H2 WHERE (H1.OS'LINUX' AND
H2.OS'LINUX' AND H1.MEM_MBH2.MEM_MBgt3072)
AND (H1.INSERTIDTEMP_H1.INSERTID AND
TEMP_H1.rand gt 982663452.975047 AND
TEMP_H1.rand lt 1025613125.93505) AND
(H2.INSERTIDTEMP_H2.INSERTID AND
TEMP_H2.rand gt 1877769069.94039 AND
TEMP_H2.rand lt 1920718742.90039)
Query Manager and Rewriter
Random sample ofinput tables withSelection
Probability Pdetermined by time constraintand
server load
20Implementing non-deterministic queries
Host
insertid
random_number
0
N
x
xy
Random Starting Point
yPN
Reshuffling Requirement
21Deadlines
- Hard-limiting
- Time-limited thread or process forked
- Climbing
- Start with low probability p, issue query, if no
results, double probability, try again, keep
going until no more time or have results - Estimation
- Like climbing, but do polynomial estimation over
previous runs to estimate if next run will exceed
deadline
22Outline
- Overview
- Model
- Implementation
- Nondeterministic queries
- Performance evaluation
- Related work
- Conclusions
D. Lu and P. Dinda, Synthesizing Realistic
Computational Grids, SC 2003 D. Lu, J. Skicewicz,
and P. Dinda, Scoped and Approximate Queries in a
Relational Grid Information Service, Grid 2003
23GridG Synthesing Realistic Computational Grids
- Generates a Grid as an annotated layer 3 topology
- Hosts, routers, links
- Graph conforms to power laws of Internet topology
- Annotations include
- memory, clock speed, cpu type, number of CPUs,
operating system type, link bandwidths, router
bandwidths, etc. - Memory distribution according to Smith study of
MDS contents
http//www.cs.northwestern.edu/urgis/GridG
24Test Grids
Grid Size (Hosts) Query
50,000 Find n hosts with 3 GB of memory
500,000 Find n hosts with 3 GB of memory
5,000,000 Find n hosts with 3 GB of memory
10,000 Find 2 close hosts
50,000 Find 2 close hosts
100,000 Find 2 close hosts
25Nondeterministic query performance
Select two hosts that together have gt3GB of RAM
Meaningful tradeoff between query processing time
and result set size is possible
26Nondeterministic query performance
Select n hosts that together have gt3GB of RAM,
holding query time constant
Can use tradeoff to controlquery time
independent of query complexity
27Deadlines
Max
Min
Find 2 hosts with collective 600 GB RAM (VERY
RARE)in 50K host grid
28Extending RGIS to Support Grid Computing On
Virtual Machines
- Virtuals
- Each RGIS object has a unique id
- Virtualization table associates unique id of
virtual resources with unique ids of their
constituent physical resources - Virtual nature of resource is hidden unless query
explicitly requests it - Futures
- An RGIS object that does not exist yet
- Futures table of unique ids
- Future nature of resource hidden unless query
explicitly requests it
29Related Work
- SLP, X.500, LDAP
- Condor ClassAds
- MDS
- R-GMA
- Redline
- Random sampling from databases
- Olsen, others
30Conclusions
- GIS system based on relational data model
- Powerful queries, but expensive to execute
- Nondeterminism to control query time
- Can be implemented without RDMBS support
- Automated query translation in RGIS
- Several techniques to implement deadlines for
queries
31People and Acknowledgements
- Students
- Jason Skicewicz, Andrew Weinrich (Web Soap),
Jack Lange (CDN) - Collaborator
- Relational Grid Resources Project at Indiana
- Beth Plale
- http//www.cs.indiana.edu/plale/projects/RGR
- Funder
- NSF
32For MoreInformation
- URGIS Site
- http//www.cs.northwestern.edu/urgis
- Prescience Lab
- http//plab.cs.northwestern.edu
- Join The User Comfort Study!
- http//comfort.cs.northwestern.edu
Special Advertising Section