Title: ObjectGlobe Open, Secure, and QoS-enhanced Distributed Query Processing
1ObjectGlobe Open, Secure, and QoS-enhanced
Distributed Query Processing
- Donald Kossmann
- Technical University of Munich
- http//www3.in.tum.de
- Joint work with Alfons Kemper (Passau) and others
2Outline
- Background
- The ObjectGlobe Lookup Service
- (Security Aspects)
- QoS Management
- Summary
3Query Processing on the Internet
- Web servers, relational databases on the Web
centralized or limited query capabilities - Middleware Systemsa great deal of data shipping
- Goals of ObjectGlobe
- integrate any kind of data
- integrate any kind of query processing
capabilities - bring query processing capabilities to the data
4Middleware for Query Processing
thumbnail
wrap_S
User-defined operators
thumbnail
wrap_S
Data-Provider A
Data-Provider B
Heavy data shipping
S
5Open Query Processing (Step 1)
thumbnail
wrap_S
Load functions
Data-Provider B
Data-Provider A
Fct-Provider
thumbnail
S
wrap_S
6Open Query Processing (Step 2)
thumbnail
wrap_S
Load functions
Data-Provider B
Data-Provider A
Fct-Provider
thumbnail
S
wrap_S
7 Traveling from M to UCB
Top N
Cycle Provider
Route
Selection
Selection
Routenplaner
Function Provider
flights
rental cars
Data Provider
Data Provider
8Open QP with ObjectGlobe
- Create an open marketplace for
- data providers
- cycle providers
- function providers
- Requirements
- wrappers exist for all data of data providers
- JVM runs on all cycle providers
- fixed interface for operators of function
providers
9Scenarios
- Free Internet everything is free and available
for everybody - Restricted Internet charge according to usage,
quality, and timeliness restrictions (e.g.,
age) - Intranet everything is free and available for
insiders - Outsourcing charge for certain services
(e.g., backup, business analyses)
10Challenges
- Lookup Service
- Find the relevant services
- Security
- Protect data and cycle providers from bad code
- Quality of Service
- What you pay is what you get
11ObjectGlobe Lookup-Service
Application /User
Provider
Browse, Search
Register
Lookup-Service
Authorisation, ...
Statistics, Cost Information, ...
Execution Engine
Parser
Optimizer
12Description of Services
- Providers register RDF or XML documents
- There is a pre-defined schema to describe
services - Data Providers
- Theme (e.g., Hotel)
- Attributes (e.g., rate, location, category)
- Access paths and wrappers
- Characteristics of the server (e.g.,
availability) - Information for authorization
- Statistics
- ...
13- Function Provider
- Signature (e.g., foo(int, int) -gt int)
- Information for authorization
- Hardware requirements (e.g., 30 MB main memory)
- Size of Java byte code
- ...
- Cycle Provider
- Hardware (e.g., 1 GB main memory)
- Location and network connections / bandwidth
- Information for authorization
- ...
14XML Description of a Data Provider
- ltDataProvidergt
- ltidgt 4711 lt/idgt
- ltthemegt
- ltnamegt Hotel lt/namegt
- ltdescgt All hotels you ever want lt/descgt
- lt/themegt
- ltAttributegt
- lttopicgt city lt/topicgt
- lttypegt string lt/typegt
- lt/Attributegt
- ...
15 Lookup Query
- Data Providers for Hotels that return the City
and Rate of each hotel - search DataProvider dselect d.uniqueId,
d.attr.where d.theme.name hotel and
d.attr.?.topic city and
d.attr.?.topic rate
16Three-tier Architecture
- Local Lookup-Servers
- Keep copies of meta-data of services that are
relevant for a particular organization or
subsidary - Evaluate Lookup requests for that organization
- Relevance is determined by subscription rules
(queries) - Public Lookup-Servers (Backbone)
- Store all (public) meta-data
- Store subscription rules of local Lookup-Servers
- Notify local Lookup-Servers of changes
- Users can browse in the public info of the
backbone
17Three-tier Architecture
Client
Client
Client
Client
Client
Queries Answers
New Rules
Local LS
Local LS
Local LS
New Rules Answers
Updates, Inserts
Public Lookup-Server
Public Lookup-Server
18- Processing Lookup Requests
- Local Lookup-Servers store meta-data in RDBMS
- Translate Lookup request into SQL
- Registering new services
- Public Lookup-Servers store meta-data in RDBMS
- Public Lookup-Servers store rules in RDBMS
- Apply filter algorithm using RDBMS in order to
find relevant local Lookup-Servers - Deletes and updates of services
- Apply filter algorithm to find affected local
Lookup-Servers (more complicated, however) - Principle Map everything to RDBMS
19Storing XML Data in an RDBMS
ltperson, id 4711gt ltnamegt Lilly Potter
lt/namegt ltchildgt ltperson, id 314gt
ltnamegt Harry Potter lt/namegt lt/childgt lt/persongt lt
person, id 666gt ltnamegt James Potter lt/namegt
ltchildgt 314 lt/childgt lt/persongt
0
person
person
4711
666
name
name
child
Lilly Potter
i314
James Potter
person
314
name
Harry Potter
20Edge Approach
Edge Table
Value Table (String)
Source Label Target
0 person 4711
0 person 666
4711 name v1
4711 child i314
666 name v2
666 child i314
Id Value
v1 Lilly Potter
v2 James Potter
v3 Harry Potter
Value Table (Integer)
Id Value
v4 12
21XML Queries
- Find the name of all persons that like to play
Quidditch and are younger than 18 yearsselect
nwhere ltpersongt ltnamegt n lt/namegt ltagegt
a lt/agegt lthobbygt Quidditch lt/hobbygt
lt/persongt, a lt 18 - Carry out pattern matching with document graph
22Translation to SQL
SELECT nv.value
FROM Edge
p, Edge n, Edge h, Value nv, Value hv WHERE
p.label person AND p.target
n.source AND n.label name
AND n.target nv.id AND
p.target h.source AND
h.label hobby AND
h.target hv.id AND hv.value
Quidditch
Works essentially in the same way for the query
language of our Lookup service.
23Publish Subscribe Algorithm
- Decompose subscription rules and store them in
RDMBS of Public Lookup-Servers - SQL Join-Queries in order to match sub-rules with
meta-data objects(Recall meta-data is
decomposed, too) - SQL Join-Queries in order to re-construct
matching subscription rules from sub-rules
24Decomposition of Subscription Rules
- Data Providers for Stock Market Information that
cost less than 500 Dollarssearch DataProvider
dwhere d.theme.name Stock Market and
d.cost lt 500 - Decomposition into three atomic rulesR1 search
Theme t where t.name BörseR2 search
DataProvider d where d.cost lt 500R3 search R1
a, R2 b where b.theme a - Store these rules in RDBMS
Rule Class Operator Attribute Value
R1 Theme name Stock Mkt
R2 DataProv. lt cost 500
25Matching
Rule Class Operator Attribute Value
R1 Theme name Stock Mkt
R2 DataProv. lt cost 500
Object Type Attribute Value
O1 Theme name Stock Mkt
O1 Theme description SE InfoSys
O2 DataProv. theme O1
O2 DataProv. attr O3 (kurs)
O2 DataProv. attr O4 (wkn)
O2 DataProv. cost 70
Result of Join (R1, O1) (R2, O2)
26Re-constructing Subscription Rulesfrom matching
atomic sub-rules
- Store decomposition graph in RDMBS
- higher-level and atomic rules are vertices
- Top-level rules are so-called triggering
rulesif they are affected, notify LLS - Walk bottom up through decomposition graph
- SQL-Join Query for each pair of matching rules,
find out whether they have a common parent - N.B. the decomposition graph is a binary
directed, acyclic graph
27Preliminary Experiments
- Synthetic benchmark database with 100.000
(different) subscription rules - Oracle 8i used in the Public Lookup Server
new providers Proc. Time (PLS)
1 250 msecs
100 (batch) 5000 msecs
Batch updates are crucial
28Summary
- Basic Principle decompose rules and data
- Advantages
- Generic, independent of schema
- Very easy to implement, no administration needed
- Exploit query capabilities of RDBMS
- Need not worry about document boundaries
- Finding common sub-rules is trivial
- Disadvantage
- Sub-optimal query performance (many Joins)but
probably sufficient, if updates are batched
29Related Work
- Lookup Services Jini, UDDI, Plug Play
- Publish Subscribe
- IR world
- SIFT (Stanford)
- XFilter (Berkeley)
- LeSelect (INRIA)
- Continuous Queries (Niagra, ...)
- Storing and Indexing XML Data ...
30Outline
- Background
- The ObjectGlobe Lookup Service
- (Security Aspects)
- QoS Management
- Summary
31Security Requirements in ObjectGlobe
- Protection of Data and Cycle Providers
- Secure Communication
- use SSL connections (authenticated and encrypted)
- Authentication of Clients
- passwords / certificates
- digitally signed requests (query subplans)
- Authorization control
- data/cycle providers are autonomous
- but register user privileges in lookup service
32Security of Data/Cycle Providers
Secure sandbox
Class loader
Internal class loader
Query 1
ObjectGlobe runtime system
Class loader
Query 2
Internet
Class loader
Query 3
33Privileged Built-inOperatorsfor Disk or Network
Access
Internal operator
sandbox
external operator
tmpfile
34Other Potential Risks
- Malicious cycle providers
- security classification of cycle providers
- authorization for data- and function code-flow
- providers analyze the safety of the entire QEP
- Denial of Service Attack
- monitor resource consumption of external
operators - quality assertion of external operators
- authenticate external operators
35QoS Management
- State of the Art best-effort
- Goal users should be able to constrain
- Cost of execution
- Running time
- Quality of the results
- Initial approach (to get a feeling)
- extended query optimization
- Admission control
- Monitoring and plan adaptions at execution time
- Real solution ???
36Quality Parameters
- Cost of execution
-
- Running time
- First tuple, last tuple, Nth tuple
- Quality of the results
- Number of results
- Coverage Number (or ) of data sources queried
- Staleness of data
- Cost as a function of coverage (-gt Mariposa)
- Cost as a function of wheels (Mercedes)
37Quality of Service-Parameters
Response time
Desired space for query plans
Cost ()
max
max
Completeness
min
38Extended Query Optimization
- Bottom-up dynamic programming query optimizer,
- standard costing etc., and the following
extensions - Generate alternatives for each operator
- Consider classes of equivalent providers
- Extended Pruning, Heuristics for choosing a
Winner - Enumerate incomplete UNIONs
- Initialize QoS-Accounts
39Query Optimization Quality of Service-Considerati
ons
Cost
illegal QEP
R
Q
P
Completeness
40
40QoS-Annotated Query Plan
display
hostclient
QoS Accounts
hostclient
thumbnail
hostA.com
hostB.com
hostB.com
scan
scan
hostA.com
hostB.com
41Optimization Open Questions
- Revisit heuristics to choose winning plan
- Dynamic heuristics depending on workload and/or
feedback - Reverse engineering a plan
- How much data should a plan read if the cost
should be 5.00? - Does query optimization matter?
42Admission Control Monitoring
- Admission Control
- Check assumptions of optimizer
- Carried out at plan instantiation time for each
plan fragment (set of operators at one site) - Monitoring
- Predict quality of results at the end of
execution - Carried out by special Monitoring operators
- Take actions if violations are detected
- ECA rules specify actions
43Monitoring Operators
Join
- at the end of pipelines
- are non-blocking / low cost
- above receive ops
- keep statistics for predictions
- differentiate between open and next phase
- Communicate with each other for liveliness
monitor
receive
send
send
monitor
monitor
A
B
44Plan Adaptions
- General Abort, Restart / Reoptimize
- Response Time Violation
- compressConnection
- movePlan (w/wo state)
- increasePriority
- removeTempResults, ...
- Coverage / Result Quality Violation
- addSubPlan
- Cost Violation
- movePlan, decreasePriority, ...
45ECA Rules for Adaptions
- if cost is high and coverage is low then abort
- if cost is high and coverage is high then
delResults - if rt is high and cost is low and network is
critical then compress
46Plan Adaptions Open Questions
- What is the right mix of actions?
- What are the right thresholds for the rules?
- How to avoid the Schweinezyklus?
- How to draw the right conclusions from the
statistics produced by Monitoring? - What is the right granularity of actions?Plan
vs. Operator vs. Tuple
47Project Status
- First demo presented at SIGMOD 99
- Travel information
- Four Web data sources (hotels, sights, train
conns) - One function provider (travel routes, top N)
- Three cycle providers (two in Europe, one in US)
- Online-Demo http//db.fmi.uni-passau.de/pr
ojects/OG - Current work more experiments
- Problem getting data from Web sources is sloooow