Title: Mapping Data to Queries
1Mapping Data to Queries
Systems Group, ETH Zurich
2 - , but the real advantage of XML is precisely
that it allows you to go from Point A to
destinations unknown. -- Larry
OBrien, Microsoft
3Goals
- Integrate data from various data feeds
- Light-weight
- Easy to use
- Fast
4Goals
- Integrate data from various data feeds
- Light-weight Mapping rules
- Easy to use Based on common language (XQuery)
- Fast Implements research ideas (YFilter)
5Targets
- Health care
- Electronic health records (Health Level 7)
- Finance
- Exchange of financial data (xBRL)
- Web services
- News feeds
- Weather
- Every domain which uses several data sources
6Example
- Find the most powerful car
ltdbgt ltcargt ltnamegtFordlt/namegt
lthpgt130lt/hpgt lt/cargt lt/dbgt
ltdatengt ltautogt ltnamegtVW Golflt/namegt
ltpsgt150lt/psgt lt/autogt lt/datengt
7Example
- Find the most powerful car
ltdbgt ltcargt ltnamegtFordlt/namegt
lthpgt130lt/hpgt lt/cargt lt/dbgt
ltdatengt ltautogt ltnamegtVW Golflt/namegt
ltpsgt150lt/psgt lt/autogt lt/datengt
daten is-a db auto is-a car ps is-a hp
8Example
ltdbgt ltcargt ltnamegtFordlt/namegt
lthpgt130lt/hpgt lt/cargt lt/dbgt
- Find the most powerful car
- Apply standard XQuery
ltdatengt ltautogt ltnamegtVW Golflt/namegt
ltpsgt150lt/psgt lt/autogt lt/datengt
let max max(//hp) for car in //car where
car/hp max return car
daten is-a db auto is-a car ps is-a hp
9Example
ltdbgt ltcargt ltnamegtFordlt/namegt
lthpgt130lt/hpgt lt/cargt lt/dbgt
- Find the most powerful car
- Apply standard XQuery
ltdatengt ltautogt ltnamegtVW Golflt/namegt
ltpsgt150lt/psgt lt/autogt lt/datengt
let max max(//hp) for car in //car where
car/hp max return car
daten is-a db auto is-a car ps is-a hp
ltautogt ltnamegtVW Golflt/namegt
ltpsgt150lt/psgt lt/autogt
10Usage Scenarios
- Continuous query processing
Queries
DSMS
Rules
Streaming Output Events
Streaming Input Events
11Usage Scenarios
- Publish/subscribe systems
Subscriptions
Data
Enhanced Broker
Data
Data
Rules
Publishers
Subscribers
12Usage Scenarios
Source 1
Data
Data Handler
Homogeneous
Data
Source 2
Data
Data
Rules
Companys Data Store
Source x
13The Is-A Rule
car is-a vehicle
- Map XML elements
- Expresses a substitutability relationship
- Like in object oriented design
- Use the car wherever vehicles are expected
- It follows //vehicle also returns car elements
- Returned as car
- Not transformed into vehicle
- Consistent with OO-approach
14The Is-A Rule
german/car is-a auto auto is-a german/car
- Map path expressions
- XPath path expressions
- Left hand side may include predicates
car_at_ps lt 100 is-a slow/vehicle
15The Is-A Rule
car in cars_at_countryGermany is-a
auto
- Specify contexts
- Element names could be used differently in
different contexts - Scope applicability of rules
- Further refinement
16The Is-A Rule
auto as a is-a ltcargt ltkwgta/ps
0.74lt/kwgt lt/cargt
- Element construction
- Map elements
- Transform data, e.g. for
- Integration of very diverse data
ltcargt ltnamegtFordlt/namegt
ltkwgt100lt/kwgt lt/cargt
ltautogt ltnamegtVW Golflt/namegt
ltpsgt150lt/psgt lt/autogt
17Implementation
- Several possibilities
- MDQ approach
- Native approach, novel MDQ data model
- Allows lazy execution
- Query rewrite
- E.g. //(car auto vehicle ...)
- Does not scale
- Data translation
- Translate input data
- Big overhead
18MDQ Data Model
ltdatengt ltautogt ltnamegtGolflt/namegt
ltpsgt150lt/psgt lt/autogt lt/datengt
daten
auto
ps
name
Golf
150
19MDQ Data Model
- MDQ data model
- Move names fromnodes to edges
daten
ltdatengt ltautogt ltnamegtGolflt/namegt
ltpsgt150lt/psgt lt/autogt lt/datengt
auto
ps
name
Golf
150
20MDQ Data Model
- Application of mapping rules
db
daten
ltdatengt ltautogt ltnamegtGolflt/namegt
ltpsgt150lt/psgt lt/autogt lt/datengt
car
auto
hp
name
ps
daten is-a db auto is-a car ps is-a hp
Golf
150
21Lazy Evaluation, YFilter
R1 daten is-a db R2 auto is-a car R2 ps
is-a hp
- Built from left hand side of rules
- Non-deterministic finite state machine
- Main idea
- Evaluate XQuery program
- Iterate through data model
- Report to YFilter
- Apply rules only when reaching an accepting state
R1
daten
?
auto
R2
ps
R3
22Experiment Throughput
- Complex query (multiple scans, joins)
- QR too many unions, DT overhead of translation
23Experiment Throughput
- Simple query
- Less unions for QR, DT still overhead of
translation
24Experiment Throughput
- 1 input message, bundle of queries evaluated at
once - QR even more unions, DT less overhead, only
transforms input message once
25Again Advantages
- Performance
- Novel data model, lazy execution
- Light-weight
- Mappings rules are small units
- Extensibility
- Add more rules as new sources are adopted
- Flexibility
- Complex mappings through element constructors
26The End
- Visit our website, LIVE DEMO!
- http//fifthelement.inf.ethz.ch8080/rules
- Write us, please!
- hemartin_at_inf.ethz.ch