Title: Piazza: Data Management Infrastructure for Semantic Web Applications
1Piazza Data Management Infrastructure for
Semantic WebApplications
- Alon Y. Halevy, Zachary G. Ives, Peter Mork, Igor
Tatarinov.
Speaker Sergey Chernov Tutor Jens Graupmann
2Outline
- INTRODUCTION. SEMANTIC WEB.
- PIAZZA SYSTEM OVERVIEW
- IMPLEMENTATION DETAILS
- 3.1 MAPPING LANGUAGE
- 3.2 QUERY ANSWERING ALGORITHM
- CONCLUSIONS.
3Introduction
- Goal
- Data Integration and Knowledge Management
- Problem
- Web data lacks machine-understandable semantics
- Solution
- Semantic Web?
4The Semantic Web
- Web sites include structural annotations
- You can pose meaningful queries on them.
- Ontologies provide the semantic glue.
- Internal implementation of web sites left open.
- Agents perform tasks
- Query one or more web sites
- Perform updates (e.g., set schedules)
- Coordinate actions
- Trust each other (or not).
- I.e., agents operating on a gigantic
heterogeneous distributed database.
(View by A. Halevy)
5General requirements
- Robust infrastructure for querying
- Peer data management systems.
- Facilitate mapping between different structures.
Need tools for - Locating relevant structures
- Easily joining the semantic web.
- Get data into structured form
- Should we worry about the legacy web?
6Using views for specifyingmappings
- Local-As-View (LAV).
- Data sources can be described as views over the
mediated schema. - Global-As-View (GAV).
- Mediated schema can be described as a set of
views over the data sources.
Mediated Schema
Site B
Site A
Site C
Mediated Schema
Site B
Site A
Site C
7Mapping
- Mapping AB specifies representation of structured
data from scheme of node A into scheme of node B
Mediated Schema
Mapping MS-C
Mapping A-MS
Mapping C-MS
Mapping MS-A
Mapping AB
Mapping BC
Site B
Site C
Site A
Mapping BA
Mapping CB
8Piazza Peer Data-Management System
- Goal
- Large scale autonomous sharing of structured data
- Peer data management system (PDMS)
- Autonomous Peers export data in their own schemas
- Pair-wise mappings between peers
- Generalization of a Data Integration system
- NOT a P2P file sharing system
9Relationship of PDMS to
- P2P overlay networks (the Structured World)
- Data integration systems (no central logical
mediated schema) - Federated databases (scale, ad-hoc nature)
- Distributed databases (no central administration)
10Representing Data
- A spectrum of possibilities
- Relational tables, some integrity constraints
- XML can encode relational, hierarchical
- Xquery emerging standard query language (SQL
for XML) - RDF XML on drugs.
- Sees only the logic ignores other aspects.
- DAMLOIL
- Full-blown Knowledge representation language.
- They all have semantics just different
expressive powers. - We keep the data simple. Mappings between data at
different peers are more complex.
11Peer Data Management
DB Projects
MIT
UW
UCB
Stanford
- Mappings are query expressions
- DbResearcher(x) ? Researcher(x),Area(x,DB)
- DbResearcher(x), Office(x,DBLab) DbLabMember(x)
12Piazza mapping language (1)
ltpubsgt ltbookgt a IN
document(source.xml)\ /authors/author
t IN a/publication/title, typ IN
a/publication/pub-type WHERE typ book
lttitlegt t lt/titlegt
ltauthorgt ltnamegt a/full-name lt/namegt
lt/authorgt lt/bookgt lt/pubsgt
Target pubs book title
author name publisher
name
Source authors author full-name
publication title
pub-type
13Piazza mapping language (2)
ltpubsgt ltbook piazzaidtgt
a IN document(source.xml)\
/authors/author t IN a/publication/title,
typ IN a/publication/pub-type WHERE typ
book lttitle piazzaidtgt t
lt/titlegt ltauthor piazzaidtgt
ltnamegt a/full-name lt/namegt
lt/authorgt lt/bookgt lt/pubsgt
Target pubs book title
author name publisher
name
Source authors author full-name
publication title
pub-type
14Piazza mapping language (3)
ltpubsgt ltbook piazzaidtgt
a IN document(source.xml)\
/authors/author t IN a/publication/title,
typ IN a/publication/pub-type WHERE typ
book PROPERTY t gtA AND t lt
B ltpublishergt
ltnamegt PROPERTY this IN
PrintersInc, PubsInc lt/namegt
lt/publishergt lt/bookgt lt/pubsgt
Target pubs book title
author name publisher
name
Source authors author full-name
publication title
pub-type
15Query Answering Algorithm
- Problem
- Evaluate query Q at P1 given a network of
mappings - Reformulate the query over all relevant peers
- Chaining of mappings using a combination of query
composition and query rewriting - QP1(x) - DbResearcher(x)
- Query Composition
- M DbResearcher(x) ? Researcher(x),Area(x,DB)
- ? QP2 (x) ?
Researcher(x),Area(x,DB) - Query Rewriting
- M DbResearcher(x), Office(x,DBLab)
DbLabMember(x) - ? QP3 (x) ?
DbLabMember(x)
16Query Reformulation (1)
Mapping
Query
ltS2gt ltpeoplegt people/S1/people
ltfacultygt namepeople/faculty/name/text()
name lt/facultygt
ltstudentgt studentpeople/student/text()
ltnamegt student lt/namegt
ltadvisorgt facultypeople/faculty,
namefaculty/name/text(),
adviseefaculty/advisee/text()
where adviseestudent
name ltadvisorgt lt/studentgt
lt/peoplegt lt/S2gt
ltresultgt for faculty in
/S1/people/faculty, name in
faculty/name/text(), advisee in
faculty/advisee/text() where name
Ullman return ltstudentgt
advisee lt/studentgt lt/resultgt
17Query Reformulation (2)
Query tree pattern
Mapping tree pattern
Query
ltS2gt
ltresultgt for faculty in
/S1/people/faculty, name in
faculty/name/text(), advisee in
faculty/advisee/text() where name
Ullman return ltstudentgt
advisee lt/studentgt lt/resultgt
S1 ltpeoplegt people
faculty name advisee adviseestudent
ltadvisorgt name
18Query Reformulation (3)
Query tree pattern
Mapping tree pattern
Query
ltS2gt
ltresultgt for faculty in
/S2/people/student, advisor in
student/advisor/text(), name in
student/name/text() where advisor
Ullman return ltstudentgt name
lt/studentgt lt/resultgt
S1 ltpeoplegt people
faculty name advisee adviseestudent
ltadvisorgt name
19Reformulation times
- Table 1 The test queries and their respective
running times.
20Current and the Future
- Current status
- Demo scenario using XML
- Looking at real domains (Bio dbs, NASA dbs)
- Future Work
- More efficient reformulation algorithm
- Semantic network analysis eliminate redundant
mappings and inconsistent mappings - Query caching to speed up query evaluation
21Conclusions
- Mapping language for mapping between sets of XML
source nodes with different document structures - Architecture that uses the transitive closure of
mappings to answer queries - Algorithm for query answering over this
transitive closure of mappings, which is able to
follow mappings in both forward and reverse
directions
22Thank You!
23Further literature
- Alon Y. Halevy, Zachary G. Ives, Dan Suciu, Igor
Tatarinov Schema Mediation for Large-Scale
Semantic Data Sharing - Igor Tatarinov, Zachary Ives, Jayant Madhavan,
Alon Halevy, Dan Suciu, Nilesh Dalvi, Xin (Luna)
Dong, Yana Kadiyska, Gerome Miklau, Peter Mork
The Piazza Peer Data Management Project - Alon Y. Halevy, Zachary G. Ives, Dan Suciu, Igor
Tatarinov Schema Mediation in Peer Data
Management Systems - Alon Halevy, Oren Etzioni, AnHai Doan, Zachary
Ives, Jayant Madhavan, Luke McDowell, Igor
Tatarinov Crossing the Structure Chasm - Madhan Arumugam, Amit Sheth, and I. Budak
Arpinar Towards Peer-to-Peer Semantic Web A
Distributed Environment for Sharing Semantic
Knowledge on the Web - Hendler J., Berners-Lee T., Miller E.
Integrating Applications on the Semantic Web