Title: P2P and databases
1P2P and databases
- Ilya Zaihrayeu
- 27/11/03
- - joint (current) work with F. Giunchiglia, G.
Kuper, E. Franconi, A. Lopatenko and A.
Ivanyukovich
2Index
- P2P and databases
- Database coordination model
- JXTA
- Implementation
- Demo
3P2P and databases
4DB Systems autonomy and centralization
- Centralized database
- single DBMS
- manages a single DB on the same computer
- Distributed database
- managers multiple databases interconnected by a
computer network - single DDBMS
- global schema
- homogeneous
- queries and updates to one database
- databases share all the data
5DB Systems autonomy and centralization, contd
- Federated database
- global schema
- single FDBMS
- local schemas and global schemas coexist
- queries and updates are possible both locally and
globally - allows partial sharing of data
- data location transparency
- Multi-database
- no global schema
- DBS structure is fixed
- database is seen as several heterogeneous units
- local DBs may have different DBMS
- a uniform language to access the DB
- (generally) heterogeneous
- (generally) autonomous
- the locations of the data are visible to the
application
6WEB P2P DBS
- Postulates
- decentralized
- heterogeneity
- total autonomy
- quality of connections, location and availability
of data cant be known a priory - locality nodes do not know all participating
DBs - Benefits
- scalability
- reliability
- graceful degradation
- Drawbacks
- QoS from the network (response time, results,
etc) may vary greatly - hard to maintain metadata distributed on the
network - Key solutions should address
- effective query answering algorithms
- transitive query propagation
- efficient ways to organize, maintain and manage
metadata - possible partial centralization (mainly to
support metadata)
- coordination, not integration!
- ... Coordination is managing dependencies
between interacting databases in runtime
7A Motivating Scenario
- A patient may be described in several DBs, which
use different patient id formats, disease
descriptions, etc. - When a patient is admitted to the hospital, H
becomes acquainted with D - The acquaintance is dropped when treatment is
over - When the doctor prescribes a drug, D becomes
acquainted with P - A patient is injured skiing, so more DBs get
involved
Ski Clinic
8The Three Variances
- We have (at least) three kinds of unpredictable
run time factors, which influence the answer to a
given query in a P2P network, namely - Network (dependent) variance the network changes
over time - Database (dependent) variance for any given P2P
network, different databases, even if asked the
same query, and at the same time, will provide
different answers - Query (dependent) variance different queries,
even if posed to the same database, will impose
different points of view on the network
9Database coordination model
10Interest Groups
- In most cases, nodes know very little of the
other nodes of the P2P network, and in particular
about the topics about which their peers are able
to answer queries - Intuitively, medical care, tourism, tourism in
Trentino, are all possible topics - A topic could be formalized as keywords, a
schema, an ontology, or as a context - Interest Group is a set of nodes which are able
to answer queries about a certain topic - Each group has a node, called the Group Manager
(GM) which is in charge of the management of the
metadata needed in order to run the group - The main goal of the Interest Group is computing,
for any given input query, the Query Scope (QS)
the set of nodes a query should be propagated to
11Acquaintances
- Acquaintances are nodes that a node knows about
and that have data that can be used to answer a
specific query (called acquaintance query) - If a node is an acquaintance, then there must be
a way to compute how to propagate a query, to
propagate results back, and to reconcile them
with the results coming from the other
acquaintances
12Coordination Rules
- Each acquaintance may be associated with one or
more Coordination Rules - At run time, nodes use coordination rules which
specify under what conditions, when, how and
where to propagate queries or updates - A proposed implementation of coordination rules
is as Event-Condition-Action (ECA) rules - Event can be an update or a query coming from the
user or from another node - Condition refers to properties of the update or
query (e.g., the type of query and/or which data
items are referenced by the query) - Action can be the translation and propagation of
a given update or query to a particular
acquaintance
13Correspondence Rules
- Each acquaintance is associated with one or more
Correspondence Rules - Correspondence Rules take care of the semantic
heterogeneity problem and are used for the
translation of queries and query results - Implemented as rewrite rules and are called by
coordination rules, in the body of the code
implementing their action and condition components
14Query Propagation Algorithm
- User submits query Q (?)
- Node defines query topic
- Node sends to Group Manager (GM) request to
define Query Scope (QS) - GM computes and sends back QS
- Node 1 sends query to acquaintances in QS, and
reports this fact to GM - Nodes 2 and 4 send answer to node 1
- Nodes propagate the query to theirs acquaintances
from QS and report this fact to GM - And so on
- Nodes which do not propagate any further, report
this fact to GM - Propagation stops when no more propagation
received from all boundary nodes
no more propagation from 8
no more propagation from 9
5. nodes 2 and 4 are reached
node 8 is reached
node 6 is reached
GM
3. QS (?, topic) ?
4. QS (?, topic) (2, 4, 6, 8, 9, 11)
9
6
2
2. Q (?, topic)
?Res2
10
7
1. Q (?)
?Res4
1
4
11
3
5
8
15JXTA
16JXTA
- JXTA provides a common set of open protocols and
an open source reference implementation for
developing peer-to-peer applications - JXTA-powered applications can
- Find other peers on the network with dynamic
discovery across firewalls - Easily share documents with anyone across the
network - Find up to the minute content at network sites
- Create a group of peers that provide a set of
services - Monitor peer activities remotely
- Securely communicate with other peers on the
network - Platform independence, namely
- Programming languages (e.g. C or Java)
- System platforms (e.g. Microsoft Windows or UNIX)
- Networking platforms (Bluetooth, TCP/IP, etc)
- Ubiquity PCs, PDAs, Consumer electronics, etc.
17JXTA Concepts
- Peer any networked device that implements one
or more of the JXTA protocols - Rendezvous peers - provide other peers with
network locations for discovering network
resources - Peer Group a collection of peers that have
agreed upon a common set of services - Network Services peers cooperate and
communicate to publish, discover, and invoke
network services - Pipes asynchronous and unidirectional message
transfer mechanism used for service communication - Messages object that is sent between JXTA
peers it is the basic unit of data exchange
between peers - Advertisements XML documents describing network
resources as peers, peer groups, pipes, etc. - IDs for Peers (IP independent), pipes, etc.
18Network Services
- Services can be either pre-installed onto a peer
or loaded from the network - Examples of core JXTA services Discovery
Service, Membership Service, Pipe Service. - JXTA protocols recognize two levels of network
services - Peer Service
- Peer Group Service
- JXTA allows for specification of custom services
in order to provide peers with desirable
functionality
19Implementation
20Basic notions
- We define acquaintance query as a conjunctive
query - q(X) - r1(X1), , rn(Xn)
- q(X) head r1(X1), , rn(Xn) subgols of the
bodysubgols relation (R) and comparison (C) - X, X1,, Xn variables or constants If x ? C
then x ? R - Example SQL ? CQ
- Relations Museums (mName, country)
Paintings (title, author, mName, cost) - SQL select P.title, M.mName from Paintings P,
Museums M where M.mNameP.mName and M.country
Italy and P.cost gt 10 000 - CQ Q(Title, MN) - Museums (MN, CTR), Paintings
(T, A, MN, C), CTR Italy, C gt 10 000 - Condition part of a coordination rule (at the
moment) head is equal to a relation being
queried
21Implementing P2P DBs in JXTA
- We implement peers as JXTA peers, and Interest
groups as JXTA groups - We extend standard JXTA peer advertisement to
encapsulate the schema information of a peer - We extend peer group advertisement to encapsulate
group topic information - We encode database related functionalities into a
set of custom services (DB-related services)
DB-related services
Node-level services
Group-level services
Queries and results handler
QS handler
Screening service
GM service
22GM service, extended
- In order to compute query scopes (SQ), Group
Manager (GM) maintains explicit representation of
a topic as a global schema (GS) - Local schemas are mapped onto GS via peer
mappings - Schemas of local databases are described in terms
of GS, i.e. LAV
GS
GM
23Data integration
- A data integration system
- A Global Schema (GS)
- A set of Local Schemas (LS)
- A set of mappings M between GS and LS
- LS describes real data
- GS is a unified integrated view of LS and appears
for the user as one virtual database (VDB) - Queries are posed against the relations of VDB
and then reformulated in respect to LS using M
Q
GS
VDB
LSn
DBn
DBj
LSj
LS5
DB5
DB4
LS4
DB6
DB3
LS3
LS6
DB1
LS1
DB2
LS2
24Data integration, contd
- Mappings M
- LS is defined in terms of GS local-as-view (LAV)
- LAV Pro allows dynamics in local sources
- LAV Contra complex query reformulation
- GS is defined in terms of LS global-as-view
(GAV) - GAV Pro easy query reformulation
- GAV Contra difficult to handle dynamics in local
sources
Q
GS
VDB
LSn
DBn
DBj
LSj
DB5
LS5
DB4
LS4
DB6
DB3
LS3
LS6
DB1
DB2
LS1
LS2
25QS computation scenario
- Query Q is posed locally in respect to a local
schema - Q is reformulated to QGS in respect to GS using
GAV - At GM, QGS is evaluated using peer mappings to
designate destination DBs - LAV Contra complex query reformulation not a
contra for us! - No complete query reformulation
- Once computed a query scope, query is propagated
on the node-to-node basis
GS
GM
(LAV)
Q ? QGS (GAV)
Q
26First level architecture
User
A node
Nodes of the P2P network
P2P Layer
A P2P database network
User Interface (UI)
User-1
Query Manager (QM)
User-2
JXTA Layer
Wrapper
User-n
Local Database (LDB)
DBS
27Second level architecture
28Demo
29(A very simple example) Demo
Rendezvous peer
- Relations
- Movie (title, year, genre)
- Credits (name, title, role)
- Movie2 (title, year, director)
- Genre (title, genre)
(1,2)
1
(2-2)
(2)
3
(1-1)
(1-3,4)
Q
(2-2)
0
(2-2)
2
(3)
(3-3)
(1,2)
(2,3,4)
4
(4-4)
Mediator peer
5
(4)
30Query example 1
- Names of actors playing in action movies in
2003 - Q(n) - Movie (t,y,g) Credits (n,t,r)
rActor gAction y2003
(1,2)
1
(2-2)
(2)
3
(1-1)
(1-3,4)
Q
(2-2)
0
(2-2)
2
(3)
(3-3)
(1,2)
(2,3,4)
4
(4-4)
5
(4)
31Query example 2
- Titles of drama movies issued after 1995
- Q(t) - Movie (t,y,g) gDrama ygt1995
(1,2)
1
(2-2)
(2)
3
(1-1)
(1-3,4)
Q
(2-2)
0
(2-2)
2
(3)
(3-3)
(1,2)
(2,3,4)
4
(4-4)
5
(4)
32Query example 1
- List titles of movies featuring Tom Hanks
- Q(t) - Credits (n,t,r) nTom Hanks
(1,2)
1
(2-2)
(2)
3
(1-1)
(1-3,4)
Q
(2-2)
0
(2-2)
2
(3)
(3-3)
(1,2)
(2,3,4)
4
(4-4)
5
(4)
33Some references
- F. Giunchiglia and I. Zaihrayeu. Making peer
databases interact - a vision for an architecture
supporting data coordination. 6th International
Workshop on Cooperative Information Agents
(CIA-2002), Madrid, Spain, September 18 -20,
2002. - F. Giunchiglia and I. Zaihrayeu. Implementing
database coordination in p2p networks. DIT
technical report DIT-03-035, the University of
Trento, Italy, Nov 2003. - P. Bernstein, F. Giunchiglia, A. Kementsietsidis,
J. Mylopoulos, L. Serafini, and I. Zaihrayeu,
Data management for peer-to-peer computing A
vision, WebDB, 2002. - A. Halevy, Z. Ives, D. Suciu, and I. Tatarinov,
Schema mediation in a peer data management
system, ICDE, 2003. - A. Halevy. Answering queries using views a
survey. VLDB Journal, 2001. - V. Kantere, I. Kiringa, J. Mylopoulos, A.
Kementsietsidis, and M. Arenas, Coordinating
peer databases using ECA rules, DBISP2P,
September 2003. - JXTA project, see http//www.jxta.org