P2P and databases - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

P2P and databases

Description:

Intuitively, medical care, tourism, tourism in Trentino, are all possible topics ... Nm. ID. GS. Data integration. A data integration system: A Global Schema (GS) ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 34

Provided by: IlyaZai3

Category:

more less

Transcript and Presenter's Notes

Title: P2P and databases

1
P2P and databases

Ilya Zaihrayeu
27/11/03
- joint (current) work with F. Giunchiglia, G.
Kuper, E. Franconi, A. Lopatenko and A.
Ivanyukovich

2
Index

P2P and databases
Database coordination model
JXTA
Implementation
Demo

3
P2P and databases
4
DB Systems autonomy and centralization

Centralized database
single DBMS
manages a single DB on the same computer

Distributed database
managers multiple databases interconnected by a
computer network
single DDBMS
global schema
homogeneous
queries and updates to one database
databases share all the data

5
DB Systems autonomy and centralization, contd

Federated database
global schema
single FDBMS
local schemas and global schemas coexist
queries and updates are possible both locally and
globally
allows partial sharing of data
data location transparency

Multi-database
no global schema
DBS structure is fixed
database is seen as several heterogeneous units
local DBs may have different DBMS
a uniform language to access the DB
(generally) heterogeneous
(generally) autonomous
the locations of the data are visible to the
application

6
WEB P2P DBS

Postulates
decentralized
heterogeneity
total autonomy
quality of connections, location and availability
of data cant be known a priory
locality nodes do not know all participating
DBs
Benefits
scalability
reliability
graceful degradation

Drawbacks
QoS from the network (response time, results,
etc) may vary greatly
hard to maintain metadata distributed on the
network
Key solutions should address
effective query answering algorithms
transitive query propagation
efficient ways to organize, maintain and manage
metadata
possible partial centralization (mainly to
support metadata)

coordination, not integration!
... Coordination is managing dependencies
between interacting databases in runtime

7
A Motivating Scenario

A patient may be described in several DBs, which
use different patient id formats, disease
descriptions, etc.
When a patient is admitted to the hospital, H
becomes acquainted with D
The acquaintance is dropped when treatment is
over
When the doctor prescribes a drug, D becomes
acquainted with P
A patient is injured skiing, so more DBs get
involved

Ski Clinic
8
The Three Variances

We have (at least) three kinds of unpredictable
run time factors, which influence the answer to a
given query in a P2P network, namely
Network (dependent) variance the network changes
over time
Database (dependent) variance for any given P2P
network, different databases, even if asked the
same query, and at the same time, will provide
different answers
Query (dependent) variance different queries,
even if posed to the same database, will impose
different points of view on the network

9
Database coordination model
10
Interest Groups

In most cases, nodes know very little of the
other nodes of the P2P network, and in particular
about the topics about which their peers are able
to answer queries
Intuitively, medical care, tourism, tourism in
Trentino, are all possible topics
A topic could be formalized as keywords, a
schema, an ontology, or as a context
Interest Group is a set of nodes which are able
to answer queries about a certain topic
Each group has a node, called the Group Manager
(GM) which is in charge of the management of the
metadata needed in order to run the group
The main goal of the Interest Group is computing,
for any given input query, the Query Scope (QS)
the set of nodes a query should be propagated to

11
Acquaintances

Acquaintances are nodes that a node knows about
and that have data that can be used to answer a
specific query (called acquaintance query)
If a node is an acquaintance, then there must be
a way to compute how to propagate a query, to
propagate results back, and to reconcile them
with the results coming from the other
acquaintances

12
Coordination Rules

Each acquaintance may be associated with one or
more Coordination Rules
At run time, nodes use coordination rules which
specify under what conditions, when, how and
where to propagate queries or updates
A proposed implementation of coordination rules
is as Event-Condition-Action (ECA) rules
Event can be an update or a query coming from the
user or from another node
Condition refers to properties of the update or
query (e.g., the type of query and/or which data
items are referenced by the query)
Action can be the translation and propagation of
a given update or query to a particular
acquaintance

13
Correspondence Rules

Each acquaintance is associated with one or more
Correspondence Rules
Correspondence Rules take care of the semantic
heterogeneity problem and are used for the
translation of queries and query results
Implemented as rewrite rules and are called by
coordination rules, in the body of the code
implementing their action and condition components

14
Query Propagation Algorithm

User submits query Q (?)
Node defines query topic
Node sends to Group Manager (GM) request to
define Query Scope (QS)
GM computes and sends back QS
Node 1 sends query to acquaintances in QS, and
reports this fact to GM
Nodes 2 and 4 send answer to node 1
Nodes propagate the query to theirs acquaintances
from QS and report this fact to GM
And so on
Nodes which do not propagate any further, report
this fact to GM
Propagation stops when no more propagation
received from all boundary nodes

no more propagation from 8
no more propagation from 9
5. nodes 2 and 4 are reached
node 8 is reached
node 6 is reached
GM
3. QS (?, topic) ?
4. QS (?, topic) (2, 4, 6, 8, 9, 11)
9
6
2
2. Q (?, topic)
?Res2
10
7
1. Q (?)
?Res4
1
4
11
3
5
8
15
JXTA
16
JXTA

JXTA provides a common set of open protocols and
an open source reference implementation for
developing peer-to-peer applications
JXTA-powered applications can
Find other peers on the network with dynamic
discovery across firewalls
Easily share documents with anyone across the
network
Find up to the minute content at network sites
Create a group of peers that provide a set of
services
Monitor peer activities remotely
Securely communicate with other peers on the
network
Platform independence, namely
Programming languages (e.g. C or Java)
System platforms (e.g. Microsoft Windows or UNIX)
Networking platforms (Bluetooth, TCP/IP, etc)
Ubiquity PCs, PDAs, Consumer electronics, etc.

17
JXTA Concepts

Peer any networked device that implements one
or more of the JXTA protocols
Rendezvous peers - provide other peers with
network locations for discovering network
resources
Peer Group a collection of peers that have
agreed upon a common set of services
Network Services peers cooperate and
communicate to publish, discover, and invoke
network services
Pipes asynchronous and unidirectional message
transfer mechanism used for service communication
Messages object that is sent between JXTA
peers it is the basic unit of data exchange
between peers
Advertisements XML documents describing network
resources as peers, peer groups, pipes, etc.
IDs for Peers (IP independent), pipes, etc.

18
Network Services

Services can be either pre-installed onto a peer
or loaded from the network
Examples of core JXTA services Discovery
Service, Membership Service, Pipe Service.
JXTA protocols recognize two levels of network
services
Peer Service
Peer Group Service
JXTA allows for specification of custom services
in order to provide peers with desirable
functionality

19
Implementation
20
Basic notions

We define acquaintance query as a conjunctive
query
q(X) - r1(X1), , rn(Xn)
q(X) head r1(X1), , rn(Xn) subgols of the
bodysubgols relation (R) and comparison (C)
X, X1,, Xn variables or constants If x ? C
then x ? R
Example SQL ? CQ
Relations Museums (mName, country)
Paintings (title, author, mName, cost)
SQL select P.title, M.mName from Paintings P,
Museums M where M.mNameP.mName and M.country
Italy and P.cost gt 10 000
CQ Q(Title, MN) - Museums (MN, CTR), Paintings
(T, A, MN, C), CTR Italy, C gt 10 000
Condition part of a coordination rule (at the
moment) head is equal to a relation being
queried

21
Implementing P2P DBs in JXTA

We implement peers as JXTA peers, and Interest
groups as JXTA groups
We extend standard JXTA peer advertisement to
encapsulate the schema information of a peer
We extend peer group advertisement to encapsulate
group topic information
We encode database related functionalities into a
set of custom services (DB-related services)

DB-related services
Node-level services
Group-level services

Queries and results handler
QS handler
Screening service
GM service
22
GM service, extended

In order to compute query scopes (SQ), Group
Manager (GM) maintains explicit representation of
a topic as a global schema (GS)
Local schemas are mapped onto GS via peer
mappings
Schemas of local databases are described in terms
of GS, i.e. LAV

GS
GM
23
Data integration

A data integration system
A Global Schema (GS)
A set of Local Schemas (LS)
A set of mappings M between GS and LS
LS describes real data
GS is a unified integrated view of LS and appears
for the user as one virtual database (VDB)
Queries are posed against the relations of VDB
and then reformulated in respect to LS using M

Q
GS
VDB
LSn
DBn
DBj
LSj
LS5
DB5
DB4
LS4
DB6
DB3
LS3
LS6
DB1
LS1
DB2
LS2
24
Data integration, contd

Mappings M
LS is defined in terms of GS local-as-view (LAV)
LAV Pro allows dynamics in local sources
LAV Contra complex query reformulation
GS is defined in terms of LS global-as-view
(GAV)
GAV Pro easy query reformulation
GAV Contra difficult to handle dynamics in local
sources

Q
GS
VDB
LSn
DBn
DBj
LSj
DB5
LS5
DB4
LS4
DB6
DB3
LS3
LS6
DB1
DB2
LS1
LS2
25
QS computation scenario

Query Q is posed locally in respect to a local
schema
Q is reformulated to QGS in respect to GS using
GAV
At GM, QGS is evaluated using peer mappings to
designate destination DBs
LAV Contra complex query reformulation not a
contra for us!
No complete query reformulation
Once computed a query scope, query is propagated
on the node-to-node basis

GS
GM
(LAV)
Q ? QGS (GAV)
Q
26
First level architecture
User
A node
Nodes of the P2P network
P2P Layer
A P2P database network
User Interface (UI)
User-1
Query Manager (QM)
User-2
JXTA Layer
Wrapper
User-n
Local Database (LDB)
DBS
27
Second level architecture
28
Demo
29
(A very simple example) Demo
Rendezvous peer

Relations
Movie (title, year, genre)
Credits (name, title, role)
Movie2 (title, year, director)
Genre (title, genre)

(1,2)
1
(2-2)
(2)
3
(1-1)
(1-3,4)
Q
(2-2)
0
(2-2)
2
(3)
(3-3)
(1,2)
(2,3,4)
4
(4-4)
Mediator peer
5
(4)
30
Query example 1

Names of actors playing in action movies in
2003
Q(n) - Movie (t,y,g) Credits (n,t,r)
rActor gAction y2003

(1,2)
1
(2-2)
(2)
3
(1-1)
(1-3,4)
Q
(2-2)
0
(2-2)
2
(3)
(3-3)
(1,2)
(2,3,4)
4
(4-4)
5
(4)
31
Query example 2

Titles of drama movies issued after 1995
Q(t) - Movie (t,y,g) gDrama ygt1995

(1,2)
1
(2-2)
(2)
3
(1-1)
(1-3,4)
Q
(2-2)
0
(2-2)
2
(3)
(3-3)
(1,2)
(2,3,4)
4
(4-4)
5
(4)
32
Query example 1

List titles of movies featuring Tom Hanks
Q(t) - Credits (n,t,r) nTom Hanks

(1,2)
1
(2-2)
(2)
3
(1-1)
(1-3,4)
Q
(2-2)
0
(2-2)
2
(3)
(3-3)
(1,2)
(2,3,4)
4
(4-4)
5
(4)
33
Some references

F. Giunchiglia and I. Zaihrayeu. Making peer
databases interact - a vision for an architecture
supporting data coordination. 6th International
Workshop on Cooperative Information Agents
(CIA-2002), Madrid, Spain, September 18 -20,
2002.
F. Giunchiglia and I. Zaihrayeu. Implementing
database coordination in p2p networks. DIT
technical report DIT-03-035, the University of
Trento, Italy, Nov 2003.
P. Bernstein, F. Giunchiglia, A. Kementsietsidis,
J. Mylopoulos, L. Serafini, and I. Zaihrayeu,
Data management for peer-to-peer computing A
vision, WebDB, 2002.
A. Halevy, Z. Ives, D. Suciu, and I. Tatarinov,
Schema mediation in a peer data management
system, ICDE, 2003.
A. Halevy. Answering queries using views a
survey. VLDB Journal, 2001.
V. Kantere, I. Kiringa, J. Mylopoulos, A.
Kementsietsidis, and M. Arenas, Coordinating
peer databases using ECA rules, DBISP2P,
September 2003.
JXTA project, see http//www.jxta.org