P2P and databases - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

P2P and databases

Description:

Intuitively, medical care, tourism, tourism in Trentino, are all possible topics ... Nm. ID. GS. Data integration. A data integration system: A Global Schema (GS) ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 34
Provided by: IlyaZai3
Category:
Tags: p2p | databases | mexico | new | tourism

less

Transcript and Presenter's Notes

Title: P2P and databases


1
P2P and databases
  • Ilya Zaihrayeu
  • 27/11/03
  • - joint (current) work with F. Giunchiglia, G.
    Kuper, E. Franconi, A. Lopatenko and A.
    Ivanyukovich

2
Index
  • P2P and databases
  • Database coordination model
  • JXTA
  • Implementation
  • Demo

3
P2P and databases
4
DB Systems autonomy and centralization
  • Centralized database
  • single DBMS
  • manages a single DB on the same computer
  • Distributed database
  • managers multiple databases interconnected by a
    computer network
  • single DDBMS
  • global schema
  • homogeneous
  • queries and updates to one database
  • databases share all the data

5
DB Systems autonomy and centralization, contd
  • Federated database
  • global schema
  • single FDBMS
  • local schemas and global schemas coexist
  • queries and updates are possible both locally and
    globally
  • allows partial sharing of data
  • data location transparency
  • Multi-database
  • no global schema
  • DBS structure is fixed
  • database is seen as several heterogeneous units
  • local DBs may have different DBMS
  • a uniform language to access the DB
  • (generally) heterogeneous
  • (generally) autonomous
  • the locations of the data are visible to the
    application

6
WEB P2P DBS
  • Postulates
  • decentralized
  • heterogeneity
  • total autonomy
  • quality of connections, location and availability
    of data cant be known a priory
  • locality nodes do not know all participating
    DBs
  • Benefits
  • scalability
  • reliability
  • graceful degradation
  • Drawbacks
  • QoS from the network (response time, results,
    etc) may vary greatly
  • hard to maintain metadata distributed on the
    network
  • Key solutions should address
  • effective query answering algorithms
  • transitive query propagation
  • efficient ways to organize, maintain and manage
    metadata
  • possible partial centralization (mainly to
    support metadata)
  • coordination, not integration!
  • ... Coordination is managing dependencies
    between interacting databases in runtime

7
A Motivating Scenario
  • A patient may be described in several DBs, which
    use different patient id formats, disease
    descriptions, etc.
  • When a patient is admitted to the hospital, H
    becomes acquainted with D
  • The acquaintance is dropped when treatment is
    over
  • When the doctor prescribes a drug, D becomes
    acquainted with P
  • A patient is injured skiing, so more DBs get
    involved

Ski Clinic
8
The Three Variances
  • We have (at least) three kinds of unpredictable
    run time factors, which influence the answer to a
    given query in a P2P network, namely
  • Network (dependent) variance the network changes
    over time
  • Database (dependent) variance for any given P2P
    network, different databases, even if asked the
    same query, and at the same time, will provide
    different answers
  • Query (dependent) variance different queries,
    even if posed to the same database, will impose
    different points of view on the network

9
Database coordination model
10
Interest Groups
  • In most cases, nodes know very little of the
    other nodes of the P2P network, and in particular
    about the topics about which their peers are able
    to answer queries
  • Intuitively, medical care, tourism, tourism in
    Trentino, are all possible topics
  • A topic could be formalized as keywords, a
    schema, an ontology, or as a context
  • Interest Group is a set of nodes which are able
    to answer queries about a certain topic
  • Each group has a node, called the Group Manager
    (GM) which is in charge of the management of the
    metadata needed in order to run the group
  • The main goal of the Interest Group is computing,
    for any given input query, the Query Scope (QS)
    the set of nodes a query should be propagated to

11
Acquaintances
  • Acquaintances are nodes that a node knows about
    and that have data that can be used to answer a
    specific query (called acquaintance query)
  • If a node is an acquaintance, then there must be
    a way to compute how to propagate a query, to
    propagate results back, and to reconcile them
    with the results coming from the other
    acquaintances

12
Coordination Rules
  • Each acquaintance may be associated with one or
    more Coordination Rules
  • At run time, nodes use coordination rules which
    specify under what conditions, when, how and
    where to propagate queries or updates
  • A proposed implementation of coordination rules
    is as Event-Condition-Action (ECA) rules
  • Event can be an update or a query coming from the
    user or from another node
  • Condition refers to properties of the update or
    query (e.g., the type of query and/or which data
    items are referenced by the query)
  • Action can be the translation and propagation of
    a given update or query to a particular
    acquaintance

13
Correspondence Rules
  • Each acquaintance is associated with one or more
    Correspondence Rules
  • Correspondence Rules take care of the semantic
    heterogeneity problem and are used for the
    translation of queries and query results
  • Implemented as rewrite rules and are called by
    coordination rules, in the body of the code
    implementing their action and condition components

14
Query Propagation Algorithm
  • User submits query Q (?)
  • Node defines query topic
  • Node sends to Group Manager (GM) request to
    define Query Scope (QS)
  • GM computes and sends back QS
  • Node 1 sends query to acquaintances in QS, and
    reports this fact to GM
  • Nodes 2 and 4 send answer to node 1
  • Nodes propagate the query to theirs acquaintances
    from QS and report this fact to GM
  • And so on
  • Nodes which do not propagate any further, report
    this fact to GM
  • Propagation stops when no more propagation
    received from all boundary nodes

no more propagation from 8
no more propagation from 9
5. nodes 2 and 4 are reached
node 8 is reached
node 6 is reached
GM
3. QS (?, topic) ?
4. QS (?, topic) (2, 4, 6, 8, 9, 11)
9
6
2
2. Q (?, topic)
?Res2
10
7
1. Q (?)
?Res4
1
4
11
3
5
8
15
JXTA
16
JXTA
  • JXTA provides a common set of open protocols and
    an open source reference implementation for
    developing peer-to-peer applications
  • JXTA-powered applications can
  • Find other peers on the network with dynamic
    discovery across firewalls
  • Easily share documents with anyone across the
    network
  • Find up to the minute content at network sites
  • Create a group of peers that provide a set of
    services
  • Monitor peer activities remotely
  • Securely communicate with other peers on the
    network
  • Platform independence, namely
  • Programming languages (e.g. C or Java)
  • System platforms (e.g. Microsoft Windows or UNIX)
  • Networking platforms (Bluetooth, TCP/IP, etc)
  • Ubiquity PCs, PDAs, Consumer electronics, etc.

17
JXTA Concepts
  • Peer any networked device that implements one
    or more of the JXTA protocols
  • Rendezvous peers - provide other peers with
    network locations for discovering network
    resources
  • Peer Group a collection of peers that have
    agreed upon a common set of services
  • Network Services peers cooperate and
    communicate to publish, discover, and invoke
    network services
  • Pipes asynchronous and unidirectional message
    transfer mechanism used for service communication
  • Messages object that is sent between JXTA
    peers it is the basic unit of data exchange
    between peers
  • Advertisements XML documents describing network
    resources as peers, peer groups, pipes, etc.
  • IDs for Peers (IP independent), pipes, etc.

18
Network Services
  • Services can be either pre-installed onto a peer
    or loaded from the network
  • Examples of core JXTA services Discovery
    Service, Membership Service, Pipe Service.
  • JXTA protocols recognize two levels of network
    services
  • Peer Service
  • Peer Group Service
  • JXTA allows for specification of custom services
    in order to provide peers with desirable
    functionality

19
Implementation
20
Basic notions
  • We define acquaintance query as a conjunctive
    query
  • q(X) - r1(X1), , rn(Xn)
  • q(X) head r1(X1), , rn(Xn) subgols of the
    bodysubgols relation (R) and comparison (C)
  • X, X1,, Xn variables or constants If x ? C
    then x ? R
  • Example SQL ? CQ
  • Relations Museums (mName, country)
    Paintings (title, author, mName, cost)
  • SQL select P.title, M.mName from Paintings P,
    Museums M where M.mNameP.mName and M.country
    Italy and P.cost gt 10 000
  • CQ Q(Title, MN) - Museums (MN, CTR), Paintings
    (T, A, MN, C), CTR Italy, C gt 10 000
  • Condition part of a coordination rule (at the
    moment) head is equal to a relation being
    queried

21
Implementing P2P DBs in JXTA
  • We implement peers as JXTA peers, and Interest
    groups as JXTA groups
  • We extend standard JXTA peer advertisement to
    encapsulate the schema information of a peer
  • We extend peer group advertisement to encapsulate
    group topic information
  • We encode database related functionalities into a
    set of custom services (DB-related services)

DB-related services
Node-level services
Group-level services


Queries and results handler
QS handler
Screening service
GM service
22
GM service, extended
  • In order to compute query scopes (SQ), Group
    Manager (GM) maintains explicit representation of
    a topic as a global schema (GS)
  • Local schemas are mapped onto GS via peer
    mappings
  • Schemas of local databases are described in terms
    of GS, i.e. LAV

GS
GM
23
Data integration
  • A data integration system
  • A Global Schema (GS)
  • A set of Local Schemas (LS)
  • A set of mappings M between GS and LS
  • LS describes real data
  • GS is a unified integrated view of LS and appears
    for the user as one virtual database (VDB)
  • Queries are posed against the relations of VDB
    and then reformulated in respect to LS using M

Q
GS
VDB
LSn
DBn
DBj
LSj
LS5
DB5
DB4
LS4
DB6
DB3
LS3
LS6
DB1
LS1
DB2
LS2
24
Data integration, contd
  • Mappings M
  • LS is defined in terms of GS local-as-view (LAV)
  • LAV Pro allows dynamics in local sources
  • LAV Contra complex query reformulation
  • GS is defined in terms of LS global-as-view
    (GAV)
  • GAV Pro easy query reformulation
  • GAV Contra difficult to handle dynamics in local
    sources

Q
GS
VDB
LSn
DBn
DBj
LSj
DB5
LS5
DB4
LS4
DB6
DB3
LS3
LS6
DB1
DB2
LS1
LS2
25
QS computation scenario
  • Query Q is posed locally in respect to a local
    schema
  • Q is reformulated to QGS in respect to GS using
    GAV
  • At GM, QGS is evaluated using peer mappings to
    designate destination DBs
  • LAV Contra complex query reformulation not a
    contra for us!
  • No complete query reformulation
  • Once computed a query scope, query is propagated
    on the node-to-node basis

GS
GM
(LAV)
Q ? QGS (GAV)
Q
26
First level architecture
User
A node
Nodes of the P2P network
P2P Layer
A P2P database network
User Interface (UI)
User-1
Query Manager (QM)
User-2
JXTA Layer
Wrapper
User-n
Local Database (LDB)
DBS
27
Second level architecture
28
Demo
29
(A very simple example) Demo
Rendezvous peer
  • Relations
  • Movie (title, year, genre)
  • Credits (name, title, role)
  • Movie2 (title, year, director)
  • Genre (title, genre)

(1,2)
1
(2-2)
(2)
3
(1-1)
(1-3,4)
Q
(2-2)
0
(2-2)
2
(3)
(3-3)
(1,2)
(2,3,4)
4
(4-4)
Mediator peer
5
(4)
30
Query example 1
  • Names of actors playing in action movies in
    2003
  • Q(n) - Movie (t,y,g) Credits (n,t,r)
    rActor gAction y2003

(1,2)
1
(2-2)
(2)
3
(1-1)
(1-3,4)
Q
(2-2)
0
(2-2)
2
(3)
(3-3)
(1,2)
(2,3,4)
4
(4-4)
5
(4)
31
Query example 2
  • Titles of drama movies issued after 1995
  • Q(t) - Movie (t,y,g) gDrama ygt1995

(1,2)
1
(2-2)
(2)
3
(1-1)
(1-3,4)
Q
(2-2)
0
(2-2)
2
(3)
(3-3)
(1,2)
(2,3,4)
4
(4-4)
5
(4)
32
Query example 1
  • List titles of movies featuring Tom Hanks
  • Q(t) - Credits (n,t,r) nTom Hanks

(1,2)
1
(2-2)
(2)
3
(1-1)
(1-3,4)
Q
(2-2)
0
(2-2)
2
(3)
(3-3)
(1,2)
(2,3,4)
4
(4-4)
5
(4)
33
Some references
  • F. Giunchiglia and I. Zaihrayeu. Making peer
    databases interact - a vision for an architecture
    supporting data coordination. 6th International
    Workshop on Cooperative Information Agents
    (CIA-2002), Madrid, Spain, September 18 -20,
    2002.
  • F. Giunchiglia and I. Zaihrayeu. Implementing
    database coordination in p2p networks. DIT
    technical report DIT-03-035, the University of
    Trento, Italy, Nov 2003.
  • P. Bernstein, F. Giunchiglia, A. Kementsietsidis,
    J. Mylopoulos, L. Serafini, and I. Zaihrayeu,
    Data management for peer-to-peer computing A
    vision, WebDB, 2002.
  • A. Halevy, Z. Ives, D. Suciu, and I. Tatarinov,
    Schema mediation in a peer data management
    system, ICDE, 2003.
  • A. Halevy. Answering queries using views a
    survey. VLDB Journal, 2001.
  • V. Kantere, I. Kiringa, J. Mylopoulos, A.
    Kementsietsidis, and M. Arenas, Coordinating
    peer databases using ECA rules, DBISP2P,
    September 2003.
  • JXTA project, see http//www.jxta.org
Write a Comment
User Comments (0)
About PowerShow.com