Title: Distributed Data Bases:
1Lecture 7
- Distributed Data Bases
- Principles and Architectures
2Distributed Data Base (DDB) Definition
- A logically interrelated collection of shared
data, physically distributed over a computer
network. - Implies data description at two levels
- Global (the view of the whole)
- Local (where data is actually held).
3DDBMS (Distributed DBMS)
- The software system that permits the management
of the distributed data base and makes the
distribution transparent to users - Transparent users are unaware of the underlying
local structure - Data requests do not specify distribution sites
- but they may notice performance differences (e.g.
if local data has to be moved to another site
along a slow line).
4Characteristics of a DDB
- Collection of logically related shared data.
- Data is split into a number of fragments
(horizontal or vertical restrict or project). - Fragments may be replicated.
- Fragments / replicas are allocated to sites.
- Fragments are in effect views
- Replicas are duplicates only acceptable if
redundancy is controlled.
5Why distribute?
- Natural match of data with location
- Can have each division, department or office hold
its own data with some degree of autonomy. - Autonomy to have control (self-determination,
self-rule) - Users can decide policies locally (devolved).
- Still need global DBA to ensure entire system
works.
6Why distribute? (cont.)
- More flexible operation
- Improved availability
- one node failure does not bring the whole system
down. - Improved reliability
- replication ensures that copies of data are still
available if a node fails. - Improved performance
- accessing most data locally reduces network
overheads. - Readily handle expansion
- can add new nodes with local schema
- followed by simple adjustments to global
definition.
7Problems with Distribution
- Complexity
- Global and local schema must be integrated.
- Design techniques involve more stages.
- Replications rigorously handled.
- Network needs to be robust.
- Costs
- More people effort needed to handle the
complexity - although cheaper to buy power with smaller
machines rather than larger ones.
8Problems with Distribution (cont.)
- Security
- Many more potential access points for would-be
violators. - Integrity
- Need to ensure that combination of local and
global constraints gives the required effect. - Experience
- Fairly immature technology
- not yet translated to standards.
9Homogeneous and Heterogeneous DDBMS
- A homogeneous DDBMS uses the same database
product at all sites. - A heterogeneous DDBMS uses different data base
products at various sites - may arise from corporate mergers.
10Degrees of Heterogeneity
- Same software, different hardware can be handled
fairly easily. - Oracle 9i Oracle 8i differences slight.
- Oracle 9i SQL Server same underlying
relational model, different syntax in places. - Oracle 9i Objectivity object-relational
(SQL-1999) and ODMG respectively, different
underlying model.
11Interoperability
- Ability to work with each other.
- In a loosely coupled environment
- Full details of each system not needed
- BUT need to have interfaces for reliably
exchanging messages without error or
misunderstanding - Solutions
- Standardized specifications
- Mediation.
- Differences in implementation
- may still lead to breakdowns in communication.
12Simple Problem in Interoperability - 1
- Two schemas in SQL-1999
- A B
- author varchar2(50), author_surname
varchar2(40), -
author, initials varchar2(10), - title varchar2(300), title varchar2(200),
- keyword1 varchar2(30), keywd keywordarr
- keyword2 varchar2(30)
- CREATE TYPE keywordarr AS
- VARRAY(8) OF varchar2(30)
- Note homogeneous model both SQL-1999 but
difficulties.
13Different Standards
- e.g. Names
- Person (surname, first_name, )
- or Person (first_name, surname, )
- or Person (name, )
- First two may easily be made equivalent but
convention in third needs to be understood. - Note also possibilities of A.N.Other, AN Other,
A N Other, etc.
14Possible Solutions
- In schema B, define a function which amalgamates
the two parts of author into one value. - Will need to look manually at format of author in
schema A. - If format inconsistent, need some pre-processing.
- Other inconsistencies require decisions
- Fixed two entries for keyword vs. array dimension
8. - Different name for keyword attribute.
- Different size for title fields (presumably adopt
higher). - In a heterogeneous environment, we need also to
relate schema constructions, e.g. is CLASS the
same as TABLE?
15Simple Problem in Interoperability - 2
- Homogeneous Models
- The same information may be held as attribute
name, relation name or a value in different data
bases. - e.g. library fines
- could be held in a dedicated relationFine
(amount, borrower_id) - or as an attribute of Loan (id, isbn, date_out,
fine) - or as a value Charge (1.25, fine).
16Architectures for Interoperability 1
- 1. Global schema integration
- Produces a single new schema (C) for the
different information systems with schemas (A,
B). -
C
A
B
17Global Schema Integration
- Advantages
- Transparent to end users appears as a single
information system. - Disadvantages
- Difficult to perform integration needs human
understanding. - Local autonomy lost.
- Static does not evolve automatically.
- Tightly-coupled.
18Architectures for Interoperability 2
- 2. Federated Data Base Systems
- Less tightly coupled schema than in 1.
- Each service specifies sharable objects
through an export schema. - Common data model.
- Internal command language.
- Decentralised control (local autonomy).
- 5-level architecture for federated system.
- e.g. Objectivity as Federated OODBMS
19FDBMS Terminology
- IS Internal Schema defining layout on disk of a
conceptual schema - CS Conceptual Schema defining logical data base
(e.g. relational tables, attributes, domains). - ES External Schema defining views on conceptual
schema.
20Federated DBMS 5-level Architecture
Global ES
Global ES
Global CS
Local ES
Local ES
Local CS
Local CS
Local IS
Local IS
DB
DB
21Federated Data BaseLoosely coupled
- Created by users.
- AE, BE are export schemas.
- V is a view.
- A, B are base schemas, retaining autonomy over
those parts not exported.
V
B
A
BE
AE
22Federated Data BasesTightly coupled
- Created by administrators.
- Global schema integration on all export schemas.
- More formal than loosely-coupled.
- Much effort to resolve semantic inconsistencies.
23Federated Data Base Systems General Advantages
- Local autonomy preserved.
- Not all data needs to be integrated.
- Provide meta-data structures for views (external
and export schema, data dictionary).
24Federated Database Systems - Disadvantages by
Approach
- Loosely coupled
- Duplication by different users in building views.
- Updating data defined in views can be difficult.
- Tightly coupled
- Similar to global schema integration
- Complex, difficult to make changes dynamically.
- Much effort needed to resolve semantic
inconsistencies.
25Multi-Data-Base Language Approach
- No attempt at schema integration.
- All sites maintain complete autonomy.
- The various schemas can be heterogeneous,
inconsistent w.r.t. services provided, and
duplicate information in different ways. - Language (e.g. MSQL) is used to integrate data
bases at run time. - Relational model used as Common Data Model.
26Multi-Data-Base Language Approach
- A, B are schemas.
- MSQL is the run-time language.
MSQL
B
A
27Multi-Data-Base Language Approach Advantages
- No preparatory work to understand semantics of
schema. - Dynamic access latest versions.
- Very skilled users can succeed in reaching their
goals. - Interesting work on multi-data-base dependencies.
28An Example Multi-Data-Base Language
- MSQL (Multidatabase SQL)
- Biased towards the relational model.
- Illustrates problems.
- Consider 2 data bases
- Each on publications of a computing society
- and query
- What is the name, e-mail address, title for
each publication of an author appearing in both
of the societys data bases?
29MSQL Schema
- Schema 1 (for AIIA Database)
- Contacts (PersonID, Name, Email, )
- Conference (Name, Type, )
- Attendees (ID, Conf_ID, Speaker, )
- Publ_Papers (P_ID, Title, Author_ID, )
- Schema 2 (for IFIP Database)
- Member_Socs (Soc_Name, )
- Conf (Conf_ID, )
- Publ_Papers (P_Ref, Title, Conf_Ref, )
- Authors (Name, Email, Paper_ID, )
- Underlined attributes are primary keys.
- Attributes in italics are foreign keys.
30MSQL for Query
- USE AIIA, IFIP
- SELECT Name, Email, Title
- FROM Authors,
- IFIP.Publ_Papers IFIP_Paper,
- Contacts,
- AIIA.Publ_papers AIIA_Paper
- WHERE Authors.Name Contacts.Name
- AND Contacts.Person_ID AIIA_Paper.Author_ID
- AND Authors.Paper_ID IFIP_Paper.P_Ref
- The USE clause declares the multi-data-bases
which are used as qualifiers in the FROM clause
to distinguish tables with the same name
(thereafter distinguished by aliasing). - Retrieves Name, E-mail address and Title from
both data bases.
31Potential Problems with MSQL
- Are names and domains of corresponding attributes
the same? - Can use LET command to create equivalences of
names, but this does not solve domain
incompatibility. - What if one schema is not relational? The E-R
model is often used as a neutral schema for
translation and comparison of heterogeneous
features.
32Multi-data-base Language Disadvantages in
General
- Distribution is not transparent.
- Users must resolve inconsistencies themselves.
- Common language may restrict scope of
heterogeneity (relational bias). - Local autonomous systems may change their schema
freely (so existing queries fail).
33Comparison of Approaches
- By coupling
- How tightly is the interoperable system connected
to its underlying systems? - By adaptability
- How freely can the interoperable system evolve in
line with the underlying schema? - By transparency
- How much understanding of the interoperable
system do end-users need to have?
34Comparison of Approaches
- Approach Coupling Adaptability Transparency
- Global Schema Tight Low High
- Integration
- Federated Medium Medium Medium
- Data Bases
- Multi-data-base Loose High Low
- Languages
35Summary
- Trend
- From Global Schema Integration
- through Federated Data Bases
- to Multi-data-base Language.
- Towards looser coupling, higher adaptability, and
lower transparency.
36Further Reading
- Management of Heterogeneous and Autonomous
Database Systems - Elmagarmid, Ahmed
- Rusinkiewicz, Marek
- Sheth, Amit
-
- Morgan Kaufmann (1999).