Distributed Data Bases: - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

Distributed Data Bases:

Description:

A logically interrelated collection of shared data, physically distributed over ... (e.g. if local data has to be moved to another site along a slow line) ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 37

Provided by: akhta

Category:

more less

Transcript and Presenter's Notes

Title: Distributed Data Bases:

1
Lecture 7

Distributed Data Bases
Principles and Architectures

2
Distributed Data Base (DDB) Definition

A logically interrelated collection of shared
data, physically distributed over a computer
network.
Implies data description at two levels
Global (the view of the whole)
Local (where data is actually held).

3
DDBMS (Distributed DBMS)

The software system that permits the management
of the distributed data base and makes the
distribution transparent to users
Transparent users are unaware of the underlying
local structure
Data requests do not specify distribution sites
but they may notice performance differences (e.g.
if local data has to be moved to another site
along a slow line).

4
Characteristics of a DDB

Collection of logically related shared data.
Data is split into a number of fragments
(horizontal or vertical restrict or project).
Fragments may be replicated.
Fragments / replicas are allocated to sites.
Fragments are in effect views
Replicas are duplicates only acceptable if
redundancy is controlled.

5
Why distribute?

Natural match of data with location
Can have each division, department or office hold
its own data with some degree of autonomy.
Autonomy to have control (self-determination,
self-rule)
Users can decide policies locally (devolved).
Still need global DBA to ensure entire system
works.

6
Why distribute? (cont.)

More flexible operation
Improved availability
one node failure does not bring the whole system
down.
Improved reliability
replication ensures that copies of data are still
available if a node fails.
Improved performance
accessing most data locally reduces network
overheads.
Readily handle expansion
can add new nodes with local schema
followed by simple adjustments to global
definition.

7
Problems with Distribution

Complexity
Global and local schema must be integrated.
Design techniques involve more stages.
Replications rigorously handled.
Network needs to be robust.
Costs
More people effort needed to handle the
complexity
although cheaper to buy power with smaller
machines rather than larger ones.

8
Problems with Distribution (cont.)

Security
Many more potential access points for would-be
violators.
Integrity
Need to ensure that combination of local and
global constraints gives the required effect.
Experience
Fairly immature technology
not yet translated to standards.

9
Homogeneous and Heterogeneous DDBMS

A homogeneous DDBMS uses the same database
product at all sites.
A heterogeneous DDBMS uses different data base
products at various sites
may arise from corporate mergers.

10
Degrees of Heterogeneity

Same software, different hardware can be handled
fairly easily.
Oracle 9i Oracle 8i differences slight.
Oracle 9i SQL Server same underlying
relational model, different syntax in places.
Oracle 9i Objectivity object-relational
(SQL-1999) and ODMG respectively, different
underlying model.

11
Interoperability

Ability to work with each other.
In a loosely coupled environment
Full details of each system not needed
BUT need to have interfaces for reliably
exchanging messages without error or
misunderstanding
Solutions
Standardized specifications
Mediation.
Differences in implementation
may still lead to breakdowns in communication.

12
Simple Problem in Interoperability - 1

Two schemas in SQL-1999
A B
author varchar2(50), author_surname
varchar2(40),
author, initials varchar2(10),
title varchar2(300), title varchar2(200),
keyword1 varchar2(30), keywd keywordarr
keyword2 varchar2(30)
CREATE TYPE keywordarr AS
VARRAY(8) OF varchar2(30)
Note homogeneous model both SQL-1999 but
difficulties.

13
Different Standards

e.g. Names
Person (surname, first_name, )
or Person (first_name, surname, )
or Person (name, )
First two may easily be made equivalent but
convention in third needs to be understood.
Note also possibilities of A.N.Other, AN Other,
A N Other, etc.

14
Possible Solutions

In schema B, define a function which amalgamates
the two parts of author into one value.
Will need to look manually at format of author in
schema A.
If format inconsistent, need some pre-processing.
Other inconsistencies require decisions
Fixed two entries for keyword vs. array dimension
8.
Different name for keyword attribute.
Different size for title fields (presumably adopt
higher).
In a heterogeneous environment, we need also to
relate schema constructions, e.g. is CLASS the
same as TABLE?

15
Simple Problem in Interoperability - 2

Homogeneous Models
The same information may be held as attribute
name, relation name or a value in different data
bases.
e.g. library fines
could be held in a dedicated relationFine
(amount, borrower_id)
or as an attribute of Loan (id, isbn, date_out,
fine)
or as a value Charge (1.25, fine).

16
Architectures for Interoperability 1

1. Global schema integration
Produces a single new schema (C) for the
different information systems with schemas (A,
B).

C
A
B
17
Global Schema Integration

Advantages
Transparent to end users appears as a single
information system.
Disadvantages
Difficult to perform integration needs human
understanding.
Local autonomy lost.
Static does not evolve automatically.
Tightly-coupled.

18
Architectures for Interoperability 2

2. Federated Data Base Systems
Less tightly coupled schema than in 1.
Each service specifies sharable objects
through an export schema.
Common data model.
Internal command language.
Decentralised control (local autonomy).
5-level architecture for federated system.
e.g. Objectivity as Federated OODBMS

19
FDBMS Terminology

IS Internal Schema defining layout on disk of a
conceptual schema
CS Conceptual Schema defining logical data base
(e.g. relational tables, attributes, domains).
ES External Schema defining views on conceptual
schema.

20
Federated DBMS 5-level Architecture
Global ES
Global ES
Global CS
Local ES
Local ES
Local CS
Local CS
Local IS
Local IS
DB
DB
21
Federated Data BaseLoosely coupled

Created by users.
AE, BE are export schemas.
V is a view.
A, B are base schemas, retaining autonomy over
those parts not exported.

V
B
A
BE
AE
22
Federated Data BasesTightly coupled

Created by administrators.
Global schema integration on all export schemas.
More formal than loosely-coupled.
Much effort to resolve semantic inconsistencies.

23
Federated Data Base Systems General Advantages

Local autonomy preserved.
Not all data needs to be integrated.
Provide meta-data structures for views (external
and export schema, data dictionary).

24
Federated Database Systems - Disadvantages by
Approach

Loosely coupled
Duplication by different users in building views.
Updating data defined in views can be difficult.
Tightly coupled
Similar to global schema integration
Complex, difficult to make changes dynamically.
Much effort needed to resolve semantic
inconsistencies.

25
Multi-Data-Base Language Approach

No attempt at schema integration.
All sites maintain complete autonomy.
The various schemas can be heterogeneous,
inconsistent w.r.t. services provided, and
duplicate information in different ways.
Language (e.g. MSQL) is used to integrate data
bases at run time.
Relational model used as Common Data Model.

26
Multi-Data-Base Language Approach

A, B are schemas.
MSQL is the run-time language.

MSQL
B
A
27
Multi-Data-Base Language Approach Advantages

No preparatory work to understand semantics of
schema.
Dynamic access latest versions.
Very skilled users can succeed in reaching their
goals.
Interesting work on multi-data-base dependencies.

28
An Example Multi-Data-Base Language

MSQL (Multidatabase SQL)
Biased towards the relational model.
Illustrates problems.
Consider 2 data bases
Each on publications of a computing society
and query
What is the name, e-mail address, title for
each publication of an author appearing in both
of the societys data bases?

29
MSQL Schema

Schema 1 (for AIIA Database)
Contacts (PersonID, Name, Email, )
Conference (Name, Type, )
Attendees (ID, Conf_ID, Speaker, )
Publ_Papers (P_ID, Title, Author_ID, )
Schema 2 (for IFIP Database)
Member_Socs (Soc_Name, )
Conf (Conf_ID, )
Publ_Papers (P_Ref, Title, Conf_Ref, )
Authors (Name, Email, Paper_ID, )
Underlined attributes are primary keys.
Attributes in italics are foreign keys.

30
MSQL for Query

USE AIIA, IFIP
SELECT Name, Email, Title
FROM Authors,
IFIP.Publ_Papers IFIP_Paper,
Contacts,
AIIA.Publ_papers AIIA_Paper
WHERE Authors.Name Contacts.Name
AND Contacts.Person_ID AIIA_Paper.Author_ID
AND Authors.Paper_ID IFIP_Paper.P_Ref
The USE clause declares the multi-data-bases
which are used as qualifiers in the FROM clause
to distinguish tables with the same name
(thereafter distinguished by aliasing).
Retrieves Name, E-mail address and Title from
both data bases.

31
Potential Problems with MSQL

Are names and domains of corresponding attributes
the same?
Can use LET command to create equivalences of
names, but this does not solve domain
incompatibility.
What if one schema is not relational? The E-R
model is often used as a neutral schema for
translation and comparison of heterogeneous
features.

32
Multi-data-base Language Disadvantages in
General

Distribution is not transparent.
Users must resolve inconsistencies themselves.
Common language may restrict scope of
heterogeneity (relational bias).
Local autonomous systems may change their schema
freely (so existing queries fail).

33
Comparison of Approaches

By coupling
How tightly is the interoperable system connected
to its underlying systems?
By adaptability
How freely can the interoperable system evolve in
line with the underlying schema?
By transparency
How much understanding of the interoperable
system do end-users need to have?

34
Comparison of Approaches