Distributed Data Bases: - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Distributed Data Bases:

Description:

A logically interrelated collection of shared data, physically distributed over ... (e.g. if local data has to be moved to another site along a slow line) ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 37
Provided by: akhta
Category:

less

Transcript and Presenter's Notes

Title: Distributed Data Bases:


1
Lecture 7
  • Distributed Data Bases
  • Principles and Architectures

2
Distributed Data Base (DDB) Definition
  • A logically interrelated collection of shared
    data, physically distributed over a computer
    network.
  • Implies data description at two levels
  • Global (the view of the whole)
  • Local (where data is actually held).

3
DDBMS (Distributed DBMS)
  • The software system that permits the management
    of the distributed data base and makes the
    distribution transparent to users
  • Transparent users are unaware of the underlying
    local structure
  • Data requests do not specify distribution sites
  • but they may notice performance differences (e.g.
    if local data has to be moved to another site
    along a slow line).

4
Characteristics of a DDB
  • Collection of logically related shared data.
  • Data is split into a number of fragments
    (horizontal or vertical restrict or project).
  • Fragments may be replicated.
  • Fragments / replicas are allocated to sites.
  • Fragments are in effect views
  • Replicas are duplicates only acceptable if
    redundancy is controlled.

5
Why distribute?
  • Natural match of data with location
  • Can have each division, department or office hold
    its own data with some degree of autonomy.
  • Autonomy to have control (self-determination,
    self-rule)
  • Users can decide policies locally (devolved).
  • Still need global DBA to ensure entire system
    works.

6
Why distribute? (cont.)
  • More flexible operation
  • Improved availability
  • one node failure does not bring the whole system
    down.
  • Improved reliability
  • replication ensures that copies of data are still
    available if a node fails.
  • Improved performance
  • accessing most data locally reduces network
    overheads.
  • Readily handle expansion
  • can add new nodes with local schema
  • followed by simple adjustments to global
    definition.

7
Problems with Distribution
  • Complexity
  • Global and local schema must be integrated.
  • Design techniques involve more stages.
  • Replications rigorously handled.
  • Network needs to be robust.
  • Costs
  • More people effort needed to handle the
    complexity
  • although cheaper to buy power with smaller
    machines rather than larger ones.

8
Problems with Distribution (cont.)
  • Security
  • Many more potential access points for would-be
    violators.
  • Integrity
  • Need to ensure that combination of local and
    global constraints gives the required effect.
  • Experience
  • Fairly immature technology
  • not yet translated to standards.

9
Homogeneous and Heterogeneous DDBMS
  • A homogeneous DDBMS uses the same database
    product at all sites.
  • A heterogeneous DDBMS uses different data base
    products at various sites
  • may arise from corporate mergers.

10
Degrees of Heterogeneity
  • Same software, different hardware can be handled
    fairly easily.
  • Oracle 9i Oracle 8i differences slight.
  • Oracle 9i SQL Server same underlying
    relational model, different syntax in places.
  • Oracle 9i Objectivity object-relational
    (SQL-1999) and ODMG respectively, different
    underlying model.

11
Interoperability
  • Ability to work with each other.
  • In a loosely coupled environment
  • Full details of each system not needed
  • BUT need to have interfaces for reliably
    exchanging messages without error or
    misunderstanding
  • Solutions
  • Standardized specifications
  • Mediation.
  • Differences in implementation
  • may still lead to breakdowns in communication.

12
Simple Problem in Interoperability - 1
  • Two schemas in SQL-1999
  • A B
  • author varchar2(50), author_surname
    varchar2(40),

  • author, initials varchar2(10),
  • title varchar2(300), title varchar2(200),
  • keyword1 varchar2(30), keywd keywordarr
  • keyword2 varchar2(30)
  • CREATE TYPE keywordarr AS
  • VARRAY(8) OF varchar2(30)
  • Note homogeneous model both SQL-1999 but
    difficulties.

13
Different Standards
  • e.g. Names
  • Person (surname, first_name, )
  • or Person (first_name, surname, )
  • or Person (name, )
  • First two may easily be made equivalent but
    convention in third needs to be understood.
  • Note also possibilities of A.N.Other, AN Other,
    A N Other, etc.

14
Possible Solutions
  • In schema B, define a function which amalgamates
    the two parts of author into one value.
  • Will need to look manually at format of author in
    schema A.
  • If format inconsistent, need some pre-processing.
  • Other inconsistencies require decisions
  • Fixed two entries for keyword vs. array dimension
    8.
  • Different name for keyword attribute.
  • Different size for title fields (presumably adopt
    higher).
  • In a heterogeneous environment, we need also to
    relate schema constructions, e.g. is CLASS the
    same as TABLE?

15
Simple Problem in Interoperability - 2
  • Homogeneous Models
  • The same information may be held as attribute
    name, relation name or a value in different data
    bases.
  • e.g. library fines
  • could be held in a dedicated relationFine
    (amount, borrower_id)
  • or as an attribute of Loan (id, isbn, date_out,
    fine)
  • or as a value Charge (1.25, fine).

16
Architectures for Interoperability 1
  • 1. Global schema integration
  • Produces a single new schema (C) for the
    different information systems with schemas (A,
    B).

C
A
B
17
Global Schema Integration
  • Advantages
  • Transparent to end users appears as a single
    information system.
  • Disadvantages
  • Difficult to perform integration needs human
    understanding.
  • Local autonomy lost.
  • Static does not evolve automatically.
  • Tightly-coupled.

18
Architectures for Interoperability 2
  • 2. Federated Data Base Systems
  • Less tightly coupled schema than in 1.
  • Each service specifies sharable objects
    through an export schema.
  • Common data model.
  • Internal command language.
  • Decentralised control (local autonomy).
  • 5-level architecture for federated system.
  • e.g. Objectivity as Federated OODBMS

19
FDBMS Terminology
  • IS Internal Schema defining layout on disk of a
    conceptual schema
  • CS Conceptual Schema defining logical data base
    (e.g. relational tables, attributes, domains).
  • ES External Schema defining views on conceptual
    schema.

20
Federated DBMS 5-level Architecture
Global ES
Global ES
Global CS
Local ES
Local ES
Local CS
Local CS
Local IS
Local IS
DB
DB
21
Federated Data BaseLoosely coupled
  • Created by users.
  • AE, BE are export schemas.
  • V is a view.
  • A, B are base schemas, retaining autonomy over
    those parts not exported.

V
B
A
BE
AE
22
Federated Data BasesTightly coupled
  • Created by administrators.
  • Global schema integration on all export schemas.
  • More formal than loosely-coupled.
  • Much effort to resolve semantic inconsistencies.

23
Federated Data Base Systems General Advantages
  • Local autonomy preserved.
  • Not all data needs to be integrated.
  • Provide meta-data structures for views (external
    and export schema, data dictionary).

24
Federated Database Systems - Disadvantages by
Approach
  • Loosely coupled
  • Duplication by different users in building views.
  • Updating data defined in views can be difficult.
  • Tightly coupled
  • Similar to global schema integration
  • Complex, difficult to make changes dynamically.
  • Much effort needed to resolve semantic
    inconsistencies.

25
Multi-Data-Base Language Approach
  • No attempt at schema integration.
  • All sites maintain complete autonomy.
  • The various schemas can be heterogeneous,
    inconsistent w.r.t. services provided, and
    duplicate information in different ways.
  • Language (e.g. MSQL) is used to integrate data
    bases at run time.
  • Relational model used as Common Data Model.

26
Multi-Data-Base Language Approach
  • A, B are schemas.
  • MSQL is the run-time language.

MSQL
B
A
27
Multi-Data-Base Language Approach Advantages
  • No preparatory work to understand semantics of
    schema.
  • Dynamic access latest versions.
  • Very skilled users can succeed in reaching their
    goals.
  • Interesting work on multi-data-base dependencies.

28
An Example Multi-Data-Base Language
  • MSQL (Multidatabase SQL)
  • Biased towards the relational model.
  • Illustrates problems.
  • Consider 2 data bases
  • Each on publications of a computing society
  • and query
  • What is the name, e-mail address, title for
    each publication of an author appearing in both
    of the societys data bases?

29
MSQL Schema
  • Schema 1 (for AIIA Database)
  • Contacts (PersonID, Name, Email, )
  • Conference (Name, Type, )
  • Attendees (ID, Conf_ID, Speaker, )
  • Publ_Papers (P_ID, Title, Author_ID, )
  • Schema 2 (for IFIP Database)
  • Member_Socs (Soc_Name, )
  • Conf (Conf_ID, )
  • Publ_Papers (P_Ref, Title, Conf_Ref, )
  • Authors (Name, Email, Paper_ID, )
  • Underlined attributes are primary keys.
  • Attributes in italics are foreign keys.

30
MSQL for Query
  • USE AIIA, IFIP
  • SELECT Name, Email, Title
  • FROM Authors,
  • IFIP.Publ_Papers IFIP_Paper,
  • Contacts,
  • AIIA.Publ_papers AIIA_Paper
  • WHERE Authors.Name Contacts.Name
  • AND Contacts.Person_ID AIIA_Paper.Author_ID
  • AND Authors.Paper_ID IFIP_Paper.P_Ref
  • The USE clause declares the multi-data-bases
    which are used as qualifiers in the FROM clause
    to distinguish tables with the same name
    (thereafter distinguished by aliasing).
  • Retrieves Name, E-mail address and Title from
    both data bases.

31
Potential Problems with MSQL
  • Are names and domains of corresponding attributes
    the same?
  • Can use LET command to create equivalences of
    names, but this does not solve domain
    incompatibility.
  • What if one schema is not relational? The E-R
    model is often used as a neutral schema for
    translation and comparison of heterogeneous
    features.

32
Multi-data-base Language Disadvantages in
General
  • Distribution is not transparent.
  • Users must resolve inconsistencies themselves.
  • Common language may restrict scope of
    heterogeneity (relational bias).
  • Local autonomous systems may change their schema
    freely (so existing queries fail).

33
Comparison of Approaches
  • By coupling
  • How tightly is the interoperable system connected
    to its underlying systems?
  • By adaptability
  • How freely can the interoperable system evolve in
    line with the underlying schema?
  • By transparency
  • How much understanding of the interoperable
    system do end-users need to have?

34
Comparison of Approaches
  • Approach Coupling Adaptability Transparency
  • Global Schema Tight Low High
  • Integration
  • Federated Medium Medium Medium
  • Data Bases
  • Multi-data-base Loose High Low
  • Languages

35
Summary
  • Trend
  • From Global Schema Integration
  • through Federated Data Bases
  • to Multi-data-base Language.
  • Towards looser coupling, higher adaptability, and
    lower transparency.

36
Further Reading
  • Management of Heterogeneous and Autonomous
    Database Systems
  • Elmagarmid, Ahmed
  • Rusinkiewicz, Marek
  • Sheth, Amit
  • Morgan Kaufmann (1999).
Write a Comment
User Comments (0)
About PowerShow.com