Distributed Databases - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Distributed Databases

Description:

A distributed database system consists of loosely coupled sites that share no ... is used for hotel and airline reservations, stock market transactions, banking ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 25
Provided by: ben78
Category:

less

Transcript and Presenter's Notes

Title: Distributed Databases


1
Distributed Databases
  • by Chien-Pin Hsu
  • CS157B Section 1
  • Nov 11, 2004
  • Dr. Sin-Min Lee

2
Distributed Database System
  • A distributed database system consists of loosely
    coupled sites that share no physical component
  • Appears to user as a single system
  • Database systems that run on each site are
    independent of each other
  • Processing maybe done at a site other than the
    initiator of request

3
Homogenous Distributed Database Systems
  • All sites have identical software
  • They are aware of each other and agree to
    cooperate in processing user requests
  • It appears to user as a single system

4
An Homogenous Distributed Database Systems
example
  • A distributed system connects three databases
    hq, mfg, and sales
  • An application can simultaneously access or
    modify the data in several databases in a single
    distributed environment.

5
What can we do?
  • A single query from a Manufacturing client on
    local database mfg can retrieve joined data from
    the products table on the local database and the
    dept table on the remote hq database.
  • For a client application, the location and
    platform of the databases are transparent.

6
Makes life easier!!
  • For example, if you are connected to database mfg
    but want to access data on database hq, creating
    a synonym on mfg for the remote dept table
    enables you to issue this query
  • SELECT
  • FROM dept
  • In this way, a distributed system gives the
    appearance of native data access.
  • Users on mfg do not have to know that the data
    they access resides on remote databases.

7
Heterogeneous Distributed Database System
  • In a heterogeneous distributed database system,
    at least one of the databases uses different
    schemas and software.
  • A database system having different schema may
    cause a major problem for query processing.
  • A database system having different software may
    cause a major problem for transaction processing.

8
Distributed Data Storage
  • Replication
  • System maintains multiple copies of data, stored
    in different sites, for faster retrieval and
    fault tolerance.
  • Fragmentation
  • Relation is partitioned into several fragments
    stored in distinct sites
  • Replication and fragmentation can be combined
  • Relation is partitioned into several fragments
    system maintains several identical replicas of
    each such fragment.

9
Advantages of Replication
  • Availability failure of site containing
    relation r does not result in unavailability of r
    is replicas exist.
  • Parallelism queries on r may be processed by
    several nodes in parallel.
  • Reduced data transfer relation r is available
    locally at each site containing a replica of r.

10
Disadvantages of Replication
  • Increased cost of updates each replica of
    relation r must be updated.
  • Increased complexity of concurrency control
    concurrent updates to distinct replicas may lead
    to inconsistent data unless special concurrency
    control mechanisms are implemented.
  • One solution choose one copy as primary copy and
    apply concurrency control operations on primary
    copy.

11
Fragmentation
  • Data can be distributed by storing individual
    tables at different sites
  • Data can also be distributed by decomposing a
    table and storing portions at different sites
    called Fragmentation
  • Fragmentation can be horizontal or vertical

12
Why use Fragmentation?
  • Usage - in general applications use views so its
    appropriate to work with subsets
  • Efficiency - data stored close to where it is
    most frequently used
  • Parallelism - a transaction can divided into
    several sub-queries to increase degree of
    concurrency
  • Security - data more secure - only stored where
    it is needed

Disadvantages Performance - may be
slower Integrity - more difficult
13
Horizontal Fragmentation
  • Each fragment, Ti , of table T contains a subset
    of the rows
  • Each tuple of T is assigned to one or more
    fragments.
  • Horizontal fragmentation is lossless

14
Horizontal Fragmentation Example
  • A bank account schema has a relation
  • Account-schema (branch-name, account-number,
    balance).
  • It fragments the relation by location and stores
    each fragment locally rows with branch-name
    Hillside are stored in the Hillside in a
    fragment

15
Vertical Fragmentation
  • Each fragment, Ti, of T contains a subset of the
    columns, each column is in at least one fragment,
    and each fragment includes the key
  • Ti ?attr_listi (T)
  • T T1 T2 .. Tn
  • All schemas must contain a common candidate key
    (or superkey) to ensure lossless join property.
  • A special attribute, the tuple-id attribute may
    be added to each schema to serve as a candidate
    key.

16
Vertical Fragmentation Example
  • A employee-info schema has a relation
  • employee-info schema (designation, name,
    Employee-id, salary).
  • It fragments the relation to put information in
    two tables for security concern.

17
Commit Protocols
  • Commit protocols are used to ensure atomicity
    across sites
  • Atomicity states that database modifications must
    follow an all or nothing rule.
  • a transaction which executes at multiple sites
    must either be committed at all the sites, or
    aborted at all the sites.

18
The Two-Phase Commit (2 PC) Protocol
  • What is this?
  • Two-phase commit is a transaction protocol
    designed for the complications that arise with
    distributed resource managers.
  • Two-phase commit technology is used for hotel and
    airline reservations, stock market transactions,
    banking applications, and credit card systems.
  • With a two-phase commit protocol, the distributed
    transaction manager employs a coordinator to
    manage the individual resource managers. The
    commit process proceeds as follows

19
Phase1 Obtaining a Decision
  • Step 1 ? Coordinator asks all participants to
    prepare to commit transaction Ti.
  • Ci adds the records ltprepare Tgt to the log and
    forces log to stable storage (a log is a file
    which maintains a record of all changes to the
    database)
  • sends prepare T messages to all sites where T
    executed

20
Phase1 Making a Decision
  • Step 2 ? Upon receiving message, transaction
    manager at site determines if it can commit the
    transaction
  • if not
  • add a record ltno Tgt to the log and send abort
    T message to Ci
  • if the transaction can be committed, then
  • 1). add the record ltready Tgt to the log
  • 2). force all records for T to stable storage
  • 3). send ready T message to Ci

21
Phase 2 Recording the Decision
  • Step 1 ? T can be committed of Ci received a
    ready T message from all the participating sites
    otherwise T must be aborted.
  • Step 2 ? Coordinator adds a decision record,
    ltcommit Tgt or ltabort Tgt, to the log and forces
    record onto stable storage. Once the record is in
    stable storage, it cannot be revoked (even if
    failures occur)
  • Step 3 ? Coordinator sends a message to each
    participant informing it of the decision (commit
    or abort)
  • Step 4 ? Participants take appropriate action
    locally.

22
Two-Phase Commit Diagram
23
Costs and Limitations
  • There have been two performance issues with two
    phase commit
  • If one database server is unavailable, none of
    the servers gets the updates.
  • This is correctable through network tuning and
    correctly building the data distribution through
    database optimization techniques.

24
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com