Introduction to Distributed Databases - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Introduction to Distributed Databases

Description:

Introduction to Distributed Databases Yiwei Wu – PowerPoint PPT presentation

Number of Views:254
Avg rating:3.0/5.0
Slides: 21
Provided by: gsu137
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Distributed Databases


1
Introduction to Distributed Databases
  • Yiwei Wu

2
Introduction
  • A distributed database is a database in which
    portions of the database are stored on multiple
    computers within a network.

Distributed DB
Centralized DB
3
Introduction Cont.
  • Disadvantages
  • Complexity
  • Economics
  • Security
  • Difficult to maintain integrity
  • Inexperience
  • Advantages
  • Reflects organizational structure
  • Local autonomy
  • Improved availability
  • Improved performance
  • Economics
  • Modularity

4
Types of DDBS
  • Homogeneous
  • Uses one DBMS for all the servers in the
    system(eg Oracle or MS-SQL ).
  • Heterogeneous
  • Uses two or more different DBMS's for
    different database servers(eg Oracle and MS-SQL
    and postgresql).

5
Data Fragmentation
  • Horizontal fragments
  • subsets of tuples (rows) from a relation
    (table).
  • Vertical fragments
  • subsets of attributes (columns) from a
    relation (table).
  • Mixed fragment
  • a fragment which is
  • both horizontally and
  • vertically fragmented.

6
Replication
  • fully replication
  • the whole database is replicated at every site
    in the distributed system
  • no replication
  • each fragment is stored at exactly one site
  • partial replication
  • some fragments of the database may be
    replicated whereas others may not

7
Query Processing
Site 1 Employee
Site 2 Department
Site 3 Result
  • Site1 10,000 records, 100 bytes each
  • R(Employee)(Fname, Lname, SSN, .. Dno)
  • Site2 100 records, 35 bytes each
  • R(Department)(Dnumber, Dname,.)
  • Q

8
Distributed Query
  • Transfer Employee to site3
  • Transfer Department to site3
  • Perform join at site3
  • Cost 1,000,0003500 1,003,500 bytes

9
Semijoin
  • The idea of using the semijoin operation is to
    reduce the number of tuples in a relation before
    transferring it to another site.
  • Project the join attribute of Department at site
    2 and transfer to site1.
  • Cost 4100
  • Join with the employee at site 1, and transfer
    back to site3
  • Cost 3410,000
  • Total Cost 340,400 bytes

10
Transaction
  • Two phase commit protocol
  • Phase 1 Obtaining a Decision
  • Coordinator asks all participants to prepare to
    commit transaction Ti.
  • Ci adds the records ltprepare Tgt to the log and
    forces log to stable storage
  • sends ltprepare Tgt messages to all sites at which
    T executed
  • Upon receiving message, transaction manager at
    site determines if it can commit the transaction
  • if not, add a record ltno Tgt to the log and send
    abort T message to Ci
  • if the transaction can be committed, then
  • add the record ltready Tgt to the log
  • force all records for T to stable storage
  • send ready T message to Ci

11
Two phase commit protocolCont.
  • Phase 2 Recording the Decision
  • T can be committed of Ci received a ready T
    message from all the participating sites
    otherwise T must be aborted.
  • Coordinator adds a decision record, ltcommit Tgt or
    ltabort Tgt, to the log and forces record onto
    stable storage. Once the record stable storage it
    is irrevocable (even if failures occur)
  • Coordinator sends a message to each participant
    informing it of the decision (commit or abort)
  • Participants take appropriate action locally.

12
Concurrency Control algorithms
  • Pessimistic
  • synchronize the execution of user requests
    before the transaction starts
  • E.g. Two-phase locking protocol, Timestamp
    ordering protocol
  • Optimistic
  • execute the requests and then perform a
    validation check to ensure that the execution has
    not compromised the consistency of the database
  • E.g. Locking based and Timestamp ordering based

13
Concurrency Control Replication
  • primary site technique
  • -- it is a simple extension of the centralized
    locking approach.
  • primary site with backup site
  • -- All locking information is maintained at both
    the primary and the backup sites
  • primary copy technique
  • -- Failure of one site only affects any
    transactions that are accessing locks on items
    whose primary copies reside at that site, but
    other transactions are not affected.

14
Deadlock Handling
  • Centralized Approach
  • A global wait-for graph is constructed and
    maintained in a single site which is the
    deadlock-detection coordinator.

Local wait-for graph
Global wait-for graph
15
Recovery
  • it is quite difficult to determine whether a site
    is down without exchanging numerous messages with
    other sites.
  • When a transaction is updating data at several
    sites, it cannot commit until it is sure that the
    effect of the transaction on every site cannot be
    lost.
  • The two-phase commit protocol is often used to
    ensure the correctness of distributed commit.

16
3-tier Client-Server Architecture
  • The first, or presentation tier, (the client or
    front-end), deals with the interaction with the
    user.
  • The second, processes the requests of all
    clients.
  • The third or database tier contains the database
    management system that manages all persistent
    data.

17
3-tier Architecture Cont.
18
Summary
  • Distributed DBMS offer site autonomy and
    distributed administration.
  • Must revisit storage techniques, concurrency
    control, and recovery issues

19
  • QA

20
  • Thank You
Write a Comment
User Comments (0)
About PowerShow.com