Title: Distributed Databases
1Distributed Databases
- by Chien-Pin Hsu
- CS157B Section 1
- Nov 11, 2004
- Dr. Sin-Min Lee
2Distributed Database System
- A distributed database system consists of loosely
coupled sites that share no physical component - Appears to user as a single system
- Database systems that run on each site are
independent of each other - Processing maybe done at a site other than the
initiator of request
3Homogenous Distributed Database Systems
- All sites have identical software
- They are aware of each other and agree to
cooperate in processing user requests - It appears to user as a single system
4 An Homogenous Distributed Database Systems
example
- A distributed system connects three databases
hq, mfg, and sales - An application can simultaneously access or
modify the data in several databases in a single
distributed environment.
5What can we do?
- A single query from a Manufacturing client on
local database mfg can retrieve joined data from
the products table on the local database and the
dept table on the remote hq database. - For a client application, the location and
platform of the databases are transparent.
6Makes life easier!!
- For example, if you are connected to database mfg
but want to access data on database hq, creating
a synonym on mfg for the remote dept table
enables you to issue this query - SELECT
- FROM dept
- In this way, a distributed system gives the
appearance of native data access. - Users on mfg do not have to know that the data
they access resides on remote databases.
7Heterogeneous Distributed Database System
- In a heterogeneous distributed database system,
at least one of the databases uses different
schemas and software. - A database system having different schema may
cause a major problem for query processing. - A database system having different software may
cause a major problem for transaction processing.
8Distributed Data Storage
- Replication
- System maintains multiple copies of data, stored
in different sites, for faster retrieval and
fault tolerance. - Fragmentation
- Relation is partitioned into several fragments
stored in distinct sites - Replication and fragmentation can be combined
- Relation is partitioned into several fragments
system maintains several identical replicas of
each such fragment.
9Advantages of Replication
- Availability failure of site containing
relation r does not result in unavailability of r
is replicas exist. - Parallelism queries on r may be processed by
several nodes in parallel. - Reduced data transfer relation r is available
locally at each site containing a replica of r.
10Disadvantages of Replication
- Increased cost of updates each replica of
relation r must be updated. - Increased complexity of concurrency control
concurrent updates to distinct replicas may lead
to inconsistent data unless special concurrency
control mechanisms are implemented. - One solution choose one copy as primary copy and
apply concurrency control operations on primary
copy.
11Fragmentation
- Data can be distributed by storing individual
tables at different sites - Data can also be distributed by decomposing a
table and storing portions at different sites
called Fragmentation - Fragmentation can be horizontal or vertical
12Why use Fragmentation?
- Usage - in general applications use views so its
appropriate to work with subsets - Efficiency - data stored close to where it is
most frequently used - Parallelism - a transaction can divided into
several sub-queries to increase degree of
concurrency - Security - data more secure - only stored where
it is needed
Disadvantages Performance - may be
slower Integrity - more difficult
13Horizontal Fragmentation
- Each fragment, Ti , of table T contains a subset
of the rows - Each tuple of T is assigned to one or more
fragments. - Horizontal fragmentation is lossless
14Horizontal Fragmentation Example
- A bank account schema has a relation
- Account-schema (branch-name, account-number,
balance). - It fragments the relation by location and stores
each fragment locally rows with branch-name
Hillside are stored in the Hillside in a
fragment
15Vertical Fragmentation
- Each fragment, Ti, of T contains a subset of the
columns, each column is in at least one fragment,
and each fragment includes the key - Ti ?attr_listi (T)
- T T1 T2 .. Tn
- All schemas must contain a common candidate key
(or superkey) to ensure lossless join property. - A special attribute, the tuple-id attribute may
be added to each schema to serve as a candidate
key.
16Vertical Fragmentation Example
- A employee-info schema has a relation
- employee-info schema (designation, name,
Employee-id, salary). - It fragments the relation to put information in
two tables for security concern.
17Commit Protocols
- Commit protocols are used to ensure atomicity
across sites - Atomicity states that database modifications must
follow an all or nothing rule. - a transaction which executes at multiple sites
must either be committed at all the sites, or
aborted at all the sites. -
18The Two-Phase Commit (2 PC) Protocol
- What is this?
-
- Two-phase commit is a transaction protocol
designed for the complications that arise with
distributed resource managers. - Two-phase commit technology is used for hotel and
airline reservations, stock market transactions,
banking applications, and credit card systems. - With a two-phase commit protocol, the distributed
transaction manager employs a coordinator to
manage the individual resource managers. The
commit process proceeds as follows -
19Phase1 Obtaining a Decision
- Step 1 ? Coordinator asks all participants to
prepare to commit transaction Ti. - Ci adds the records ltprepare Tgt to the log and
forces log to stable storage (a log is a file
which maintains a record of all changes to the
database) -
- sends prepare T messages to all sites where T
executed
20Phase1 Making a Decision
- Step 2 ? Upon receiving message, transaction
manager at site determines if it can commit the
transaction - if not
- add a record ltno Tgt to the log and send abort
T message to Ci - if the transaction can be committed, then
- 1). add the record ltready Tgt to the log
- 2). force all records for T to stable storage
- 3). send ready T message to Ci
21Phase 2 Recording the Decision
- Step 1 ? T can be committed of Ci received a
ready T message from all the participating sites
otherwise T must be aborted. - Step 2 ? Coordinator adds a decision record,
ltcommit Tgt or ltabort Tgt, to the log and forces
record onto stable storage. Once the record is in
stable storage, it cannot be revoked (even if
failures occur) - Step 3 ? Coordinator sends a message to each
participant informing it of the decision (commit
or abort) - Step 4 ? Participants take appropriate action
locally.
22Two-Phase Commit Diagram
23Costs and Limitations
- There have been two performance issues with two
phase commit - If one database server is unavailable, none of
the servers gets the updates. - This is correctable through network tuning and
correctly building the data distribution through
database optimization techniques.
24(No Transcript)