Distributed Databases - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

Distributed Databases

Description:

Snapshot Replication ... Replication with periodic snapshot update. ... Oracle (Table Snapshot, Symmetric Replication) Computer Associates (Ingress/Replicator) ... – PowerPoint PPT presentation

Number of Views:17

Avg rating:3.0/5.0

Slides: 37

Provided by: paulml

Category:

more less

Transcript and Presenter's Notes

Title: Distributed Databases

1
Distributed Databases

Chapter 11
MIS 3413 Advanced Data Base Concepts
Dr. Segall
Spring 2002

2
Definitions

Distributed Database A single logical database
that is spread physically across computers in
multiple locations that are connected by a data
communications link.
Decentralized Database A collection of
independent databases on non-networked computers.

3
Reasons forDistributed Database

Local business units want control over data.
Consolidate data across local databases for
integrated decision making.
Reduce telecommunications costs.
Reduce the risk of telecommunications failures.

4
Distributed Database Environments
5
Distributed Database Options

Homogeneous - Same DBMS at each node.
Autonomous - Independent DBMSs.
Non-autonomous - Central , coordinating DBMS.
Heterogeneous - Different DBMSs at different
nodes.
Gateways - Simple paths are created to other
databases without the benefits of one logical
database.
Systems - (continued on next screen)

6
Distributed Database Options

Systems - Supports some or all of the
functionality of one logical database.
Full DBMS Functionality - All dist. Db functions.
Partial-Multi-database - Some dist. Db functions.
Federated - Supports local databases for unique
data requests.
Loose Integration - Local dbs have their own
schemas.
Tight Integration - Local dbs use common schema.
Unfederated - Requires all access to go through a
central, coordinating module.

7
Homogeneous, Non-Autonomous Database

Data is distributed across all the nodes.
Same DBMS at each node.
All data is managed by the distributed DBMS (no
exclusively local data.)
All access is through one, global schema.
The global schema is the union of all the local
schema.

8
Homogeneous, Non-Autonomous Database
9
Heterogeneous Environment

Data distributed across all the nodes.
Different DBMSs may be used at each node.
Local access is done using the local DBMS and
schema.
Remote access is done using the global schema.

10
Heterogeneous Environment
11
Objectives and Trade-offs

Location Transparency - User does not have to
know the location of the data.
Local Autonomy - Local site can operate with its
database when central site is down.
Synchronous Distributed Database - All copies of
the same data are always identical.
Asynchronous Distributed Database - Some data
inconsistency is tolerated.

12
Advantages ofDistributed Database

Increased reliability and availability.
Local control over data.
Modular growth.
Lower communication costs.
Faster response for certain queries.

13
Disadvantages ofDistributed Database

Software cost and complexity.
Processing overhead.
Data integrity exposure.
Slower response for certain queries.

14
Options forDistributing a Database

Data replication.
Horizontal partitioning.
Vertical partitioning.
Combinations of the above.

15
Data Replication

Advantages -
Reliability.
Fast response.
May avoid complicated distributed transaction
integrity routines (if replicated data is
refreshed at scheduled intervals.)
De-couples nodes (transactions proceed even if
some nodes are down.)
Reduced network traffic at prime time (if updates
can be delayed.)

16
Data Replication

Disadvantages -
Additional requirements for storage space.
Additional time for update operations.
Complexity and cost of updating.
Integrity exposure of getting incorrect data if
replicated data is not updated simultaneously.
Therefore, better when used for non-volatile data.

17
Types of Data Replication

Snapshot Replication -
Changes are periodically sent to a master site
which sends an updated snapshot out to the other
sites.
Near Real-Time Replication -
Broadcast update orders without requiring
confirmation.
Pull Replication -
Each site controls when it wants updates.

18
When to Use Data Replication

Data timeliness.
If DBMS cannot reference data from more than one
node.
Batched updates can cause performance problems.
Updates complicated with heterogeneous DBMSs or
database design.
Telecommunications speeds limit mass updates.

19
Horizontal Partitioning

Different records of a file at different sites.
Advantages -
Efficiency Data stored close to where it is
used.
Local access optimization.
Security.
Ease of querying
Disadvantages
Inconsistent access speed when accessing data
across partitions.
Backup vulnerablility no data replication.

20
Vertical Partitioning

Different columns of a file at different sites.
Advantages and disadvantages are the same as for
horizontal partitioning except that combining
data across partitions is more difficult because
it requires joins.

21
Distributed processing system for a manufacturing
company
22
Five Distributed Database Strategies

Centralized database, distributed access.
Replication with periodic snapshot update.
Replication with near real-time synchronization
of updates.
Partitioned, one logical database.
Partitioned, independent, non-integrated segments.

23
Choice of Distributed Strategy
24
Choice of Distributed Strategy

Funding, autonomy, security.
Site data referencing patterns.
Growth and expansion needs.
Technological capabilities.
Costs of managing complex technologies.
Need for reliable service.

25
Requirements for a Distributed DBMS

Ability to locate data with a distributed data
dictionary.
Determine the location from which to retrieve
data and the location at which to process each
part of a distributed query.
Heterogeneous DBMS translation.
Security, concurrency, query optimization,
failure recovery.
Consistency of replicated data.

26
Distributed DBMS Data Reference

Local Transaction - references local data.
In a distributed database, a transaction that
requires reference only to data that are stored
at the site where transaction originates.
Global Transaction - references non-local data.
In a distributed database, a transaction that
requires reference only to data at one or more
non-local sites to satisfy the request.

27
Distributed DBMS architecture
28
Distributed DBMS Transparency Objectives

Location Transparency
Distributed data may be treated as if it were at
one location
Replication Transparency (fragmentation
transparency)
Replicated data may be treated as if it were one
item
Failure Transparency
All or none of the actions of a transaction are
committed.
Each site has a Transaction Manager.
Logs transactions and before and after images.
Concurrency control scheme to ensure data
integrity.

29
Distributed DBMS Transparency Objectives

Commit Protocol Ensures that a global
transaction is either successfully completed at
each site or else aborted.
Two-Phase Commit An algorithm for coordinating
updates in a distributed database.

30
Two-Phase Commit

Prepare Phase
Coordinator receives a commit request.
Coordinator instructs all resource managers to
get ready to go either way on the transaction.
Each resource manager writes all updates from
that transaction to its own physical log.
Coordinator receives replies from all resource
managers. If all are ok, it writes commit to its
own log if not then it writes rollback to its
log.

31
Two-Phase Commit

Commit Phase
Coordinator then informs each resource manager of
its decision and broadcasts a message to either
commit or rollback (abort.) If the message is
commit, then each resource manager transfers the
update from its log to its database.
A failure during the commit phase puts a
transaction in limbo. This has to be tested
for and handled with timeouts or polling.

32
Distributed DBMS Transparency Objectives

Concurrency Transparency
Design goal for distributed database,
although a distributed system runs many
transactions, it appears that a given transaction
is the only activity in the system.
Timestamping
Concurrency control mechanism.
Assigns a globally unique timestamp to each
transaction.
alternative to locks in distributed databases.

33
Query Optimization

In a query involving a multi-site join and,
possibly, a distributed database with replicated
files, the distributed DBMS must decide where to
access the data and how to proceed with the join.
Three step process
Query decomposition - rewritten and simplified
Data localization - query fragmented so that
fragments reference data at only one site.
Global optimization -
Order in which to execute query fragments.
Data movement between sites.
Where parts of the query will be executed.
Semijoin only the joining attribute from one
site is transmitted.

34
Query Optimization - C.J. Date Example

SUPPLIER (Supplier No., City) - 10K recs in
Detroit
PART (Part No., Color) - 100K recs in Chicago
SHIPMENT (Supplier No., Part No.) - 1mil recs in
Det.
10 red parts 100K shipments from Cleveland
SELECT SUPPLIER.SUPPLIER_NO
FROM SUPPLIER, SHIPMENT, PART
WHERE SUPPLIER.CITY Cleveland
AND SHIPMENT.PART_NO PART.PART_NO
AND SHIPMENT.SUPPLIER_NO SUPPLIER.SUPPLIER_NO
AND PART.COLOR Red

35
Evolution of Distributed DBMS

Unit of Work - All of a transactions
instructions.
Remote Unit of Work
SQL statements at one location can be executed as
a single unit of work on a single remote DBMS.
Distributed Unit of Work
Different statements in a unit of work may refer
to different remote sites.
All databases in an SQL statement must be at a
single site.
Distributed Request
A single SQL statement may refer to tables in
more than one remote site.
May not support replication transparency or
failure transparency.