Distributed Databases - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Distributed Databases

Description:

Snapshot Replication ... Replication with periodic snapshot update. ... Oracle (Table Snapshot, Symmetric Replication) Computer Associates (Ingress/Replicator) ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 37
Provided by: paulml
Category:

less

Transcript and Presenter's Notes

Title: Distributed Databases


1
Distributed Databases
  • Chapter 11
  • MIS 3413 Advanced Data Base Concepts
  • Dr. Segall
  • Spring 2002

2
Definitions
  • Distributed Database A single logical database
    that is spread physically across computers in
    multiple locations that are connected by a data
    communications link.
  • Decentralized Database A collection of
    independent databases on non-networked computers.

3
Reasons forDistributed Database
  • Local business units want control over data.
  • Consolidate data across local databases for
    integrated decision making.
  • Reduce telecommunications costs.
  • Reduce the risk of telecommunications failures.

4
Distributed Database Environments
5
Distributed Database Options
  • Homogeneous - Same DBMS at each node.
  • Autonomous - Independent DBMSs.
  • Non-autonomous - Central , coordinating DBMS.
  • Heterogeneous - Different DBMSs at different
    nodes.
  • Gateways - Simple paths are created to other
    databases without the benefits of one logical
    database.
  • Systems - (continued on next screen)

6
Distributed Database Options
  • Systems - Supports some or all of the
    functionality of one logical database.
  • Full DBMS Functionality - All dist. Db functions.
  • Partial-Multi-database - Some dist. Db functions.
  • Federated - Supports local databases for unique
    data requests.
  • Loose Integration - Local dbs have their own
    schemas.
  • Tight Integration - Local dbs use common schema.
  • Unfederated - Requires all access to go through a
    central, coordinating module.

7
Homogeneous, Non-Autonomous Database
  • Data is distributed across all the nodes.
  • Same DBMS at each node.
  • All data is managed by the distributed DBMS (no
    exclusively local data.)
  • All access is through one, global schema.
  • The global schema is the union of all the local
    schema.

8
Homogeneous, Non-Autonomous Database
9
Heterogeneous Environment
  • Data distributed across all the nodes.
  • Different DBMSs may be used at each node.
  • Local access is done using the local DBMS and
    schema.
  • Remote access is done using the global schema.

10
Heterogeneous Environment
11
Objectives and Trade-offs
  • Location Transparency - User does not have to
    know the location of the data.
  • Local Autonomy - Local site can operate with its
    database when central site is down.
  • Synchronous Distributed Database - All copies of
    the same data are always identical.
  • Asynchronous Distributed Database - Some data
    inconsistency is tolerated.

12
Advantages ofDistributed Database
  • Increased reliability and availability.
  • Local control over data.
  • Modular growth.
  • Lower communication costs.
  • Faster response for certain queries.

13
Disadvantages ofDistributed Database
  • Software cost and complexity.
  • Processing overhead.
  • Data integrity exposure.
  • Slower response for certain queries.

14
Options forDistributing a Database
  • Data replication.
  • Horizontal partitioning.
  • Vertical partitioning.
  • Combinations of the above.

15
Data Replication
  • Advantages -
  • Reliability.
  • Fast response.
  • May avoid complicated distributed transaction
    integrity routines (if replicated data is
    refreshed at scheduled intervals.)
  • De-couples nodes (transactions proceed even if
    some nodes are down.)
  • Reduced network traffic at prime time (if updates
    can be delayed.)

16
Data Replication
  • Disadvantages -
  • Additional requirements for storage space.
  • Additional time for update operations.
  • Complexity and cost of updating.
  • Integrity exposure of getting incorrect data if
    replicated data is not updated simultaneously.
  • Therefore, better when used for non-volatile data.

17
Types of Data Replication
  • Snapshot Replication -
  • Changes are periodically sent to a master site
    which sends an updated snapshot out to the other
    sites.
  • Near Real-Time Replication -
  • Broadcast update orders without requiring
    confirmation.
  • Pull Replication -
  • Each site controls when it wants updates.

18
When to Use Data Replication
  • Data timeliness.
  • If DBMS cannot reference data from more than one
    node.
  • Batched updates can cause performance problems.
  • Updates complicated with heterogeneous DBMSs or
    database design.
  • Telecommunications speeds limit mass updates.

19
Horizontal Partitioning
  • Different records of a file at different sites.
  • Advantages -
  • Efficiency Data stored close to where it is
    used.
  • Local access optimization.
  • Security.
  • Ease of querying
  • Disadvantages
  • Inconsistent access speed when accessing data
    across partitions.
  • Backup vulnerablility no data replication.

20
Vertical Partitioning
  • Different columns of a file at different sites.
  • Advantages and disadvantages are the same as for
    horizontal partitioning except that combining
    data across partitions is more difficult because
    it requires joins.

21
Distributed processing system for a manufacturing
company
22
Five Distributed Database Strategies
  • Centralized database, distributed access.
  • Replication with periodic snapshot update.
  • Replication with near real-time synchronization
    of updates.
  • Partitioned, one logical database.
  • Partitioned, independent, non-integrated segments.

23
Choice of Distributed Strategy
24
Choice of Distributed Strategy
  • Funding, autonomy, security.
  • Site data referencing patterns.
  • Growth and expansion needs.
  • Technological capabilities.
  • Costs of managing complex technologies.
  • Need for reliable service.

25
Requirements for a Distributed DBMS
  • Ability to locate data with a distributed data
    dictionary.
  • Determine the location from which to retrieve
    data and the location at which to process each
    part of a distributed query.
  • Heterogeneous DBMS translation.
  • Security, concurrency, query optimization,
    failure recovery.
  • Consistency of replicated data.

26
Distributed DBMS Data Reference
  • Local Transaction - references local data.
  • In a distributed database, a transaction that
    requires reference only to data that are stored
    at the site where transaction originates.
  • Global Transaction - references non-local data.
  • In a distributed database, a transaction that
    requires reference only to data at one or more
    non-local sites to satisfy the request.

27
Distributed DBMS architecture
28
Distributed DBMS Transparency Objectives
  • Location Transparency
  • Distributed data may be treated as if it were at
    one location
  • Replication Transparency (fragmentation
    transparency)
  • Replicated data may be treated as if it were one
    item
  • Failure Transparency
  • All or none of the actions of a transaction are
    committed.
  • Each site has a Transaction Manager.
  • Logs transactions and before and after images.
  • Concurrency control scheme to ensure data
    integrity.

29
Distributed DBMS Transparency Objectives
  • Commit Protocol Ensures that a global
    transaction is either successfully completed at
    each site or else aborted.
  • Two-Phase Commit An algorithm for coordinating
    updates in a distributed database.

30
Two-Phase Commit
  • Prepare Phase
  • Coordinator receives a commit request.
  • Coordinator instructs all resource managers to
    get ready to go either way on the transaction.
    Each resource manager writes all updates from
    that transaction to its own physical log.
  • Coordinator receives replies from all resource
    managers. If all are ok, it writes commit to its
    own log if not then it writes rollback to its
    log.

31
Two-Phase Commit
  • Commit Phase
  • Coordinator then informs each resource manager of
    its decision and broadcasts a message to either
    commit or rollback (abort.) If the message is
    commit, then each resource manager transfers the
    update from its log to its database.
  • A failure during the commit phase puts a
    transaction in limbo. This has to be tested
    for and handled with timeouts or polling.

32
Distributed DBMS Transparency Objectives
  • Concurrency Transparency
  • Design goal for distributed database,
  • although a distributed system runs many
    transactions, it appears that a given transaction
    is the only activity in the system.
  • Timestamping
  • Concurrency control mechanism.
  • Assigns a globally unique timestamp to each
    transaction.
  • alternative to locks in distributed databases.

33
Query Optimization
  • In a query involving a multi-site join and,
    possibly, a distributed database with replicated
    files, the distributed DBMS must decide where to
    access the data and how to proceed with the join.
    Three step process
  • Query decomposition - rewritten and simplified
  • Data localization - query fragmented so that
    fragments reference data at only one site.
  • Global optimization -
  • Order in which to execute query fragments.
  • Data movement between sites.
  • Where parts of the query will be executed.
  • Semijoin only the joining attribute from one
    site is transmitted.

34
Query Optimization - C.J. Date Example
  • SUPPLIER (Supplier No., City) - 10K recs in
    Detroit
  • PART (Part No., Color) - 100K recs in Chicago
  • SHIPMENT (Supplier No., Part No.) - 1mil recs in
    Det.
  • 10 red parts 100K shipments from Cleveland
  • SELECT SUPPLIER.SUPPLIER_NO
  • FROM SUPPLIER, SHIPMENT, PART
  • WHERE SUPPLIER.CITY Cleveland
  • AND SHIPMENT.PART_NO PART.PART_NO
  • AND SHIPMENT.SUPPLIER_NO SUPPLIER.SUPPLIER_NO
  • AND PART.COLOR Red

35
Evolution of Distributed DBMS
  • Unit of Work - All of a transactions
    instructions.
  • Remote Unit of Work
  • SQL statements at one location can be executed as
    a single unit of work on a single remote DBMS.
  • Distributed Unit of Work
  • Different statements in a unit of work may refer
    to different remote sites.
  • All databases in an SQL statement must be at a
    single site.
  • Distributed Request
  • A single SQL statement may refer to tables in
    more than one remote site.
  • May not support replication transparency or
    failure transparency.

36
Distributed DBMS Products
  • IBM (DB2 - DataPropragator Relational, DRDA)
  • Sybase (Replication Server, SQL Anywhere,
    OmniSQL)
  • Oracle (Table Snapshot, Symmetric Replication)
  • Computer Associates (Ingress/Replicator)
  • Microsoft (SQL Server)
Write a Comment
User Comments (0)
About PowerShow.com