Distributed Database Management Systems - PowerPoint PPT Presentation

1 / 72
About This Presentation
Title:

Distributed Database Management Systems

Description:

How database implementation is affected by different levels of ... Database is divided into several disjointed parts (fragments) and stored at several sites ... – PowerPoint PPT presentation

Number of Views:4127
Avg rating:3.0/5.0
Slides: 73
Provided by: course250
Category:

less

Transcript and Presenter's Notes

Title: Distributed Database Management Systems


1
157337
Distributed Database Management
Systems Database Development Week 11
2
Topics
  • What a distributed database management system
    (DDBMS) is and what its components are
  • How database implementation is affected by
    different levels of data and process distribution
  • How transactions are managed in a distributed
    database environment
  • How database design is affected by the
    distributed database environment

3
The Evolution of Distributed Database Management
Systems
  • Distributed database management system (DDBMS)
  • Governs storage and processing of logically
    related data over interconnected computer systems
    in which both data and processing functions are
    distributed among several sites

4
The Evolution of Distributed Database Management
Systems (continued)
  • Centralized database required that corporate data
    be stored in a single central site
  • Dynamic business environment and centralized
    databases shortcomings spawned a demand for
    applications based on data access from different
    sources at multiple locations

5
The Evolution of Distributed Database Management
Systems (continued)

6
DDBMS Advantages and Disadvantages
  • Advantages include
  • Data are located near greatest demand site
  • Faster data access
  • Faster data processing
  • Growth facilitation
  • Improved communications

7
DDBMS Advantages and Disadvantages (continued)
  • Advantages include (continued)
  • Reduced operating costs
  • User-friendly interface
  • Less danger of a single-point failure
  • Processor independence

8
DDBMS Advantages and Disadvantages (continued)
  • Disadvantages include
  • Complexity of management and control
  • Security
  • Lack of standards
  • Increased storage requirements
  • Increased training cost

9
DDBMS Advantages and Disadvantages (continued)
10
DDBMS Advantages and Disadvantages (continued)
11
DDBMS Advantages and Disadvantages (continued)
12
Characteristics of Distributed Management Systems
  • Application interface
  • Validation
  • Transformation
  • Query optimization
  • Mapping
  • I/O interface

13
Characteristics of Distributed Management Systems
(continued)
  • Formatting
  • Security
  • Backup and recovery
  • DB administration
  • Concurrency control
  • Transaction management

14
Characteristics of Distributed Management Systems
(continued)
  • Must perform all the functions of centralized
    DBMS
  • Must handle all necessary functions imposed by
    distribution of data and processing
  • Must perform these additional functions
    transparently to the end user

15
Characteristics of Distributed Management Systems
(continued)
16
DDBMS Components
  • Must include (at least) the following components
  • Computer workstations
  • Network hardware and software
  • Communications media
  • Transaction processor (application processor,
    transaction manager)
  • Software component found in each computer that
    requests data

17
DDBMS Components (continued)
  • Must include (at least) the following components
    (continued)
  • Data processor or data manager
  • Software component residing on each computer that
    stores and retrieves data located at the site
  • May be a centralized DBMS

18
DDBMS Components (continued)
19
Levels of Data and Process Distribution
20
Single-Site Processing, Single-Site Data (SPSD)
  • All processing is done on single CPU or host
    computer (mainframe, midrange, or PC)
  • All data are stored on host computers local disk
  • Processing cannot be done on end users side of
    system

21
Single-Site Processing, Single-Site Data (SPSD)
(continued)
  • Typical of most mainframe and midrange computer
    DBMSs
  • DBMS is located on host computer, which is
    accessed by dumb terminals connected to it
  • Also typical of first generation of single-user
    microcomputer databases

22
Single-Site Processing, Single-Site Data (SPSD)
(continued)
23
Multiple-Site Processing, Single-Site Data (MPSD)
  • Multiple processes run on different computers
    sharing single data repository
  • MPSD scenario requires network file server
    running conventional applications that are
    accessed through LAN
  • Many multiuser accounting applications, running
    under personal computer network, fit such a
    description

24
Multiple-Site Processing, Single-Site Data
(MPSD) (continued)
25
Multiple-Site Processing, Multiple-Site Data
(MPMD)
  • Fully distributed database management system with
    support for multiple data processors and
    transaction processors at multiple sites
  • Classified as either homogeneous or heterogeneous
  • Homogeneous DDBMSs
  • Integrate only one type of centralized DBMS over
    a network

26
Multiple-Site Processing, Multiple-Site Data
(MPMD) (continued)
  • Heterogeneous DDBMSs
  • Integrate different types of centralized DBMSs
    over a network
  • Fully heterogeneous DDBMS
  • Support different DBMSs that may even support
    different data models (relational, hierarchical,
    or network) running under different computer
    systems, such as mainframes and microcomputers

27
Multiple-Site Processing, Multiple-Site Data
(MPMD) (continued)
28
Distributed Database Transparency Features
  • Allow end user to feel like databases only user
  • Features include
  • Distribution transparency
  • Transaction transparency
  • Failure transparency
  • Performance transparency
  • Heterogeneity transparency

29
Distribution Transparency
  • Allows management of physically dispersed
    database as though it were a centralized database
  • Following three levels of distribution
    transparency are recognized
  • Fragmentation transparency
  • Location transparency
  • Local mapping transparency

30
Distribution Transparency (continued)
31
Distribution Transparency (continued)
32
Transaction Transparency
  • Ensures database transactions will maintain
    distributed databases integrity and consistency

33
Distributed Requests and Distributed Transactions
  • Distributed transaction
  • Can update or request data from several different
    remote sites on network
  • Remote request
  • Lets single SQL statement access data to be
    processed by single remote database processor
  • Remote transaction
  • Accesses data at single remote site

34
Distributed Requests and Distributed Transactions
(continued)
  • Distributed transaction
  • Allows transaction to reference several different
    (local or remote) DP sites
  • Distributed request
  • Lets single SQL statement reference data located
    at several different local or remote DP sites

35
Distributed Requests and Distributed Transactions
(continued)
36
Distributed Requests and Distributed Transactions
(continued)
37
Distributed Requests and Distributed Transactions
(continued)
38
Distributed Requests and Distributed Transactions
(continued)
39
Distributed Requests and Distributed Transactions
(continued)
40
Distributed Concurrency Control
  • Multisite, multiple-process operations are much
    more likely to create data inconsistencies and
    deadlocked transactions than are single-site
    systems

41
Distributed Concurrency Control (continued)
42
Two-Phase Commit Protocol
  • Distributed databases make it possible for
    transaction to access data at several sites
  • Final COMMIT must not be issued until all sites
    have committed their parts of transaction
  • Two-phase commit protocol requires each
    individual DPs transaction log entry be written
    before database fragment is actually updated

43
Performance Transparency and Query Optimization
  • Objective of query optimization routine is to
    minimize total cost associated with execution of
    request
  • Costs associated with request are function of
  • Access time (I/O) cost
  • Communication cost
  • CPU time cost
  • Must provide distribution transparency as well as
    replica transparency

44
Performance Transparency and Query Optimization
(continued)
  • Replica transparency
  • DDBMSs ability to hide existence of multiple
    copies of data from user
  • Query optimization techniques include
  • Manual or automatic
  • Static or dynamic
  • Statistically based or rule-based algorithms

45
Distributed Database Design
  • Data fragmentation
  • How to partition database into fragments
  • Data replication
  • Which fragments to replicate
  • Data allocation
  • Where to locate those fragments and replicas

46
Data Fragmentation
  • Breaks single object into two or more segments or
    fragments
  • Each fragment can be stored at any site over
    computer network
  • Information about data fragmentation is stored in
    distributed data catalog (DDC), from which it is
    accessed by TP to process user requests

47
Data Fragmentation (continued)
  • Strategies
  • Horizontal fragmentation
  • Division of a relation into subsets (fragments)
    of tuples (rows)
  • Vertical fragmentation
  • Division of a relation into attribute (column)
    subsets
  • Mixed fragmentation
  • Combination of horizontal and vertical strategies

48
Data Fragmentation (continued)
49
Data Fragmentation (continued)
50
Data Fragmentation (continued)
51
Data Fragmentation (continued)
52
Data Fragmentation (continued)
53
Data Fragmentation (continued)
54
Data Fragmentation (continued)
55
Data Replication
  • Storage of data copies at multiple sites served
    by computer network
  • Fragment copies can be stored at several sites to
    serve specific information requirements
  • Can enhance data availability and response time
  • Can help to reduce communication and total query
    costs

56
Data Replication (continued)
57
Data Replication (continued)
  • Replication scenarios
  • Fully replicated database
  • Stores multiple copies of each database fragment
    at multiple sites
  • Can be impractical due to amount of overhead
  • Partially replicated database
  • Stores multiple copies of some database fragments
    at multiple sites
  • Most DDBMSs are able to handle the partially
    replicated database well

58
Data Replication (continued)
  • Replication scenarios (continued)
  • Unreplicated database
  • Stores each database fragment at single site
  • No duplicate database fragments

59
Data Allocation
  • Deciding where to locate data
  • Allocation strategies
  • Centralized data allocation
  • Entire database is stored at one site
  • Partitioned data allocation
  • Database is divided into several disjointed parts
    (fragments) and stored at several sites

60
Data Allocation (continued)
  • Allocation strategies (continued)
  • Replicated data allocation
  • Copies of one or more database fragments are
    stored at several sites
  • Data distribution over computer network is
    achieved through data partition, data
    replication, or combination of both

61
Client/Server vs. DDBMS
  • Way in which computers interact to form system
  • Features user of resources, or client, and
    provider of resources, or server
  • Can be used to implement a DBMS in which client
    is the TP and server is the DP

62
Client/Server vs. DDBMS (continued)
  • Client/server advantages
  • Less expensive than alternate minicomputer or
    mainframe solutions
  • Allow end user to use microcomputers GUI,
    thereby improving functionality and simplicity
  • More people in job market have PC skills than
    mainframe skills
  • PC is well established in workplace

63
Client/Server vs. DDBMS (continued)
  • Client/server advantages (continued)
  • Numerous data analysis and query tools exist to
    facilitate interaction with DBMSs available in PC
    market
  • Considerable cost advantage to offloading
    applications development from mainframe to
    powerful PCs

64
Client/Server vs. DDBMS (continued)
  • Client/server disadvantages
  • Creates more complex environment
  • Different platforms (LANs, operating systems, and
    so on) are often difficult to manage
  • An increase in number of users and processing
    sites often paves the way for security problems

65
Client/Server vs. DDBMS (continued)
  • Client/server disadvantages (continued)
  • Possible to spread data access to much wider
    circle of users
  • Increases demand for people with broad knowledge
    of computers and software
  • Increases burden of training and cost of
    maintaining the environment

66
C. J. Dates Twelve Commandments for Distributed
Databases
  • Local site independence
  • Central site independence
  • Failure independence
  • Location transparency
  • Fragmentation transparency
  • Replication transparency

67
C. J. Dates Twelve Commandments for Distributed
Databases (continued)
  • Distributed query processing
  • Distributed transaction processing
  • Hardware independence
  • Operating system independence
  • Network independence
  • Database independence

68
Summary
  • Distributed database stores logically related
    data in two or more physically independent sites
    connected via computer network
  • Distributed processing is division of logical
    database processing among two or more network
    nodes
  • Distributed databases require distributed
    processing
  • Main components of DDBMS are transaction
    processor and data processor

69
Summary (continued)
  • Current database systems can be classified by
    extent to which they support processing and data
    distribution
  • Homogeneous distributed database system
    integrates only one particular type of DBMS over
    computer network
  • Heterogeneous distributed database system
    integrates several different types of DBMSs over
    computer network

70
Summary (continued)
  • DDBMS characteristics are best described as set
    of transparencies
  • Transaction is formed by one or more database
    requests
  • Distributed concurrency control is required in
    network of distributed databases
  • Distributed DBMS evaluates every data request to
    find optimum access path in distributed database

71
Summary (continued)
  • The design of distributed database must consider
    fragmentation and replication of data
  • Database can be replicated over several different
    sites on computer network
  • Client/server architecture refers to way in which
    two computers interact over computer network to
    form a system

72
For exam
  • Revise from these slides and Chapter 12 of
    textbook
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com