Title: Distributed Database Management Systems
1Chapter 12
Distributed Database Management
Systems Database Systems Design,
Implementation, and Management, Seventh Edition,
Rob and Coronel
2In this chapter, you will learn
- What a distributed database management system
(DDBMS) is and what its components are - How database implementation is affected by
different levels of data and process distribution - How transactions are managed in a distributed
database environment - How database design is affected by the
distributed database environment
3The Evolution of Distributed Database Management
Systems
- Distributed database management system (DDBMS)
- Governs storage and processing of logically
related data over interconnected computer systems
in which both data and processing functions are
distributed among several sites
4The Evolution of Distributed Database Management
Systems (continued)
- Centralized database required that corporate data
be stored in a single central site - Dynamic business environment and centralized
databases shortcomings spawned a demand for
applications based on data access from different
sources at multiple locations
5The Evolution of Distributed Database Management
Systems (continued)
6DDBMS Advantages and Disadvantages
- Advantages include
- Data are located near greatest demand site
- Faster data access
- Faster data processing
- Growth facilitation
- Improved communications
7DDBMS Advantages and Disadvantages (continued)
- Advantages include (continued)
- Reduced operating costs
- User-friendly interface
- Less danger of a single-point failure
- Processor independence
8DDBMS Advantages and Disadvantages (continued)
- Disadvantages include
- Complexity of management and control
- Security
- Lack of standards
- Increased storage requirements
- Increased training cost
9DDBMS Advantages and Disadvantages (continued)
10DDBMS Advantages and Disadvantages (continued)
11DDBMS Advantages and Disadvantages (continued)
12Characteristics of Distributed Management Systems
- Application interface
- Validation
- Transformation
- Query optimization
- Mapping
- I/O interface
13Characteristics of Distributed Management Systems
(continued)
- Formatting
- Security
- Backup and recovery
- DB administration
- Concurrency control
- Transaction management
14Characteristics of Distributed Management Systems
(continued)
- Must perform all the functions of centralized
DBMS - Must handle all necessary functions imposed by
distribution of data and processing - Must perform these additional functions
transparently to the end user
15Characteristics of Distributed Management Systems
(continued)
16DDBMS Components
- Must include (at least) the following components
- Computer workstations
- Network hardware and software
- Communications media
- Transaction processor (application processor,
transaction manager) - Software component found in each computer that
requests data
17DDBMS Components (continued)
- Must include (at least) the following components
(continued) - Data processor or data manager
- Software component residing on each computer that
stores and retrieves data located at the site - May be a centralized DBMS
18DDBMS Components (continued)
19Levels of Data and Process Distribution
20Single-Site Processing, Single-Site Data (SPSD)
- All processing is done on single CPU or host
computer (mainframe, midrange, or PC) - All data are stored on host computers local disk
- Processing cannot be done on end users side of
system
21Single-Site Processing, Single-Site Data (SPSD)
(continued)
- Typical of most mainframe and midrange computer
DBMSs - DBMS is located on host computer, which is
accessed by dumb terminals connected to it - Also typical of first generation of single-user
microcomputer databases
22Single-Site Processing, Single-Site Data (SPSD)
(continued)
23Multiple-Site Processing, Single-Site Data (MPSD)
- Multiple processes run on different computers
sharing single data repository - MPSD scenario requires network file server
running conventional applications that are
accessed through LAN - Many multiuser accounting applications, running
under personal computer network, fit such a
description
24Multiple-Site Processing, Single-Site Data
(MPSD) (continued)
25Multiple-Site Processing, Multiple-Site Data
(MPMD)
- Fully distributed database management system with
support for multiple data processors and
transaction processors at multiple sites - Classified as either homogeneous or heterogeneous
- Homogeneous DDBMSs
- Integrate only one type of centralized DBMS over
a network
26Multiple-Site Processing, Multiple-Site Data
(MPMD) (continued)
- Heterogeneous DDBMSs
- Integrate different types of centralized DBMSs
over a network - Fully heterogeneous DDBMS
- Support different DBMSs that may even support
different data models (relational, hierarchical,
or network) running under different computer
systems, such as mainframes and microcomputers
27Multiple-Site Processing, Multiple-Site Data
(MPMD) (continued)
28Distributed Database Transparency Features
- Allow end user to feel like databases only user
- Features include
- Distribution transparency
- Transaction transparency
- Failure transparency
- Performance transparency
- Heterogeneity transparency
29Distribution Transparency
- Allows management of physically dispersed
database as though it were a centralized database - Following three levels of distribution
transparency are recognized - Fragmentation transparency
- Location transparency
- Local mapping transparency
30Distribution Transparency (continued)
31Distribution Transparency (continued)
32Transaction Transparency
- Ensures database transactions will maintain
distributed databases integrity and consistency
33Distributed Requests and Distributed Transactions
- Distributed transaction
- Can update or request data from several different
remote sites on network - Remote request
- Lets single SQL statement access data to be
processed by single remote database processor - Remote transaction
- Accesses data at single remote site
34Distributed Requests and Distributed Transactions
(continued)
- Distributed transaction
- Allows transaction to reference several different
(local or remote) DP sites - Distributed request
- Lets single SQL statement reference data located
at several different local or remote DP sites
35Distributed Requests and Distributed Transactions
(continued)
36Distributed Requests and Distributed Transactions
(continued)
37Distributed Requests and Distributed Transactions
(continued)
38Distributed Requests and Distributed Transactions
(continued)
39Distributed Requests and Distributed Transactions
(continued)
40Distributed Concurrency Control
- Multisite, multiple-process operations are much
more likely to create data inconsistencies and
deadlocked transactions than are single-site
systems
41Distributed Concurrency Control (continued)
42Two-Phase Commit Protocol
- Distributed databases make it possible for
transaction to access data at several sites - Final COMMIT must not be issued until all sites
have committed their parts of transaction - Two-phase commit protocol requires each
individual DPs transaction log entry be written
before database fragment is actually updated
43Performance Transparency and Query Optimization
- Objective of query optimization routine is to
minimize total cost associated with execution of
request - Costs associated with request are function of
- Access time (I/O) cost
- Communication cost
- CPU time cost
- Must provide distribution transparency as well as
replica transparency
44Performance Transparency and Query Optimization
(continued)
- Replica transparency
- DDBMSs ability to hide existence of multiple
copies of data from user - Query optimization techniques include
- Manual or automatic
- Static or dynamic
- Statistically based or rule-based algorithms
45Distributed Database Design
- Data fragmentation
- How to partition database into fragments
- Data replication
- Which fragments to replicate
- Data allocation
- Where to locate those fragments and replicas
46Data Fragmentation
- Breaks single object into two or more segments or
fragments - Each fragment can be stored at any site over
computer network - Information about data fragmentation is stored in
distributed data catalog (DDC), from which it is
accessed by TP to process user requests
47Data Fragmentation (continued)
- Strategies
- Horizontal fragmentation
- Division of a relation into subsets (fragments)
of tuples (rows) - Vertical fragmentation
- Division of a relation into attribute (column)
subsets - Mixed fragmentation
- Combination of horizontal and vertical strategies
48Data Fragmentation (continued)
49Data Fragmentation (continued)
50Data Fragmentation (continued)
51Data Fragmentation (continued)
52Data Fragmentation (continued)
53Data Fragmentation (continued)
54Data Fragmentation (continued)
55Data Replication
- Storage of data copies at multiple sites served
by computer network - Fragment copies can be stored at several sites to
serve specific information requirements - Can enhance data availability and response time
- Can help to reduce communication and total query
costs
56Data Replication (continued)
57Data Replication (continued)
- Replication scenarios
- Fully replicated database
- Stores multiple copies of each database fragment
at multiple sites - Can be impractical due to amount of overhead
- Partially replicated database
- Stores multiple copies of some database fragments
at multiple sites - Most DDBMSs are able to handle the partially
replicated database well
58Data Replication (continued)
- Replication scenarios (continued)
- Unreplicated database
- Stores each database fragment at single site
- No duplicate database fragments
59Data Allocation
- Deciding where to locate data
- Allocation strategies
- Centralized data allocation
- Entire database is stored at one site
- Partitioned data allocation
- Database is divided into several disjointed parts
(fragments) and stored at several sites
60Data Allocation (continued)
- Allocation strategies (continued)
- Replicated data allocation
- Copies of one or more database fragments are
stored at several sites - Data distribution over computer network is
achieved through data partition, data
replication, or combination of both
61Client/Server vs. DDBMS
- Way in which computers interact to form system
- Features user of resources, or client, and
provider of resources, or server - Can be used to implement a DBMS in which client
is the TP and server is the DP
62Client/Server vs. DDBMS (continued)
- Client/server advantages
- Less expensive than alternate minicomputer or
mainframe solutions - Allow end user to use microcomputers GUI,
thereby improving functionality and simplicity - More people in job market have PC skills than
mainframe skills - PC is well established in workplace
63Client/Server vs. DDBMS (continued)
- Client/server advantages (continued)
- Numerous data analysis and query tools exist to
facilitate interaction with DBMSs available in PC
market - Considerable cost advantage to offloading
applications development from mainframe to
powerful PCs
64Client/Server vs. DDBMS (continued)
- Client/server disadvantages
- Creates more complex environment
- Different platforms (LANs, operating systems, and
so on) are often difficult to manage - An increase in number of users and processing
sites often paves the way for security problems
65Client/Server vs. DDBMS (continued)
- Client/server disadvantages (continued)
- Possible to spread data access to much wider
circle of users - Increases demand for people with broad knowledge
of computers and software - Increases burden of training and cost of
maintaining the environment
66C. J. Dates Twelve Commandments for Distributed
Databases
- Local site independence
- Central site independence
- Failure independence
- Location transparency
- Fragmentation transparency
- Replication transparency
67C. J. Dates Twelve Commandments for Distributed
Databases (continued)
- Distributed query processing
- Distributed transaction processing
- Hardware independence
- Operating system independence
- Network independence
- Database independence
68Summary
- Distributed database stores logically related
data in two or more physically independent sites
connected via computer network - Distributed processing is division of logical
database processing among two or more network
nodes - Distributed databases require distributed
processing - Main components of DDBMS are transaction
processor and data processor
69Summary (continued)
- Current database systems can be classified by
extent to which they support processing and data
distribution - Homogeneous distributed database system
integrates only one particular type of DBMS over
computer network - Heterogeneous distributed database system
integrates several different types of DBMSs over
computer network
70Summary (continued)
- DDBMS characteristics are best described as set
of transparencies - Transaction is formed by one or more database
requests - Distributed concurrency control is required in
network of distributed databases - Distributed DBMS evaluates every data request to
find optimum access path in distributed database
71Summary (continued)
- The design of distributed database must consider
fragmentation and replication of data - Database can be replicated over several different
sites on computer network - Client/server architecture refers to way in which
two computers interact over computer network to
form a system