Title: L02: DDBMS Architecture
1L02 DDBMS Architecture
- DDBMS Architecture
- Architectural Models of DDBMS
- More on Transparency
2DDBMS Architecture
- Architecture defines a systems structure with
- Componets
- Functions of components, and
- Their interactions
- Purpose of reference architecture
- A framework for discussion
- Standardiztion
3Approaches of Reference Models
- Based on components
- modularity - unclear functionality
- Based on functions (e.g. ISO/OSI model)
- clear interfaces - unclear complexity
- Based on data (e.g. ANSI/SPARC model)
- data-centric - unclear functionality
- Need an integration of the three approaches
4ANSI/SPARC Architecture
Users
External Schema
External View
External View
External View
Conceptual Schema
Conceptual View
Internal Schema
Internal View
5Conceptual Schema Definition (1)
- RELATION PAY
- KEY TITLE
- ATTIRBUTES
- TITLE CHAR (10)
- SAL NUMERIC(6)
-
-
- RELATION EMP
- KEY ENO
- ATTIBUTES
- ENO CHAR(9)
- ENAME CHAR(15)
- TITLE CHAR(10)
-
6Conceptual Schema Definition (2)
- RELATION ASG
- KEY ENO, PNO
- ATTIRBUTES
- ENO CHAR(9)
- PNO CHAR(7)
- RESP CHAR (10)
- DUR NUMERIC(3)
-
-
- RELATION PROJ
- KEY PNO
- ATTIBUTES
- PNO CHAR(7)
- PNAME CHAR(20)
- BUDGET NUMERIC(7)
- LOC CHAR(15)
-
7Internal Schema Definition
- RELATION EMP
- KEY ENO
- ATTIBUTES
- ENO CHAR(9)
- ENAME CHAR(15)
- TITLE CHAR(10)
-
- INTERNA_REL E
- INDEX ON E
- FIELD
- E BYTE(9)
- ENAME BYTE(15)
- TIT BYTE(10)
-
-
8External View Definition
- Create a BUDGET view from PROJ
- CREATE VIEW BUDGET (PNAME,BUD)
- AS SELECT PNAME, BUDGET
- FROM PROJ
- Create a PAYROLL view from EMP and PAY
- CREATE VIEW PAYROLL ( EMP_NO, EMP_NAME, SAL)
- AS SELECT EMP.ENO, EMP.ENAME, PAY.SAL
- FROM EMP, PAY
- WHERE EMP.TITLE PAY.TITLE
9The Classical DDBMS Architecture
Global Schema
Site Independent Schemas
Fragmentation Schema
Allocation Schema
Local Mapping Schema
Local Mapping Schema
Other sites
DBMS 1
DBMS 2
LOCAL DB 2
Site 1
Site 2
10DDBMS Schemas
- Global Schema a set of global relations as if
database were not distributed at all - Fragmentation Schema global relation is split
into non-overlapping (logical) fragments. 1n
mapping from relation R to fragments Ri. - Allocation Schema 11 or 1n (redundant) mapping
from fragments to sites. All fragments
corresponding to the same relation R at a site j
constitute the physical image Rj. A copy of a
fragment is denoted by Rji. - Local Mapping Schema a mapping from physical
images to physical objects, which are manipulated
by local DBMSs.
11Global Relations, Fragments and Physical Images
12Motivation for this Architecture
- Separating the concept of data fragmentation from
the concept of data allocation - fragmentation transparency
- location transparency
- Explicit control of redundancy
- Independence from local databases allows local
mapping transparency
13Some Aspects of the Classical Architecture
- Distributed database technology is an add-on
technology, most users already have populated
centralized DBMSs. Whereas top down design
assumes implementation of new DDBMS from scratch. - In many application environments, such as
semi-structured databases, continuous multimedia
data, the notion of fragment is difficult to
define. - Current relational DBMS products provide for some
form of location transparency (such as, by using
nicknames).
14L02 DDBMS Architecture
- DDBMS Architecture
- Architectural Models of DDBMS
- More on Transparency
15Bottom Up Architecture - Present Future
- Possible ways in which multiple databases may be
put together for sharing by multiple DBMSs, which
are characterized according to - Autonomy (A) degree to which individual DBMSs
can operate independently. - 0 - Tightly coupled - integrated (A0),
- 1 - Semiautonomous - federated (A1),
- 2 - Total Isolation - multidatabase systems(A2)
- Distribution (D)
- 0 - no distribution - single site (D0),
- 1 - client-server - distribution of DBMS
functionality (D1), - 2 - full distribution P2P distributed
architecture(D2) - Heterogeneity (H)
- 0 - homogeneous (H0)
- 1 - heterogeneous (H1)
16Architectural Alternatives
Distribution
(A1, D2, H0)
(A0, D2, H0)
(A2, D2, H0)
(A2, D2, H1)
(A0, D2, H1)
(A2, D1, H0)
(A2, D1, H1)
(A0, D1, H1)
(A2, D0, H0)
(A0, D0, H0)
Autonomy
(A0, D0, H1)
(A2, D0, H1)
(A1, D0, H1)
Heterogeneity
17Autonomy
- Autonomy refers to the distribution of control,
not of data. It indicates the degree to which
individual DBMSs can operate independently - Exchanging information among sites?
- Independently executing transactions?
- Modifying the component systems?
- Requirements of an autonomous system
- The local operations of the individual DBMSs are
not affected by their participation in the DDBS. - The individual DBMSs query processing and
optimization should not be affected by the
execution of global queries that access multiple
databases - System consistency or operation should not be
compromised when individual DBMSs join or leave
the distributed database confederation.
18Dimensions of Autonomy
- Design Autonomy
- Freedom for individual DBMSs to use data models
and transaction management techniques they prefer
- Communication Autonomy
- Freedom for individual DBMSs to decide what
information (data control) is to be exported - Execution Autonomy
- Freedom for individual DBMSs to execute
transactions submitted in any way that it wants
to
19Alternatives of Autonomy
- A0 Tight integration
- A single image of the entire database to any user
- A1 Semiautonomous
- Can operate independently
- Have made some parts of their local data
shareable - Need to be modified to enable exchanging
information - A2 Total isolation
- Stand alone DBMS at each site, no global control
- Know nothing about other databases
20Distribution
- Deals with data, not of control
- Physical distribution of data over sites
- Two classes of distribution alternatives
- Client/Server (different from network C/S)
- Clients Apps, Server Data management
- Peer-to-Peer (or full)
- Each machine has full DBMS functionality
- Main focus of the textbook
21Heterogeneity
- Various forms of heterogeneity
- Hardware, networking protocol, data manager
- Focus of this course
- Data models, query languages, xact mgmt
22DDBMSs Implementation Alternatives
P2P homogeneous DBMS
P2P homogeneous federated system
P2P homogeneous multidatabase system
P2P Heterogeneous Federated DBMS
D
P2P heterogeneous DBMS
P2P heterogeneous Multidatabase system
Logically integrated and homogeneous multiple
DBMSs
A
Heterogeneous Integrated DBMS
Multidatabase System
H
Single site homogeneous federated DBMS
Heterogeneous multidatabase system
Single Site heterogeneous federated
23Taxonomy of Distributed Databases
- Composite DBMSs -tight integration
- single image of entire database is available to
any user - can be single or multiple sites
- can be homogeneous or heterogeneous
- Federated DBMSs - semiautonomous
- DBMSs that can operate independently, but have
decided to make some parts of their local data
shareable - can be single or multiple sites.
- they need to be modified to enable them to
exchange information - Multidatabase Systems - total isolation
- individual systems are stand alone DBMSs, which
know neither the existence of other databases or
how to communicate with them - no global control over the execution of
individual DBMSs. - can be single or multiple sites
- homogeneous or heterogeneous
24Client Server Architectures
- Client/Server (Ax, D1, Hy)
- Multiple clients, single server
- Similar data management as in centralized db
- Multiple clients, multiple servers
- Clients connect to servers as needed, or
- Clients connect to their home server only
25Components of Client/Server System
26DDBS Reference Architecture
ES1
ES2
ESn
External Schema
GCS
Global Conceptual Schema
LCS1
LCS2
LCSn
Local Conceptual Schema
LIS1
LIS2
LISn
Local Internal Schema
It is logically integrated. Provides for the
levels of transparency
27Components of a DDBS Site
DATA PROCESSOR
USER PROCESSOR
Local Internal Schema
Local Conceptual Schema
log
External Schema
Global Schema
Runtime Support
Local Query Processor
Local Recovery Manager
28DDBMS Components USER PROCESSOR
- User interface handler interprets user commands
and formats the result data as it is sent to the
user - Semantic data controller checks the integrity
constraints and authorization requirements - Global query optimizer and decomposer determines
execution strategy, translates global queries to
local queries, and generates strategy for
distributed join operations - Global execution monitor (distributed transaction
manager) coordinates the distributed execution of
the user request
29DDBMS Components DATA PROCESSOR
- Local query processor selects the access path and
is involved in local query optimization and join
operations - Local recovery manager maintains local database
consistency - Run-time support processor physically accesses
the database. It is the interface to the OS and
contains database buffer manager
30MDBS Architecture
- Heterogeneous MDBS-Unilingual
- Requires users to utilize different data models
and languages when both a local and global
database need to be accessed - Applications accessing multiple databases should
use external views defined on GCS - A single application can have a LES on LCS and
GES on GCS - Heterogeneous MDBS-Multilingual
- Users access global database by means of external
schema defined using the language of user's local
DBMS - Queries against global databases are made using
language of local DBMS - Deal with translation of queries at runtime
31MDBS Architecture (with Global Schema)
Global External Schema
Local External Schema
(A2, Dx, Hy) GCS generated bottom-up
- The GCS is generated by integrating LES's or
LCS's - The users of a local DBMS can maintain their
autonomy - Design of GCS is bottom-up
32MDBS with GCS
- Uni-lingual Heterogeneous MDBS
- Accessing multiple databases through GES on GCS
- LES on LCS and GES on GCS co-exist
- Multilingual Heterogeneous MDBS
- GES defined using the language of local DBMS
- Queries over GES in the language of local DBMS
- Easier for users to query multiple databases
- The system deals with translation of queries at
runtime
33MDBS without Global Conceptual Schema
Multidatabase Layer
Local Database System Layer
- Local system layer consists of several DBMSs
which present to multidatabase layer part of
their databases - The shared database is either local conceptual
schema or external schema (Not shown in the
figure above) - External views on one or more LCSs.
- Access to multiple databases through application
programs
34MDBS without Global Schema
- Federated Database Systems
- Do not use global conceptual schema
- Each local DBMS defines export schema
- Global database is a union of export schemas
- Each application accesses global database through
import schema (external view) - MDBSs components architecture
- Existence of full fledged local DBMSs
- MDBS is a layer on top of individual DBMSs that
support access to different databases - The complexity of the layer depends on existence
of GCS and heterogeneity
35MDBS Components Model
- Full-fledged DBMS at each site
- A layer runs on top of individual DBMSs
- The complexity of the layer depends on
- Existence of GCS and heterogeneity
36Components of an MDBMS
User
User
User
User Requests
System Responses
Multi-DBMS Layer
Transaction Manager
Transaction Manager
User Interface
User Interface
Query Processor
Query Processor
Scheduler
Scheduler
Recovery Manager
Recovery Manager
Query Optimizer
Query Optimizer
Runtime Sup. Processor
Runtime Sup. Processor
Database
Database
37Global Directory
- Directory is itself a database that contains
meat-data about the actual data stored in the
database. It includes the support for
fragmentation transparency for the classical
DDBMS architecture. - Directory can be local or distributed.
- Directory can be replicated and/or partitioned.
- Directory issues are very important for large
multi-database applications, such as digital
libraries.
38L02 DDBMS Architecture
- DDBMS Architecture based on data
- Architectural Models
- More on Transparency
39More on Transparency
- X transparency means the existence of X is not
known to users. - Closely related to architecture issues.
- The level of transparency is inevitably a
compromise between ease of use and the difficulty
and overhead cost of providing high levels of
transparency - DDBMS provides location transparency and
fragmentation transparency, OS provides network
transparency, and DBMS provides data independence
40On Distribution Transparency
- Higher levels of distribution transparency
require appropriate DDBMS support, but makes
end-application developers work easy. - Less distribution transparency the more the
end-application developer needs to know about
fragmentation and allocation schemes, and how to
maintain database consistency. - There are tough problems in query optimization
and transaction management that need to be
tackled (in terms of system support and
implementation) before fragmentation transparency
can be supported.
41Levels of Distribution Transparency
- Fragmentation Transparency
- Just like using global relations.
- Location Transparency
- Need to know fragmentation schema but no need
not know where fragments are located - Applications access fragments (no need to
specify sites where fragments are located). - Local Mapping Transparency
- Need to know both fragmentation and allocation
schema no need to know what the underlying local
DBMSs are. - Applications access fragments explicitly
specifying where the fragments are located. - No Transparency
- Need to know local DBMS query languages, and
write applications using functionality provided
by the Local DBMS
42Distribution Transparency
- We analyze the different levels of distribution
transparency which can be provided by DDBMS for
applications. - A Simple Application
- Supplier(SNum, Name, City)
- Horizontally fragmented into
- Supplier 1 ? City HK'' Supplier at Site1
- Supplier2 ? City ! HK Supplier at Site2,
Site3 - Application
- Read the supplier number from the terminal and
return the name of the supplier with that number
43Types of Data Fragmentation
- Vertical Fragmentation
- Projection on relation (subset of attributes)
- Reconstruction by join
- Updates require no tuple migration
- Horizontal Fragmentation
- Selection on relation (subset of tuples)
- Reconstruction by union
- Updates may requires tuple migration
- Mixed Fragmentation
- A fragment is a Select-Project query on relation.
44Horizontal Fragmentation
- Partitioning the tuples of a global relation into
subsets - Example
- Supplier(SNum, Name, City)
- Horizontal Fragmentation can be
- Supplier 1 ? City ! HK Supplier
- Supplier2 ? City ! HK Supplier
- Reconstruction is possible
- Supplier Supplier1 ? Supplier2
- The set of predicates defining all the fragments
must be complete, and mutually exclusive
45Derived Horizontal Fragmentation
- The horizontal fragmentation is derived from the
horizontal fragmentation of another relation - Example
- Supply (SNum, PNum, DeptNum, Quan)
- SNum is a supplier number
- Supply1 Supply SNumSNum Supplier1
- Supply2 Supply SNumSNum Supplier2
- The predicates defining derived horizontal
fragments are - Supply.SNum Supplier.SNum and Supplier. City
HK'' - Supply.SNum Supplier.SNum and Supplier. City !
HK''
46Vertical Fragmentation
- The vertical fragmentation of a global relation
is the subdivision of its attributes into groups
fragments are obtained by projecting the global
relation over each group - Example
- EMP (ENum,Name,Sal,Tax,MNum,DNum)
- A vertical fragmentation can be
- EMP1 ? ENum, Name, MNum, DNum EMP
- EMP2 ? ENum, Sal, Tax EMP
- Reconstruction
47Rules for Data Fragmentation
- Completeness
- All the data of the global relation must be
mapped into the fragments - Reconstruction
- It must always be possible to reconstruct each
global relation from its fragments - Disjointedness
- it is convenient that fragments be disjoint, so
that the replication of data can be controlled
explicitly at the allocation level
48Level 4 of Distribution Transparency
read(terminal, SNum) SupIMS(Snum,Name,Found
) at S1 If not FOUND then SupCODASYL(Snum,Nam
e,Found) at S3 write(terminal, Name).
S1
S3
49Level 3 of Distribution Transparency
- Local Mapping Transparency
The applications have to specify both the
fragment names and the sites where they are
located. The mapping of database operations
specified in applications to those in DBMSs at
sites is transparent
50Level 2 of Distribution Transparency
The application is independent from changes in
allocation schema, but not from changes to
fragmentation schema
51Level 1 of Distribution Transparency
- Fragmentation transparency
- The DDBMS interprets the database operation by
accessing the databases at different sites in a
way which is completely determined by the system
52Distribution Transparency for Updates
EMP1 ?ENum,Name,Sal,Tax?DNum?10 (EMP) EMP2
?ENum,MNum,DNum?DNum?10 (EMP) EMP3
?ENum,Name,DNum?Dnumgt10 (EMP) EMP4
?ENum,MNum,Sal,Tax?Dnumgt10 (EMP)
- Difficult
- broadcasting updates to all copies
- migration of tuples because of change of
fragment defining attributes
EMP1
EMP2
Update Dnum15 for Employee with Enum100
EMP4
EMP3
53An Update Application
With Level 2 Location Transparency only
UPDATE Emp SET DNum 15 WHERE ENum 100
Select Name, Tax, Sal into Name, Sal, Tax
From EMP 1 Where ENum 100 Select MNum into
MNum From Emp 2 Where ENum 100 Insert into
EMP 3 (ENum, Name, DNum) (100, Name, 15)
Insert into EMP 4 (ENum, Sal, Tax, MNum) (100,
Sal, Tax, MNum) Delete EMP 1 where ENum
100 Delete EMP 2 where ENum 100
With Level 1 Fragmentation Transparency
54Summary
- Approaches of Reference Architectures
- Data-based Architecture of DBMS
- External, Conceptual, and Internal Schema
- DDBMS Models
- Autonomy, Distribution, and Heterogeneity
- Architectural Alternatives of DDBMS
- Client/Server, Peer-to-Peer, and Multi-database