Title: Designing Distributed Databases
1Chapter 3
- Designing Distributed Databases
2Designing Distributed Databases
- Data Models
- Choosing a Standard Data Model
- Data Model Translators
- Schema Integration
- Summary
3What Is a Data Model?
- Data model Style of describing and manipulating
data - Data model components include
- Data description data manipulation
- Semantic integrity constraints
4Why Are Data Models Important?
- Different DBMSs use different data models
- File
- Relational
- Hierarchical
- Network
- Translators are necessary to support
heterogeneous DBMSs - Translate data manipulation commands
- Translate data structures
5File Data Model
- Data description exists in applications
- Sample data description
- Sample file
01 Department 03 DepartmentFlag Char(1) 03
DepartmentName Char(10) 03 DepartmentBudget Inte
ger.01 Employee 03 EmployeeFlag Char(1) 03
EmployeeName Char(10) 03 EmployeeSalary Integer.
More
6File Data Model, cont.
- Data manipulation is a single record per command
Sequential Direct open open read read write
insert close delete modify close
More
7File Data Model, cont.
- Semantic integrity constraints include
- Type-checking
- Integer
- Character
- Decimal
- File data model is weak in semantic integrity
constraints up to 50 of the application is
devoted to enforcing semantic integrity
constraints.
8Relational Data Model
- Data description exists in the data dictionary,
not in the application - No visible pointers
Table Department,Column DepartmentName Char(10)
not null,Column DepartmentBudget Integer Table
Employee,Column EmployeeName Char(10) not
null,Column EmployeeSalary Integer,Column
DepartmentName Char(10)
EmployeeEmployee Name Employee Salary
Department NameAckman 5000 Toy
Baker 4500 ToyCarson 4800
CarDavis 5100 Car
More
9Relational Data Model, cont.
- Data manipulation is multiple records per command
- Update commands
- Insert
- Delete
- Update
More
10Relational Data Model, cont.
- Semantic integrity constraints include
- Type-checking
- Unique table identifiers
- Referential integrity Each foreign key value
must exist as a value of the foreign tables
primary key or be null.
Primary keys
Foreign key
11Hierarchical Data Model
- Data description exists in the data dictionary
and is copied to the application.
More
12Hierarchical Data Model, cont.
- Data manipulation is a single record per command
More
13Hierarchical Data Model, cont.
- Semantic integrity constraints include
- Type-checking
- Integer
- Character
- Decimal
- Range-checking
- Parent-child constraints
14Data Models Summary
15Designing Distributed Databases
- Data Models
- Choosing a Standard Data Model
- Data Model Translators
- Schema Integration
- Summary
16Choosing a Standard Data Model
User Interface Global Transaction UI
Translator Global Transaction Global Request
Optimizer Global Execution Plan Distributed
Execution Manager Subtransaction Communication
Subsystem Subtransaction Local Execution
Manager Subtransaction Translator Subtransaction L
ocal Optimizer Physical Commands Runtime Support
Processor Database
UI Schema
Allocation Schema
Local Standard Schema
Allocation Schema
17Why Choose a Single Data Model?
Twelve Translations
Six Translations
18Relational Is Choice for Distributed DBMS
- Minimizes communication costs
- Avoids trouble with pointers
- Weak in semantic integrity constraints
19RelationalMinimizes Communication Costs
- Relational data model supports multiple-records-pe
r-command queries. - Fewer transmissions requesting data
- Fewer (but larger) data communications
20RelationalAvoids Troubles With Pointers
- Relational is oriented to flat files with no
pointers. - No dangling pointers when files are transmitted
- No pointers to be misunderstood by the target
DBMS - Can transfer parts of complex structures
21RelationalWeak in Semantic Integrity Constraints
- Limited to few semantic integrity constraints
- Type constraints
- Unique identifier
- Not null
- Referential integrity constraints
- Needed, more constraints
- Triggers and asserts
- Semantic data models
- All commercial systems have chosen the relational
data model
22Designing Distributed Databases
- Data Models
- Choosing a Standard Data Model
- Data Model Translators
- Schema Integration
- Summary
23Data Model Translators
UI Schema
Allocation Schema
Local Standard Schema
Allocation Schema
24Data Manipulation Commands Translating
Nonprocedural to Procedural Commands
- Transform local DBMS data structures into tables
- Convert relational data manipulation commands
into commands supported by the local DBMS
25Why Data Model Translations?
- Users data model is different from the standard
data model. - Data model used by the local DBMS is different
from the standard data model.
26Translating File Data Structures to Relational
Data Structures
File Data Definition Relational Data
Descriptionrecord type tablefield columnuni
que identifier primary key
27Concept Correspondence Example
EmployeeEmployee Name Employee Salary
Department NameAckman 5000 Toy
Baker 4500 ToyCarson 4800
CarDavis 5100 Car
28Problem Areas
- No unique identifier
- Implicit information in ordering
- Repeating groups
- Multiple record types
29Problem AreaNo Unique Identifier
- Problem Some files do not have records with
unique identifiers. - Solution Generate a new field that contains a
unique identifier.
30No Unique Identifier Example
31Problem AreaImplicit Information in Ordering
- Problem Files in which record order implies
information that is not represented in field
values e.g., - Order implies waiting list, ranking, etc.
- Solution Generate a new field whose value shows
the order e.g., - New field containing a value representing the
position in the file
32Implicit Information in OrderingExample
33Problem Area Repeating Groups
- Example file
- Solution Build new table for repeating group
34Problem Area Multiple Record Types
- Example file
- Solution Split into two tables
D Car 45000000E Ackman 00005000E Baker 000045
00D Toy 48000000E Carson 00004800E Davis 000
05100
DepartmentDName BudgetCar 45000000
Toy 48000000
EmployeeDept EName
SalaryToy Ackman 5000 Toy Baker
4500Car Carson
4800Car Davis 5100
35Translating Hierarchical Data Structures to
Relational Data Structures
File Data Definition Relational Data
Descriptionrecord type tabledata
item columnrecord key primary
keyparent-child foreign key
36Concept CorrespondenceExample
37Problem Areas
- Implicit information in ordering
- Repeating groups
- Solution Same as in file systems
38Problem Area Duplicate Segments
- Solution Duplicate information in multiple rows
and make foreign key part of primary key
39Concept Correspondence Example
40Problem Areas
- Files in which the record order implies
information that is not represented in field
values. - Repeating groups
- Multiple record types
- Ordering of the members of a set implies
information - Solution Same solutions as in file-to-relational
data structure translation
41Exercise Translating File Structures to
Relational Data Structures
- Convert the following files to relational
database tablesThere are two record types in
the file, Employee and Dependent. The Employee
record contains Employee flag (E), EmpName,
EmpNumber, NumberOfSkills, Skill1, Skill2, and
Skill3. Following each Employee record are the
records of the employees dependents
DependentFlag (D), DependentName, and
Birthdate - A sample of the file follows E Ackman 555-55-555
5 3 Fortran COBOL Pascal D Sally 1979 D Susan 19
83 D Fred 1984 E Baker 666-66-6666 2 Pascal Lisp
E Carson 777-77-7777 3 Prolog C Forth D Sam 198
8 D George 1989
42Translating Nonprocedural Data Manipulation
Commands to Procedural Data Manipulation Commands
- Used for all procedural data models
- Hierarchical
- File
- Consider data structures and access techniques
- Indexes
- Physical ordering
- Hash files
- Etc.
43Translating Nonprocedural to Procedural Data
Manipulation Commands Example
- Exercise Write a flowchart that performs this
SQL statement assuming Department is a sequential
file with no index
44Translating Nonprocedural to Procedural Data
Manipulation Commands Another Example
- Exercise Write a flowchart that performs this
SQL statement assuming that both Department and
Employee are unordered sequential files
45Designing Distributed Databases
- Data Models
- Choosing a Standard Data Model
- Data Model Translators
- Schema Integration
- Summary
46Schema Integration
UI Schema
Allocation Schema
Local Standard Schema
Allocation Schema
47Why Integrate Schemas?
- To build a global conceptual schema that
describes data in multiple databases - The relationship between original schemas and the
global logical schema is represented in the
allocation schema.
48Schema Integration Problems
- Differences in data structures
- Differences in database values
49Differences in Data Structures Example
- Engineer Name,EmpId,Salary,NumberOfPatents
- Scientist Name,EmpId,Salary,HighestDegree
- Secretary Name,EmpId,Salary,WordsPerMin
- Equipment Name,Inventory,Location
- Employee Name,EmpId,Salary,HireDate
50Integrating Two Tables
- Determine the relationship of the tables rows
- Containment
- Overlap
- Disjoint
- Determine how many tables to create
- Single table
- Joins the columns
- Set noncommon attributes to null
- Multiple tables
- One table contains common columns
- Other tables contain key and noncommon columns
A
B
51Strict Containment Single Table Approach
- Each scientist is also an employee Scientist
Name,EmpId,Salary,HighestDegree in
DBMS1 Employee Name,EmpId,Salary,HireDate in
DBMS2 - Union of attributes Employee Name,EmpId,Salary,
HireDate, HighestDegree - HighestDegree has value only for Scientist,
otherwise null - HighestDegree column serves two purposes
- Contains the value of highest degree for
scientists - Indicates whether row represents Scientist or
Employee
52Strict Containment Multiple Table Approach
- Each scientist is also an employee. Scientist
Name,EmpId,Salary,HighestDegree in
DBMS1 Employee Name,EmpId,Salary,HireDate in
DBMS2 - Create two tables Employee Name,EmpId,Salary,Hir
eDate Scientist EmpId,HighestDegree - Scientist.EmpId is the foreign key
53Strict Containment Single Table Disadvantages
- Multiple attributes must either all be null or
all have values. - For example, if both HighestDegree and School are
attributes of Scientist but not
Employee Scientist Name,EmpId,Salary,HighestDegr
ee,School in DBMS1 Employee Name,EmpId,Salary,
HireDate in DBMS2 Employee
Name,EmpId,Salary,HireDate,HighestDegree,School - Either HighestDegree and School must both have
values or neither can have values
54Strict Containment Multiple Table Disadvantages
- Data about one entity in two places
- Expensive for retrieval
- Updates are complex
- For example, Employee Name,EmpId,Salary,HireDate
Scientist EmpId,HighestDegree - Employee and Scientist must be joined to access
all of Scientists attributes. - Whenever a scientist resigns, a row must be
deleted from both the Scientist and Employee
tables.
55Overlap Single Table Approach
- Engineer Name,EmpId,Salary,NumberOfPatents
- Scientist Name,EmpId,Salary,HighestDegree
- Some engineers are also scientists, but some are
not. - Some scientists are also engineers, but some are
not. - Union of attributes
- RDStaff Name,EmpId,Salary, HighestDegree,Numbe
rOfPatents - Only engineers have values for NumberOfPatents.
- Only scientist have values for HighestDegree.
56Overlap Multiple Table Approach
- Engineer Name,EmpId,Salary,NumberOfPatents
- Scientist Name,EmpId,Salary,HighestDegree
- Some engineers are also scientists, but some are
not. - Some scientists are also engineers, but some are
not. - Union of attributes
- RDStaff Name,EmpId,Salary Scientist
EmpId, HighestDegree Engineer EmpId,
NumberOfPatents - Only engineers have values for NumberOfPatents.
- Only scientist have values for HighestDegree.
57Disjoint
- Equipment Name,Inventory,Location
- Employee Name,EmpId,Salary,HireDate
- No Employee is an Equipment.
- No Equipment is an Employee.
- Leave Equipment and Employee as separate tables.
58Disjoint But Merge
- Scientist Name,EmpId,Salary,HighestDegree
- Secretary Name,EmpId,Salary,WordsPerMin
- Merge is similar to overlap.
59General Approach for Schema Integration
- For each pair of Tables A and B, determine
whether - A is contained in B
- B is contained in A
- A is equivalent to B
- A overlaps B
- A is disjoint from B, but integrates anyway
- A is disjoint from B, but does not integrate
- Modify tables if table integration is warranted.
- If the single table approach is used, remember to
specify which groups of attributes must all be
null or nonnull. - If the multiple table approach is used, remember
to specify new foreign key constraints.
60Schema Integration Exercise
- Create a schema for the following tables
- Engineer Name,EmpId,Salary,NumberOfPatents
- Scientist Name,EmpId,Salary,HighestDegree
- Secretary Name,EmpId,Salary,WordsPerMin
- Equipment Name,Inventory,Location
- Employee Name,EmpId,Salary,HireDate
- Assume that the Employee file is the master file
maintained by the personnel office and that it is
used for generating paychecks that the Equipment
file is maintained by the building maintenance
office and that the remaining files are
maintained by department heads
61Data Integration Problems
- Corresponding columns of two table rows may have
different values. - Domains of corresponding columns may have
- Different precision
- Different semantic integrity constraints
- Different coding structures
Domain Set of values contained in a column
62Corresponding Columns of Two Table Rows That Have
Different Values
63Resolving Data Value Differences
- If one database contains value x and the other
contains value y, what value does the user see? - Automatically resolve the difference
- Min (x,y)
- Choose from the table judged to be most reliable
- Null, not applicable, dont know
- User resolves the difference
- Retain both values
- Go back to original source
64Handling Data Value Differences Example
Let user resolvedifferences
Resolve Differences
65Resolving Data Value Differences Exercise
- Integrate the following tables and resolve data
differences
66Schema Integration Summary
- Types of integration
- Schema
- Create a schema describing data in multiple
databases - Data
- Merge corresponding records from different
databases - Schema and data must be integrated if the
distributed database is to support a unified
database. - Otherwise, users must resolve schema and data
inconsistencies.
67Designing Distributed Databases
- Data Models
- Choosing a Standard Data Model
- Data Model Translators
- Schema Integration
- Summary
68Summary
- Several data models are used for centralized
DBMSs. - Relational
- Hierarchical
- Network
- Files
- The relational data model is frequently chosen
for distributed DBMSs. - Translators are needed for user interfaces.
- Translators are needed for heterogeneous local
DBMSs. - Data model translators are complicated by
differences in the descriptive capabilities of
the various data models. - Nonprocedural-to-procedural translations use
local data structures - Procedural-to-nonprocedural are very difficult.
More
69Summary, cont.
- To integrate data, two tasks must be performed
- Schema integration, to build a global schema
- Data value inconsistency resolution, to remove
data discrepancies - Files should be fragmented based on use by
important and frequently executed applications. - The original file can be constructed from
fragments. - Fragments should be allocated based on efficiency
considerations.