Physical%20Database%20Design - PowerPoint PPT Presentation

About This Presentation

Title:

Physical%20Database%20Design

Description:

Example code-look-up table (Pine Valley Furniture Company) Data Format decisions (integrity) ... record in an overflow area. Long synonym chains degrade ... – PowerPoint PPT presentation

Number of Views:108

Avg rating:3.0/5.0

Slides: 55

Provided by: traci5

Learn more at: https://www.ou.edu

Category:

more less

Transcript and Presenter's Notes

Title: Physical%20Database%20Design

1
Chapter 6

Physical Database Design

2
Physical Design

The purpose of the physical design process is to
translate the logical description of the data
into technical specifications for storing and
retrieving data
Goal create a design that will provide adequate
performance and insure database integrity,
security, and recoverability
Decisions made in this phase have a major impact
on data accessibility, response times, security,
and user friendliness.

3
Required Inputs

Normalized relations
Data volume and use estimates
Attribute definitions
Descriptions of where and when data are used
Expectations for response time, data security,
backup, recovery, retention, and integrity
Description of chosen technology

4
Determining volume and usage

Data volume statistics represent the size of the
business
calculated assuming business growth over a period
of several years
Usage is estimated from the timing of events,
transaction volumes, and reporting and query
activity.
Less precise than volume statistics

5
Composite usage map (Pine Valley Furniture
Company)
6
Physical Design Decisions

Specify the data type for each attribute from the
logical data model
minimize storage space and maximize integrity
Specify physical records by grouping attributes
from the logical data model
Specify the file organization technique to use
for physical storage of data records
Specify indexes to optimize data retrieval
Specify query optimization strategies

7
Data Format

Data type selection goals
minimize storage
represent all possible values
eliminate illegal values
improve integrity
support manipulation
Note these have different relative importance

8
Data format decisions (coding)

e.g., AH(Adams Hall), B(Buchanan) , etc
implement by creating a look-up table
there is a trade-off in that you must create and
store a second table and you must access this
table to look up the code value
consider using when a field has a limited number
of possible values, each of which occupies a
relatively large amount of space, and the number
of records is large and/or the number of record
accesses is small

9
Example code-look-up table (Pine Valley Furniture
Company)
10
Data Format decisions (integrity)

Data integrity controls
default value
Range control
Null value control
Referential integrity
Missing data
substitute an estimate
report missing data
Sensitivity testing

11
For example...

Suppose you were designing the age field in a
student record at your university. What decisions
would you make about
data type
integrity (range, default, null)
How might your decision vary by other
characteristics about the student such as degree
sought?

12
Attribute groupings

Physical Record A group of fields stored in
adjacent memory locations and retrieved together
as a unit.
Page The amount of data read or written in one
I/O operation.
Blocking Factor The number of physical records
per page.

13
Database Access Model
The goal in structuring physical records is to
minimize performance bottlenecks resulting from
disk accesses (accessing data from disk is slow
compared to main memory)
14
Attribute groupingDenormalization

Process of transforming normalized relations into
denormalized physical record specifications
may partition a relation into more than one
physical record
may combine attributes from different relations
into one physical record

15
Denormalization

Involves a trade-off
Reduced disk accesses and greater performance
(due, for example, to fewer table joins)
- But -
Introduction of anomalies (and thus redundancies)
that will necessitate extra data maintenance
increase chance of errors and force reprogramming
when business rules change
may optimize certain tasks at the expense of
others (if activities change, benefits may no
longer exist)

16
Denormalization opportunities

11 relationship
MM associative entity with non-key attributes
reference data

17
A possible denormali-zation situation reference
data
18
More denormalization options

Horizontal Partitioning Distributing the rows of
a table into several separate files.
Vertical Partitioning Distributing the columns
of a table into several separate files.
The primary key must be repeated in each file.
Combination of both

19
Partitioning

Advantages of Partitioning
Records used together are grouped together
Each partition can be optimized for performance
Security and recovery
Partitions stored on different disks less
contention
Parallel processing capability
Disadvantages of Partitioning
Slower retrievals when across partitions
Complexity for application programmers
Anomalies and extra storage space requirements
due to duplication of data across partitions

20
How about this..

Consider the following normalized relations
STORE(Store_Id, Region, Manager_Id, Square_Feet)
EMPLOYEE(Emp_Id, Store_Id, Name, Address)
DEPARTMENT(Dept, Store_ID, Manager_Id,
Sales_Goal)
SCHEDULE(Dept, Emp_Id, Date, hours)
What opportunities might exist for
denormalization?

21
Physical Files

Physical File A file as stored on disk
Constructs to link two pieces of data
Sequential storage
Pointers
File Organization How the files are arranged on
the disk (more on this later)
Access Method How the data can be retrieved
based on the file organization
Relative - data accessed as an offset from the
most recently referenced point in secondary
memory
Direct - data accessed as a result of a
calculation to generate the beginning address of
a record

22
File Organizations

A technique for physically arranging the records
of a file on secondary storage devices.
Goals in selecting (trade-offs exist, of
course)
Fast data retrieval
High throughput for input and maintenance
Efficient use of storage space
Protection from failures or data loss
Minimal need for reorganization
Accommodation for growth
Security from unauthorized use

23
File Organizations

Sequential
Indexed
Indexed Sequential
Indexed Nonsequential
Hashed (also called Direct)
See Table 6-3 for comparison

24
Sequential File Organization

Records of the file are stored in sequence by the
primary key field values

25
Comparisons of file organizations (a) Sequential
26
Sequential Retrieval

Consider a file of 10,000 records each occupying
1 page
Queries that require processing all records will
require 10,000 accesses
e.g., Find all items of type 'E'
Many disk accesses are wasted if few records meet
the condition
However, very effective if most or all records
will be accessed (e.g., payroll)

27
Indexed File Organization

Index concept is like index in a book
Indexed-sequential file organization The
records are stored sequentially by primary key
values and there is an index built on the primary
key field (and possibly indexes built on other
fields, also)

28
(b) Indexed
29
Hashed File Organization

Hashing Algorithm Converts a primary key value
into a record address
Division-remainder method is common hashing
algorithm(more to come on this)

30
(c ) Hashed
31
Hashing

A technique for reducing disk accesses for direct
access
Avoids an index
Number of accesses per record can be close to one
The hash field is converted to a hash address by
a hash function

32
Hashing
33
Shortcomings of Hashing

Different hash fields may convert to the same
hash address
these are called Synonyms
store the colliding record in an overflow area
Long synonym chains degrade performance
There can be only one hash field per record
The file can no longer be processed sequentially

34
Clustering

In some relational DBMS, related records from
different tables that are often retrieved
together can be stored close together on disk
Because the related records are stored close to
one another on the physical disk, less time is
needed to retrieve the data
E.g., Customer data and Order data may frequently
be retrieved together
Can require substantial maintenance if the
clustered data changes frequently

35
Indexing

An index is a table file that is used to
determine the location of rows in another file
that satisfy some condition

36
Querying with an Index

Read the index into memory
Search the index to find records meeting the
condition
Access only those records containing required
data
Disk accesses are substantially reduced when the
query involves few records

37
Maintaining an Index

Adding a record requires at least two disk
accesses
Update the file
Update the index
Trade-off
Faster queries
Slower maintenance (additions, deletions, and
updates of records)
Thus, more static databases benefit more overall

38
Rules of Thumbfor Using Indexes

1. Indexes are most useful on larger tables
2. Index the primary key of each table(may be
automatic, as in Access)
3. Indexes are useful on search fields (WHERE)
4. Indexes are also useful on fields used for
sorting (ORDER BY) and categorizing (GROUP BY)
5. Most useful to index on a field when there are
many different values for that field

39
Rules of Thumbfor Using Indexes

6. Find out the limits placed on indexing by your
DBMS (Access allows 32 indexes per table, and no
index may contain more than 10 fields)
7. Depending on the DBMS, null values may not be
referenced from an index (thus, rows with a null
value in the field that is indexed may not be
found by a search using the index)

40
Rules for Adding Derived Columns

Use when aggregate values are regularly
retrieved.
Use when aggregate values are costly to
calculate.
Permit updating only of source data.
Create triggers to cascade changes from source
data.

41
One Other Rule of Thumbfor Increasing Performance

Consider contriving a shorter field or selecting
another candidate key to substitute for a long,
multi-field primary key (and all associated
foreign keys)

42
Query Optimizer Factors

Type of Query
Highly selective.
All or most of the records of a file.
Unique fields
Size of files
Indexes
Join Method
Nested-Loop
Merge-Scan (Both files must be ordered or indexed
on the join columns.)

43
More practice

Draw a composite usage map for the following
PERSON(person_ID, name, address, DOB)
PATIENT(PA_person_ID, Contact)
PHYSICIAN(PH_person_ID, specialty)
PERFORMANCE(PA_person_ID, PH_person_ID,
Treatment, Treatment_date, Treatment_time)
CONSUMPTION(PA_person_ID, Item, Date, Quantity)
ITEM(Item, Description)
Make recommendations about denormalizing,
partitioning, file organization, and indexing

44
RAID

Redundant Arrays of Inexpensive Disks
Exploits economies of scale of disk manufacturing
for the computer market
Can give greater security
Increases fault tolerance of systems
Not a replacement for regular backup

45
RAID

The operating system sees a set of physical
drives as one logical drive
Data are distributed across physical drives
All levels, except 0, have data redundancy or
error-correction features
Parity codes or redundant data are used for data
recovery

46
Mirroring

Write
Identical copies of file are written to each
drive in array
Read
Alternate pages are read simultaneously from each
drive
Pages put together in memory
Access time is reduced by approximately the
number of disks in the array
Read error
Read required page from another drive
Tradeoffs
Provides data security
Reduces access time
Uses more disk space

47
Mirroring
Complete Data Set
Complete Data Set
No parity
48
Striping

Three drive model
Write
Half of file to first drive
Half of file to second drive
Parity bit to third drive
Read
Portions from each drive are put together in
memory
Read error
Lost bits are reconstructed from third drives
parity data
Tradeoffs
Provides data security
Uses less storage space than mirroring
Not as fast as mirroring

49
Striping
One-Half Data Set
One-Half Data Set
Parity Codes
50
Database Architectures

Hierarchical
Network
Relational
Object-oriented
Multidimensional

51
Hierarchical models

Type of logical database model in which data is
organized in a tree structure.
Records are divided into segments which are
linked using pointers.
Parent and child data are stored together.

52
Sample hierarchical model
ROOT/PARENT
Employee
Job
Benefits
Compensation
FIRST CHILD
Assignments
2nd CHILD
Ratings
Salary
Pension
Insurance
Health
53
Network model