Physical%20Database%20Design - PowerPoint PPT Presentation

About This Presentation
Title:

Physical%20Database%20Design

Description:

Example code-look-up table (Pine Valley Furniture Company) Data Format decisions (integrity) ... record in an overflow area. Long synonym chains degrade ... – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 55
Provided by: traci5
Learn more at: https://www.ou.edu
Category:

less

Transcript and Presenter's Notes

Title: Physical%20Database%20Design


1
Chapter 6
  • Physical Database Design

2
Physical Design
  • The purpose of the physical design process is to
    translate the logical description of the data
    into technical specifications for storing and
    retrieving data
  • Goal create a design that will provide adequate
    performance and insure database integrity,
    security, and recoverability
  • Decisions made in this phase have a major impact
    on data accessibility, response times, security,
    and user friendliness.

3
Required Inputs
  • Normalized relations
  • Data volume and use estimates
  • Attribute definitions
  • Descriptions of where and when data are used
  • Expectations for response time, data security,
    backup, recovery, retention, and integrity
  • Description of chosen technology

4
Determining volume and usage
  • Data volume statistics represent the size of the
    business
  • calculated assuming business growth over a period
    of several years
  • Usage is estimated from the timing of events,
    transaction volumes, and reporting and query
    activity.
  • Less precise than volume statistics

5
Composite usage map (Pine Valley Furniture
Company)
6
Physical Design Decisions
  • Specify the data type for each attribute from the
    logical data model
  • minimize storage space and maximize integrity
  • Specify physical records by grouping attributes
    from the logical data model
  • Specify the file organization technique to use
    for physical storage of data records
  • Specify indexes to optimize data retrieval
  • Specify query optimization strategies

7
Data Format
  • Data type selection goals
  • minimize storage
  • represent all possible values
  • eliminate illegal values
  • improve integrity
  • support manipulation
  • Note these have different relative importance

8
Data format decisions (coding)
  • e.g., AH(Adams Hall), B(Buchanan) , etc
  • implement by creating a look-up table
  • there is a trade-off in that you must create and
    store a second table and you must access this
    table to look up the code value
  • consider using when a field has a limited number
    of possible values, each of which occupies a
    relatively large amount of space, and the number
    of records is large and/or the number of record
    accesses is small

9
Example code-look-up table (Pine Valley Furniture
Company)
10
Data Format decisions (integrity)
  • Data integrity controls
  • default value
  • Range control
  • Null value control
  • Referential integrity
  • Missing data
  • substitute an estimate
  • report missing data
  • Sensitivity testing

11
For example...
  • Suppose you were designing the age field in a
    student record at your university. What decisions
    would you make about
  • data type
  • integrity (range, default, null)
  • How might your decision vary by other
    characteristics about the student such as degree
    sought?

12
Attribute groupings
  • Physical Record A group of fields stored in
    adjacent memory locations and retrieved together
    as a unit.
  • Page The amount of data read or written in one
    I/O operation.
  • Blocking Factor The number of physical records
    per page.

13
Database Access Model
The goal in structuring physical records is to
minimize performance bottlenecks resulting from
disk accesses (accessing data from disk is slow
compared to main memory)
14
Attribute groupingDenormalization
  • Process of transforming normalized relations into
    denormalized physical record specifications
  • may partition a relation into more than one
    physical record
  • may combine attributes from different relations
    into one physical record

15
Denormalization
  • Involves a trade-off
  • Reduced disk accesses and greater performance
    (due, for example, to fewer table joins)
  • - But -
  • Introduction of anomalies (and thus redundancies)
    that will necessitate extra data maintenance
  • increase chance of errors and force reprogramming
    when business rules change
  • may optimize certain tasks at the expense of
    others (if activities change, benefits may no
    longer exist)

16
Denormalization opportunities
  • 11 relationship
  • MM associative entity with non-key attributes
  • reference data

17
A possible denormali-zation situation reference
data
18
More denormalization options
  • Horizontal Partitioning Distributing the rows of
    a table into several separate files.
  • Vertical Partitioning Distributing the columns
    of a table into several separate files.
  • The primary key must be repeated in each file.
  • Combination of both

19
Partitioning
  • Advantages of Partitioning
  • Records used together are grouped together
  • Each partition can be optimized for performance
  • Security and recovery
  • Partitions stored on different disks less
    contention
  • Parallel processing capability
  • Disadvantages of Partitioning
  • Slower retrievals when across partitions
  • Complexity for application programmers
  • Anomalies and extra storage space requirements
    due to duplication of data across partitions

20
How about this..
  • Consider the following normalized relations
  • STORE(Store_Id, Region, Manager_Id, Square_Feet)
  • EMPLOYEE(Emp_Id, Store_Id, Name, Address)
  • DEPARTMENT(Dept, Store_ID, Manager_Id,
    Sales_Goal)
  • SCHEDULE(Dept, Emp_Id, Date, hours)
  • What opportunities might exist for
    denormalization?

21
Physical Files
  • Physical File A file as stored on disk
  • Constructs to link two pieces of data
  • Sequential storage
  • Pointers
  • File Organization How the files are arranged on
    the disk (more on this later)
  • Access Method How the data can be retrieved
    based on the file organization
  • Relative - data accessed as an offset from the
    most recently referenced point in secondary
    memory
  • Direct - data accessed as a result of a
    calculation to generate the beginning address of
    a record

22
File Organizations
  • A technique for physically arranging the records
    of a file on secondary storage devices.
  • Goals in selecting (trade-offs exist, of
    course)
  • Fast data retrieval
  • High throughput for input and maintenance
  • Efficient use of storage space
  • Protection from failures or data loss
  • Minimal need for reorganization
  • Accommodation for growth
  • Security from unauthorized use

23
File Organizations
  • Sequential
  • Indexed
  • Indexed Sequential
  • Indexed Nonsequential
  • Hashed (also called Direct)
  • See Table 6-3 for comparison

24
Sequential File Organization
  • Records of the file are stored in sequence by the
    primary key field values

25
Comparisons of file organizations (a) Sequential
26
Sequential Retrieval
  • Consider a file of 10,000 records each occupying
    1 page
  • Queries that require processing all records will
    require 10,000 accesses
  • e.g., Find all items of type 'E'
  • Many disk accesses are wasted if few records meet
    the condition
  • However, very effective if most or all records
    will be accessed (e.g., payroll)

27
Indexed File Organization
  • Index concept is like index in a book
  • Indexed-sequential file organization The
    records are stored sequentially by primary key
    values and there is an index built on the primary
    key field (and possibly indexes built on other
    fields, also)

28
(b) Indexed
29
Hashed File Organization
  • Hashing Algorithm Converts a primary key value
    into a record address
  • Division-remainder method is common hashing
    algorithm(more to come on this)

30
(c ) Hashed
31
Hashing
  • A technique for reducing disk accesses for direct
    access
  • Avoids an index
  • Number of accesses per record can be close to one
  • The hash field is converted to a hash address by
    a hash function

32
Hashing
33
Shortcomings of Hashing
  • Different hash fields may convert to the same
    hash address
  • these are called Synonyms
  • store the colliding record in an overflow area
  • Long synonym chains degrade performance
  • There can be only one hash field per record
  • The file can no longer be processed sequentially

34
Clustering
  • In some relational DBMS, related records from
    different tables that are often retrieved
    together can be stored close together on disk
  • Because the related records are stored close to
    one another on the physical disk, less time is
    needed to retrieve the data
  • E.g., Customer data and Order data may frequently
    be retrieved together
  • Can require substantial maintenance if the
    clustered data changes frequently

35
Indexing
  • An index is a table file that is used to
    determine the location of rows in another file
    that satisfy some condition

36
Querying with an Index
  • Read the index into memory
  • Search the index to find records meeting the
    condition
  • Access only those records containing required
    data
  • Disk accesses are substantially reduced when the
    query involves few records

37
Maintaining an Index
  • Adding a record requires at least two disk
    accesses
  • Update the file
  • Update the index
  • Trade-off
  • Faster queries
  • Slower maintenance (additions, deletions, and
    updates of records)
  • Thus, more static databases benefit more overall

38
Rules of Thumbfor Using Indexes
  • 1. Indexes are most useful on larger tables
  • 2. Index the primary key of each table(may be
    automatic, as in Access)
  • 3. Indexes are useful on search fields (WHERE)
  • 4. Indexes are also useful on fields used for
    sorting (ORDER BY) and categorizing (GROUP BY)
  • 5. Most useful to index on a field when there are
    many different values for that field

39
Rules of Thumbfor Using Indexes
  • 6. Find out the limits placed on indexing by your
    DBMS (Access allows 32 indexes per table, and no
    index may contain more than 10 fields)
  • 7. Depending on the DBMS, null values may not be
    referenced from an index (thus, rows with a null
    value in the field that is indexed may not be
    found by a search using the index)

40
Rules for Adding Derived Columns
  • Use when aggregate values are regularly
    retrieved.
  • Use when aggregate values are costly to
    calculate.
  • Permit updating only of source data.
  • Create triggers to cascade changes from source
    data.

41
One Other Rule of Thumbfor Increasing Performance
  • Consider contriving a shorter field or selecting
    another candidate key to substitute for a long,
    multi-field primary key (and all associated
    foreign keys)

42
Query Optimizer Factors
  • Type of Query
  • Highly selective.
  • All or most of the records of a file.
  • Unique fields
  • Size of files
  • Indexes
  • Join Method
  • Nested-Loop
  • Merge-Scan (Both files must be ordered or indexed
    on the join columns.)

43
More practice
  • Draw a composite usage map for the following
  • PERSON(person_ID, name, address, DOB)
  • PATIENT(PA_person_ID, Contact)
  • PHYSICIAN(PH_person_ID, specialty)
  • PERFORMANCE(PA_person_ID, PH_person_ID,
    Treatment, Treatment_date, Treatment_time)
  • CONSUMPTION(PA_person_ID, Item, Date, Quantity)
  • ITEM(Item, Description)
  • Make recommendations about denormalizing,
    partitioning, file organization, and indexing

44
RAID
  • Redundant Arrays of Inexpensive Disks
  • Exploits economies of scale of disk manufacturing
    for the computer market
  • Can give greater security
  • Increases fault tolerance of systems
  • Not a replacement for regular backup

45
RAID
  • The operating system sees a set of physical
    drives as one logical drive
  • Data are distributed across physical drives
  • All levels, except 0, have data redundancy or
    error-correction features
  • Parity codes or redundant data are used for data
    recovery

46
Mirroring
  • Write
  • Identical copies of file are written to each
    drive in array
  • Read
  • Alternate pages are read simultaneously from each
    drive
  • Pages put together in memory
  • Access time is reduced by approximately the
    number of disks in the array
  • Read error
  • Read required page from another drive
  • Tradeoffs
  • Provides data security
  • Reduces access time
  • Uses more disk space

47
Mirroring
Complete Data Set
Complete Data Set
No parity
48
Striping
  • Three drive model
  • Write
  • Half of file to first drive
  • Half of file to second drive
  • Parity bit to third drive
  • Read
  • Portions from each drive are put together in
    memory
  • Read error
  • Lost bits are reconstructed from third drives
    parity data
  • Tradeoffs
  • Provides data security
  • Uses less storage space than mirroring
  • Not as fast as mirroring

49
Striping
One-Half Data Set
One-Half Data Set
Parity Codes
50
Database Architectures
  • Hierarchical
  • Network
  • Relational
  • Object-oriented
  • Multidimensional

51
Hierarchical models
  • Type of logical database model in which data is
    organized in a tree structure.
  • Records are divided into segments which are
    linked using pointers.
  • Parent and child data are stored together.

52
Sample hierarchical model
ROOT/PARENT
Employee
Job
Benefits
Compensation
FIRST CHILD
Assignments
2nd CHILD
Ratings
Salary
Pension
Insurance
Health
53
Network model
  • Extends the hierarchical model by allowing a
    child to have zero, one, or many parents
  • No real theoretical base
  • Never widely
  • used

54
Comparing the alternatives
OODBS are marketed as easy to use, but the
transition from RDBMS is difficult
Write a Comment
User Comments (0)
About PowerShow.com