Title: Normalization of Database Tables
1Lesson 5
- Normalization of Database Tables
2In this lesson, you will learn
- What normalization is and what role it plays in
the database design process - About the normal forms 1NF, 2NF, 3NF
- Transforming from lower normal forms to higher
normal forms - What and when to perform Denormlization
3Database Tables and Normalization
- Normalization
- Process for evaluating and correcting table
structures to minimize data redundancies - helps eliminate data anomalies
- Works through a series of stages called normal
forms - Normal form (1NF)
- Second normal form (2NF)
- Third normal form (3NF)
4Database Tables and Normalization (continued)
- 2NF is better than 1NF 3NF is better than 2NF
- For most business database design purposes, 3NF
is highest we need to go in the normalization
process - Highest level of normalization is not always most
desirable (sometime lower level needed, more
redundancy, less join faster performance)
5The Need for Normalization
- Example company that manages building projects
- Charges its clients by billing hours spent on
each contract - Hourly billing rate is dependent on employees
position - Periodically, a report is generated that contains
information displayed in Table 5.1
6A Sample Report Layout
7A Table in the Report Format
8The Need for Normalization (continued)
- Structure of data set in Figure 5.1 does not
handle data very well - The table structure appears to work report is
generated with ease - Unfortunately, the report may yield different
results, depending on what data anomaly has
occurred
9Data Anomalies
- Update
- modifying an attribute/value of an entity,
require many alterations (where that entity
exist) - eg)) changing EMP_NAME of an employee, changing
JOB_CLASS of employee EMP_NUM 15 - Insertion
- adding a new record, all attributes must be added
or dummy value assigned - eg)) new employee added, not yet assigned to any
project, need to assigned to dummy project record - Deletion
- losing of vital data due to deletion of an entity
- eg)) if emp_num 103 quits, deleting from employee
record, need to assign new employee to project
that emp 103 work on (otherwise lost of data,
causes effects to the charge)
10Conversion to First Normal Form
- Relational table must not contain repeating
groups - Repeating group
- when a group of multiple (related) entries can
exist for any single key attribute occurrence(a
field of a record referring to more than one
occurrence in a record) - 3 steps procedure
- Step 1 Eliminate the Repeating Groups
- Step 2 Identify Primary key
- Step 3 Identify all Dependencies
11Step 1 Eliminate the Repeating Groups
- Present data in a tabular format, where each cell
has a single value and there are no repeating
groups - Eliminate repeating groups by eliminating nulls,
making sure that each repeating group attribute
contains an appropriate data value
12Data Organization First Normal Form
To normalized diagram
To 1NF summary
To Dependency
To 3NF
13Step 2 Identify Primary key
- Primary key/Prime attribute? attribute that
uniquely identify a record - Old key of PROJ_NUM not unique
- New key must be composed
- What should be the key?
PROJ_NUM EMP_NUM
14Step 3 Identify all Dependencies
- Dependency diagram
- Depicts all dependencies found within a given
table structure - Providing birds-eye view of all relationships
among a tables attributes - Partial Dependency
- when an attribute depend on only part of the
primary key - exist only for table with composite key
- eg)) PROJ_NAME depend only on PROJ_NUM
- Transitive Dependency
- dependency of nonprime attribute on another
nonprime attribute - eg)) CHG_HOUR depend on JOB_CLASS
Back to PROJECT table
15A Dependency Diagram First Normal Form (1NF)
16A Dependency Diagram First Normal Form (1NF)
171NF
- Tabular format in which
- All key attributes are defined
- There are no repeating groups in the table
- All attributes are dependent on primary key
- All relational tables satisfy 1NF requirements
Back to PROJECT table
18Conversion to Second Normal Form
- Table is in second normal form (2NF) if
- It is in 1NF and
- It includes no partial dependencies
- No attribute is dependent on only a portion of
the primary key - NOTE a table with only one attribute as primary
key is already in 2NF
19Conversion to Second Normal Form
- 2 steps
- Step 1 Identify All Key Components and Separate
into different tables - Step 2 Identify the Dependent Attributes and
remove partial dependency
20Step 1 Identify All Key Components
- Write each key component on separate line, and
then write the original (composite) key on the
last line - Each component will become the key in a new table
- PROJ_NUM
- EMP_NUM
- PROJ_NUM, EMP_NUM
Back to Dependency Diagram
21Step 2 Identify the Dependent Attributes
- Remove partial dependency
- PROJECT (PROJ_NUM, PROJ_NAME)
- EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
- ASSIGNMENT (PROJ_NUM, EMP_NUM, ASSIGN_HOUR)
To 3 NF
22Second Normal Form (2NF) Conversion Results
23Conversion to Third Normal Form
- A table is in third normal form (3NF) if
- It is in 2NF and
- It contains no transitive dependencies
- Transitive Dependency
- dependency of nonprime attribute on another
nonprime attribute
24Conversion to Third Normal Form
- 3 Steps
- Step 1 Identify Each New Determinant (transitive
dependency) - Step 2 Identify the Dependent Attributes
- Step 3 Remove the Dependent Attributes from
Transitive Dependencies
25Step 1 Identify Each New Determinant
- For every transitive dependency, write its
determinant as a PK for a new table - Determinant
- Any attribute whose value determines other values
within a row - Eg)) CHG_HOUR depends on JOB_CLASS
- JOB_CLASS ? CHG_HOUR
- JOB_CLASS
26Step 2 Identify the Dependent Attributes
- List the attributes dependent on each determinant
identified in Step 1 and identify the dependency - Name the table to reflect its contents and
function - JOB(JOB_CLASS,CHG_HOUR)
Back to PROJECT table
27Step 3 Remove the Dependent Attributes from
Transitive Dependencies
- Eliminate all dependent attributes in transitive
relationship(s) from each table - Draw a new dependency diagram to show all tables
defined in Steps 13 - JOB (JOB_CLASS, CHG_HOUR)
- PROJECT (PROJ_NUM, PROJ_NAME)
- EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)
- ASSIGNMENT (PROJ_NUM, EMP_NUM, ASSIGN_HOUR)
Back to 2 NF
28Third Normal Form (3NF) Conversion Results
EMPLOYEE table
Back to PROJECT table (Compare attribute name)
ASSIGN table
29Improving the Design (continued)
- the following changes were made
- PK assignment (Add a new attribute, JOB_CODE as
PK) - Naming conventions (change attribute name to
match with table) - Adding attributes PROJECT (PROJ_NUM,PROJ_NAME,EMP_
NUM) as project leader - Adding relationships (new relationship affected
by adding EMP_NUM to PROJECT table) - Attribute atomicity (split necessary composite
attribute, EMP_NAME ? EMP_FNAME, EMP_LNAME)
PROJECT JOB EMPLOYEE ASSIGNMENT
30Improving the Design (continued)
- the following changes were made
- Refining PKs (Add a new attribute, ASSIGN_NUM as
PK) - Maintaining historical accuracy (add a new
attribute, ASSIGN_CHG_HOUR, if change of CHG_HOUR
historical record not affected) - Using derived attributes (add a new attribute,
TOTAL_ASSIGN) in ASSIGN table, even though a
derived attribute should not be stored but this
will help in faster report generation time)
PROJECT JOB EMPLOYEE ASSIGNMENT
31The Completed Database
Improvements
32Back normalized diagram
Improvements
33Improvements
Back normalized diagram
New PK
34Limitations on System-Assigned Keys
- System-assigned primary key may not prevent
confusing entries - Data entries in Table 5.2 are inappropriate
because they duplicate existing records - Yet there has been no violation of either entity
integrity or referential integrity - Solved by making Index Key
35Duplicate Entries in the JOB Table
36Normalization and Database Design
- Normalization should be part of design process
- Make sure that proposed entities meet required
normal form before table structures are created - Database design process (like a loop) redesign
and modification occur
37Normalization and Database Design (continued)
- ER diagram
- Provides the big picture, or macro view, of an
organizations data requirements and operations - Created through an iterative process
- Identifying relevant entities, their attributes
and their relationship - Use results to identify additional entities and
attributes
38Normalization and Database Design
- Normalization procedures
- Focus on the characteristics of specific entities
- A micro view of the entities within the ER
diagram - Difficult to separate normalization process from
ER modeling process - Two techniques should be used concurrently
39Overall Review for Contract Company
- company manage many projects
- each project requires many employees
- each employee may be assigned to many projects
- some employees are not assigned to a project
- each employee has only one major job
classification, which determine the hourly
billing rate - many employee can have the same job
classification
What are entities?
PROJECT, EMPLOYEE, JOB
40Overall Review for Contract Company
PROJECT requires MN EMPLOYEE PROJECT
requires 1M ASSIGNMENT EMPLOYEE enters 1M AS
SIGNMENT JOB assigns 1M EMPLOYEE
PROJECT(PROJ_NUM, PROJ_NAME) EMPLOYEE(EMP_NUM,
EMP_LNAME, EMP_FNAME, EMP_INITIAL,
JOB_DESCRIPTION, JOB_CHG_HOUR)
41The Initial ERD for a Contracting Company
1NF? 2NF? (any partial) 3NF? (any transitive)
42The Modified ERD for a Contracting Company
43The Incorrect Representation of a MN
Relationship
44The Correct Representation of a MN Relationship
45The Final (Implementable) ERD for a Contracting
Company
Is ASSIGNMENT a weak entity?
NO. ASSIGNMENT is not weak entity because it does
not inherit PK from EMPLOYEE and PROJECT to be
used as its PK but only used as FK
46Denormalization
- normalization ? decomposed to many tables
- many join between tables
- many disk I/O operations and processing, reduce
speed - therefore some may need denormalization to
increase processing speed - speed must be balanced against data anomalies
EG)) customer address (ZIP_CODE,CITY)