Title: Normalization
1Normalization
- enables the analysis of the design of a
relational database - gives a method for identifying the existence of
potential problems in the design - involves various types of normal forms
2First, Second, and Third Normal Forms
- 1NF - Boyce-Codd (1972)
- 2NF - Boyce-Codd (1972)
- 3NF - Boyce-Codd (1972)
- BCNF - Boyce-Codd (1974)
3Repeating Groups
SOFTWARE (PACKID, TAGNUM, INSTDATE, SOFTCOST)
4First Normal Form
- definition - no repeating groups
- general solution - expand the primary key to
include multiple attributes - original primary key
- the attribute that differentiates the repeating
group occurrences
5Example of 1NF
FD PACKID --gt PACKNAMETAGNUM --gt COMPID PACKID,
TAGNUM --gt PACKNAME, COMPID, INSTDATE, SOFTCOST
6Problems with 1NF
- update anomalies
- update
- inconsistent data
- insertions
- deletions
- occur because we have attributes that are
dependent on only a portion of the primary key - non-key attribute
- an attribute that is not part of the primary key
7Second Normal Form
- already 1NF
- no non-key attribute is dependent on only a
portion of the primary key. - no partial dependencies (dependencies on less
than the primary key) - solution - partial dependencies are pulled out of
the table and placed in a table that reflects
their dependency
8Example Solution for 2NF
- Step 1 - List the primary key and all possible
subsets as new relations - (PACKID,
- (TAGNUM,
- (PACKID, TAGNUM,
9Example Solution for 2NF
- Step 2 - Place each attribute with the minimal
primary key on which it is dependent - SOFTWARE (PACKID, PACKNAME)
- PC (TAGNUM, COMPID)
- PACKAGE (PACKID, TAGNUM, INSTDATE, SOFTCOST)
10Another Sample of 2NF
- PC (TAGNUM, COMPID, EMPNUM, EMPNAME, LOCATION)
- FD
- TAGNUM --gt COMPID, EMPNUM, EMPNAME, LOCATION
- EMPNUM --gt EMPNAME
11Table for Sample of 2NF
12Problems with 2NF
- update anomalies
- update
- inconsistent data
- insertions
- deletions
- problems occur because EMPNUM, a non-primary key,
determines EMPNAME - since EMPNUM is not the primary key it can appear
in multiple rows as can the EMPNAME
13Problems with 2NF
- determinant - any attribute or collection of
attributes that determines another attribute
14Third Normal Form
- Definition - any attribute (or collection of
attributes) that determines anouther attribute is
called a determinant. - Definition - a relation is in third normal form
(3NF) if it is in second normal form and if the
only determinants it contains are candidate keys.
15Sample Table in 2NF
PC (TAGNUM, COMPID, EMPNUM, EMPNAME, LOCATION)
16Functional Dependencies for the Table
- TAGNUM --gt COMPID, EMPNUM, EMPNAME, LOCATION
- EMPNUM --gt EMPNAME
17Dependency Diagram
18General Solution to Decompose from 2NF to 3NF
- for each determinant that is not a candidate key,
remove the attributes that depend on the
determinant. - create a new relation containing all the
attributes that depend on the determinant. - make the determinant the primary key for the new
relation. - (Very similar to decomposition from 1NF to 2NF)
19Solution for our 2NF Table
- Two tables
- PC (TAGNUM, COMPID, EMPNUM, LOCATION)
- EMPLOYEE (EMPNUM, EMPNAME)
202 Tables
EMPLOYEE
21Incorrect Decompositions
- If the original 2NF relation is decomposed
incorrectly, the problems will still exist with
redundant data and update anomalies - never split a functional dependence across 2
relations - always preserve the original information
22Application to Database Design
- 3NF is all that is needed
- instances exist where it is unwise to put a
relation into 3NF - splitting of relations requires the use of
interrelational constraints