Title: Introduction to Normalization
1Introduction to Normalization
- CPSC 356 Database
- Ellen Walker
- Hiram College
2Building a Schema
- Start with a list of all attributes, considered
as if you had a giant flat database (one
relation) with all possible information in one
place - Divide the attributes into multiple relations
- Intuitively
- According to formal rules (normalization)
- This is a formalized alternative to the
algorithms we learned before
3What Makes A Good Schema?
- Each relation should have clear semantics, i.e.
can be easily described in a few words - Try to avoid redundancy (to minimize storage
space, but also to avoid anomalies) - Avoid a design that encourages too many NULL
values in a relation. NULL can be ambiguous N/A
vs. unknown vs. not-yet-entered, etc. - Dont split related attributes so that the
relationship between them is lost (e.g. make sure
LastName and UserID are both in the same relation)
4Tracking Real Estate Staff
- Consider a single relation for real estate
- It contains branch name, branch number, staff
name, staff number, staff salary, etc. - One entry for each staff member of each branch
- Branch information is repeated for different
staff (REDUNDANCY!) - Staff information is repeated if they work in
multiple branches (REDUNDANCY!) - This is an example of what NOT to do
5Redundancy-caused Anomalies
- Insertion Anomalies
- A branch with no staff has many NULLs
- Entering a new staff member has NULL branch info
- But branch number and staff number are both part
of primary key! (Why) - Deletion Anomalies
- When the last staff member at a branch is
deleted, the branch info is lost - Update Anomalies
- If we make a change in branch info once, it must
be changed in all copies (for all staff).
6Solving Redundancy Problems
- Decompose the relation into multiple relations
- Use Foreign Keys so the complete relation can be
reconstructed through a join - Branch has branch number branch info
- Staff has staff number, staff info branch
number as foreign key - Foreign Keys are exactly the attributes that are
in the primary key of the other relation - Insertion, deletion update anomalies are gone!
- Consider add branch with no staff, remove last
staff member, update branch info
7How to Decompose?
- Decompositions are not (always) intuitively
obvious - Codd discovered mathematical properties (called
Normal Forms) that describe goodness of
decomposition - First, Second, Third normal forms decrease
redundancy without loss of information - BCNF, Fourth and Fifth normal forms potentially
introduce information loss (we will see) - To understand normal forms, start with functional
dependencies
8Functional Dependencies
- If A and B are attributes, and every value of A
is associated with exactly one value of B (so
knowing A predicts B), then B is functionally
dependent on A (We write this as A-gtB) - Functional dependency is based on the semantics
(meaning) of the attributes. - A-gtB and B-gtA are two different constraints
- Email -gt first name is a valid dependency
- First name -gt Email is not a valid dependency
9Examples of Functional Dependencies
- US Zip Code -gt State
- US Area Code -gt State
- Email -gt Firstname, lastname
- HotelNo, RoomNo -gt Price
- JobTitle, ServiceLength -gt Salary
10What are the dependencies?
- item place customer-name
- ring Kay jewelers prince charming
- ring walmart miss piggy
- Place -gt item?
- Item -gt place?
11Finding Dependencies in Data
- If a value of attribute A is associated with two
or more values of B, then it is not true that
A-gtB. - If a value of attribute A is associated with
exactly one value of B, then it might be true
that A-gtB. - Only when every possible value of attribute A is
associated with exactly one value of B is it true
that A-gtB.
12Characteristics of Functional Dependencies for
Normalization
- For any given values of the attributes on the
left, there is exactly one possible attribute on
the right - No future data will ever invalidate the
dependency - Dependency is nontrivial -- no attributes from
the left are repeated on the right
13Keys Functional Dependency
- Remember, a candidate key is a subset of
attributes that is (guaranteed) unique for every
tuple - Therefore, a valid candidate key determines all
other attributes in the tuple - Therefore, there is a functional dependency from
the candidate key to all other non-key attributes
of the relation. - (Since the primary key is a candidate key, these
arguments can also be made for primary keys)
14Manipulating Functional Dependencies
- Given a set of dependencies, derive more
dependencies using inference rules - The closure X of a set of dependencies is the
set of all possible dependencies that can be
derived from it.
15Armstrongs Inference Rules for Manipulating
Dependencies
- if Y is a subset of X, then X -gt Y
- Alternatively X,Y -gt X (Reflexive)
- If X-gtY then X,Z-gtY,Z (Augmentation)
- If X-gtY and Y-gtZ then X-gtZ (Transitive)
16Additional Inference Rules
- A-gtA (Self-determination)
- If A-gtB,C then A-gtB and A-gtC (Decomposition)
- If A-gtB and A-gtC then A-gtB,C (Union)
- If A-gtB and C-gtD then A,C -gt B,D (Composition)
17When are two sets of FDs equivalent?
- When we can use inference rules to transform A to
B , then A and B are equivalent - Problem it might take a long time to find the
right set of inference rules - What we need is a standard form of FDs - then
we can just compare
18Finding the Closure
- F is a set of functional dependencies (e.g. the
obvious ones from primary keys) We want to find
X, which is the set of all attributes that are
dependent on X (based on F). - X X
- repeat
- for each dependency Y-gtZ in F do
- if Y is a subset of X then X X union Z
- until no more can be added to X
19Closure Example
- F is the following set of dependencies
- A-gtB,C C-gtD A,D -gt F
- What is A (all attributes that can be derived
from A)? - Initialize A A
- Because A-gtB,C add B,C to A
- Because C is in A and C-gtD, add D to A
- Because A and D are in A, add F to A
- Therefore A is A,B,C,D,F
20Equivalence Test
- Are the following sets of FDs equivalent?
- AB-gtC, D-gtE, AE-gtG, GD-gtH, ID-gtJ
- ABD-gtC, ABE-gtG, GD-gtEH, IE-gtJ
- Compute closures for each, if any two are
different, they are not equivalent - You will need to consider every left side
21Finding a Key
- Given a relation with attributes ABCDEFGHIJ and
the following FDs, find a candidate key for the
relation - AB-gtC, D-gtE, AE-gtG, GD-gtH, ID-gtJ
- A candidate key is a subset of attributes that
has the entire set of attributes as its closure. - Lets try ABD
22What is Normalization?
- Formal technique for analyzing relations based on
primary key (or candidate keys) and functional
dependencies - Series of tests (normal forms), each of which is
harder to pass - Normal forms 1NF, 2NF, 3NF, BCNF depend on
functional dependencies - Higher forms (4NF, 5NF) based on other
dependencies - To avoid update anomalies without loss, normalize
to 3NF.