Title: Relational database
1Lecture 3 on Data Normalization
Study the methods of first, second, third,
Boyce-Codd, fourth and fifth normal form for
relational database design, in order to eliminate
data redundancy and update abnormality.
2Normalization TheoryRefine database design
to eliminate abnormalities (irregularities) of
manipulating database
31NF, 2NF and 3NF
- Built around the concept of normal forms
- Normal form Contains atomic values only
- All normalized relations are in 1NF
- 2NF is the subset of 1NF, 3NF is the subset of
2NF and so on - 3NF is more desirable than 2NF, 2NF is more
desirable than 1NF
4BCNF, 4NF and 5NF(PJNF)
- Boyce-Codd Normal Form
- A stronger form of 3NF
- Every BCNF is also 3NF, but some 3NF are not BCNF
- 4NF and 5NF
- Defined recently
- Deal with multi-valued dependency (MVD) and join
dependency (JD)
5Relationship between Normal Forms
Universe of relations
1NF relations
2NF relations
3NF relations
BCNF relations
4NF relations
5NF/PJNF relations
6First Normal Form
- A relation is in 1NF if each attribute contains
only one value (not a set of values) - The primary key (PK) can not be null
7First Normal Form
S S-name Enrollments
S1 Brown C1 Math
S1 Brown C2 Chem
S1 Brown C3 Phys
S2 Smith C2 Chem
S2 Smith C3 Phys
S2 Smith C4 Math
S3 Brown C2 Chem
S3 Brown C3 Phys
Is this relation in 1NF?
Relation STUDENT-A
8First Normal Form
S S-name Enrollments
S1 Brown C1 Math
S1 Brown C2 Chem
S1 Brown C3 Phys
S2 Smith C2 Chem
S2 Smith C3 Phys
S2 Smith C4 Math
S3 Brown C2 Chem
S3 Brown C3 Phys
- NO!!!
- Elements in the domain Enrollments are not atomic
- Could be split into two domains C and C-Name
Relation STUDENT-B
9First Normal Form
- Enrollments is split into C and C-Name
- Use S and C as a compound PK
- A student may attend several courses and a course
may have several students - So S and C has a mn mapping
S S-Name C C-Name
S1 Brown C1 Math
S1 Brown C2 Chem
S1 Brown C3 Phys
S2 Smith C2 Chem
S2 Smith C3 Phys
S2 Smith C4 Math
S3 Brown C2 Chem
S3 Brown C3 Phys
Relation STUDENT-B
10Functional Dependency (FD)
- Attribute Y of relation R is functionally
dependent on attribute X of R ? each value of X
is associated with exactly one value of Y - Denoted by X ? Y
- In the relation STUDENT-B
- S ? S-Name
- C ? C-Name
- S, C ? 0
11Anomalies using 1NF
- 1NF relations require less complicated
application to operate as opposed to unnormalized
relations - Anomalies in insert
- Since PK is composed of C and S, both details
of student and course must be known before
inserting a entry - Eg to add a course, at least one student is
enrolled
12Anomalies using 1NF
- Anomalies in delete
- If all students attending a particular course are
deleted, the course will not be found in the
database - Anomalies in update
- Redundancy of S-Name and C-Name
- Increase storage space and effort to modify data
item - If a course is modified, all tuples containing
that course must be updated
13Second Normal Form
- A relation is in 2NF if it is in 1NF and every
non-PK attribute is fully functionally dependant
on the PK - In the relation STUDENT-B
- PK C, S
- Non-PK attribute C-Name, S-Name
- C, S ? S-Name
- S ? S-Name
- Since S-Name is only partially dependent on the
PK, relation Student-B is not in 2NF
14Second Normal Form
S S-Name
S1 Brown
S2 Smith
S3 Brown
C C-Name
C1 Math
C2 Chem
C3 Phys
C4 Math
S C
S1 C1
S1 C2
S1 C3
S2 C2
S2 C3
S2 C4
S3 C2
S3 C3
Relation SC
Relation STUDENT
Relation COURSE
- All of them are in 2NF as none of them has
partial dependency - Original information can be reconstructed by
natural join operation
15Anomalies in 2NF
- Suppose we have the relations PRODUCT, MACHINE
and EMPLOYEE - P ? M
- P ? E
- M ? E
- The tuple (P1, M1, E1) means product P1 is
manufactured on machine M1 which is operated by
employee E1
16Anomalies in 2NF
- Anomalies in insert
- It is not possible to store the fact that which
machine is operated by which employee without
knowing at least one product produced by this
machine - Anomalies in delete
- If an employee is fired the fact that which
machine he operated and what product that machine
produced are also lost
17Anomalies in 2NF
- Anomalies in update
- If one employee is assigned to operate another
machine then several tuples have to be updated as
well
18Third Normal Form
- A relation is in 3NF if it is in 2NF and no
non-PK attributes is transitively dependent on
the PK - In the manufacture relations
- P ? M and M ? E implies P ? E
- So P ? E is a transitive dependency
19Third Normal Form
P M E
P1 M1 E1
P2 M2 E3
P3 M1 E1
P4 M1 E1
P5 M3 E2
P6 M4 E1
P M
P1 M1
P2 M2
P3 M1
P4 M1
P5 M3
P6 M4
M E
M1 E1
M2 E3
M3 E2
M4 E1
- No loss of information
- Insert, delete and update anomalies are eliminated
R2
R1
MANUFACTURE
20Boyce/Codd Normal Form
- A relation is BCNF ? every determinant is a
candidate key - A determinant is an attribute, possibly
composite, on which some other attribute is fully
functionally dependent
21Boyce/Codd Normal Form
S J T
Smith Math Prof. White
Smith Physics Prof. Green
Jones Math Prof. White
Jones Physics Prof. Brown
- For each subject (J), each student (S) of that
subject taught by only one teacher (T) FD S, J
? T - Each teacher (T) teaches only one subject (J)
FD T ? J - Each subject (J) is taught by several teacher
MVD J? ? T
Relation SJT
- There exists a relation SJT with attributes S
(student), J (subject) and T (teacher). The
meaning of SJT tuple is that the specified
student is taught the specified subject by the
specified teacher.
22Boyce/Codd Normal Form
- There are two determinants (S, J) and T in
functional dependency - Anomalies in update
- If the fact that Jones studies physics is
deleted, the fact that Professor Brown teaches
physics is also lost. It is because T is a
determinant but not a candidate key
23Boyce/Codd Normal Form
S J
Smith Math
Smith Physics
Jones Math
Jones Physics
T J
Prof. White Math
Prof. Green Physics
Prof. Brown Physics
Relation TJ
Relation ST
Relations (S, J) and (T, J) are in BCNF because
all determinants are candidate keys.
24Multi-valued Dependency
- Given a relation R with attributes A, B and C.
The multi-valued dependence R.A ?? R.B holds ?
the set of B-values matching a given (A-value,
C-value) pair in R depends only on the A-value
and is independent of the C-value
25Fourth Normal Form
- A relation is in 4NF ? whenever there exists an
multi-valued dependence (MVD), say A ?? B, then
all attributes are also functionally dependent on
A, i.e. A ? X for all attribute X of the relation
26Fourth Normal Form
Course Teacher Text
Physics Prof. Green Basic Mechanics
Physics Prof. Green Principles of Optics
Physics Prof. Brown Basic Mechanics
Physics Prof. Brown Principles of Optics
Physics Prof. Black Basic Mechanics
Physics Prof. Black Principles of Optics
Math Prof. White Modern Algebra
Math Prof. White Projective Geometry
Relation CTX (not in 4NF)
27Fourth Normal Form
- A tuple (C, T, X) appears in CTX ? course C can
be taught by teacher T and uses X as a reference.
For a given course, all possible combinations of
teacher and text appear that is, CTX satisfies
the constraint if tuples (C, T1, X1), (C, T2,
X2) both appears, then tuples (C, T1, X2), (C,
T2, X1) both appears also
28Fourth Normal Form
- CTX contains redundancy
- CTX is in BCNF as there are no other functional
determinants - But CTX is not in 4NF as it involves an MVD that
is not an FD at all, let alone an FD in which the
determinant is a candidate key
29Anomalies in insert
- For example, to add the information that the
physics course uses a new text called Advanced
Mechanism, it is necessary to create three new
tuples, one for each of the three teachers.
30Fourth Normal Form
Course Teacher
Physics Prof. Green
Physics Prof. Brown
Physics Prof. Black
Math Prof. White
Course Text
Physics Basic Mechanics
Physics Principles of Optics
Math Modern Algebra
Math Projective Geometry
Relation CT
Relation CX
- 4NF is an improvement over BCNF, in that it
eliminates another form of undesirable structure
31Fifth Normal Form
- Join dependency relation R satisfies the JD (X,
Y,Z) ? it is the join of its projections on X,
Y,Z where X, Y,Z are subsets of the set of
attributes of R - A relation is in 5NF/PJNF (Projection-join normal
form) ? every join dependency in R is implied by
the candidate keys of R - 5NF is the ultimate normal form with respect to
projection and join
32Fifth Normal Form
S P J
S1 P1 J2
S1 P2 J1
S2 P1 J1
S1 P1 J1
S P
S1 P1
S1 P2
S2 P1
J S
J2 S1
J1 S1
J1 S2
P J
P1 J2
P2 J1
P1 J1
SP
Join over P
JS
PJ
Relation SPJ
S P J
S1 P1 J2
S1 P1 J1
S1 P2 J1
S2 P1 J2
S2 P1 J1
Join over (J, S)
- SPJ is the join of all of its three projections,
- not of any two!
Spurious
33Join Dependence constraint
- Condition JD(join dependence) in relation R(S,
P, J) - Constraint if R1(S, P), R2(P, J) and R3(J,
S) exists - then R(S, P, J) exists
34Connection Trap
- Condition Without JD(join dependence) in
relation (S, P, J) - Connect trap if R1(S, P), R2(P, J) and
R3(J, S) exists - then R(S, P, J) may not exist and R1, R2 and
R3 may not be able to be connected
35Abnomalies in insert with JD
- If insert (S1, P1, J2), (S1, P2, J1), and
- (S2, P1, J1)
- Then (S1, P1, J1) must also be inserted
- On the other hand, if one of (S1, P1, J2), (S1,
P2, J1) and (S2, P1, J1) is deleted, then (S1,
P1, J1) must also be deleted.
36Fifth Normal Form (5NF)
JS
PJ
SP
S P
S1 P1
S1 P2
S2 P1
J S
J2 S1
J1 S1
J1 S2
P J
P1 J2
P2 J1
P1 J1
37Steps in normalization
Unnormalized form
- Decompose all data structures that are not 2D
into 2D relations of segments - Eliminate any partial dependency
- Eliminate any transitive dependency
- Eliminate any remaining FD in which determinant
is not a candidate key - Eliminate any MVD
- Eliminate any JD that are implied by candidate
keys
1NF
2NF
3NF
BCNF
4NF
5NF/PJNF
38Lecture Summary
- The 1NF, 2NF, 3NF, BCNF, 4NF and 5NF are to split
the unnormalized table into normalized table(s),
and which can eliminate data redundancy and
update abnormality. The higher norm form implies
the lower norm form.
39Review Question
- Explain the differences between Third Normal Form
and Boyce Codd Normal Form with respect to
functional dependencies. - Why Boyce Codd is called Strong third normal
form? - How can one normalize relations of Third Normal
Form into Boyce Codd Normal Form?
40Tutorial Question
- Describe and derive the unnormal, first, second
and third normal form for the following unnormal
form including 12 data fields with 4 of them are
in repeating groups in a table. Identify the
functional dependencies of each normal form.
41Reading Assignment
- Chapter 10 Functional Dependencies and
Normalization for Relational Databases and
Chapter 11 Relational Database Design Algorithms
and Further Dependencies of Fundamentals of
Database Systems fifth edition, by Elmasri
Navathe, Pearson, 2007.