Title: Normalization 1NF, 2NF, 3NF, BCNF
1Normalization 1NF, 2NF, 3NF, BCNF
2Outline
- Introduction
- Nonloss Decomposition and Functional Dependencies
- First, Second, and Third Normal Forms
- Dependency Preservation
- Boyce/Codd Normal Form
- A Note on Relation-Valued Attributes
3Normalization
- Normalized and 1 NF are the same thing typically
normalized refers incorrectly to 3NF - Normalization helps control redundancy
- Normalization is reversible i.e. nonloss, or
information preserving - Six normal forms are discussed 1 through 5, and
Boyce-Codd Normal Form (BCNF), which is an
improvement on 3NF
4Levels of Normalization
5Nonloss Decomposition and Functional Dependencies
6Overview
- Nonloss decomposition the normalization
procedure involves decomposing a given relvar
into other relvars, and moreover that the
decomposition is required to be reversible, so
that no information is lost in the process - Example
- Case a is an nonloss decomposition
- If you join SST and SC back together again, you
get back to S - Case b is a lossy decomposition
- Some additional spurious tuples will appear
7Sample value for relvar S and two corresponding
decompositions
8Nonloss Decomposition and Functional Dependencies
- Normalization uses a process of projection to
decompose relvars - Re-composition is a process of joins
- Heaths theorem Let RA, B, C be a relvar,
where A, B, and C are sets of attributes. If R
satisfies the FD A?B, then R is equal to the join
of its projection on A, B and A, C - The decomposition of relvar R into projections
R1Rn is nonloss if R the join of R1Rn
9First, Second, and Third Normal Forms
- Throughout this section, we assume for simplicity
that each relvar has exactly one candidate key,
which we further assume is the primary key
10Third Normal Form
- A relvar is in 3NF if and only if the nonkey
attributes are both mutually independent and
irreducibly dependent on the primary key - Two or more attributes are mutually independent
if none of them is FD on any combination of the
others. Such independence implies that each
attribute can be updated independently of the
rest - Example the parts relvar P is in 3NF
- A relvar is in 3NF if and only if, for all time,
each tuple consists of a primary key value that
identifies some entity, together with a set of
zero or more mutually independent attribute
values that describe that entity in some way
11First Normal Form
- A relvar is in 1NF if and only if in every legal
value of that relvar, every tuple contains
exactly one value for each attribute - In this way, relvars are always in 1NF
- A relvar in 1NF may display functional
dependencies other than those emanating from the
primary key - Such non-primary-key dependencies promote a
miasma of update anomalies difficulties with
the update operations INSERT, DELETE, and UPDATE
12Example
- FIRST S, STATUS, CITY, P, QTY PRIMARY KEY
S, P - One more constraint CITY ? STATUS
- A suppliers status is determined by the location
of that supplier (e.g. all London suppliers must
have a status of 20) - Non3NF diagram has arrows out of candidate keys
together with certain additional arrows
13Sample Value for relvar FIRST
Note relvar FIRST violates both conditions a and
b in the preliminary 3NF definition the nonkey
attributes are not all mutually independent, and
they are not all irreducible dependent on the
primary key
14The Suppliers-and-Parts Database (Sample Values)
S (suppliers) and P (parts) are entities, and SP
(shipment) is a relationship between S and P (a
relationship is a special case of an entity) In
RDB, entities and relationships are also
represented in the same uniform way
15Update Anomalies For FIRST
- Focus on the FD S ? CITY
- INSERT we cannot insert the fact that a
particular supplier is located in a particular
city until that supplier supplies at least one
part (e.g. supplier S5 is located in Athens) - DELETE if we delete the sole FIRST tuple for a
particular supplier, we delete not only the
shipment concerning that supplier to a particular
part but also the information that the supplier
is located in a particular city (e.g. the S3
tuple) - UPDATE the city value for a given supplier
appears in FIRST many times, in general
16Solution for FIRST
- The real problem is that relvar FIRST contains
too much information (shipment, and supplier) all
bundled together - If we delete a tuple, we delete too much
- The solution to this problem is to un-bundle
- Replace relvar FIRST by the two relvars
- SECOND S, STATUS, CITY
- SP S, P, QTY
- The decomposition of FIRST into SECOND and SP
eliminates the dependencies that were not
irreducible - The attribute CITY in FIRST did not describe the
entity described by the primary key instead, it
described the supplier involved in that shipment
17Relvars SECOND and SP
18Step Summarization
- The first step in the normalization procedure is
to take projections to eliminate
non-irreducible FD - Before normalization
- R A, B, C, D PRIMARY KEY A, B /
assume A ? D holds / - After normalization
- R1 A, D PRIMARY KEY A
- R2 A, B, C PRIMARY KEY A, B
FOREIGN KEY A REFERENCES R1
19Second Normal Form
- A relvar is in 2NF if and only if it is in 1NF
and every nonkey attribute is irreducibly
dependent on the primary key - Assumes only one candidate key
- A relvar in 2NF is less susceptible to update
anomalies, but may still exhibit transitive
dependencies - Both attributes in a transitive dependency are
irreducibly implied by the primary key, and each
implies the other
20Update Anomalies For SECOND
- Focus on the FD CITY ? STATUS
- S ? CITY CITY ? STATUS, then S ? STATUS
- INSERT we cannot insert the fact that a
particular city has a particular status until we
have some suppliers actually located in that city
(e.g. city ROME has a status of 50) - DELETE if we delete the sole SECOND tuple for a
particular city, we delete not only the supplier
concerned but also the information that that
city has that particular status (e.g. the S5
tuple) - UPDATE the status value for a given city appears
in SECOND many times, in general
21Solution for SECOND
- The real problem is that relvar SECOND contains
too much information (supplier, and city) all
bundled together - If we delete a tuple, we delete too much
- The solution to this problem is to un-bundle
- Replace relvar SECOND by the two relvars
- SC S, CITY
- CS CITY, STATUS
- The decomposition of SECOND into SC and CS
eliminates the transitive dependencies - The attribute STATUS in SECOND did not describe
the entity described by the primary key instead,
it described the city involved in which that
supplier happened to be located
22Relvars SC and CS
23Step Summarization
- The second step in the normalization procedure is
to take projections to eliminate transitive FD - Before normalization
- R A, B, C PRIMARY KEY A / assume
B ? C holds / - After normalization
- R1 B, C PRIMARY KEY B
- R2 A, B PRIMARY KEY A FOREIGN KEY
B REFERENCES R1
24Third Normal Form
- A relvar is in 3NF if and only if it is in 2NF
and every nonkey attribute is nontransitively
dependent on the primary key - Assumes only one candidate key
25Dependency Preservation
- It is frequently the case that a given relvar can
be nonloss-decomposed in a variety of different
ways - Example
- SC S, CITY CS CITY, STATUS
- SC S, CITY SS S, STATUS
- Dependency preservation refers to a specific case
of nonloss decomposition, such that the
normalized relvars are independent of each other - Updates can be made to either one without regard
for others
Which decomposition is better ?
26Dependency Preservation (Cont.)
- Projections R1 and R2 of a relvar R are
independent in the foregoing sense if and only if
both of the following are true - Every FD in R is a logical consequence of those
in R1 and R2 - The common attributes of R1 and R2 form a
candidate key for at least one of the pair - More on dependency preservation pp. 366-367
27Boyce/Codd Normal Form
28Overview
- Codds original definition of 3NF did not
adequately deal with the case of a relvar that - Had two or more candidate keys, such that
- The candidate keys were composite, and
- The are overlapped (i.e. had at least one
attribute in common) - Definition of Boyce/Codd normal form
- A relvar is in BCNF if and only if every
nontrivial, left-irreducible FD has a candidate
key as its determinant - A relvar is in BCNF if and only if every
determinant is a candidate key (informal
definition) - The only arrows in the FD diagram are arrows out
of candidate keys
29Overview (Cont.)
- Examples
- FIRST and SECOND, which were not in 3NF, are not
in BCNF either - SP, SC, and CS, which were in 3NF, are also in
BCNF - Another example involving two disjoint
candidate keys - Suppose S and SNAME are both candidate keys, and
STATUS and CITY are mutually independent ? BCNF
30BCNF Example I
- SSP S, SNAME, P, QTY
- Candidate key S, P , SNAME, P
- Non-BCNF determinants S and SNAME are not
candidate keys - Suffer the same kind of problems as FIRST and
SECOND did - SSP is in 3NF by the old definition (not the
simplified version) - That definition did not require an attribute to
be irreducibly dependent on each candidate key if
it was itself a component of some candidate key
of the relvar - The fact that SNAME is notirreducibly dependent
on S, P was ignored
31Solution for SSP
- Break SSP into two projections
- SS S, SNAME
- SP S, P, QTY
- OR
- SS S, SNAME
- SP SNAME, P, QTY
S
P
QTY
SNAME
P
32BCNF Example II
- SJT (student, subject, teacher)
- The meaning of an SJT tuple (s, j, t) is that a
student s is taught subject j by teacher t - Constraints
- For each subject, each student of that subject is
taught by only one teacher - Each teacher teaches only one subject (but one
subject is taught by several teachers)
33BCNF Example II (Cont.)
- SJT (Cont.)
- Two overlapping candidate keys, S, J and S,
T - SJT is in 3NF and not BCNF
- Attribute T is a determinant but not a candidate
key - Solution for SJT break SJT into
- ST S, T and TJ T, J
34BCNF Example II (Cont.)
- However, the two projections ST and TJ are not
independent - FD S, J ? T cannot be deduced from the FD T?J
- Example an attempt to insert a tuple for Smith
and Prof. Brown into ST must be rejected Why ??? - The twin objectives of (a) decomposing a relvar
into BCNF components, and (b) decomposing it into
independent components, can occasionally be in
conflict
35BCNF Example III
- EXAM
- The meaning of an EXAM tuple (s, j, p) is that a
student s was examined in subject j and achieved
position p - Constraint no two students obtained the same
position in the same subject - Two overlapping candidate keys, S, J , and
J, P - They are the only determinants
- EXAM is in BCNF
36Relation-Valued Attributes
- A relation may include attributes whose values
are relations - Traditionally this would be seen to violate 1NF,
which was held to prohibit repeating groups - Now they are theoretically sound, but in practice
you should avoid them because they have
complicated predicates
GET S for suppliers who supply part P1 ( SPQ
WHERE TUPLE P P (P1) ? PQ P ) S
GET P for parts supplied by supplier S1 ( SPQ
WHERE S S (S1) ) UNGROUP PQ ) P
Matters are worse for update operations
37Conclusions
- The whole point of normalization theory is to try
to identify commonsense principles and formalize
them - In this way, we can mechanize those principles by
an algorithm - BCNF and indeed 5NF is always achievable