Title: CHAPTER 19 SCHEMA, REFINEMENT AND NORMAL FORMS
1CHAPTER 19SCHEMA, REFINEMENT AND NORMAL FORMS
2INTRODUCTION
- Having defined all the relational schemas that
we want in our database, the next step is to
refine them for so as to (near)optimize them with
respect to (memory) space and time (of response
to queries). - Much of this optimization can be performed by
the optimizer software within the DBMS, but the
actual schema cannot be modified by the
optimizer, and that is where important
improvement can be achieved as we shall find out
in this new chapter.
3REDUNDANCY AND ITS PROBLEMS (1)
- Consider the following instance of the
Hourly_Emps relation which we have seen earlier
It has been assumed in this relation that the
rating implies the hourly wages. This is a
special constraint called a functional dependency
(FD) which is denoted as follows rating ?
hourly_wages. As a result of this constraint, the
above relation contains a certain amount of
redundancy and creates what are called anomalies.
4REDUNDANCY AND ITS PROBLEMS (2)
- - Redundancy It will be seen, in the above
instance, that an hourly_wages of 10 for a
rating of 8 is specified three times similarly
an hourly_wages of 7 for a rating of 5 is
specified twice. - Such redundancy represent wasted memory space
i.e. poor design. - - Anomalies There are three kinds of anomalies
- (1) Update Anomalies If we want to change the
hourly_wages in the first tuple, we must perform
similar changes in the second and fifth tuples.
We could specify an integrity constraint for the
DBMS to perform such additional changes
automatically, but it still requires extra
(wasted) time to perform these additional
updates. - (2) Insertion Anomalies We cannot insert a (new)
employee tuple unless we know the hourly_wages
for his rating. - (3) Deletion Anomalies If we delete all tuples
with a given rating, we lose the information that
this rating corresponds to a certain
hourly_wages.
5REDUNDANCY AND ITS PROBLEMS (3)
- The Possibility of Using Null Values
- - The use of null values cannot resolve the
problems of redundant storage and update
anomalies. - - null value can sometimes help in insertion and
delete anomalies, at least when the null value is
not forced on a prime key. - - Since null values also introduce their own
problems it is preferable to avoid them if at all
possible.
6DECOMPOSITION
- - All the problems we have described will vanish
if we decompose our original relation into two
relations as shown below
N. B. (1) Note that some extra space is now
needed to accommodate the second relation and
that the ratings show up in two places, but the
net saving in a relation containing some hundreds
or thousand of tuples is quite significant. (2)
It should be immediately apparent that it is
generally more efficient to store entity sets
separately from the relations in which they
participate, unless certain other constraints are
present.
7PREVIEW OF THIS CHAPTER
- Our goal in this chapter is to develop the tools
that will - (1) permit us to determine when a relational
schema is subject to the redundancies and
anomalies we have just witnessed. - (2) allow us to refine such deficient schema by
suitable decomposition, thereby avoiding
redundancies and anomalies without loss of
information.
8THE PROGRAM FOR THIS CHAPTER
- In order to achieve our stated goals we shall
proceed as follows - (1) Define a new type of ICs called Functional
Dependencies (FD) which can be used to detect the
presence of redundancies and anomalies (R/A) in
relational schemas (henceforth we shall simply
say schemas). - (2) Define a number of normal forms which can be
used to classify relations with respect to their
liability to incorporate R/A. - (3) Introduce a decomposition technique that
permits us to replace a relations containing RA
by equivalent ones devoid of R/A. - (4) Discuss other potential problems (loss of FD
characterization) that may plague the resulting
relations, and how to deal with such problems
9FUNCTIONAL DEPENDENCIES
- - Functional Dependencies (FDs) are ICs that
characterize relations by generalizing the
concept of keys. - - A functional dependency, X?Y, characterizing a
relation R (where X and Y are sets of attributes
of R) states that the attribute set Y is
completely determined (we say functionally
determined) by the attribute set X. - - Thus if X is a candidate key (or a superkey) Y
is the entire set of attributes of R. - - There can be any number and any mixture of FDs
that characterize a relation. - - The FDs characterizing relations must be
specified as part of the semantics of these
relations. - - The presence of certain FDs in a relation is
responsible for the R/A plaguing that relation. - - Our job is to locate such FDs and render them
harmless.
10EXAMPLE OF FUNCTIONAL DEPENDENCIES
- Consider a relation Contract described by the
schema - Contracts (contractid, supplierid, projectid,
depid, partid, qty, value) and denoted CSJDPQV - where the meaning of a tuple is that the contract
identified by - contractid C is an agreement that supplier S
(supplierid) will supply - Q items of part P (partid) to project J (projid)
associated with - department D (deptid), the value V ot this
contract being (value). - 1. The fact that the key of this relation is C
is denoted by the FD - C?CSJDPQV which is really an abbreviation for
the seven FDs - (i) C?C, (ii) C?S, (iii) C?J, (iv) C?D, (v) C?P,
(vi) C?Q, (vii) C?V - N. B. The FD (i) C?C is called a trivial FD since
every attribute is trivially implied by itself. - Now suppose we wish to add the following company
rules - 2. A project purchases a given part using a
single contract JP?C. - 3. A department purchases at most one part from a
supplier SD?P. - Thus relation C is characterized by the set of
FDs C? CSJDPQV, JP? C, SD? P
11CLOSURE OF A SET OF FDs ARMSTRONG AXIOMS
- - The set of all FDs implied by a given set F of
FDs is called the - closure of F, denoted F.
- - In order to check for the presence of R/A we
need to ascertain the possible - presence of other FDs implied by those stated
explicitly. - This means that we have to calculate the closure
F. - - This may be done utilizing the Armstrong Axioms
which may be stated as follows letting X, Y, Z
denote sets of attributes of a relation R, - Reflexivity If X?Y (i.e. X contains Y) then X?Y.
(This rule really generates only trivial FDs). - Augmentation If X?Y, then XZ?YZ for any Z.
- Transitivity If X?Y and Y?Z, then, X?Z.
- It may be proven that
- (1) Armstrongs axioms are sound, i.e. they
generate only FDs in F. - (2)Armstrongs axioms are complete, i. e. they
generate all the FDs in F. - It is convenient to add the following additional
rules which may even be considered as denotation
rules - Union If X?Y and X?Z, then X?YZ.
- Decomposition If X?YZ, then X?Y and X?Z.
- N. B. Note that these axioms do not imply that
you may cancel attributes appearing on both
sides. Thus if AB? BC, then you may not
conclude that A?B.
12EXAMPLES OF APPLICATION OF ARMSTRONG AXIOMS (1)
- Consider the relation ABC with FDs (i) A?B and
(ii) B?C. - 1. From Reflexivity we get all the trivial FDs
which are of the form - X?Y, where Y ? X, X ? ABC and Y ? ABC.
- 2. Applying transitivity to (i) and (ii) we get A
? C. - 3. From augmentation we get
- AC ? BC, AB ? AC, AB ?CB.
- Thus the closure of the set F of given FDs is
(apart from trivial FDs) - F A?B, B?C, A ? C, AC ? BC, AB ? AC, AB
?CB
13EXAMPLES OF APPLICATION OF ARMSTRONG AXIOMS (2)
- Consider the previous relation Contracts that is
characterized by the set of FDs (i) C ?
CSJDPQV, (ii) JP ?C, (iii) SD ? P. - 1. From (ii), (i), and transitivity we get (iv)
JP ? CSJDPQV. - 2. From (iii) and augmentation we get (v) SDJ ?
JP. - 3. From (v), (iv) and transitivity we get (vi)
SDJ ? CSDJPQV. - 4. From (i) and decomposition we can get
- C ?S, C ? J, C ? D, C ? P,C ? Q, C ? V.
- N. B. We have not included trivial dependencies
in the above - derivations.
14ATTRIBUTE CLOSURE
- Constructing the closure of a set of FDs may be
fairly laborious. It - may be avoided when one wishes to check what are
the possible - right-hand sides of an FD X ? Y, for a given X,
by means of the - following algorithm which calculates the
so-called attribute closure, - denoted X, of a set X A1, A2, ,An of
attributes, with respect - to the set F of FDs.
- 1. Let X be a set of attributes that eventually
will become the closure. First we initialize X to
be A1, A2, ,An. - 2. We repeatedly search for some FD B1 B2 Bm ?
C such that all of B1 B2 Bm are in the set of
attributes X, but C is not. We then add C to the
set X. - 3. Repeat step 2 as many times as necessary until
no more new - attributes can be added to X.
- 4. The final set X is the correct value of A1,
A2, ,An.
15EXAMPLE OF ATTRIBUTE CLOSURE COMPUTATION
- Given the previous Contracts relation
characterized by the FDs - (i) C ? CSJDPQV, (ii) JP?C, (iii)
SD?P - Suppose we wish to get the attribute closure of
JP, i.e. (JP) - 1. Initialize the closure (X) as JP.
- 2. (i) does not satisfy the requirement that the
left side be in JP. - (ii) does, therefore we set (X) (X) ? C
JPC. - (iii) does not.
- We now repeat step 2
- 2. (i) now does satisfy the requirement that the
left side be in JP, therefore we set (X) (X)
? CSJDPQV JPCSDQV. - (ii) and (iii) add nothing new. Repeating step 2
does not change (X). - Therefore we stop having obtained (JP)
JPCSDQV.
16NORMAL FORMS (1)
- The following Normal Forms are used in the RMD to
characterize relations - First Normal Form (1NF).
- Second Normal Form (2NF).
- Third Normal Form (3NF).
- Boyce-Codd Normal Form (BCNF).
- Later we shall also consider
- Fourth Normal Form (4NF)
- Fifth Normal Form (5NF)
17NORMAL FORMS (2)
- First Normal Form (1NF)
- - A relation is said to be in 1NF if it has only
single-valued attributes. - (This is such a pillar of the RMD that we shall
assume it is always true for all the relations we
encounter). - Second Normal form (2NF)
- - A relation is said to be in 2NF if it is in 1NF
and contains partial dependencies - (Actually, 2NF is really of historic interest
only). - Third Normal Form (3NF)
- - A relation is said to be in 3NFif it is in 2NF
and, for every FD X?A in F, one of the following
statements is true - A?X i. e. it is a trivial FD, or
- X is a superkey (a superkey is a set of
attributes which contains a key), or - A is part of some key for R.
- Boyce-Codd Normal Form (BCNF)
- - A relation is said to be in BCNF if it is in
2NF and one of the following two statements is
true - A?X i. e. it is a trivial FD, or
- X is a superkey.
18NORMAL FORMS (3)
- The successive containment of Normal Forms may be
illustrated by the following diagram
BCNF
3NF
2NF
1NF
19DIAGRAMATIC VIEW OF FDs
PARTIAL DEPENDENCIES ? IN 2NF, BUT NOT 3NF
TRANSITIVE DEPENDENCIES ? IN 3NF BUT NOT IN BCNF
X IS A SUPERKEY ? IN BCNF
20DECOMPOSITION THE REQUIREMENTS
- - In order to remove the R/A from a relation it
is necessary to decompose it into two or more
relations utilizing one of the FDs responsible
for the R/A as a pivot point of this
decomposition. - - The decomposition must be lossless, i.e. the
final relations must contain exactly the
information contained in the original relation,
without losing any nor adding any other ones. - - The decomposition must be dependency-preserving,
i.e. the final relations must be characterized
by the same FDs as the original relation so as to
permit their verification when modifications are
requested. - - The above demands lead to the requirement that
the final relations be in BCNF if at all
possible, or in 3NF otherwise. - - We shall now examine some examples drawn from
odd-numbered problems in the text.
21THE DECOMPOSITION ALGORITHM
- 1. Check which normal form the relation is in.
- 2. Locate the FDs responsible for it not being in
BCNF. - 3. Select one of the responsible FDs.
- 4. Decompose the relation using the selected
responsible FD, say A?B, as the pivot of the
decomposition the decomposition yields two
separate relations, say R1 and R2R1 has for
attributes all the attributes occurring in the
pivot FD i. e. R1(AB) and no other R2 has for
attributes the set A. - 5. Check whether the resulting relations are in
BCNF. If they are, go to step 6 otherwise repeat
from step 2 on. - 6. Check whether the final relations are
dependency preserving. If they are, stop if they
are not, you may have several choices - (i) The problem may disappear if you select some
alternate responsible FD try all permissible
combinations. - (ii) If the problem persist you may choose to
leave it in 3NF, or - (iii) you may decide to add some other relation
to those you have obtained in order to preserve
dependencies,
22EXAMPLE 1 (19.7-1)
- Consider a relation R (A B C D) with FDs
- (i) C ? D, (ii) C ? A, (iii) B ? C
- - The only candidate key is B.
- - (i) and (ii) are partial dependencies, so R is
not even in 2NF. - - pick (ii) as the pivot the decomposition is
now as follows - R (A B C D )
C ?A
R1(CA) R2(BCD) with C?A
with C?D and B?C
R2 is not in BCNF ?decompose with (ii) as pivot.
C?D
R21(CD) R22(BC) with C?D
with B?C
R1, R21, and R22 are now in BCNF ?lossless
decomposition and dependencies are preserved.
23EXAMPLE 1 (19.7-1) (contd)
- Suppose we now use (i) as the first pivot. We
would get - R(ABCD)
C?D
R1(CD) R2(ABC) with
C?D with C?A and B?C
B?C
R21(CA) R22(BC) with C?A
with B?C
Thus, in this case, the final results are similar.
24EXAMPLE 2 (19.7-3)
- Consider R(ABCD) with (i) ABC?D and (ii) D?A.
- -The candidate key is (ABC)
- - R is in 3NF because of (ii)
- - Decomposing R using (ii) as pivot
- R(ABCD)
D?A
R1(DA) R2(BCD) with
D?A
The final relations R1 and R2 are both in BCNF.
However, the decomposition is not
dependency-preserving for FD (i) is not
preserved whenever we make a change in the
database we have to re-joinR1 and R2 to be able
to verify that (i) is preserved. Therefore we
have the following choice (a) leave R in 3NF as
it was originally with its R/A or (b) decompose
it into R1 and R2 and join them whenever, there
is a change.
25EXAMPLE 3 (19.7-5)
- Consider R(ABCD) with (i) AB?C, (ii) AB?D, (iii)
C?A, (iv) D ?B - - The candidate keys are now AB, BC, CD, AD.
- - Because of (iii) and (iv), R is in 3NF but not
in BCNF. - - Decompose using (iii) as pivot
- R(ABCD)
C?A
R1(CA) R2(BCD) with
C?A with D?B
D?B
R21(DB) R22(CD) with
D?B
The three final relations are in BCNF, but the
decomposition is not dependencypreserving since
(i) and (ii) cannot be associated with any of the
three final relations. The text suggests adding
ABC and ABD to R1, R21 and R22. That is an
expensive solution!
26EXAMPLE 3 (19.10-1)
- Consider R(ABCD) with (i) B?C, (ii) D?A.
- Questions (a) Candidate keys? (b) Is
decomposition into BC and AD good or bad? Why? - Answers (a) BD
- (b) Both (i) and (ii) are partial dependencies.
Thus R is in 2NF. - Therefore a decomposition is in order.
- - Following our decomposition algorithm, we
should decompose using either (i) or (ii) as
pivots. - - Using (i) as pivot we get R(ABCD)
B?C
R1(BC) R2(ABD) with B?C
with D?A
D?A
This decomposition is good It is lossless and
dependency- preserving.
R21(DA) R22(BD) with D?A
27EXAMPLE 3 (19.10-1) (contd)
- However, decomposition into BC and AD is
unsatisfactory because it - is not lossless it is lossy! Indeed when we
join BC and AD we get - their cartesian product because they have no
attribute in common, - and cartesian products are generally much larger
than join. - Example Let R
A B C D
a1 b1 c1 d1
a2 b2 c2 d2
R(BC)
B C
b1 c1
b2 c2
R(AD)
A D
a1 d1
a2 d2
Not equal
R(AD) ? R(BC)
A B C D
a1 b1 c1 d1
a1 b1 c2 d2
a2 b2 c1 d1
a2 b2 c2 d2
28LOSSLESS DECOMPOSITIONS
- A useful test for lossless decomposition is given
in the text by - Theorem 3 (p.620)Let R be a relation and F be a
set of FDs that hold over R. The decomposition
into relations with attribute sets R1 and R2 is
lossless if and only if F contains - either the FD R1 n R2 ? R1 or the FD R1 n R2 ?
R2. - This theorem yields the following lemma which we
have been using in our decomposition - Lemma (p. 620) If an FD X?Y holds over a
relation R and X n Y is empty, the decomposition
of R into R Y and XY is lossless. - N.B. Do not assume that a lossless decomposition
and a decomposition yielding BCNF relations are
synonymous.
29EXAMPLE 3 (19.10-2)
- Consider R(ABCD) with (i) AB?C, (ii) C?A,
(iii) C?D. - Questions (a) Candidate keys (b) Is
decomposition into - R1 ACD and R2 BC
- good or not? Why?
- Answers
- (a) Candidate keys AB, BC.
- (b) Consider the losslessness decomposition test
- Here R1 ACD and R2 BC ?R1 n R2 C thus R1 n
R2 ? R1 - corresponds to C ? ACD which is in F since it is
derivable from - C?C (trivial FD), C?A (ii), C?D (iii).
- Thus theorem 3 is satisfied and the proposed
decomposition is lossless. - - Note that this is a BCNF decomposition because
- - R1 is characterized by the FDs (ii) C?A and
(iii) C?D so that C - is the key in R1 and the two FDs follow BCNF
requirements. - - R2 has the key BC and thus also satisfies BCNF
conditions,
30EXAMPLE 3 (19.10-2) )contd)
- The decomposition is thus as follows
-
R(ABCD) - with AB?C, C?A, C?D
N.B. No pivot FD.
R1(ACD) R2(BC)
with C?A and C?D
However, the decomposition is not
dependency-preserving since the FD AB?C cannot
be included in either relation. The text suggests
adding the relation ABC to avoid repetitive
construction of joins to check for violations of
the missing FD.
31DEPENDENCY-PRESERVING DECOMPOSITIONS
- - Definition of the projection of a set F of
dependencies - Let R be a relation decomposed into two schemas
with attribute sets - X and Y, and let F be a set of FDs over R.
- The projection of F on X is the set of FDs in F
that involve only - attributes in X. It is denoted as FX .
- - Definition of dependency-preserving
decomposition - The decomposition of relation R with the set F of
FDs into two relations with attribute sets X and
Y is dependency-preserving if - (FX ? FY) F .
32ANOTHER LOOK AT DECOMPOSITION
- So far we have seen lossless-join decomposition
that were not dependency-preserving, The opposite
can also easily occur dependency-preserving
decomposition which are not lossless-join. - Example Consider the relation R (ABC) with FD
A?B. - If we decompose it into R1(AB) and R2(BC), we
have a dependency-preserving decomposition,
however it is not lossless-join. - Note that has the key AC
- To find a lossless-join decomposition we can use
our pivot point algorithm -
R(ABC) -
with A?B
A?B
R1(AB) R2(AC)
with A?B
33DECOMPOSITION INTO 3NF (1)
- The decomposition techniques we have developed
for BCNF also work for 3NF. However, here we can
develop an algorithm to decompose a relation into
a collection of 3NF relations which are both
lossless-join and dependency-preserving. - The desired algorithm depends on the use of a
minimal cover for a set of FDs, which is defined
as follows - Definition
- A minimal cover for a set F of FDs is a set G of
FDs such that - 1. Every FD in G is of the form X?A, where A is a
single attribute. - 2. The closure of F and G are equal F G .
- 3. If we obtain a se H of FDs from G by deleting
one or more FDs, or by deleting attributes from
an FD in G, then, F ? H.
34DECOMPOSITION INTO 3NF (2)
- Example
- Let F (i) A?B, (ii) ABCD?E, (iii) EF?G, (iv)
EF?H, (v) ACDF?EG. - 1. rewrite (v) as (v) ACDF?E, (vi) ACDF?G.
- 2. Consider (vi). It is implied by (i), (ii), and
(iii) ?it can be deleted. - Similarly we ca delete (v).
- 3. Consider (ii) since (i) holds, we can replace
(ii) with (vii) ACD?E - Finally, a minimal cover for F is the set
- A?B, ACD?E, EF?G, EF?H.
- These operations are easily transformed into an
algorithm which appears on p. 626. - We can now give an algorithm for putting
relations into 3NF with lossless-join and
dependency preservation.
35DECOMPOSITION INTO 3NF (3)
- Algorithm for obtaining a lossless-join,
dependency preserving - decomposition into 3NF relations from a relation
R with a set F of - FDs that is a minimal cover
- Let R1, R2, , Rn be the desired decomposition
where Ri is in 3NF and let Fi denote the
projection of F onto the attributes of Ri. - 1.Identify the set N of FDs in F that is not
preserved, i. e. not included in the closure of
the union of Fis. - 2.For each FD X?A in N, create a relation XA and
add it to the decomposition of R.
36EXAMPLE OF DEPENDENCY PRESERVATION
- Example
- - Consider a relation R (A, B, C) with F A ?
B , B ? C, C ? A - Now, we decompose R into two relations R1 and
R2 using A ? B - R1 (A, B) with FD A ? B and
R2 (B, C) with FD B ? C - Question Is this decomposition
dependency-preserving? - - At first sight, it may appear that the answer
is 'no' for the FD, C ? A, is associated with
neither R1 nor R2. - - However, if we apply the test described above,
we find - FAB A ? B , B ? A and FBC B ? C
, C ? B , - as the second element in each set is implied by
application of transitivity to the original FDs. - Thus, (FAB ? FBC) F. Therefore, the
decomposition is dependency-preserving. -
- - But then, where is the FD, C ? A, in the above
decomposition? The answer is that we should have
really written the decomposed schemas as - R1 (A, B) with FAB A ? B, B ? A and R2
(B, C) with FBC B ? C, C ? B - and C ? A is automatically preserved when the
other FDs are preserved.