CS 541 Keys, Dependencies, and Relational Normalization - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

CS 541 Keys, Dependencies, and Relational Normalization

Description:

Important point: 'key' in a relation refers to tuples, not the entities ... Or, we could assert some FD's and deduce one or more keys by the formal definition. ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 40

Provided by: clif8

Category:

more less

Transcript and Presenter's Notes

Title: CS 541 Keys, Dependencies, and Relational Normalization

1
CS 541Keys, Dependencies,and Relational
Normalization

September 11, 2007

2
Functional Dependencies

X ? A assertion about a relation R that
whenever two tuples agree on all the attributes
of X, then they must also agree on attribute A
Why do we care?
Knowing functional dependencies provides a formal
mechanism to divide up relations (normalization)
Saves space
Prevents storing data that violates dependencies

3
Example

Drinkers(name, addr, beersLiked, manf,
favoriteBeer)
Reasonable FD's to assert
1. name ? addr
2. name ? favoriteBeer
3. beersLiked ? manf

Shorthand combine FD's with common left side by
concatenating their right sides.
Sometimes, several attributes jointly determine
another attribute, although neither does by
itself. Example
beer bar ? price

5
Keys of Relations

K is a key for relation R if
1. K ? all attributes of R. (Uniqueness)
2. For no proper subset of K is (1) true.
(Minimality)
If K at least satisfies (1), then K is a
superkey.
Conventions
Pick one key underline key attributes in the
relation schema.
X, etc., represent sets of attributes A etc.,
represent single attributes.
No set formers in FDs, e.g., ABC instead ofA,
B, C.

6
Example

Drinkers(name, addr, beersLiked, manf,
favoriteBeer)
name, beersLiked FDs all attributes, as seen.
Shows name, beersLiked is a superkey.
name ? beersLiked is false, so name not a
superkey.
beersLiked ? name also false, so beersLiked not
a superkey.
Thus, name, beersLiked is a key.
No other keys in this example.
Neither name nor beersLiked is on the right of
any observed FD, so they must be part of any
superkey.
Important point key in a relation refers to
tuples, not the entities they represent. If an
entity is represented by several tuples, then
entity-key will not be the same as relation-key.

7
Example 2
Lastname Firstname Student ID
Major Key Key (2
attributes)
Superkey
Note There are alternate keys

Keys are Lastname, Firstname and StudentID

8
Who Determines Keys/FDs?

We could assert a key K.
Then the only FDs asserted are that K ? A for
every attribute A.
No surprise K is then the only key for those
FDs, according to the formal definition of
key.
Or, we could assert some FDs and deduce one or
more keys by the formal definition.
E/R diagram implies FDs by key declarations and
many-one relationship declarations.
Rule of thumb FDs either come from keyness,
many-1 relationship, or from physics.
E.g., no two courses can meet in the same room
at the same time yields room time ? course.

9
Functional Dependencies (FDs)and Many-One
Relationships

Consider R(A1,, An) and X is a keythen X ? Y
for any attributes Y in A1,, Aneven if they
overlap with X. Why?
Suppose R is used to represent a many ? one
relationship E1 entity set ? E2 entity
setwhere X key for E1, Y key for E2,Then, X ? Y
holds,And Y ? X does not hold unless the
relationship is one-one.
What about many-many relationships?

10
Inferring FDs

And this is important because
When we talk about improving relational designs,
we often need to ask does this FD hold in this
relation?
Given FDs X1? A1, X2 ? A2,, Xn ? An, does FD Y
? B necessarily hold in the same relation?
Start by assuming two tuples agree in Y. Use
given FDs to infer other attributes on which
they must agree. If B is among them, then yes,
else no.

11
Algorithm

Define Y closure of Y set of attributes
functionally determined by Y
Basis YY.
Induction If X ? Y, and X ? A is a given FD,
then add A to Y.
End when Y cannot be changed.

12
Example

A ? B, BC ? D.
A AB.
CC.
(AC) ABCD.

13
Given Versus Implied FDs

Typically, we state a few FDs that are known to
hold for a relation R.
Other FDs may follow logically from the given
FDs these are implied FDs.
We are free to choose any basis for the FDs of R
a set of FDs that imply all the FDs that hold
for R.

14
Finding All Implied FDs

Motivation Suppose we have a relation ABCD with
some FDs F. If we decide to decompose ABCD into
ABC and AD, what are the FDs for ABC, AD?
Example F AB ? C, C ? D, D ? A. It looks like
just AB ? C holds in ABC, but in fact C ? A
follows from F and applies to relation ABC.
Problem is exponential in worst case.

15
Algorithm

For each set of attributes X compute X.
But skip X ?, X all attributes.
Add X ? A for each A in XX.
Drop XY ? A if X ? A holds.
Consequence If X is all attributes, then there
is no point in computing closure of supersets of
X.
Finally, project the FDs by selecting only those
FDs that involve only the attributes of the
projection.
Notice that after we project the discovered FDs
onto some relation, the eliminated FDs can be
inferred in the projected relation.

16
Example

F AB ? C, C ? D, D ? A. What FDs follow?
A A BB (nothing).
CACD (add C ? A).
DAD (nothing new).
(AB)ABCD (add AB ? D skip all supersets of
AB).
(BC)ABCD (nothing new skip all supersets of
BC).
(BD)ABCD (add BD ? C skip all supersets of
BD).
(AC)ACD (AD)AD (CD)ACD (nothing new).
(ACD)ACD (nothing new).
All other sets contain AB, BC, or BD, so skip.
Thus, the only interesting FDs that follow from
F areC ? A, AB ? D, BD ? C.

17
Example 2

Set of FDs in ABCGHI
A ? BA ? CCG ? HCG ? IB ? H
Compute (CG), (BG), (AG)

18
Example 3

In ABC with FDs A ? B, B ? C, project onto AC.
A ABC yields A ? B, A ? C.
B BC yields B ? C.
AB ABC yields AB ? C
drop in favor of A ? C
AC ABC yields AC ? B
drop in favor of A ? B
C C and BC BC adds nothing.
Resulting FDs A ? B, A ? C, B ? C.
Projection onto AC A ? C.

19
CS 541Normalization

September 11, 2007

20
FDs Armstrongs Axioms

Reflexivity
If B1, B2, , Bm ? A1, A2, , An ? A1A2An
? B1B2Bm
Also called trivial FDs
Augmentation
A1A2An ? B1B2Bm ?A1A2AnC1C2Ck ?
B1B2BmC1C2Ck
Transitivity
A1A2An ? B1B2Bm and B1B2Bm ? C1C2Ck ?
A1A2An ? C1C2Ck

21
Review Functional Dependencies

In ABC with FDs A ? B, B ? C, project onto AC.
A ABC yields A ? B, A ? C.
B BC yields B ? C.
AB ABC yields AB ? C drop in favor of A ? C.
AC ABC yields AC ? B drop in favor of A ? B.
C C and BC BC adds nothing.
Resulting FDs A ? B, A ? C, B ? C.
Projection onto AC A ? C.

22
Normalization

Goal BCNF Boyce-Codd Normal Form
all FDs follow from the fact key ?
everything.
Formally, R is in BCNF if for every nontrivial FD
for R, say X ? A, then X is a superkey.
Nontrivial right-side attribute not in left
side.
Why?
1. Guarantees no redundancy due to FDs.
2. Guarantees no update anomalies one
occurrence of a fact is updated, not all.
3. Guarantees no deletion anomalies valid fact
is lost when tuple is deleted.

23
Example of Problems

Drinkers(name, addr, beersLiked, manf,
favoriteBeer)
FDs
1. name ? addr
2. name ? favoriteBeer
3. beersLiked ? manf
???s are redundant, since we can figure them out
from the FDs.
Update anomalies If Janeway gets transferred to
the Intrepid,will we change addr in each of her
tuples?
Deletion anomalies If nobody likes Bud, we lose
track of Buds manufacturer.

Each of the given FDs is a BCNF violation
Key name, beersLiked
Each of the given FDs has a left side that is a
proper subset of the key.
Another Example
Beers(name, manf, manfAddr).
FDs name ? manf, manf ? manfAddr.
Only key is name.
Manf ? manfAddr violates BCNF with a left side
unrelated to any key.

25
CS 541Normalization

September 13, 2007

26
Decomposition to Reach BCNF

Setting relation R, given FDs F.
Suppose relation R has BCNF violation X ? B.
We need only look among FDs of F for a BCNF
violation, not those that follow from F.
Proof If Y ? A is a BCNF violation and follows
from F, then the computation of Y used at least
one FD X ? B from F.
X must be a subset of Y.
Thus, if Y is not a superkey, X cannot be a
superkey either, and X ? B is also a BCNF
violation.

1. Compute X.
Cannot be all attributes why?
2. Decompose R into X and (RX) ? X.
3. Find the FDs for the decomposed relations.
Project the FDs from F calculate all
consequents of F that involve only attributes
from X or only from (R?X) ? X.

R
X
X
28
Example

R Drinkers(name, addr, beersLiked, manf,
favoriteBeer)
F
1. name ? addr
2. name ? favoriteBeer
3. beersLiked ? manf
Pick BCNF violation name ? addr.
Close the left side name name addr
favoriteBeer.
Decomposed relations
Drinkers1(name, addr, favoriteBeer)
Drinkers2(name, beersLiked, manf)
Projected FDs (skipping a lot of work that leads
nowhere interesting)
For Drinkers1 name ? addr and name ?
favoriteBeer.
For Drinkers2 beersLiked ? manf.

(Repeating)
Decomposed relations
Drinkers1(name, addr, favoriteBeer)
Drinkers2(name, beersLiked, manf)
Projected FDs
For Drinkers1 name ? addr and name ?
favoriteBeer.
For Drinkers2 beersLiked ? manf.
BCNF violations?
For Drinkers1, name is key and all left sides of
FDs are superkeys.
For Drinkers2, name, beersLiked is the key, and
beersLiked ? manf violates BCNF.

30
Decompose Drinkers2

First set of decomposed relations
Drinkers1(name, addr, favoriteBeer)
Drinkers2(name, beersLiked, manf)
Close beersLiked beersLiked, manf.
Decompose Drinkers2 into
Drinkers3(beersLiked, manf)
Drinkers4(name, beersLiked)
Resulting relations are all in BCNF
Drinkers1(name, addr, favoriteBeer)
Drinkers3(beersLiked, manf)
Drinkers4(name, beersLiked)

31
3NF

One FD structure causes problems
If you decompose, you cant check all the FDs
only in the decomposed relations.
If you dont decompose, you violate BCNF.
Abstractly AB ? C and C ? B.
Example 1 title city ? theatre and theatre ?
city.
Example 2 street city ? zip,zip ? city.
Keys A, B and A, C, but C ? B has a left
side that is not a superkey.
Suggests decomposition into BC and AC.
But you cant check the FD AB ? C in only these
relations.

32
Example

A street, B city, C zip.
Join

zip ? city
street city ? zip
33
Elegant Workaround

Define the problem away.
A relation R is in 3NF iff (if and only if)for
every nontrivial FD X ? A, either
1. X is a superkey, or
2. A is prime member of at least one key.
Thus, the canonical problem goes away you dont
have to decompose because all attributes are
prime.

34
What 3NF Gives You

There are two important properties of a
decomposition
We should be able to recover from the decomposed
relations the data of the original.
Recovery involves projection and join, which we
shall defer until weve discussed relational
algebra.
We should be able to check that the FDs for the
original relation are satisfied by checking the
projections of those FDs in the decomposed
relations.
Without proof, we assert that it is always
possible to decompose into BCNF and satisfy (1).
Also without proof, we can decompose into 3NF and
satisfy both (1) and (2).
But it is not possible to decompose into BNCF and
get both (1) and (2).
Street-city-zip is an example of this point.

35
CS 541Multivalued Dependencies

September 11, 2007

36
Multivalued Dependencies

The multivalued dependency X ?? Y holds in a
relation R if whenever we have two tuples of R
that agree in all the attributes of X, then we
can swap their Y components and get two new
tuples that are also in R.
X Y others

37
Example

Drinkers(name, addr, phones, beersLiked) with MVD
Name ?? phones. If Drinkers has the two tuples
name addr phones beersLiked
sue a p1 b1
sue a p2 b2
it must also have the same tuples with phones
components swapped
name addr phones beersLiked
sue a p2 b1
sue a p1 b2
Note we must check this condition for all pairs
of tuples that agree on name, not just one pair.

38
MVD Rules

1. Every FD is an MVD.
Because if X ?Y, then swapping Ys between tuples
that agree on X doesnt create new tuples.
Example, in Drinkers name ?? addr.
2. Complementation if X ?? Y, then X ?? Z, where
Z is all attributes not in X or Y.
Example since name ?? phonesholds in
Drinkers, so doesname ?? addr beersLiked.

39
Splitting Doesnt Hold

Sometimes you need to have several attributes on
the right of an MVD. For example
Drinkers(name, areaCode, phones, beersLiked,
beerManf)
name areaCode phones beersLiked beerManf
Sue 831 555-1111 Bud A.B.
Sue 831 555-1111 Wicked Ale Petes
Sue 408 555-9999 Bud A.B.
Sue 408 555-9999 Wicked Ale Petes
name ?? areaCode phones holds, but neither
name ?? areaCode nor name ?? phones do.

40
4NF

Eliminate redundancy due to multiplicative effect
of MVDs.
Roughly treat MVDs as FD's for decomposition,
but not for finding keys.
Formally R is in Fourth Normal Form if whenever
MVDX ?? Y is nontrivial (Y is not a subset of X,
and X ? Y is not all attributes), then X is a
superkey.
Remember, X ? Y implies X ?? Y, so 4NF is more
stringentthan BCNF.
Decompose R, using4NF violation X ?? Y,into XY
and X ? (RY).