Title: CPSC 310 Database Systems
1CPSC 310 Database Systems
- Lecturer Anxiao (Andrew) Jiang
- Lecture Four Normalization
2Normalization
- Anomalies
- Boyce-Codd Normal Form
- 3rd Normal Form
Slides by Jeffrey Ullman
3Anomalies
- Goal of relational schema design is to avoid
anomalies and redundancy. - Update anomaly one occurrence of a fact is
changed, but not all occurrences. - Deletion anomaly a valid fact is lost when a
tuple is deleted.
4Example of Bad Design
Consumers(name, addr, candiesLiked, manf,
favCandy) name addr candiesLiked manf
favCandy Janeway Voyager Twizzlers Hershey
Smarties Janeway ??? Smarties Petes
??? Spock Enterprise Twizzlers ??? Twizzlers
Data is redundant, because each of the ???s can
be figured out by using the FDs name -gt addr
favCandy and candiesLiked -gt manf.
5This Bad Design Also Exhibits Anomalies
name addr candiesLiked manf
favCandy Janeway Voyager Twizzlers Hershey
Smarties Janeway Voyager Smarties Nestle
Smarties Spock Enterprise Twizzlers Hershey
Twizzlers
- Update anomaly if Janeway is transferred to
Intrepid, - will we remember to change each of her tuples?
- Deletion anomaly If nobody likes Twizzlers, we
lose track - of the fact that Hershey manufactures Twizzlers.
6Boyce-Codd Normal Form
- We say a relation R is in BCNF if whenever X
-gtA is a nontrivial FD that holds in R, X is a
superkey. - Remember nontrivial means A is not a member of
set X. - Remember, a superkey is any superset of a key
(not necessarily a proper superset).
7Example
- Consumers(name, addr, candiesLiked, manf,
favCandy) - FDs name-gtaddr favCandy, candiesLiked-gtmanf
- Only key is name, candiesLiked.
- In each FD, the left side is not a superkey.
- Any one of these FDs shows, Consumers is not in
BCNF
8Another Example
- Candies(name, manf, manfAddr)
- FDs name-gtmanf, manf-gtmanfAddr
- Only key is name .
- name-gtmanf does not violate BCNF, but
manf-gtmanfAddr does.
9Decomposition into BCNF
- Given relation R with set of FDs F.
- Compute keys for R (sets of attributes that
functionally determine all other attributes) - Look among the given FDs for a BCNF violation X
-gtB. - If any FD following from F violates BCNF, then
there will surely be an FD in F itself that
violates BCNF. - Compute X .
- Won't be all attributes, or else X would be a
superkey and X -gtB would not be a violation
10Decompose R Using X -gt B
- Replace R by relations R1 and R2 with schemas
- R1 X (all attributes "reachable" from X)
- R2 R (X X ) (X plus all attributes not
"reachable" from X) - Project given FDs F onto the two new relations
(ignore FD's containing attributes no longer in
the new relation). - Check if procedure must be repeated with R1
and/or R2.
11Decomposition Picture
R1
R-X
X
X -X
R2
R
12Example
- Consumers(name, addr, candiesLiked, manf,
favCandy) - F name-gtaddr, name -gt favCandy, candiesLiked-gt
manf - Pick BCNF violation name-gtaddr.
- Close the left side name name, addr,
favCandy. - Decomposed relations
- Consumers1(name, addr, favCandy)
- Consumers2(name, candiesLiked, manf)
13Example, Continued
- We are not done we need to check Consumers1 and
Consumers2 for BCNF. - Projecting FDs is easy here.
- For Consumers1(name, addr, favCandy), relevant
FDs are name-gtaddr and name-gtfavCandy. - Thus, name is the only key and Consumers1 is in
BCNF.
14Example, Continued
- For Consumers2(name, candiesLiked, manf), the
only FD is candiesLiked -gt manf, and the only key
is name, candiesLiked. - Violation of BCNF.
- candiesLiked candiesLiked, manf, so we
decompose Consumers2 into - Consumers3(candiesLiked, manf)
- Consumers4(name, candiesLiked)
15Example, Concluded
- The resulting decomposition of Consumers
- Consumers1(name, addr, favCandy)
- Consumers3(candiesLiked, manf)
- Consumers4(name, candiesLiked)
- Notice Consumers1 tells us about consumers,
Consumers3 tells us about candies, and
Consumers4 tells us the relationship between
consumers and the candies they like.
16Third Normal Form - Motivation
- There is one structure of FDs that causes
trouble when we decompose. - AB -gtC and C -gtB.
- Example A street address, B city,
- C zip code.
- There are two keys, A,B and A,C .
- Think about why.
- C -gtB is a BCNF violation (why?), so we must
decompose into AC, BC.
17We Cannot Enforce FDs
- The problem is that if we use AC and BC as our
database schema, we cannot enforce the FD AB -gtC
by checking FDs in these decomposed relations. - Example with A street, B city, and C zip on
the next slide.
18An Unenforceable FD
- Suppose we have relation Addr with FD's
- street city -gt zip
- zip -gt city
- Keys are street, city and street, zip
- After decomposing, we have relations
- Addr1(street,zip) with no FD (!)
- Addr2(city,zip) with FD zip -gt city
- Consider instances on next slide
19An Unenforceable FD (cont'd)
Addr1
Addr2
street zip 545 Tech Sq. 02138 545 Tech
Sq. 02139
city zip Cambridge 02138 Cambridge 02139
Although no FDs were violated in the decomposed
relations, FD street city -gt zip is violated by
the database as a whole.
203NF Lets Us Avoid This Problem
- 3rd Normal Form (3NF) modifies the BCNF condition
so we do not have to decompose in this problem
situation. - An attribute is prime if it is a member of any
key. - X -gtA violates 3NF if and only if X is not a
superkey, and also A is not prime.
21Example
- In our problem situation with FDs AB -gtC and C
-gtB, we have keys AB and AC. - Thus A, B, and C are each prime.
- Although C -gtB violates BCNF, it does not
violate 3NF. - So no need to decompose.
22What 3NF and BCNF Give You
- There are two important properties of a
decomposition - Recovery it should be possible to project the
original relations onto the decomposed schema,
and then reconstruct the original. - Dependency Preservation it should be possible
to check in the projected relations whether all
the (original) given FDs are satisfied.
233NF and BCNF, Continued
- We can get (1) with a BCNF decomposition.
- Explanation needs to wait for relational algebra.
- We can get both (1) and (2) with a 3NF
decomposition. - But we cant always get (1) and (2) with a BCNF
decomposition. - street-city-zip is an example.