Title: Boyce-Codd NF
1Boyce-Codd NF Lossless Decomposition
CS157A L16
2Armstrongs Axioms
- For computing the set of FDs that follow a given
FD, the - following rules called Armstrongs axioms are
useful - Reflexivity If B ? A, then A ? B
- Augmentation If A ? B, then A ? C ? B ? C
Note also that if A ? B, then A ? C ? B for any
set of attributes C. - Transitivity If A ? B and B ? C then A ? C
3Example of 3NF but not BCNF
- R(A B C D)
- 1 2 1 2
- 1 3 2 1
- 2 1 1 1
- 1 3 2 2
- 2 1 3 2
- Is this table BCNF?
4- FD graph of R is
- A B C D
- B ? A
- AC ? B
- BD ? A, C
- CD ? A, B
- Prime Attribute key attributes are B,C,D
- B ? A and B is not a key, So R is not 3NF
- R is not BCNF
5Projecting FDs
- Given a relation R (A,B,C,D) and F(R) A?B,
B?C, C?D. - Suppose S is projected from R as S(A,C,D). What
is F(S). - To compute F(S), start by computing the closures
of all attributes - in S.
- In R, A A?B, A?C, A?D
- In S, A A?C, A?D
- C C?D and
- D D
- Since A contains all attributes of S, it is not
required to compute - (AC), (AD) or (ACD).
6Inference Rules for FDs
A1, A2, , An ? B1, B2, , Bm
Splitting rule and Combining rule
Is equivalent to
A1 ... Am B1 ... Bm
A1, A2, , An ? B1 A1, A2, , An ? B2 . . . .
. A1, A2, , An ? Bm
7Inference Rules for FDs(continued)
Trivial Rule
A1, A2, , An ? Ai
where i 1, 2, ..., n
A1 Am
Why ?
8Inference Rules for FDs(continued)
Transitive Closure Rule
A1, A2, , An ? B1, B2, , Bm
If
and
B1, B2, , Bm ? C1, C2, , Cp
A1, A2, , An ? C1, C2, , Cp
then
Why ?
9A1 Am B1 Bm C1 ... Cp
10Example (continued)
1. name ? color 2. category ? department 3.
color, category ? price
Start from the following FDs
Infer the following FDs
Inferred FD Which Ruledid we apply ?
4. name, category ? name
5. name, category ? color
6. name, category ? category
7. name, category ? color, category
8. name, category ? price
11Another Rule
Augmentation
A1, A2, , An ? B
If
then
A1, A2, , An , C1, C2, , Cp ? B
Augmentation follows from trivial rules and
transitivityHow ?
12Problem infer ALL FDs
- Given a set of FDs, infer all possible FDs
- How to proceed ?
- Try all possible FDs, apply all 3 rules
- E.g. R(A, B, C, D) how many FDs are possible ?
- Drop trivial FDs, drop augmented FDs
- Still way too many
- Better use the Closure Algorithm (next)
13Closure of a set of Attributes
Given a set of attributes A1, , An The
closure, A1, , An , is the set of attributes
Bs.t. A1, , An ? B
name ? color category ? department color,
category ? price
Example
Closures name name, color
name, category name, category, color,
department, price color color
14Closure Algorithm
Start with XA1, , An. Repeat until X doesnt
change do if B1, , Bn ? C is a FD
and B1, , Bn are all in X then
add C to X.
Example
name ? color category ? department color,
category ? price
name, category name, category,
color, department, price
15Example
A, B ? C A, D ? E B ? D A, F ? B
R(A,B,C,D,E,F)
Compute A,B X A, B,
Compute A, F X A, F,
16Using Closure to Infer ALL FDs
Example
A, B ? CA, D ? B B ? D
Step 1 Compute X, for every X
A A, B BD, C C, D D AB ABCD,
AC AC, AD ABCD ABC ABD ACD ABCD
(no need to compute why ?) BCD BCD, ABCD
ABCD
Step 2 Enumerate all FDs X ? Y, s.t. Y ? X and
X?Y ?
AB ? CD, AD?BC, ABC ? D, ABD ? C, ACD ? B
17Problem Finding FDs
- Approach 1 During Database Design
- Designer derives them from real-world knowledge
of users - Problem knowledge might not be available
- Approach 2 From a Database Instance
- Analyze given database instance and find all FDs
satisfied by that instance - Useful if designers dont get enough information
from users - Problem FDs might be artifical for the given
instance
18Find All FDs
Student Dept Course Room
Alice CSE C 020
Bob CSE C 020
Alice EE HW 040
Carol CSE DB 045
Dan CSE Java 050
Elsa CSE DB 045
Frank EE Circuits 020
Do all FDsmake sensein practice ?
19Answer
Course ? Dept, Room Dept, Room ? Course Student,
Dept ? Course, Room Student, Course ? Dept,
Room Student, Room ? Dept, Course
Do all FDsmake sensein practice ?
20Keys
- A key is a set of attributes A1, ..., An s.t. for
any other attribute B, we have A1, ..., An ? B - A minimal key is a set of attributes which is a
key and for which no subset is a key - Note book calls them superkey and key
21Computing Keys
- Compute X for all sets X
- If X all attributes, then X is a key
- List only the minimal keys
- Note there can be many minimal keys !
- Example R(A,B,C), AB?C, BC?AMinimal keys AB
and BC
22Examples of Keys
- Product(name, price, category, color)
- name, category ? price
- category ? color
- Keys are name, category and all supersets
- Enrollment(student, address, course, room, time)
- student ? address
- room, time ? course
- student, course ? room, time
- Keys are
23Relational Schema Design(or Logical Schema
Design)
- Main idea
- Start with some relational schema
- Find out its FDs
- Use them to design a better relational schema
24Data Anomalies
- When a database is poorly designed we get
anomalies - Redundancy data is repeated
- Update anomalies need to change in several
places - Delete anomalies may lose data when we dont want
25Relational Schema Design
Example Persons with several phones
Name SSN PhoneNumber City
Fred 123-45-6789 206-555-1234 Seattle
Fred 123-45-6789 206-555-6543 Seattle
Joe 987-65-4321 908-555-2121 Westfield
SSN ? Name, City
but not SSN ? PhoneNumber
- Anomalies
- Redundancy repeat data
- Update anomalies Fred moves to Bellevue
- Deletion anomalies Joe deletes his phone
number what is his city ?
26Relation Decomposition
Break the relation into two
Name SSN PhoneNumber City
Fred 123-45-6789 206-555-1234 Seattle
Fred 123-45-6789 206-555-6543 Seattle
Joe 987-65-4321 908-555-2121 Westfield
Name SSN City
Fred 123-45-6789 Seattle
Joe 987-65-4321 Westfield
SSN PhoneNumber
123-45-6789 206-555-1234
123-45-6789 206-555-6543
987-65-4321 908-555-2121
- Anomalies have gone
- No more repeated data
- Easy to move Fred to Bellevue (how ?)
- Easy to delete all Joes phone number (how ?)
27Relational Schema Design
Conceptual Model
Relational Model plus FDs
Normalization Eliminates anomalies
28Decompositions in General
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm)
R2(A1, ..., An, C1, ..., Cp)
R1 projection of R on A1, ..., An, B1, ..., Bm
R2 projection of R on A1, ..., An, C1, ..., Cp
29Decomposition
Name Price Category
Gizmo 19.99 Gadget
OneClick 24.99 Camera
Gizmo 19.99 Camera
Name Price
Gizmo 19.99
OneClick 24.99
Gizmo 19.99
Name Category
Gizmo Gadget
OneClick Camera
Gizmo Camera
Lossless decomposition
30Incorrect Decomposition
Name Price Category
Gizmo 19.99 Gadget
OneClick 24.99 Camera
Gizmo 19.99 Camera
Whatsincorrect ??
Name Category
Gizmo Gadget
OneClick Camera
Gizmo Camera
Price Category
19.99 Gadget
24.99 Camera
19.99 Camera
Lossy decomposition
31Decompositions in General
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm)
R2(A1, ..., An, C1, ..., Cp)
If A1, ..., An ? B1, ..., Bm Then the
decomposition is lossless
Note dont need necessarily A1, ..., An ? C1,
..., Cp
Example name ? price, hence the first
decomposition is lossless
32Normal Forms
- First Normal Form all attributes are atomic
- Second Normal Form (2NF) old and obsolete
- Third Normal Form (3NF) this lecture
- Boyce Codd Normal Form (BCNF) this lecture
- Others...
33(No Transcript)
34- R (J, K, L)
- F (JK ? L, L ? K)
- Two candidate keys JK and JL
- R is in 3NF
- JK ? L JK is a superkey
- L ? K K is prime
- BCNF decomposition yields
- R1 (L,K), R2 (L,J)
- testing for JK ? L requires a join
- There is some redundancy in R
35Boyce-Codd Normal Form
A simple condition for removing anomalies from
relations
A relation R is in BCNF if If A1, ..., An ?
B is a non-trivial dependency in R , then
A1, ..., An is a key for R
In English (though a bit vague) Whenever a
set of attributes of R is determining another
attribute, it should determine all the
attributes of R.
36(No Transcript)
37(No Transcript)
38BCNF Decomposition Algorithm
Repeat choose A1, , Am ? B1, , Bn that
violates the BNCF condition split R into
R1(A1, , Am, B1, , Bn) and R2(A1, , Am,
others) continue with both R1 and R2Until
no more violations
Is there a 2-attribute relation that is not in
BCNF ?
As
Bs
Others
R1
R2
39Example
Name SSN PhoneNumber City
Fred 123-45-6789 206-555-1234 Seattle
Fred 123-45-6789 206-555-6543 Seattle
Joe 987-65-4321 908-555-2121 Westfield
Joe 987-65-4321 908-555-1234 Westfield
What are the dependencies? SSN ? Name,
City What are the keys? SSN, PhoneNumber Is it
in BCNF?
40Decompose it into BCNF
Name SSN City
Fred 123-45-6789 Seattle
Joe 987-65-4321 Westfield
SSN ? Name, City
- Lets check anomalies
- Redundancy ?
- Update ?
- Delete ?
SSN PhoneNumber
123-45-6789 206-555-1234
123-45-6789 206-555-6543
987-65-4321 908-555-2121
987-65-4321 908-555-1234
41Summary of BCNF Decomposition
Find a dependency that violates the BCNF
condition
A1, A2, , An ? B1, B2, , Bm
Heuristics choose B , B , B as large as
possible
1
2
m
Continue until there are no BCNF violations left.
Decompose
Others
As
Bs
2-attribute relations are BCNF
R1
R2
42Example Decomposition
Person(name, SSN, age, hairColor,
phoneNumber) SSN ? name, age age ? hairColor
Decompose in BCNF (in class) Step 1 find all
keys (How ? Compute S, for various sets
S) Step 2 now decompose
43Other Example
- R(A,B,C,D) A ? B, B ? C
- Key AD
- Violations of BCNF A ? B, A? C, A?BC
- Pick A? BC split into R1(A,BC) R2(A,D)
- What happens if we pick A ? B first ?
44Lossless Decompositions
- A decomposition is lossless if we can recover
- R(A,B,C)
- R1(A,B) R2(A,C)
- R(A,B,C) should be the same
as R(A,B,C)
Decompose
Recover
R is in general larger than R. Must ensure R
R
45Lossless Decompositions
- Given R(A,B,C) s.t. A?B, the decomposition into
R1(A,B), R2(A,C) is lossless
463NF A Problem with BCNF
Unit Company
Product
FDs Unit ? Company Company, Product ?
Unit So, there is a BCNF violation, and we
decompose.
Unit Company
Unit ? Company
Unit Product
No FDs
Notice we loose the FD Company, Product ? Unit
47So Whats the Problem?
Unit Company
Unit Product
Galaga99 UW Galaga99
databases Bingo UW
Bingo databases
No problem so far. All local FDs are
satisfied. Lets put all the data back into a
single table again (anomalies?)
Unit Company
Product
Galaga99 UW
databases Bingo UW
databases
Violates the dependency company, product -gt
unit!
48Solution 3rd Normal Form (3NF)
A simple condition for removing anomalies from
relations
A relation R is in 3rd normal form if Whenever
there is a nontrivial dependency A1, A2, ..., An
? Bfor R , then A1, A2, ..., An is a key
for R, or B is part of a key.
Tradeoff BCNF no anomalies, but may lose some
FDs 3NF keeps all FDs, but may have some
anomalies
49Purpose of Normalization
- To reduce the chances for anomalies to occur in a
database. - normalization prevents the possible corruption of
databases stemming from what are called
insertion anomalies," "deletion anomalies," and
"update anomalies."
50Insertion Anomaly
- A failure to place a new database entry into all
the places in the database where that new entry
needs to be stored. - In a properly normalized database, a new entry
needs to be inserted into only one place in the
database
51Deletion Anomaly
- A failure to remove an existing database entry
when it is time to remove that entry. - In a properly normalized database, an old,
to-be-gotten-rid-of entry needs to be deleted
from only one place in the database
52Update anomaly
- An update of a database involves modifications
that may be additions, deletions, or both. Thus
"update anomalies" can be either of the kinds of
anomalies discussed above.
53(No Transcript)
54(No Transcript)
55(No Transcript)
56(No Transcript)
57(No Transcript)
58(No Transcript)
59(No Transcript)
60(No Transcript)
61(No Transcript)
62(No Transcript)
63(No Transcript)
64(No Transcript)