Title: Dr. A.I. Cristea
1CS 319 Theory of Databases
- Dr. A.I. Cristea
- http//www.dcs.warwick.ac.uk/acristea/
2 previous Armstrong axioms
3Content
- Generalities DB
- Integrity constraints (FD revisited)
- LLJ, DP and applications
- Relational Algebra (revisited)
- Query optimisation
- Temporal Data
- The Askew Wall
- Tuple calculus
- Domain calculus
- Query equivalence
4Lossless Join Decomposition
- Lossless Join Definition
- Let R1 , R2 be a decomposition of R (meaning
that R1 ? R2 R) the decomposition is
lossless if for every legal instance r of R - r ?R1(r) ?? ?R2(r)
- What is wrong with the following decomposition?
- R A,B,C and F A ? B, C ? B and we
replace R by R1 , R2 where R1 A,B and R2
C,B.
5Sufficient Condition for Lossless Join
- Lossless Join means
- Let R1 , R2 be a decomposition of R (meaning
that R1 ? R2 R) - Prove that for all legal instances r
- r ? ?R1(r) ?? ?R2(r)
- Prove that this decomposition is lossless if
R1 ? R2 ? R1 or R1 ? R2 ? R2 - Can you give an example of a lossless join
decomposition (instance) when neither - R1 ? R2 ? R1 nor R1 ? R2 ? R2 hold?
6Boyce-Codd Normal Form (BCNF)
- A relation scheme R is in BCNF if (and only if)
for every non-trivial fd X ? Y ? F, X is a
superkey (for R). - A database scheme D R1,..., Rn is in BCNF if
(and only if) ?i ? 1,...,n Ri is in BCNF. - Let R A,B,C and F A ? B, C ? B and let
us decompose R into by R1 , R2 where R1
A,B and R2 C,B. Is this
decomposition in BCNF? Is this the best
decomposition in BCNF? (Can you find a better
one?)
7BCNF Decomposition Algorithm
- result R
- done false
- compute F
- while (not done) do
- if (there is a schema Ri in result that is not
in BCNF) - then begin
- let a?ß be a nontrivial functional dependency
- that holds on Ri such that a?Ri is not in F,
- and a?ß?
- result (result Ri) ? (Ri ß) ? (a, ß)
- end
- else done true
8Dependencies in a decomposition
- Which dependencies hold in R1 and R2?
- R A,B,C and F A ? B, B ? C and we
replace R by R1 , R2 where R1 A,B and R2
B,C. - R A,B,C and F A ? B, C ? B and we
replace R by R1 , R2 where R1 A,B and R2
A,C. - R A,B,C and F A ? B, B ? C and we
replace R by R1 , R2 where R1 A,B and R2
A,C
9Third Normal Form (3NF)
- Third Normal Form
- Informal Presentation
- Example and Discussion
- Formal Definition
- 3NF Decomposition Algorithm
- Principle and Properties
- Lossless-join, dependency-preserving
decomposition into 3NF - Proof of Correctness
- Example of 3NF Decomposition
- Third Normal Form and Boyce-Codd Normal Form
10Informal Presentation
- Motivation
- There are some situations where
- BCNF decomposition is not dependency preserving,
and - Efficient checking for FD violation on updates is
important - Solution
- Define a weaker normal form, called Third Normal
Form - FDs can be checked on individual relations
without computing a join - There is always a lossless-join,
dependency-preserving decomposition into 3NF
11Informal Presentation
- Motivation
- Sometimes a relational schema and its FDs are not
in BCNF but one does not want to decompose it
further - Example
- Relation Bookings with attributes
- title, the name of the performance
- theater, the name of the theater where the
performance is being shown - city, where the theater is located
- FDs are theater ? city, title city ?
theater - Is there a BCNF violation?
12Informal Presentation
- Motivation
- Decomposition to get to BCNF may not always be
desirable - BCNF decomposition is not dependency preserving,
and - Efficient checking for FD violation on updates is
important - 3NF relaxes BCNF to allow relations that cannot
be decomposed into BCNF relations without losing
ability to check each FD - Informal Definition of 3NF
- A relation R is in third normal form if
As for BCNF
13Informal Presentation
- Informal Definition of 3NF
- A relation R is in third normal form if
- The difference between BCNF and 3NF
- B is a member of some candidate key
- Previous example schema is in 3NF
- Candidate keys here are (title, city) and
(theater, title) - Theater is not a superkey but city is a member of
a candidate key - What is the problem with this schema?
14Informal Presentation
- Informal Definition of 3NF
- Previous example schema is in 3NF
- What is the problem with this schema?
- The schema contains redundant information
15Formal Definition 3NF
- Definition
- A relation schema R is in third normal form (3NF)
if - for all functional dependencies in F of the form
? ? ?,where ? ? R and ? ? R, at least one of the
following holds - ? ? ? is a trivial functional dependency (? ? ?)
- ? contains a key for R
- every B ? ? is part of some candidate key of R
- BCNF and 3NF
- A BCNF relation is in 3NF
- A 3NF relation is not necessary in BCNF
16Formal Presentation
- Example
- Consider the two relational schemas
- R1 (cust-num, name, house-num, street, city,
state) - cust-num ? name, house-num, street, city, state
-
- R2 (house-num, street, city, state, zip)
- house-num, street, city, state ? zip
- zip ? state
- Are these relations in 3NF?
17Formal Presentation
- Example in 3NF?
- For R1
- The only nontrivial functional dependencies in F
are those with cust-num as a member of the
left-side of the FD - As cust-num is a superkey of R1, these functional
dependencies satisfy the second condition for 3NF
18Formal Presentation
- Example in 3NF?
- For R2
- There are two kinds of nontrivial functional
dependencies in F - Those with (house-num, street, city, state) as a
subset of the left hand side of the FD As
(house-num, street, city, state) is a superkey
for R2, these functional dependencies satisfy the
second condition for 3NF - Those of the form ? ? zip ? ? ? state where ?
? ? - For any such functional dependency
- (? ? state) (? ? zip) state (or ?)
- Because state is part of a candidate key of R2,
such functional dependencies satisfy the third
condition for 3NF
19 Decomposition into 3NF
- Principles
- Input/Output
- Input
- A set of functional dependencies F
- A relation schema R
- Output
- A lossless-join, dependency-preserving
decomposition in 3NF - Canonical Cover
- The set of dependencies Fc in the algorithm is a
canonical cover of the functional dependencies
20Fc definition
- a canonical cover Fc for F is a set of
dependencies Fc for which - Fc ltgt F
- no fd in Fc is superfluous
- no fd in Fc contains extraneous attrs
- each left side of fd in Fc is unique
21Extraneous attribute A in a?ß in R
- A?a F gt F a?ß ? (a-A)?ß
- A?ß F a?ß ? a?(ß -A) gt F
- Computed via attribute closures
22Fc computation algorithm
- Fc F
- Repeat
- apply union rule (right side of fd)
- find fd with extraneous attrs (left/right side)
- delete these
- Until Fc doesnt change
23Decomposition into 3NF
- Principles
- The algorithm takes a set of dependencies and
adds one schema at a time, instead of decomposing
the initial schema repeatedly - The result is not uniquely defined since
- A set of functional dependencies can have more
than one canonical cover - In some cases, the result of the algorithm
depends on the order which it considers the
dependencies in Fc(minor bug in the algorithm,
see later)
24 Decomposition into 3NF
- Decomposition
- Given relation R, set F of functional
dependencies - Find decomposition of R into a set of 3NF
relation Ri - Algorithm (sketch, real algorithm on next
slides) - Decomposition produces a lossless join and
preserves dependencies - Prove !
25 Decomposition Algorithm into 3NF
- Let Fc be the canonical cover of F
- j 0
- for each dependency a ? ß in Fc
- if none of schemes in Ri (i1, 2, , j)
contains aß then - j j1
- Rj aß
- end-if
- if any of the schemes in Ri (i1, 2, , j-1) is
contained in Rj - remove Ri
- end-if
- end-for
- if none of the schemes Ri (i1, 2, , j) contains
a candidate key for R then - j j 1
- Rj any candidate key for R
- end-if
- return (R1, R2, , Rj)
26 Decomposition into 3NF
- Example
- Semester database of a university
- Relational schema R(L, I, T, R, S, G)
- Attributes
- L Lecture R Room
- I Instructor S Student
- G Grade T Time
- Functional Dependencies
- L ? I, TR ? L, TI ? R, LS ? G, TS ? R, TRI ? LR
27Decomposition into 3NF
- Example
- R(L, I, T, R, S, G)
- F L ? I, TR ? L, TI ? R, LS ? G, TS ? R, TRI ?
LR - Are all FDs necessary? No !
- TR ? L, TI ? R then TRI ? LR
- Canonical cover of F
- Fc L ? I, TR ? L, TI ? R, TS ? R, LS ? G
- Key (ST)
- Key attributes S, T
28 Decomposition into 3NF
- Example
- R (L, I, T, R, S, G)
- Fc L ? I, TR ? L, TI ? R, TS ? R, LS ? G
- Key attributes S, T
- Decomposition in 3NF
- R1 (L, I) R2 (T, R, L)
- R3 (T, I, R) R4 (L, S, G)
- R5 (S, T, R)
(2) Create a relation Ri XA for each FD X ? A
in Fc
29Decomposition into 3NF
- 3NF Decomposition Algorithm
- Proof of Correctness
- 3NF decomposition algorithm is lossless join,
dependency preserving decomposition into 3NF - Dependency preserving
- Lossless join
- 3NF
30Proof Decomposition into 3NF is dependency
preserving
- 3NF Decomposition Algorithm
- Decomposition is dependency preserving
- 3NF decomposition algorithm is dependency
preserving since there is a relation for every FD
in Fc.
31 Proof Decomposition into 3NF is a lossless join
- 3NF Decomposition Algorithm
- Decomposition is lossless join
- Lossless join decomposition
- A decomposition R1, R2 is a lossless-join
decompositionif R1 ? R2 ? R1 or R1 ? R2 ? R2 - Idea
- A candidate key (K) is in one of the relations Ri
in decomposition (last step of algorithm
guarantees this) - Closure of candidate key under Fc must contain
all attributes in R (definition of candidate key) - Follow the steps of attribute closure algorithm
(Fig. 7.9)to show that the sufficient lossless
join condition is satisfied for K.
32 Proof Decomposition into 3NF is actually 3NF!
- 3NF Decomposition Algorithm
- Decomposition into 3NF
- Claim
- If a relation Ri is in the decomposition
generated by the synthesis algorithm, then Ri is
in 3NF - Idea
- To test for 3NF, it is sufficient to consider the
functional dependencies whose right-hand side is
a single attribute - Therefore to see that Ri is in 3NF, we must show
that any functional dependency ? ? B that holds
in Ri, satisfies the definition of 3NF
33Proof Decomposition into 3NF is actually 3NF!
- 3NF Decomposition Algorithm
- Decomposition into 3NF
- Demonstration
- Assume ? ? ? is the dependency that generated Ri
in the algorithm - B must be in ? or ?, since B is in Ri and ? ? ?
generated Ri - Let us consider two possible cases
- B is in ? but not ?
- B is in ? but not ?
34Proof 3NF Decomposition is 3NF!
- 3NF Decomposition Algorithm
- Decomposition into 3NF
- Demonstration
- B is in ? but not in ?
- ? must be superkey (why?)
- The second condition of 3NF is satisfied
- B is in ? but not in ?
- ? is a candidate key
- The third alternative in the definition of 3NF is
satisfied - Note we cannot show that ? is a superkey. This
shows exactly why the third alternative is
present in the definition of 3NF
35Decomposition into 3NF
- B is in ?
- Assume ? is not a superkey
- ? must contain some attribute not in ?
- Since ? ? B is in F it must be derivable from
Fc, by using attribute closure on ? - Attribute closure cannot have used ? ??
- if it had been used, ? must be contained in the
attribute closure of ?, which is not possible,
since we assumed ? is not a superkey - Now, using ?? (?- B) and ? ? B, we can derive
? ?B (since ? ? ? ?, and B ? ? since ? ? B is
non-trivial) - Then, B is extraneous in the right-hand side of ?
?? which is not possible since ? ?? is in Fc
(contradiction!) - Thus, if B is in ? then ? must be a superkey
36 Comparison of BCNF and 3NF
- BCNF or 3NF?
- Relations in BCNF and 3NF
- Relations in BCNF no repetition of information
- Relations in 3NF problem of repetition of
information - Decomposition in BCNF and in 3NF
- It is always possible to decompose a relation
into relations in 3NF and - the decomposition is lossless
- dependencies are preserved
- It is always possible to decompose a relation
into relations in BCNF and - the decomposition is lossless
- the information is not repeated
37 Compare BCNF and 3NF
- To summarize
- Design Goals
- Goal for a relational database design is
- BCNF (no redundant information)
- Lossless join
- Dependency preservation
- If we cannot achieve this, we accept
- 3NF (possible repetition of information)
- Lossless join
- Dependency preservation
38Summary
- We have learned
- LLJ
- DP
- BCNF algorithm
- 3rd NF algorithm
39 to follow Relational Algebra, revisited
40Questions?