Title: Department of Computer Science and Engineering, HKUST
1Comp 231 Database Management Systems
7. Relational Database Design
2Pitfalls in Relational Database Design
- Relational database design requires that we find
a good collection of relation schemas. A bad
design may lead to - Repetition of information.
- Inability to represent certain information.
- Design Goals
- Avoid redundant data
- Ensure that relationships among attributes are
represented - Facilitate the checking of updates for violation
of database integrity constraints
3Example
- Consider the relation schemaLending-schemabranc
h-name, branch-city, assets, customer-name,
loan-number, amount
- Redundancy
- Data for branch-name, branch-city, assets are
repeated for each loan that a branch makes - Waste space and complicates updating
- Null values
- cannot store information about a branch if no
loans exist - Can use null values, but they are difficult to
handle
4Decomposition
- Decompose the relation schema Lending-schema
intoBranch-customer-schemabranch-name,
branch-city, assets, customer-nameCustomer-loan-s
chemacustomer-name, loan-number, amount - All attributes of an original schema (R) must
appear in the decomposition (R1,R2) R R1 ?
R2 - Lossless-join decompositionFor all possible
relations r on schema R r ?R1 (r ) ?R2
(r )
5Example of a Non Lossless-Join Decomposition
- Decompose R (A,B,C) into R1 (A,B) and R2
(B,C)
It is a lossy decomposition An extraneous tuple
is obtained. You get more, not less!!
6Goal - Devise a Theory for the Following
- Decide whether a particular relation R is in
good form. - In the case that a relation R is not in good
form, decompose it into a set of relations R1,
R2, ,Rn such that - each relation is in good form
- the decomposition is a lossless-join
decomposition - Our theory is based on
- functional dependencies
- multivalued dependencies
7Why are FDs involved?
- We cant tell if a relation scheme is good or not
without first knowing the functional
dependencies.
Lending-schema(branch-name, branch-city, assets,
customer-name, loan-number, amount)
How do you know this scheme is not good?
Because you know the functional dependencies.
Try to name a few of them.
8Normalization using Functional Dependencies
- When we decompose a relation schema R with a set
of functional dependencies F into R1 and R2 we
want - Lossless-join decomposition At least one of the
following dependencies is in F - R1 ? R2 ? R1
- R1 ? R2 ? R2
- No redundancy The relations R1 and R2 preferably
should be in either Boyce-Codd Normal Form or
Third Normal Form. - Dependency preservation Let Fi be the set of
dependencies in F that include only attributes
in Ri. Test to see if - (F1 ? F2 ) Fotherwise, checking updates for
violation of functional dependencies is expensive.
The attributes with which you can join R1 and R2
is either a key of R1 or R2
9Example
- R (A, B, C) F A ? B, B ? C
- R1 (A, B), R2 (B, C)
- Lossless-join decomposition R1 ? R2 B and
B ? R2(BC) - Dependency preservingF1 A ? B F2 B ? C ?
(F1 ? F2 ) F - R1 (A, B), R2 (A, C)
- Lossless-join decomposition R1 ? R2 A and
A ? R1(AB) - Not dependency preserving F1 A ? B F2 ? ?
B ? C and A ? C are lost (cannot check B ? C
without computing R1 R2)
10Boyce-Codd Normal Form
- A relation schema R is in BCNF with respect to a
set of F of functional dependencies if for all
functional dependencies in F of the form ? ? ?,
where ? ? R and ? ? R, at least one of the
following holds - ? ? ? is trivial (i.e., ? ? ?)
- ? is a superkey for R
11Example
- R (A, B, C)F A ? B B ? CKey A
- R is not in BCNF. Why?
- Decomposition R1 (A,B), R2 (B,C)
- R1 and R2 in BCNF
- Lossless-join decomposition
- Dependency preserving
B ? C where B is not a superkey
12BCNF Decomposition Algorithm
result R done false compute F while
(not done) do if (there is a schema Ri in
result that is not in BCNF) then begin let ? ?
? be a nontrivial functional dependency that
holds on Ri such that ? ? Ri is not in F,
and ? ? ? ? result (result - Ri) ?
(Ri - ?) ? (?, ?) end else donetrue
Remove ? from the original scheme and include a
new scheme R(??)
Each Ri is in BCNF, and decomposition is
lossless-join.
13Example of BCNF Decomposition
- R (branch-name, branch-city, assets, customer-n
ame, loan-number, amount)F branch-name ?
assets, branch-city loan-number ? amount,
branch-nameKey loan-number, customer-name - Decomposition
- First FD violates BCNF
- R1 (branch-name, branch-city, assets)
- R2 (branch-name, customer-name, loan-number,
amount) - Second FD violates BCNF in R2
- R3 (branch-name, loan-number, amount)
- R4 (customer-name, loan-number)
- Final decomposition R1, R3, R4
14BCNF and Dependency Preservation
- It is not always possible to get a BCNF
decomposition that is dependency preserving. - R (J, K, L)F JK ? L L ? KTwo candidate
keys JK and JL - R is not in BCNF decompose into R1(J, L) and
R2(L,K) - Any decomposition of R will fail to preserve
JK ? L
So, sometimes we need to step back to a weaker
requirement
15Third Normal Form
- A relation schema R is in third normal form (3NF)
if for all ? ? ? in Fat least one of the
following holds - ? ? ? is trivial (i.e., ? ? ?)
- ? is superkey for R
- Each attribute A in ? ? ? is contained in a
candidate key of R. - If a relation is in BCNF it is in 3NF (since in
BCNF one of the first two conditions above must
hold).
16Third Normal Form
- Same example as in BCNF
- R (J, K, L)F JK ? L, L ? K
- Two candidate keys JK and JL
- R is in 3NF JK ? L JK is a superkey L
? K K is contained in a candidate key - Algorithm to decompose a relation schema R into a
set of relation schemas R1, R2,, Rn such that - each relation schema Ri is in 3NF
- lossless-join decomposition
- dependency preserving
173NF Decomposition Algorithm
- Let Fc be a canonical cover for Fi 0for
each functional dependency ? ? ? in Fc do if
none of the schemas Rj, 1lt j lt i contains
?? then begin ii1 Rj ??
endif none of the schemas Rj, 1lt j lt i
contains a candidate key for R then
begin ii1 Ri any candidate key for R
end return (R1, R2, , Ri)
18Example
- Relation schemaBanker-info-schemabranch-name,
customer-name, banker-name, office-number - The functional dependencies for this relation
schema are banker-name ? branch-name,
office-number customer-name, branch-name ?
banker-name - The key is customer-name, branch-name
19Applying 3NF to banker - info - schema
- Go through the for loop in the algorithmbanker-
name ? branch-name, office-numberis not in any
decomposed relation (no decomposed relation so
far) Create a new relationBanker-office-schema
( banker-name, branch-name, office-number
)customer-name, branch-name ? banker-nameis
not in any decomposed relation (one decomposed
relation so far) Create a new relationBanker-sch
ema ( customer-name, branch-name, banker-name ) - Since Banker-schema contains a candidate key for
Banker-info-schema, we are done with the
decomposition process.
20Comparison of BCNF and 3NF
- It is always possible to decompose a relation
into relations in 3NF and - the decomposition is lossless
- dependencies are preserved
- It is always possible to decompose a relation
into relations in BCNF and - the decomposition is lossless
- it may not be possible to preserve dependencies
21Comparison of BCNF and 3NF
- Prof-office ( Department, Prof-name , Room)
- F Department, Prof-name ? Room Room ?
Prof-name
A professor may be affiliated with more than one
department and assigned more than one
office. Prof-name -?-? Room Prof-name -?-?
Dept Nearby departments may give him the same
office, whereas department far away may give him
a different one. Room -?-? Dept Office wont be
shared Room ? Prof-name Each department will
assign only one office to a professor Department,
Prof-name ? Room
- Key Department, Prof-name
- The relation is in 3NF but not in BCNF.
- repetition of Prof/room information
- need to use null values if a Prof has a room but
no department assigned
22Problems with Decomposition
- Algorithms need to identify all candidate keys
and the canonical cover it is a rather difficult
process - Decomposition algorithms are not deterministic.
E.g., if there are several functional
dependencies violating the normal form, the order
of selecting the problem FD for decomposition may
give different relation schemes - Algorithms may result in relation schemes which
are not intuitive
23Relational Database Design Review
- Two approaches to DB design
- 1) Design ER model, then translate to relation
schemes - 2) Put every attribute together in one relation,
identify all the functional dependencies, and
then decompose into 3NF at least. - The first approach is more popular, but
relational theory helps formalizing some concepts
such as key (what does it mean by A key uniquely
identifies the tuples?) - Identifying the FDs is part of the DB design
process it helps you understand the requirements
better.
24An Example of Lossy Decomposition
It is clearly a bad decomposition since Sem is
not a foreign key of any table. How would you
decompose it???
25An Example of Bad Relation Scheme
- Project ( Emp-no, Proj-no, Emp-name, Hours )
- From ER point of view, it is bad since it
embodies an entity type and an NM relationship
type - From a relational theory point of view, you know
- Emp-no ? Emp-name
- Emp-no, Proj-no ? Hours
- Emp-no and Proj-no is the only key in Project
- Project is not in 3NF, why? Decompose
- Employee ( Emp-no, Emp-name )
- Works-on ( Emp-no, Proj-no, Hours )
26A Difficult Example for ER Approach
Toyota Camry, 2.2, Japan, 5600
- Cars_all ( Make, Engine-size, Origin, Fee )
- Difficult to see how many entities or
relationships are there - From a relational theory point of view, you know
- Make, Engine-size ? Origin
- Engine-size ? Fee
- Make, Engine-size is the only key in Cars_all
- Cars_all is not in 3NF, why? Decompose
- Cars ( Make, Engine-size, Origin )
- License ( Engine-size, Fee )
27Another Difficult Example
- Cars2 ( Make, Engine-size, Plant)
- From a relational theory point of view, you know
- Make, Engine-size ? Plant
- Plant ? Make (A plant makes the same engine of a
given model) - Make, Engine-size is the only key in Cars2
- Cars2 is in 3NF but not in BCNF, why? Decompose
- Car_plant ( Make, Plant )
- Car_engine( Make, Engine-size)
Question but what have we lost in BCNF?
28First Normal Form
- Atomicity is actually a property of how the
elements of the domain are used. - E.g. Strings would normally be considered
indivisible - Suppose that students are given roll numbers
which are strings of the form CS0012 or EE1127 - If the first two characters are extracted to find
the department, the domain of roll numbers is not
atomic. - Doing so is a bad idea leads to encoding of
information in application program rather than in
the database.