Department of Computer Science and Engineering, HKUST - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Department of Computer Science and Engineering, HKUST

Description:

E.g. Strings would normally be considered indivisible. Suppose that students are given roll numbers which are strings of the form CS0012 or EE1127 ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 29
Provided by: Ale8279
Category:

less

Transcript and Presenter's Notes

Title: Department of Computer Science and Engineering, HKUST


1
Comp 231 Database Management Systems
7. Relational Database Design
2
Pitfalls in Relational Database Design
  • Relational database design requires that we find
    a good collection of relation schemas. A bad
    design may lead to
  • Repetition of information.
  • Inability to represent certain information.
  • Design Goals
  • Avoid redundant data
  • Ensure that relationships among attributes are
    represented
  • Facilitate the checking of updates for violation
    of database integrity constraints

3
Example
  • Consider the relation schemaLending-schemabranc
    h-name, branch-city, assets, customer-name,
    loan-number, amount
  • Redundancy
  • Data for branch-name, branch-city, assets are
    repeated for each loan that a branch makes
  • Waste space and complicates updating
  • Null values
  • cannot store information about a branch if no
    loans exist
  • Can use null values, but they are difficult to
    handle

4
Decomposition
  • Decompose the relation schema Lending-schema
    intoBranch-customer-schemabranch-name,
    branch-city, assets, customer-nameCustomer-loan-s
    chemacustomer-name, loan-number, amount
  • All attributes of an original schema (R) must
    appear in the decomposition (R1,R2) R R1 ?
    R2
  • Lossless-join decompositionFor all possible
    relations r on schema R r ?R1 (r ) ?R2
    (r )

5
Example of a Non Lossless-Join Decomposition
  • Decompose R (A,B,C) into R1 (A,B) and R2
    (B,C)

It is a lossy decomposition An extraneous tuple
is obtained. You get more, not less!!
6
Goal - Devise a Theory for the Following
  • Decide whether a particular relation R is in
    good form.
  • In the case that a relation R is not in good
    form, decompose it into a set of relations R1,
    R2, ,Rn such that
  • each relation is in good form
  • the decomposition is a lossless-join
    decomposition
  • Our theory is based on
  • functional dependencies
  • multivalued dependencies

7
Why are FDs involved?
  • We cant tell if a relation scheme is good or not
    without first knowing the functional
    dependencies.

Lending-schema(branch-name, branch-city, assets,
customer-name, loan-number, amount)
How do you know this scheme is not good?
Because you know the functional dependencies.
Try to name a few of them.
8
Normalization using Functional Dependencies
  • When we decompose a relation schema R with a set
    of functional dependencies F into R1 and R2 we
    want
  • Lossless-join decomposition At least one of the
    following dependencies is in F
  • R1 ? R2 ? R1
  • R1 ? R2 ? R2
  • No redundancy The relations R1 and R2 preferably
    should be in either Boyce-Codd Normal Form or
    Third Normal Form.
  • Dependency preservation Let Fi be the set of
    dependencies in F that include only attributes
    in Ri. Test to see if
  • (F1 ? F2 ) Fotherwise, checking updates for
    violation of functional dependencies is expensive.

The attributes with which you can join R1 and R2
is either a key of R1 or R2
9
Example
  • R (A, B, C) F A ? B, B ? C
  • R1 (A, B), R2 (B, C)
  • Lossless-join decomposition R1 ? R2 B and
    B ? R2(BC)
  • Dependency preservingF1 A ? B F2 B ? C ?
    (F1 ? F2 ) F
  • R1 (A, B), R2 (A, C)
  • Lossless-join decomposition R1 ? R2 A and
    A ? R1(AB)
  • Not dependency preserving F1 A ? B F2 ? ?
    B ? C and A ? C are lost (cannot check B ? C
    without computing R1 R2)

10
Boyce-Codd Normal Form
  • A relation schema R is in BCNF with respect to a
    set of F of functional dependencies if for all
    functional dependencies in F of the form ? ? ?,
    where ? ? R and ? ? R, at least one of the
    following holds
  • ? ? ? is trivial (i.e., ? ? ?)
  • ? is a superkey for R

11
Example
  • R (A, B, C)F A ? B B ? CKey A
  • R is not in BCNF. Why?
  • Decomposition R1 (A,B), R2 (B,C)
  • R1 and R2 in BCNF
  • Lossless-join decomposition
  • Dependency preserving

B ? C where B is not a superkey
12
BCNF Decomposition Algorithm
result R done false compute F while
(not done) do if (there is a schema Ri in
result that is not in BCNF) then begin let ? ?
? be a nontrivial functional dependency that
holds on Ri such that ? ? Ri is not in F,
and ? ? ? ? result (result - Ri) ?
(Ri - ?) ? (?, ?) end else donetrue
Remove ? from the original scheme and include a
new scheme R(??)
Each Ri is in BCNF, and decomposition is
lossless-join.
13
Example of BCNF Decomposition
  • R (branch-name, branch-city, assets, customer-n
    ame, loan-number, amount)F branch-name ?
    assets, branch-city loan-number ? amount,
    branch-nameKey loan-number, customer-name
  • Decomposition
  • First FD violates BCNF
  • R1 (branch-name, branch-city, assets)
  • R2 (branch-name, customer-name, loan-number,
    amount)
  • Second FD violates BCNF in R2
  • R3 (branch-name, loan-number, amount)
  • R4 (customer-name, loan-number)
  • Final decomposition R1, R3, R4

14
BCNF and Dependency Preservation
  • It is not always possible to get a BCNF
    decomposition that is dependency preserving.
  • R (J, K, L)F JK ? L L ? KTwo candidate
    keys JK and JL
  • R is not in BCNF decompose into R1(J, L) and
    R2(L,K)
  • Any decomposition of R will fail to preserve
    JK ? L

So, sometimes we need to step back to a weaker
requirement
15
Third Normal Form
  • A relation schema R is in third normal form (3NF)
    if for all ? ? ? in Fat least one of the
    following holds
  • ? ? ? is trivial (i.e., ? ? ?)
  • ? is superkey for R
  • Each attribute A in ? ? ? is contained in a
    candidate key of R.
  • If a relation is in BCNF it is in 3NF (since in
    BCNF one of the first two conditions above must
    hold).

16
Third Normal Form
  • Same example as in BCNF
  • R (J, K, L)F JK ? L, L ? K
  • Two candidate keys JK and JL
  • R is in 3NF JK ? L JK is a superkey L
    ? K K is contained in a candidate key
  • Algorithm to decompose a relation schema R into a
    set of relation schemas R1, R2,, Rn such that
  • each relation schema Ri is in 3NF
  • lossless-join decomposition
  • dependency preserving

17
3NF Decomposition Algorithm
  • Let Fc be a canonical cover for Fi 0for
    each functional dependency ? ? ? in Fc do if
    none of the schemas Rj, 1lt j lt i contains
    ?? then begin ii1 Rj ??
    endif none of the schemas Rj, 1lt j lt i
    contains a candidate key for R then
    begin ii1 Ri any candidate key for R
    end return (R1, R2, , Ri)

18
Example
  • Relation schemaBanker-info-schemabranch-name,
    customer-name, banker-name, office-number
  • The functional dependencies for this relation
    schema are banker-name ? branch-name,
    office-number customer-name, branch-name ?
    banker-name
  • The key is customer-name, branch-name

19
Applying 3NF to banker - info - schema
  • Go through the for loop in the algorithmbanker-
    name ? branch-name, office-numberis not in any
    decomposed relation (no decomposed relation so
    far) Create a new relationBanker-office-schema
    ( banker-name, branch-name, office-number
    )customer-name, branch-name ? banker-nameis
    not in any decomposed relation (one decomposed
    relation so far) Create a new relationBanker-sch
    ema ( customer-name, branch-name, banker-name )
  • Since Banker-schema contains a candidate key for
    Banker-info-schema, we are done with the
    decomposition process.

20
Comparison of BCNF and 3NF
  • It is always possible to decompose a relation
    into relations in 3NF and
  • the decomposition is lossless
  • dependencies are preserved
  • It is always possible to decompose a relation
    into relations in BCNF and
  • the decomposition is lossless
  • it may not be possible to preserve dependencies

21
Comparison of BCNF and 3NF
  • Prof-office ( Department, Prof-name , Room)
  • F Department, Prof-name ? Room Room ?
    Prof-name

A professor may be affiliated with more than one
department and assigned more than one
office. Prof-name -?-? Room Prof-name -?-?
Dept Nearby departments may give him the same
office, whereas department far away may give him
a different one. Room -?-? Dept Office wont be
shared Room ? Prof-name Each department will
assign only one office to a professor Department,
Prof-name ? Room
  • Key Department, Prof-name
  • The relation is in 3NF but not in BCNF.
  • repetition of Prof/room information
  • need to use null values if a Prof has a room but
    no department assigned

22
Problems with Decomposition
  • Algorithms need to identify all candidate keys
    and the canonical cover it is a rather difficult
    process
  • Decomposition algorithms are not deterministic.
    E.g., if there are several functional
    dependencies violating the normal form, the order
    of selecting the problem FD for decomposition may
    give different relation schemes
  • Algorithms may result in relation schemes which
    are not intuitive

23
Relational Database Design Review
  • Two approaches to DB design
  • 1) Design ER model, then translate to relation
    schemes
  • 2) Put every attribute together in one relation,
    identify all the functional dependencies, and
    then decompose into 3NF at least.
  • The first approach is more popular, but
    relational theory helps formalizing some concepts
    such as key (what does it mean by A key uniquely
    identifies the tuples?)
  • Identifying the FDs is part of the DB design
    process it helps you understand the requirements
    better.

24
An Example of Lossy Decomposition
It is clearly a bad decomposition since Sem is
not a foreign key of any table. How would you
decompose it???
25
An Example of Bad Relation Scheme
  • Project ( Emp-no, Proj-no, Emp-name, Hours )
  • From ER point of view, it is bad since it
    embodies an entity type and an NM relationship
    type
  • From a relational theory point of view, you know
  • Emp-no ? Emp-name
  • Emp-no, Proj-no ? Hours
  • Emp-no and Proj-no is the only key in Project
  • Project is not in 3NF, why? Decompose
  • Employee ( Emp-no, Emp-name )
  • Works-on ( Emp-no, Proj-no, Hours )

26
A Difficult Example for ER Approach
Toyota Camry, 2.2, Japan, 5600
  • Cars_all ( Make, Engine-size, Origin, Fee )
  • Difficult to see how many entities or
    relationships are there
  • From a relational theory point of view, you know
  • Make, Engine-size ? Origin
  • Engine-size ? Fee
  • Make, Engine-size is the only key in Cars_all
  • Cars_all is not in 3NF, why? Decompose
  • Cars ( Make, Engine-size, Origin )
  • License ( Engine-size, Fee )

27
Another Difficult Example
  • Cars2 ( Make, Engine-size, Plant)
  • From a relational theory point of view, you know
  • Make, Engine-size ? Plant
  • Plant ? Make (A plant makes the same engine of a
    given model)
  • Make, Engine-size is the only key in Cars2
  • Cars2 is in 3NF but not in BCNF, why? Decompose
  • Car_plant ( Make, Plant )
  • Car_engine( Make, Engine-size)

Question but what have we lost in BCNF?
28
First Normal Form
  • Atomicity is actually a property of how the
    elements of the domain are used.
  • E.g. Strings would normally be considered
    indivisible
  • Suppose that students are given roll numbers
    which are strings of the form CS0012 or EE1127
  • If the first two characters are extracted to find
    the department, the domain of roll numbers is not
    atomic.
  • Doing so is a bad idea leads to encoding of
    information in application program rather than in
    the database.
Write a Comment
User Comments (0)
About PowerShow.com