Chapter 10. Database Design & Normalisation - Lecture Material - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Chapter 10. Database Design & Normalisation - Lecture Material

Description:

Chapter 10. Database Design & Normalisation - Lecture Material This chapter covers database design and normalisation. 10-1. Introduction to Database Design – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 42
Provided by: computin68
Category:

less

Transcript and Presenter's Notes

Title: Chapter 10. Database Design & Normalisation - Lecture Material


1
  • Chapter 10. Database Design Normalisation -
    Lecture Material
  • This chapter covers database design and
    normalisation.
  • 10-1. Introduction to Database Design
  • 10-2. 3NF, 2NF and 1NF
  • 10-3. BCNF
  • 10-3-1. BCNF Example 1
  • 10-3-2. BCNF Example 2
  • 10-3-3. BCNF Example 3
  • 10-3-4. BCNF Example 4
  • 10-4. 4NF
  • 10-5. 5NF
  • 10-6. Database Design
  • Sources Elmasri Navathe pp 391 - 445

2
  • 10-1. Introduction to Database Design
  • An important part of database design is deciding
    on a suitable logical structure or schema to
    implement ... called database design.
  • Considering supplier parts example (S,P,SP) there
    is a feeling of correctness.

3
Introduction to Database Design
  • Good database design means
  • Attributes with meaning
  • Limited redundant information
  • Limited NULL values
  • Disallow possibility of spurious tuples

4
  • 10-1. Introduction to Database Design
  • Normalisation theory is a formalism of simple
    ideas with a practical application in logical
    database schema design.
  • Normalisation theory should allow us to recognise
    relations with undesirable properties, tell us
    what is "wrong" how to "correct" it.

5
  • 10-1. Introduction to Database Design
  • Normalisation theory is built around normal forms
    - each normal form has a set of satisfiable
    criteria.
  • Normal forms exist in a hierarchy
  • 1NF? 2NF? 3NF? BCNF? 4NF? PJ/NF (5NF)
  • Codd defined 1NF, 2NF, 3NF in 1972.
  • 3NF had inadequacies so it was revised in 1974 by
    Boyce/Codd (BCNF).
  • 1977 Fagin defined 4NF, 1979 defined 5NF.
  • 6NF,7NF ?... dependencies theory suggests there
    may be higher NFs but not practicable in database
    environment.

6
  • 10-1. Introduction to Database Design
  • DB designers should aim for higher NFs but this
    is not law - just recommended as normalisation
    simply provides guidelines for database design.
  • There are often good reason for not using
    normalisation theory.
  • In order to describe the various normal forms we
    must first introduce some definitions

7
  • 10-1. Introduction to Database Design
  • Functional DependencyGiven relation R, attribute
    Y of R is functionally dependent on X of R, R.X
    -gt R.Y, or R.X functionally determines R.Y ...
  • ... iff each R.X value has associated with it
    precisely one R.Y value, where X and/or Y may be
    composite.
  • S.SNAME, S.STATUS and S.CITY are each
    functionally dependent on S.S
  • If R.X is a candidate key or if R.X is the
    primary key, then all R.Y must be functionally
    dependent on R.X
  • In SP we have a composite primary key so
  • SP.(S,P) ? SP.QTY

8
  • 10-1. Introduction to Database Design
  • There is no requirement in the definition of
    functional dependence that R.X be a candidate
    key, thus
  • R.X ? R.Y iff whenever 2 tuples of R.X are the
    same then the corresponding R.Y values are also
    the same.
  • R.Y is fully functionally dependent on R.X iff it
    is functionally dependent on R.X and not fully
    functionally dependent on any subset of R.X
  • S.(S,STATUS) ? S.CITY is true but not full
    functional dependence as S.S -gt S.CITY
  • If R.X ? R.Y but not fully then R.X must be
    composite

9
Normalisation Normal Forms
  • Define 3NF where we want to get to (1972)
  • Demonstrate how 3NF defn is broken and the
    problems with this 1NF
  • Fix problems by defining the FD and breaking up
    table 2NF
  • Problems with 2NF
  • Fix problems with 3NF

10
3NF, 2NF and 1NF
  • 1NF
  • Nearly assumed Disallow multi-valued or
    composite attributes
  • 2NF
  • Every non-key attribute is fully functionally
    dependent on the primary key
  • City ? Status

11
  • 10-2. 3NF, 2NF and 1NF
  • Definition 1 3NFA relation R is in 3NF iff the
    nonkey attributes of R are mutually independent
    and fully dependent on the primary key of R
  • Nonkey in this sense means not part of the
    primary key and mutually independent means none
    of the attributes are functionally dependent on
    any others.
  • P(P,PNAME,COLOUR,WEIGHT) is 3NF because we can
    change nonkey attributes independently and all
    are functionally dependent on P.

12
  • In order to show 2NF let us unite S and SP to
    get
  • FIRST(S,STATUS,CITY,P,QTY)
  • We also introduce a new constraint such that
    STATUS is functionally dependent on CITY, eg
  • London suppliers have status 10, alwaysParis
    suppliers have status 20, alwaysMunich suppliers
    have status 20, always etc ...
  • Primary key is (S,P) and the functional
    dependency diagram is ...

13
  • A relation in 3NF has arrows out of primary key
    only.
  • In FIRST, additional arrows cause trouble as the
    nonkey attributes are not mutually independent
    and not all attributes are dependent on the
    primary key.
  • What are the difficulties with this relation
    anyway ?
  • The problem with FIRST is that it stores
    redundant information which can lead to update
    anomalies as follows

14
  • INSERT Cannot insert the fact that a supplier
    exists until that supplier actually makes a
    shipment
  • DELETE Deleting the last tuple based on S,P
    could lose the information that S3 is located in
    CITY
  • UPDATE CITY values occur for each shipment thus
    an update of CITY is unnecessarily expensive.

15
  • One solution is to replace FIRST by
  • SECOND(S,STATUS,CITY) and SP(S,P,QTY)
  • This yields the following FD diagram

This is appealing as follows INSERT can enter
the fact that S5 is in ATHENS without S5 actually
having to make a shipment DELETE can delete
shipment tuples and not lose location
information UPDATE information appears once only
thus updating is more efficient
16
  • 10-2. 3NF, 2NF and 1NF
  • Definition of 2NFA relation R is in 2NF iff it
    is in 1NF and every nonkey attribute is fully
    dependent on the primary key.
  • A relation in 1NF and not in 2NF can be reduced
    to a collection of 2NF relations by replacing by
    an equivalent collection of projections.
  • Equivalence here means 1NF form can be obtained
    from the 2NF forms by taking the natural joins so
    the decomposition is non-loss and reversible.
  • What makes the 2NF form more desirable is that it
    can represent information which is not
    representable in the 1NF form
  • SECOND(S,STATUS,CITY) and SP(S,P,QTY)

17
  • 10-2. 3NF, 2NF and 1NF
  • The relation SECOND still has some problems
    however, it lacks mutual independence among the
    nonkey attributes.

18
  • 10-2. 3NF, 2NF and 1NF
  • The dependency of STATUS on S is fully
    functional but is transitive via CITY and this
    can lead to update anomalies as follows
  • INSERT cannot enter status of a supplier in Rome
    as 50 until we have some supplier in Rome
  • DELETE if we delete the only SECOND tuple of a
    city we destroy the status of that city
  • UPDATE STATUS of a CITY appears many times

19
  • 10-2. 3NF, 2NF and 1NF
  • So we replace SECOND by
  • SC(S,CITY) and CS(CITY,STATUS)
  • This gives the following FD diagram
  • In SECOND, STATUS did not describe the entity
    identified by the primary key (S).

20
  • 10-2. 3NF, 2NF and 1NF
  • Definition 4 3NFRelation R is in 3NF iff it is
    in 2NF and every nonkey attribute is
    non-transitively dependent on the primary key.
  • A relation in 2NF and not in 3NF can be
    (reversibly) non-loss reduced to an equivalent
    collection of 3NF relations able to contain
    information not representable in the 2NF form.

21
  • 10-2. 3NF, 2NF and 1NF
  • Normalisation is a semantic notion, not to do
    with the values that happen to exist so it is not
    possible to look at the values in a table and say
    it is in 3NF without knowing about dependencies
    and meaning of data.

22
  • 10-3. BCNF
  • 3NF has the following inadequacies in that it
    cannot handle cases of relations with
  • multiple candidate keys where
  • candidate keys are composite
  • candidate keys overlap
  • The above combination of events do not occur very
    often in practice, but they are not contrived and
    they do exist.
  • BCNF was defined to address the above and the
    definition of BCNF is stronger than that of 3NF.

23
  • 10-3. BCNF
  • A functional determinant is an attribute on which
    some other attribute is fully functionally
    dependent.
  • Definition of BCNFA relation R is in BCNF iff
    every determinant is a candidate key ... not just
    primary keys!
  • This is a simpler definition than 3NF with no
    references to 1NF or 2NF or transitive
    dependencies.

24
  • 10-3. BCNF
  • Now for some confusion ...
  • Textbooks sometimes differ in their definitions
    of 3NF and whether a relation in 3NF is also in
    BCNF !
  • The precise and exact definitions do not assume
    that each R has exactly one CK, the PK, as is
    done in most textbooks, and those definitions are
    as follows
  • 2NF 1NF and each non-prime attribute is FFD on
    each CK
  • 3NF 2NF and none of the non-prime attributes
    are transitively dependent on any CKs
  • Here non-prime is not part of any candidate key.
  • However many textbooks simplify the definitions
    by assuming that each R has one CK which is the
    PK !

25
  • 10-3. BCNF
  • Any given relation can be non-loss decomposed
    into an equivalent collection of BCNF relations.
  • FIRST,SECOND are not in BCNF
  • SP,SC,CS are in BCNF
  • Lets illustrate BCNF with a set of examples, some
    in BCNF, some not.

26
  • 10-3-1. BCNF Example 1
  • S(S, SNAME, STATUS, CITY)
  • S and SNAME are both candidate keys, i.e.
    numbers and names of suppliers are both unique.
  • STATUS and CITY are mutually independent, with
    the FD diagram ...
  • S is in BCNF as the only determinants are
    candidate keys.
  • In S, candidate keys are atomic and thus
    non-overlapping

27
  • 10-3-2. BCNF Example 2
  • SSP(S, SNAME, P, QTY)
  • Candidate keys are (S, P) and (SNAME, P), say
    1st is primary key with FD diagram ...
  • Not in BCNF as 2 determinants, S,SNAME are not
    candidate keys so the table will contain
    redundancies and have certain update anomalies.

28
  • 10-3-3. BCNF Example 3
  • SJT(S, J, T)
  • Here student S is taught subject J by teacher T
    with the following constraints
  • 1. For each subject each student is taught by
    only 1 teacher2. Each teacher teaches only 1
    subject3. Each subject taught by several
    teachers
  • This is a bit like secondary school, with the
    following FD diagram ...

29
  • Here we have two overlapping candidate keys (S,
    J) and (S, T) and SJT is in 3NF but it is not in
    BCNF so we could get update anomalies caused by T
    being a determinant but not a CK (Candidate Key).
  • Solution replace SJT by 2 projections
  • ST(S, T) and TJ(T, J)

30
  • 10-3-4. BCNF Example 4
  • EXAM(S, J, P)
  • Here student S was examined in subject J and
    achieved rank position P in the class with the
    constraint that there are no ties for positions.
  • This yields the following FD diagram ....
  • Here we have composite and overlapping candidate
    keys (S, J) and (J, P) but just because we have
    such a situation does not mean we need to
    normalise because EXAM is already in BCNF !

31
  • 10-4. 4NF
  • Consider the following ...
  • course (C) taught by one of a set of teachers (T)
  • for each course there is a repeating set of
    recommended textbooks (X)
  • for each course there may be any numbers of
    teachers and any numbers of recommended texts
  • teachers and texts are independent
  • teachers can be associated with any number of
    courses
  • This corresponds closely with a large secondary
    school with DoE recommended textbooks, teachers
    doubling up and many teachers.
  • We could "flatten" this information into a 1NF
    relation called CTX.

32
  • 10-4. 4NF
  • There are no FDs in data so no basis for
    decomposition but there is still some redundancy
    in CTX.
  • If (C1, T1, X1) and (C1, T2, X2) then there must
    also be the following tuples present ... (C1, T1,
    X2) and (C1, T2, X1) !
  • This is redundancy and thus we can have update
    anomalies.
  • For example, if we add (Geography, Ryan, Holland)
    and (Geography, Scott, Gaines) then we must also
    add (Geography, Ryan, Gaines) and (Geography,
    Scott, Holland).
  • Examining the criteria for normal forms however
    we find CTX is (trivially) in BCNF as the 3
    attributes make up the sole CK !

33
  • It would be desirable to decompose CTX into
  • CT(Course, Teacher) and CX(Course, Text)
  • Both of these are in BCNF as both are "all key".

So CTX would be represented as CV
CT
CX
34
  • 10-4. 4NF
  • This decomposition is based on Fagin's
    multi-valued dependencies (MVDs).
  • course -gt-gt teachercourse -gt-gt text
  • A course does not have a single corresponding
    teacher, it has a well-defined set of teachers
    and for a course c and text x the set of teachers
    depends on the value of c, independent of x.
  • Definition of 4NFA relation R is in 4NF if it is
    in BCNF and all MVDs are FDs.
  • CTX is not in 4NF CT, CX are in 4NF 4NF is
    more desirable as it eliminates redundancies

35
4NF
  • For R with attributes A, B C (which may be
    composite !)
  • R.A -gt-gt R.B
  • iff the set of R.B values match (Avalue, Cvalue)
    in R and this depends on A, independent of C
  • R must have at least 3 attributes and for R(A, B,
    C) then R.A -gt-gt R.B holds iff
  • R.A -gt-gt R.C also holds
  • MVDs always go in pairs
  • If R.A -gt-gt R.B R.C then R can be non-loss
    decomposed into R1(A, B) and R2(A, C)

36
  • 10-5. 5NF
  • Some relations cannot be non-loss decomposed by
    projection into 2 relations but can be composed
    by projection into 3 relations ... called
    n-decomposable for n gt 2.
  • 5NF is more theoretical than real ... anyway here
    is an example
  • SPJ is a relation about suppliers, parts and
    projects.
  • 1. Smith supplies wrenches2. wrenches are used
    in Block23. Smith supplies Block2
  • If 1,2 3 hold then Smith supplies wrenches to
    Block2 also holds as true.

37
5NF
  • Normally this implication does not hold, but if
    it does we call it a JOIN dependency and SPJ is a
    JD over (SP,PJ,JS) and should be decomposed into
    3 relations yielding three relations, all in 5NF.
  • SPJ is not in 5NF because it has a join
    dependency but discovering such JDs is not easy
    ... this is because FDs and even MVDs have a
    straightforward real-world interpretation whereas
    a JD does not.
  • If R is in 5NF then it is also in 4NF.

38
  • 10-6. Database Design
  • Database design is all about designing a schema
    of tables which captures all information needs
    from the portion of the real world being
    modelled, in such a way that no unnecessary
    redundancies are stored which could lead to
    update anomalies.
  • Reasoning about the normal forms of tables in a
    schema helps us determine if update anomalies can
    occur in theory.
  • Database design is a give-and-take task, fluid,
    revised continuously as users needs change and
    the information being modelled changes.
  • A database design is never complete, it is always
    evolving.
  • The task of database design is separate but
    related to the task of the DBA.

39
  • 10-6. Database Design
  • If the design of a database has commenced with
    the construction of an E-R diagram, then this can
    be used to determine a first version of the
    relational schema, but only a first version.
  • Turning E-R entities into relational tables is
    easy
  • Turning E-R 11 relationships into tables is also
    easy by storing the PK of one entity as an FK
    attribute of the other ... which one to embed in
    the other affects performance of queries, choice
    of database designer
  • Turning E-R 1N relationships into tables is done
    by placing the PK of the relation representing
    the parent entity as a FK in the relation
    representing the child entity ... unlike 11, it
    does matter which is FK embedded in the other.
  • Turning E-R MN relationships into tables is done
    by creating an additional table, an intersection
    relation, to represent the relationship itself
    ... i.e. decompose the MN into two 1N
    relationships. The PK of the new relation is the
    combination of PKs of its "parents".
  • Representing recursive relationships, which can
    be 11, 1N or NM, is done by embedding key for
    one in itself (11 and 1N) or creating an
    additional table (NM) ... so the fact that it is
    recursive is actually not important.

40
Database Design
  • Having gone through the effort of an E-R
    modelling exercise and then the generation of a
    first approximation at a database schema, this
    first version may then be refined or
    de-normalised.

41
  • 10-6. Database Design
  • Given R in 1NF and FDs, MVDs and JDs, we
    systematically reduce R to a collection of
    smaller relations which are "more desirable", by
    taking projections in order to eliminate
    redundancy and the possibility of update
    anomalies.
  • But these are guidelines only and don't always
    have to be followed often we want to
    de-normalise a database design.
  • NADDR(name, street, city, state, zip)
  • Name is PK and all others are FDs on name ...
    assume also that zip -gt (city, state), i.e. city
    names are unique within a state ...

42
  • NADDR is in 2NF and normalisation guidelines say
    to decompose into nsz(name, street, zip) and
    zcs(zip, city, state).
  • Both of these would be in 3NF but this is
    impractical in practice.
  • The most regular use of the data is to get access
    to all field values in one go and having to join
    nsz and zcs for every address lookup is
    inefficient ... so ... de-normalise.
Write a Comment
User Comments (0)
About PowerShow.com