Normalization 1NF, 2NF, 3NF, BCNF - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Normalization 1NF, 2NF, 3NF, BCNF

Description:

Nonloss Decomposition and Functional Dependencies. First, Second, and Third Normal Forms ... key dependencies promote a miasma of update anomalies difficulties with ... – PowerPoint PPT presentation

Number of Views:5318
Avg rating:3.0/5.0
Slides: 38
Provided by: Richar502
Category:
Tags: 1nf | 2nf | 3nf | bcnf | miasma | normalization

less

Transcript and Presenter's Notes

Title: Normalization 1NF, 2NF, 3NF, BCNF


1
Normalization 1NF, 2NF, 3NF, BCNF
2
Outline
  • Introduction
  • Nonloss Decomposition and Functional Dependencies
  • First, Second, and Third Normal Forms
  • Dependency Preservation
  • Boyce/Codd Normal Form
  • A Note on Relation-Valued Attributes

3
Normalization
  • Normalized and 1 NF are the same thing typically
    normalized refers incorrectly to 3NF
  • Normalization helps control redundancy
  • Normalization is reversible i.e. nonloss, or
    information preserving
  • Six normal forms are discussed 1 through 5, and
    Boyce-Codd Normal Form (BCNF), which is an
    improvement on 3NF

4
Levels of Normalization
5
Nonloss Decomposition and Functional Dependencies
6
Overview
  • Nonloss decomposition the normalization
    procedure involves decomposing a given relvar
    into other relvars, and moreover that the
    decomposition is required to be reversible, so
    that no information is lost in the process
  • Example
  • Case a is an nonloss decomposition
  • If you join SST and SC back together again, you
    get back to S
  • Case b is a lossy decomposition
  • Some additional spurious tuples will appear

7
Sample value for relvar S and two corresponding
decompositions
8
Nonloss Decomposition and Functional Dependencies
  • Normalization uses a process of projection to
    decompose relvars
  • Re-composition is a process of joins
  • Heaths theorem Let RA, B, C be a relvar,
    where A, B, and C are sets of attributes. If R
    satisfies the FD A?B, then R is equal to the join
    of its projection on A, B and A, C
  • The decomposition of relvar R into projections
    R1Rn is nonloss if R the join of R1Rn

9
First, Second, and Third Normal Forms
  • Throughout this section, we assume for simplicity
    that each relvar has exactly one candidate key,
    which we further assume is the primary key

10
Third Normal Form
  • A relvar is in 3NF if and only if the nonkey
    attributes are both mutually independent and
    irreducibly dependent on the primary key
  • Two or more attributes are mutually independent
    if none of them is FD on any combination of the
    others. Such independence implies that each
    attribute can be updated independently of the
    rest
  • Example the parts relvar P is in 3NF
  • A relvar is in 3NF if and only if, for all time,
    each tuple consists of a primary key value that
    identifies some entity, together with a set of
    zero or more mutually independent attribute
    values that describe that entity in some way

11
First Normal Form
  • A relvar is in 1NF if and only if in every legal
    value of that relvar, every tuple contains
    exactly one value for each attribute
  • In this way, relvars are always in 1NF
  • A relvar in 1NF may display functional
    dependencies other than those emanating from the
    primary key
  • Such non-primary-key dependencies promote a
    miasma of update anomalies difficulties with
    the update operations INSERT, DELETE, and UPDATE

12
Example
  • FIRST S, STATUS, CITY, P, QTY PRIMARY KEY
    S, P
  • One more constraint CITY ? STATUS
  • A suppliers status is determined by the location
    of that supplier (e.g. all London suppliers must
    have a status of 20)
  • Non3NF diagram has arrows out of candidate keys
    together with certain additional arrows

13
Sample Value for relvar FIRST
Note relvar FIRST violates both conditions a and
b in the preliminary 3NF definition the nonkey
attributes are not all mutually independent, and
they are not all irreducible dependent on the
primary key
14
The Suppliers-and-Parts Database (Sample Values)
S (suppliers) and P (parts) are entities, and SP
(shipment) is a relationship between S and P (a
relationship is a special case of an entity) In
RDB, entities and relationships are also
represented in the same uniform way
15
Update Anomalies For FIRST
  • Focus on the FD S ? CITY
  • INSERT we cannot insert the fact that a
    particular supplier is located in a particular
    city until that supplier supplies at least one
    part (e.g. supplier S5 is located in Athens)
  • DELETE if we delete the sole FIRST tuple for a
    particular supplier, we delete not only the
    shipment concerning that supplier to a particular
    part but also the information that the supplier
    is located in a particular city (e.g. the S3
    tuple)
  • UPDATE the city value for a given supplier
    appears in FIRST many times, in general

16
Solution for FIRST
  • The real problem is that relvar FIRST contains
    too much information (shipment, and supplier) all
    bundled together
  • If we delete a tuple, we delete too much
  • The solution to this problem is to un-bundle
  • Replace relvar FIRST by the two relvars
  • SECOND S, STATUS, CITY
  • SP S, P, QTY
  • The decomposition of FIRST into SECOND and SP
    eliminates the dependencies that were not
    irreducible
  • The attribute CITY in FIRST did not describe the
    entity described by the primary key instead, it
    described the supplier involved in that shipment

17
Relvars SECOND and SP
18
Step Summarization
  • The first step in the normalization procedure is
    to take projections to eliminate
    non-irreducible FD
  • Before normalization
  • R A, B, C, D PRIMARY KEY A, B /
    assume A ? D holds /
  • After normalization
  • R1 A, D PRIMARY KEY A
  • R2 A, B, C PRIMARY KEY A, B
    FOREIGN KEY A REFERENCES R1

19
Second Normal Form
  • A relvar is in 2NF if and only if it is in 1NF
    and every nonkey attribute is irreducibly
    dependent on the primary key
  • Assumes only one candidate key
  • A relvar in 2NF is less susceptible to update
    anomalies, but may still exhibit transitive
    dependencies
  • Both attributes in a transitive dependency are
    irreducibly implied by the primary key, and each
    implies the other

20
Update Anomalies For SECOND
  • Focus on the FD CITY ? STATUS
  • S ? CITY CITY ? STATUS, then S ? STATUS
  • INSERT we cannot insert the fact that a
    particular city has a particular status until we
    have some suppliers actually located in that city
    (e.g. city ROME has a status of 50)
  • DELETE if we delete the sole SECOND tuple for a
    particular city, we delete not only the supplier
    concerned but also the information that that
    city has that particular status (e.g. the S5
    tuple)
  • UPDATE the status value for a given city appears
    in SECOND many times, in general

21
Solution for SECOND
  • The real problem is that relvar SECOND contains
    too much information (supplier, and city) all
    bundled together
  • If we delete a tuple, we delete too much
  • The solution to this problem is to un-bundle
  • Replace relvar SECOND by the two relvars
  • SC S, CITY
  • CS CITY, STATUS
  • The decomposition of SECOND into SC and CS
    eliminates the transitive dependencies
  • The attribute STATUS in SECOND did not describe
    the entity described by the primary key instead,
    it described the city involved in which that
    supplier happened to be located

22
Relvars SC and CS
23
Step Summarization
  • The second step in the normalization procedure is
    to take projections to eliminate transitive FD
  • Before normalization
  • R A, B, C PRIMARY KEY A / assume
    B ? C holds /
  • After normalization
  • R1 B, C PRIMARY KEY B
  • R2 A, B PRIMARY KEY A FOREIGN KEY
    B REFERENCES R1

24
Third Normal Form
  • A relvar is in 3NF if and only if it is in 2NF
    and every nonkey attribute is nontransitively
    dependent on the primary key
  • Assumes only one candidate key

25
Dependency Preservation
  • It is frequently the case that a given relvar can
    be nonloss-decomposed in a variety of different
    ways
  • Example
  • SC S, CITY CS CITY, STATUS
  • SC S, CITY SS S, STATUS
  • Dependency preservation refers to a specific case
    of nonloss decomposition, such that the
    normalized relvars are independent of each other
  • Updates can be made to either one without regard
    for others

Which decomposition is better ?
26
Dependency Preservation (Cont.)
  • Projections R1 and R2 of a relvar R are
    independent in the foregoing sense if and only if
    both of the following are true
  • Every FD in R is a logical consequence of those
    in R1 and R2
  • The common attributes of R1 and R2 form a
    candidate key for at least one of the pair
  • More on dependency preservation pp. 366-367

27
Boyce/Codd Normal Form
28
Overview
  • Codds original definition of 3NF did not
    adequately deal with the case of a relvar that
  • Had two or more candidate keys, such that
  • The candidate keys were composite, and
  • The are overlapped (i.e. had at least one
    attribute in common)
  • Definition of Boyce/Codd normal form
  • A relvar is in BCNF if and only if every
    nontrivial, left-irreducible FD has a candidate
    key as its determinant
  • A relvar is in BCNF if and only if every
    determinant is a candidate key (informal
    definition)
  • The only arrows in the FD diagram are arrows out
    of candidate keys

29
Overview (Cont.)
  • Examples
  • FIRST and SECOND, which were not in 3NF, are not
    in BCNF either
  • SP, SC, and CS, which were in 3NF, are also in
    BCNF
  • Another example involving two disjoint
    candidate keys
  • Suppose S and SNAME are both candidate keys, and
    STATUS and CITY are mutually independent ? BCNF

30
BCNF Example I
  • SSP S, SNAME, P, QTY
  • Candidate key S, P , SNAME, P
  • Non-BCNF determinants S and SNAME are not
    candidate keys
  • Suffer the same kind of problems as FIRST and
    SECOND did
  • SSP is in 3NF by the old definition (not the
    simplified version)
  • That definition did not require an attribute to
    be irreducibly dependent on each candidate key if
    it was itself a component of some candidate key
    of the relvar
  • The fact that SNAME is notirreducibly dependent
    on S, P was ignored

31
Solution for SSP
  • Break SSP into two projections
  • SS S, SNAME
  • SP S, P, QTY
  • OR
  • SS S, SNAME
  • SP SNAME, P, QTY

S
P
QTY
SNAME
P
32
BCNF Example II
  • SJT (student, subject, teacher)
  • The meaning of an SJT tuple (s, j, t) is that a
    student s is taught subject j by teacher t
  • Constraints
  • For each subject, each student of that subject is
    taught by only one teacher
  • Each teacher teaches only one subject (but one
    subject is taught by several teachers)

33
BCNF Example II (Cont.)
  • SJT (Cont.)
  • Two overlapping candidate keys, S, J and S,
    T
  • SJT is in 3NF and not BCNF
  • Attribute T is a determinant but not a candidate
    key
  • Solution for SJT break SJT into
  • ST S, T and TJ T, J

34
BCNF Example II (Cont.)
  • However, the two projections ST and TJ are not
    independent
  • FD S, J ? T cannot be deduced from the FD T?J
  • Example an attempt to insert a tuple for Smith
    and Prof. Brown into ST must be rejected Why ???
  • The twin objectives of (a) decomposing a relvar
    into BCNF components, and (b) decomposing it into
    independent components, can occasionally be in
    conflict

35
BCNF Example III
  • EXAM
  • The meaning of an EXAM tuple (s, j, p) is that a
    student s was examined in subject j and achieved
    position p
  • Constraint no two students obtained the same
    position in the same subject
  • Two overlapping candidate keys, S, J , and
    J, P
  • They are the only determinants
  • EXAM is in BCNF

36
Relation-Valued Attributes
  • A relation may include attributes whose values
    are relations
  • Traditionally this would be seen to violate 1NF,
    which was held to prohibit repeating groups
  • Now they are theoretically sound, but in practice
    you should avoid them because they have
    complicated predicates

GET S for suppliers who supply part P1 ( SPQ
WHERE TUPLE P P (P1) ? PQ P ) S
GET P for parts supplied by supplier S1 ( SPQ
WHERE S S (S1) ) UNGROUP PQ ) P
Matters are worse for update operations
37
Conclusions
  • The whole point of normalization theory is to try
    to identify commonsense principles and formalize
    them
  • In this way, we can mechanize those principles by
    an algorithm
  • BCNF and indeed 5NF is always achievable
Write a Comment
User Comments (0)
About PowerShow.com