Normalization of Database Tables - PowerPoint PPT Presentation

About This Presentation
Title:

Normalization of Database Tables

Description:

Chapter 3.6-7 Normalization of Database Tables – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 48
Provided by: Arthu163
Learn more at: http://cms.dt.uh.edu
Category:

less

Transcript and Presenter's Notes

Title: Normalization of Database Tables


1
Chapter 3.6-7
  • Normalization of Database Tables

2
Normalization
  • Normalization is process for assigning attributes
    to entities
  • Reduces data redundancies
  • Helps eliminate data anomalies
  • Produces controlled redundancies to link tables
  • Normalization stages
  • 1NF - First normal form
  • 2NF - Second normal form
  • 3NF - Third normal form
  • 4NF - Fourth normal form

3
Example
Starts-in
Stars
Movies
Owns
Studios
4
Problem Example
Movies
  • Update anomalies If Harrison Fords phone
    changes, must change it in each of his tuples. If
    Length value of Star Wars needs to be changed,
    must change all occurrences
  • Deletion anomalies If we delete Waynes World
    entries from database, we also loose all info
    about Dana Carvey Mike Meyers

5
Conversion to 1NF
  • Repeating groups must be eliminated
  • Proper primary key developed
  • Uniquely identifies each tuple
  • Dependencies can be identified
  • undesirable dependencies allowed
  • Partial
  • based on part of composite primary key
  • Transitive
  • one nonprime attribute depends on another
    nonprime attribute

6
Conversion to 1NF Cont.
  • An attribute that is at least part of a key is
    known as a prime attribute or key attribute or
    primary key.

7
Example
  • Projects assigned to employees
  • Each project has a number and a name
  • Each employee has a number, a name, a job class
  • Each employee working on a project, need to keep
    number of hours spent on project, and hourly
    rate.
  • Project Assignments Table
  • ( PROJ_NUM, PROJ_NAME, EMP_NUM, EMP_NAME,
    JOB_CLASS, CHG_HOUR, HOURS)
  • Whats the Key for this relation?

8
Data Organization 1NF
9
Dependency Diagram (1NF)
PROJ_ NUM, EMP_NUM --gt PROJ_NAME, EMP_NAME,
JOB_CLASS, CHG_HOUR, HOURS PROJ_NUM --gt
PROJ_NAME EMP_NUM--gtEMP_NAME, JOB_CLASS,
CHG_HOURS JOB_CLASS --gt CHG_HOUR
10
1NF Summarized
  • All key attributes defined
  • Primary Key identified
  • No repeating groups in table
  • All attributes dependent on
  • primary key

11
2NF Summarized
  • In 1NF, but
  • Includes no partial dependencies
  • Partial dependency
  • An attribute is functionally dependent on a
    portion of the primary key.
  • Example
  • PROJ_NUM ? PROJ_NAME
  • EMP_NUM--gtEMP_NAME, JOB_CLASS, CHG_HOURS

12
Conversion to 2NF
  1. Start with 1NF format
  2. Write each key component on a separate line
  3. Write dependent attributes after each key
    component
  4. Write original key on last line
  5. Write any remaining attributes after original key
  6. Each component is new table

PROJECT (PROJ_NUM, PROJ_NAME) EMPLOYEE (EMP_NUM,
EMP_NAME, JOB_CLASS, CHG_HOUR) ASSIGN (PROJ_NUM,
EMP_NUM, HOURS)
13
2NF Conversion Results
14
2NF Summarized
  • In 1NF
  • Includes no partial dependencies
  • Still possible to exhibit transitive dependency
  • Attributes may be functionally dependent on
    non-key attributes

15
Conversion to 3NF
  • decompose table(s) to eliminate transitive
    functional dependencies

PROJECT (PROJ_NUM, PROJ_NAME) ASSIGN (PROJ_NUM,
EMP_NUM, HOURS) EMPLOYEE (EMP_NUM, EMP_NAME,
JOB_CLASS) JOB (JOB_CLASS, CHG_HOUR)
16
3NF Summarized
  • In 2NF
  • Contains no transitive dependencies

17
Additional DB Enhancements
  • Associate Manager with each project
  • Job_Code is more concise and less error prone
    that job class
  • Use naming convention to name attributes e.g.
    job_code, job_description, Job_chg_hour etc.

18
Boyce-Codd Normal Form (BCNF)
  • Formally, R is in BCNF if for every nontrivial FD
    for R, say X ? A, X is a superkey.
  • Nontrivial right-side attribute not in left
    side.
  • Trivial FDs examples
  • A?A
  • AB?A
  • Informally the only arrows in the FD diagram are
    arrows out of superkeys
  • Note BCNF violation when relation has more than
    one superkey that overlap

19
3NF Table Not in BCNF
What normal form?
20
Decomposition of Table Structure to Meet BCNF
21
Decomposition to Reach BCNF
  • Setting relation R with FDs F.
  • Suppose relation R has BCNF violation X ? B and X
    not a superkey.

22
  • 1. Compute X.
  • Cannot be all attributes why?
  • 2. Decompose R into X and (RX) ? X.
  • 3. Find the FDs for the decomposed relations.
  • Project the FDs from F calculate all
    consequents of F that involve only attributes
    from X or only from (R?X) ? X.

R
X
X
23
Decomposition to Reach BCNF
  • Identify the violating FD
  • E.g. X1X2Xn ? B1B2Bm
  • Add to the right-hand side of FD as many
    attributes as are functionally determined by
    (X1X2Xn)
  • Decompose relation R into two relations
  • One relation has all attributes Xs Bs
  • Second relation has the Xs plus any other
    remaining attributes from R other than Bs

24
BCNF--Example
  • Assume R(S, J, T)
  • S Student
  • J subject
  • T Teacher
  • Student S is taught subject J by teacher T.
  • Constraints
  • For each subject, each student of that subject is
    taught by only one teacher
  • Each teacher teaches only one subject (but each
    subject is taught by several teachers)

25
BCNF--Example
S J T
Smith Math Prof. White
Smith Physics Prof. Green
Jane Math Prof. White
Jane Physics Prof. Brown
  • Functional Dependencies
  • S , J ? T
  • T ? J

26
BCNF--Example
  • Candidate keys S, J and S, T
  • 3NF but not in BCNF
  • Update anomaly if we delete the info that Jane
    is studying Physics we also loose the info that
    Prof. Brown teaches Physics
  • Solution two relations R1S, T R2T, J

S
T
J
27
Decomposition Based on BCNF is Necessarily
Correct
Attributes A, B, C. FD B ?
C Relations R1A,B R2B, C Tuples
in R (a, b, c) Tuples in R1 (a, b) Tuples in
R2 (b, c) Natural join of R1 and R2 (a, b,
c) ? original relation R can be reconstructed
by forming the natural join of R1 and R2.
28
Decomposition Based on BCNF is Necessarily
Correct
Attributes A, B, C. FD B ?
C Relations R1A,B R2B, C Tuples
in R (a, b, c) , (d, b, e) Tuples in R1 (a,
b), (d, b) Tuples in R2 (b, c), (b, e) Tuples
in the natural join of R1 and R2 (a,b,c),
(a,b, e) (d, b, c), (d, b, e) Can (a,b,e), (d,
b, c) be a bogus tuples?
29
Decomposition Based on BCNF is Necessarily Correct
  • Answer No
  • Because B ? C i.e. if 2 tuples have same B
    attribute then they must have the same C
    attribute. ? (b,c) (b,e)
  • ? (a, b,e) (a, b,c) and (d, b, c) (d, b, e)

30
Theorem
  • Any two-attribute relation is in BCNF.

31
Decomposition Theorem
  • Suppose we decompose a relation R(X, Y, Z) into
    R1(X, Y) and R2(X,Z) and project the R onto R1
    and R2.
  • Then join(R1, R2) is guaranteed to reconstruct R
    if and only if X?Y or X?Z
  • Notice that whenever we decompose because of a
    BNCF violation, one of the above FDs holds.

32
3NF
  • One FD structure causes problems in BCNF
  • If you decompose, you cant recover all of the
    original FDs.
  • If you dont decompose, you violate BCNF.
  • Abstractly AB ? C and C ? B.
  • Example street city ? zip, and zip ? city.
  • BCNF violation C ? B has a left side that is not
    a superkey.
  • Based on previous algorithm, decompose into BC
    and AC.
  • But the FD AB ? C does not hold in new tables.

33
Example
  • A street, B city, C zip.

zip ? city BCNF violation
Decompose
It is a bad idea to decompose relation because
you loose the ability to check the dependency
street city ? zip
34
Example
zip ? city
Decompose
It is a bad idea to decompose relation because
you loose the ability to check the dependency
street city ? zip
35
Elegant Workaround
  • Define the problem away.
  • A relation R is in 3NF iff (if and only if)for
    every nontrivial FD X ? A, either
  • 1. X is a superkey, or
  • 2. A is prime member of at least one key.
  • Thus, if we just normalize to the 3NF, the
    problem goes away.

36
What 3NF and BCNF Give You
  • There are two important properties of a
    decomposition
  • Recovery it should be possible to project the
    original relations onto the decomposed schema,
    and then reconstruct the original.
  • Dependency Preservation it should be possible
    to check in the projected relations whether all
    the given FDs are satisfied.

37
3NF and BCNF, Continued
  • We can get (1) with a BCNF decomposition.
  • We can get both (1) and (2) with a 3NF
    decomposition.
  • But we cant always get (1) and (2) with a BCNF
    decomposition.
  • street-city-zip is an example.

38
Mutli-valued Dependencies
  • Fourth Normal Form

39
Definition of MVD
  • A multivalued dependency is an assertion that two
    attributes (sets of attributes) are independent
    of one another.
  • Formally A multivalued dependency (MVD) on R, X
    -gt-gtY , says that if two tuples of R agree on
    all the attributes of X, then their components in
    Y may be swapped, and the result will be two
    tuples that are also in the relation.

40
Example
  • Actors(name, addr, phones, cars) with MVD
  • Name ?? phones.
  • name addr phones cars
  • sue a p1 b1
  • sue a p2 b2
  • it must also have the same tuples with phones
    components swapped
  • name addr phones cars
  • sue a p2 b1
  • sue a p1 b2
  • Note we must check this condition for all pairs
    of tuples that agree on name, not just one pair.

41
Example 2
  • An actor may have more than one address
  • Key? What normal form?
  • Note the redundancies
  • MVD name ?? street city
  • read name determines 1 or more street city
    independent of all other attributes

42
MVD Rules
  • 1. Every FD is an MVD.
  • Because if X ?Y, then swapping Ys between tuples
    that agree on X doesnt create new tuples.
  • Example, in Actors name ?? addr.
  • Note the opposite is not true i.e. not every MVD
    is a FD
  • 2. Complementation if X ?? Y, then X ?? Z, where
    Z is all attributes not in X or Y.
  • Example since name ?? phones holds in Actors,
    the name ?? addr cars.

43
Splitting Doesnt Hold
  • name ?? street city holds, but
  • name ?? street does not hold
  • Name does not determine 1 or more street
    independent of city.
  • name ?? city does not hold

44
Example 2
  • An actor may have more than one address
  • MVD name ?? street city
  • read name determines 1 or more street city
    independent of all other attributes
  • Also (complement MVD) name ?? title year

45
Fourth Normal Form
  • The redundancy that comes from MVDs is not
    removable by putting the database schema in BCNF.
  • There is a stronger normal form, called 4NF, that
    (intuitively) treats MVDs as FDs when it comes
    to decomposition, but not when determining keys
    of the relation.

46
4NF
  • Eliminate redundancy due to multiplicative effect
    of MVDs.
  • Roughly treat MVDs as FD's for decomposition,
    but not for finding keys.
  • Formally R is in Fourth Normal Form if whenever
    MVDX ?? Y is nontrivial (Y is not a subset of X,
    and X ? Y is not all attributes), then X is a
    superkey.
  • Remember, X ? Y implies X ?? Y, so 4NF is more
    stringentthan BCNF.
  • Decompose R, using4NF violation X ?? Y,into XY
    and X ? (RY).

R
Y
X
47
Example
  • Drinkers(name, addr, phones, cars)
  • FD name ? addr
  • Nontrivial MVDs name ?? phones
  • name ?? cars.
  • Only key name, phones, cars
  • All three dependencies above violate 4NF. Why?
  • Successive decomposition yields 4NF relations
  • D1(name, addr)
  • D2(name, phones)
  • D3(name, cars)

48
Example 2
  • name ?? street city
  • Decompose into
  • R1(name, street, city)
  • R2(name, title, year)
Write a Comment
User Comments (0)
About PowerShow.com