Title: Normalization of Database Tables
1Chapter 3.6-7
- Normalization of Database Tables
2Normalization
- Normalization is process for assigning attributes
to entities - Reduces data redundancies
- Helps eliminate data anomalies
- Produces controlled redundancies to link tables
- Normalization stages
- 1NF - First normal form
- 2NF - Second normal form
- 3NF - Third normal form
- 4NF - Fourth normal form
3Example
Starts-in
Stars
Movies
Owns
Studios
4Problem Example
Movies
- Update anomalies If Harrison Fords phone
changes, must change it in each of his tuples. If
Length value of Star Wars needs to be changed,
must change all occurrences - Deletion anomalies If we delete Waynes World
entries from database, we also loose all info
about Dana Carvey Mike Meyers
5Conversion to 1NF
- Repeating groups must be eliminated
- Proper primary key developed
- Uniquely identifies each tuple
- Dependencies can be identified
- undesirable dependencies allowed
- Partial
- based on part of composite primary key
- Transitive
- one nonprime attribute depends on another
nonprime attribute
6Conversion to 1NF Cont.
- An attribute that is at least part of a key is
known as a prime attribute or key attribute or
primary key.
7Example
- Projects assigned to employees
- Each project has a number and a name
- Each employee has a number, a name, a job class
- Each employee working on a project, need to keep
number of hours spent on project, and hourly
rate. - Project Assignments Table
- ( PROJ_NUM, PROJ_NAME, EMP_NUM, EMP_NAME,
JOB_CLASS, CHG_HOUR, HOURS) - Whats the Key for this relation?
8Data Organization 1NF
9Dependency Diagram (1NF)
PROJ_ NUM, EMP_NUM --gt PROJ_NAME, EMP_NAME,
JOB_CLASS, CHG_HOUR, HOURS PROJ_NUM --gt
PROJ_NAME EMP_NUM--gtEMP_NAME, JOB_CLASS,
CHG_HOURS JOB_CLASS --gt CHG_HOUR
101NF Summarized
- All key attributes defined
- Primary Key identified
- No repeating groups in table
- All attributes dependent on
- primary key
112NF Summarized
- In 1NF, but
- Includes no partial dependencies
- Partial dependency
- An attribute is functionally dependent on a
portion of the primary key. - Example
- PROJ_NUM ? PROJ_NAME
- EMP_NUM--gtEMP_NAME, JOB_CLASS, CHG_HOURS
12Conversion to 2NF
- Start with 1NF format
- Write each key component on a separate line
- Write dependent attributes after each key
component - Write original key on last line
- Write any remaining attributes after original key
- Each component is new table
PROJECT (PROJ_NUM, PROJ_NAME) EMPLOYEE (EMP_NUM,
EMP_NAME, JOB_CLASS, CHG_HOUR) ASSIGN (PROJ_NUM,
EMP_NUM, HOURS)
132NF Conversion Results
142NF Summarized
- In 1NF
- Includes no partial dependencies
- Still possible to exhibit transitive dependency
- Attributes may be functionally dependent on
non-key attributes
15Conversion to 3NF
- decompose table(s) to eliminate transitive
functional dependencies
PROJECT (PROJ_NUM, PROJ_NAME) ASSIGN (PROJ_NUM,
EMP_NUM, HOURS) EMPLOYEE (EMP_NUM, EMP_NAME,
JOB_CLASS) JOB (JOB_CLASS, CHG_HOUR)
163NF Summarized
- In 2NF
- Contains no transitive dependencies
17Additional DB Enhancements
- Associate Manager with each project
- Job_Code is more concise and less error prone
that job class - Use naming convention to name attributes e.g.
job_code, job_description, Job_chg_hour etc.
18Boyce-Codd Normal Form (BCNF)
- Formally, R is in BCNF if for every nontrivial FD
for R, say X ? A, X is a superkey. - Nontrivial right-side attribute not in left
side. - Trivial FDs examples
- A?A
- AB?A
- Informally the only arrows in the FD diagram are
arrows out of superkeys - Note BCNF violation when relation has more than
one superkey that overlap
193NF Table Not in BCNF
What normal form?
20Decomposition of Table Structure to Meet BCNF
21Decomposition to Reach BCNF
- Setting relation R with FDs F.
- Suppose relation R has BCNF violation X ? B and X
not a superkey.
22- 1. Compute X.
- Cannot be all attributes why?
- 2. Decompose R into X and (RX) ? X.
- 3. Find the FDs for the decomposed relations.
- Project the FDs from F calculate all
consequents of F that involve only attributes
from X or only from (R?X) ? X.
R
X
X
23Decomposition to Reach BCNF
- Identify the violating FD
- E.g. X1X2Xn ? B1B2Bm
- Add to the right-hand side of FD as many
attributes as are functionally determined by
(X1X2Xn) - Decompose relation R into two relations
- One relation has all attributes Xs Bs
- Second relation has the Xs plus any other
remaining attributes from R other than Bs
24BCNF--Example
- Assume R(S, J, T)
- S Student
- J subject
- T Teacher
- Student S is taught subject J by teacher T.
- Constraints
- For each subject, each student of that subject is
taught by only one teacher - Each teacher teaches only one subject (but each
subject is taught by several teachers)
25BCNF--Example
S J T
Smith Math Prof. White
Smith Physics Prof. Green
Jane Math Prof. White
Jane Physics Prof. Brown
- Functional Dependencies
- S , J ? T
- T ? J
26BCNF--Example
- Candidate keys S, J and S, T
- 3NF but not in BCNF
- Update anomaly if we delete the info that Jane
is studying Physics we also loose the info that
Prof. Brown teaches Physics - Solution two relations R1S, T R2T, J
S
T
J
27Decomposition Based on BCNF is Necessarily
Correct
Attributes A, B, C. FD B ?
C Relations R1A,B R2B, C Tuples
in R (a, b, c) Tuples in R1 (a, b) Tuples in
R2 (b, c) Natural join of R1 and R2 (a, b,
c) ? original relation R can be reconstructed
by forming the natural join of R1 and R2.
28Decomposition Based on BCNF is Necessarily
Correct
Attributes A, B, C. FD B ?
C Relations R1A,B R2B, C Tuples
in R (a, b, c) , (d, b, e) Tuples in R1 (a,
b), (d, b) Tuples in R2 (b, c), (b, e) Tuples
in the natural join of R1 and R2 (a,b,c),
(a,b, e) (d, b, c), (d, b, e) Can (a,b,e), (d,
b, c) be a bogus tuples?
29Decomposition Based on BCNF is Necessarily Correct
- Answer No
- Because B ? C i.e. if 2 tuples have same B
attribute then they must have the same C
attribute. ? (b,c) (b,e) - ? (a, b,e) (a, b,c) and (d, b, c) (d, b, e)
30Theorem
- Any two-attribute relation is in BCNF.
31Decomposition Theorem
- Suppose we decompose a relation R(X, Y, Z) into
R1(X, Y) and R2(X,Z) and project the R onto R1
and R2. - Then join(R1, R2) is guaranteed to reconstruct R
if and only if X?Y or X?Z - Notice that whenever we decompose because of a
BNCF violation, one of the above FDs holds.
323NF
- One FD structure causes problems in BCNF
- If you decompose, you cant recover all of the
original FDs. - If you dont decompose, you violate BCNF.
- Abstractly AB ? C and C ? B.
- Example street city ? zip, and zip ? city.
- BCNF violation C ? B has a left side that is not
a superkey. - Based on previous algorithm, decompose into BC
and AC. - But the FD AB ? C does not hold in new tables.
33Example
zip ? city BCNF violation
Decompose
It is a bad idea to decompose relation because
you loose the ability to check the dependency
street city ? zip
34Example
zip ? city
Decompose
It is a bad idea to decompose relation because
you loose the ability to check the dependency
street city ? zip
35Elegant Workaround
- Define the problem away.
- A relation R is in 3NF iff (if and only if)for
every nontrivial FD X ? A, either - 1. X is a superkey, or
- 2. A is prime member of at least one key.
- Thus, if we just normalize to the 3NF, the
problem goes away.
36What 3NF and BCNF Give You
- There are two important properties of a
decomposition - Recovery it should be possible to project the
original relations onto the decomposed schema,
and then reconstruct the original. - Dependency Preservation it should be possible
to check in the projected relations whether all
the given FDs are satisfied.
373NF and BCNF, Continued
- We can get (1) with a BCNF decomposition.
- We can get both (1) and (2) with a 3NF
decomposition. - But we cant always get (1) and (2) with a BCNF
decomposition. - street-city-zip is an example.
38Mutli-valued Dependencies
39Definition of MVD
- A multivalued dependency is an assertion that two
attributes (sets of attributes) are independent
of one another. - Formally A multivalued dependency (MVD) on R, X
-gt-gtY , says that if two tuples of R agree on
all the attributes of X, then their components in
Y may be swapped, and the result will be two
tuples that are also in the relation.
40Example
- Actors(name, addr, phones, cars) with MVD
- Name ?? phones.
- name addr phones cars
- sue a p1 b1
- sue a p2 b2
- it must also have the same tuples with phones
components swapped - name addr phones cars
- sue a p2 b1
- sue a p1 b2
- Note we must check this condition for all pairs
of tuples that agree on name, not just one pair.
41Example 2
- An actor may have more than one address
- Key? What normal form?
- Note the redundancies
- MVD name ?? street city
- read name determines 1 or more street city
independent of all other attributes
42MVD Rules
- 1. Every FD is an MVD.
- Because if X ?Y, then swapping Ys between tuples
that agree on X doesnt create new tuples. - Example, in Actors name ?? addr.
- Note the opposite is not true i.e. not every MVD
is a FD - 2. Complementation if X ?? Y, then X ?? Z, where
Z is all attributes not in X or Y. - Example since name ?? phones holds in Actors,
the name ?? addr cars.
43Splitting Doesnt Hold
- name ?? street city holds, but
- name ?? street does not hold
- Name does not determine 1 or more street
independent of city. - name ?? city does not hold
44Example 2
- An actor may have more than one address
- MVD name ?? street city
- read name determines 1 or more street city
independent of all other attributes - Also (complement MVD) name ?? title year
45Fourth Normal Form
- The redundancy that comes from MVDs is not
removable by putting the database schema in BCNF. - There is a stronger normal form, called 4NF, that
(intuitively) treats MVDs as FDs when it comes
to decomposition, but not when determining keys
of the relation.
464NF
- Eliminate redundancy due to multiplicative effect
of MVDs. - Roughly treat MVDs as FD's for decomposition,
but not for finding keys. - Formally R is in Fourth Normal Form if whenever
MVDX ?? Y is nontrivial (Y is not a subset of X,
and X ? Y is not all attributes), then X is a
superkey. - Remember, X ? Y implies X ?? Y, so 4NF is more
stringentthan BCNF. - Decompose R, using4NF violation X ?? Y,into XY
and X ? (RY).
R
Y
X
47Example
- Drinkers(name, addr, phones, cars)
- FD name ? addr
- Nontrivial MVDs name ?? phones
- name ?? cars.
- Only key name, phones, cars
- All three dependencies above violate 4NF. Why?
- Successive decomposition yields 4NF relations
- D1(name, addr)
- D2(name, phones)
- D3(name, cars)
48Example 2
- name ?? street city
- Decompose into
- R1(name, street, city)
- R2(name, title, year)