Title: Database Management
1Database Management Fall 2003 Normalization C
hapter 8
2Why normalization?
o An old approach. (Some history ) o There
are people out there who still talk and
think in these terms and so your knowledge
of normalization may be taken as one
measure of your preparedness in the database
area. o Normalization is part of the theory
behind relational databases. Understanding it
(and other aspects of the theory) will give
you a firmer grasp of the entire
topic. o Normalization remains a useful form
of cross-checking, when reviewing data models
for completeness and consistency. However,
3If the principles of data modeling are followed
faithfully, then the outcome should be a
high-fidelity model and a normalized database.
In other words, if you model data correctly, you
create a normalized design. (Watson, p. 206)
4Functional dependency
- Core Concept
- A relationship between attributes in an entity
- One or more attributes determine the value of
another attribute - An identifier functionally determines all the
attributes of an entity - stock code ? firm name, stock price, stock
quantity, stock dividend - If we know stock code we know the value of firm
name, etc. - Multivalued dependency
- Formulae
- (stock dividend, stock price) ? yield
5Full functional dependency
- Yield is fully functionally dependent on stock
dividend and stock price because both of these
attributes are required to determine the value of
yield - (stock dividend, stock price) ? yield
- Determinant
- An attribute that fully functionally determines
another attribute - e.g., stock code determines stock PE
- Think identifier (primary key)
6In short,
it is by understanding the relationships betwe
en data elements i.e., attributes that we
determine functional dependency. (Watson, p. 207)
Accordingly, normalization is something we do at
the attribute level, in order to help us confirm
that our entities, their lists of attributes, and
their inter-relationships are properly drawn.
7The normal forms an overview
Normalizing a relation (entity, table) means
getting it into normal form which entails o
first normal form (1NF) o second normal form
(2NF) o third normal form (3NF) o Boyce-Codd
normal form (BCNF) o fourth normal form (4NF) o
fifth normal form (5NF) o domain/key normal
form (DK/NF)
The normal forms are nested. That is, if its
in 2NF, its also in 1NF. If its in 3NF, then
its in both 2NF and 1NF. And so on
8Normal Forms are nested
1st NF
2nd NF
3rd NF
BC NF
4TH NF
5TH NF
9First normal form (1NF)
- A relation is in first normal form if and only
if all columns are single-valued. In other
words, no multi-valued attributes! - All rows must have the same number of columns
10Second normal form (2NF)
A relation is in second normal form if and only
if it is in first normal form and all non-key
columns are dependent on the entire key. In
other words, no attribute should be determined by
just part of the primary key.
11Second normal form (2NF)
- Violated when a nonkey column is a fact about
part of the primary key - A column is not fully functionally dependent on
the primary key - customer-credit in this case
order
itemno customerid quantity customer-credit
12 57 25 OK
34 679 3 POOR
ITEM itemno
CUSTOMER customerid customer-credit
ORDER quantity custid (FK) itemno (FK)
12Third normal form (3NF)
A relation is in third normal form if and only if
it is in second normal form and has no transitive
dependencies. This means that no non-key
attribute should be determined by another non-key
attribute.
13Third normal form (3NF)
- Violated when a non-key column is a fact about
another non-key column - A column is not fully functionally dependent on
the primary key - EXCHANGE RATE in this case
stock
stock code nation exchange rate
MG USA 0.67
IR AUS 0.46
STOCK
NATION
stock code
nation code
firm name
nation name
stock price
exchange rate
stock quantity
stock dividend
stock PE
natcode (FK)
14Summary of Normal Forms 1-3
- Every attribute must depend on
- the key
- the whole key
- and nothing but the key
- so help me Codd!
15Boyce-Codd normal form (BCNF)
- Arises when a table
- has multiple candidate keys
- the candidate keys are composite
- the candidate keys overlap
16Fourth normal form (4NF)
- A row should not contain two or more multivalued
independent facts
student
studentid sport subject
50 Football English
50 Football Music
50 Tennis Botany
50 Karate Botany
STUDENT-
STUDENT
ENROLMENT
SPORT
studentid
sectionno
proficiency
SUBJECT
SPORT
subjectid
sportcode
17A relation is in Boyce-Codd normal form if and
only if every determinant is a candidate
key. This means that no key attribute should be
determined by a non-key attribute or another key
attribute.
Boyce-Codd normal form (BCNF)
Adapted from McFadden et al., 1999
18BCNF The problem resolved (although it appears
to create a new problem)
19Fifth normal form (5NF)
- A table can be reconstructed from other tables
- There exists some rule that enables a relation to
be inferred
20- Agent is assigned to customer
21As noted, theres also a domain/key normal form
(DK/NF). However . . . Relations in third
normal form (3NF) are sufficient for most
practical database applications.
Moreover Violations of 5NF occur very rarely
and are difficult to detect in practice.
And The practical utility of DK/NF is
quite limited. (McFadden et al., 1999)
22So, the overall point of normalization . . .
to eliminate problems in the design of database
tables that can lead to modification anomalies
(insertion, deletion, update). Which is great,
but remember if you model data correctly,
you create a normalized design. (Watson, p.
206) (Note that resolving violations of normal
form generally involves adding entities which
makes a strong case for a top-down, business-dr
iven approach to data modeling.)