Title: Database Design: Normalization
1Database Design Normalization
2In this lecture you will learn
- Mathematical notions behind relational model
- Anomalies
- How tables that contain redundant data can suffer
from anomalies, which can introduce
inconsistencies into a database. - Normalization
- The rules associated with the most commonly used
normal forms, namely first (1NF), second (2NF),
and third (3NF).
3Introduction
- Relations derived from ER model may be faulty
- May cause data redundancy
- May cause insert/delete/update anomalies
- We use some mathematical properties of relations
to - locate these faults
- fix them
4Mathematical notions behind relational model
- Set a collection of objects characterized by
some defining property - E.g. a column in a database table such as last
names of all staff - Cross Product of sets one of the operations (X)
on sets - E.g. consider two sets, set of all first names
and set of all last names in the staff table - fName Mary, David
- lName Howe, Ford
- fNameXlName (Mary,Howe), (Mary,Ford), (David,
Howe), (David, Ford) - Relation defined between two sets and is a
subset of cross product between those two sets - E.g. FirstNameOf (Mary, Howe), (David, Ford)
5Relational model
- The name relational model comes from this
mathematical notion of relation - Where a relation is a set (collection) of tuples
that have related objects such as first name and
last name of the same person - E.g. (fName, lName) is a relation
- We can have relations over any number of sets
- E.g. (staffNo, fName, lName, position)
- In general we can denote a relation as
(A,B,C,D,.,Z) where A, B, C and Z are all its
attribute sets
6Function
- A function is a special kind of relation
- In a relation (X,Y), if every value of X is
associated with exactly one value of Y, then we
say Y is a function of X. - E.g. the relation (1,2),(2,4),(3,6),(4,8) is a
function, Y 2X for 0ltXlt5
Y
X
Only one arrow can start from any single value in
X
1 2 3 4
2 4 6 8
7Functional Dependency
- If Y is a function of X
- Y is dependent on X
- there is a relationship of functional dependency
between Y and X - If we know the value of X, we find only one value
of Y in all the records that has this value of X,
at any moment in time. - In databases, we work with relations in general
form (A,B,C,D,,Z)
8Functional Dependency
- Functional Dependency
- Describes relationship between
- attributes in a relation
- or columns in a table.
-
- If A and B are columns of table R
- B is functionally dependent on A
- if each value of A in R is associated with
exactly one value of B in R. - We are interested in finding such functional
dependencies among database relations
9Functional Dependency
- Functional dependency is a property of the
meaning of the attributes in a relation (or the
columns in a table). - Diagrammatic representation
10Functional Dependency
- Determinant of a functional dependency refers to
attribute or group of attributes on left-hand
side of the arrow. - If the determinant can maintain the functional
dependency with a minimum number of attributes,
then we call it full functional dependency
11Data Redundancy
- Major aims of relational database design
- to group columns into tables to minimize data
redundancy - to reduce file storage space required by base
relations.
12Data Redundancy
- Tables that contain redundant information may
potentially suffer from update anomalies - Insert anomalies
- Delete anomalies
- Update anomalies
- We illustrate these anomalies with an example.
13Data redundancy and anomalies
Staff_Branch
14Data redundancy and update anomalies
- StaffBranch table has redundant data
- details of a branch are repeated for every member
of staff. - In contrast, consider Branch and Staff tables
- branch information appears only once for each
branch in the Branch table - only the branch number (branchNo) is repeated in
the Staff table, to represent where each member
of staff is located.
15Data redundancy and anomalies
Staff
Branch
16Insert Anomalies
- Try to insert details for a new member of staff
into StaffBranch. - You also need to insert branch details that are
consistent with existing details for the same
branch. - Hard to maintain data consistency with StaffBranch
17Delete Anomalies
- Try to delete details for a member of staff from
StaffBranch. - You also lose branch details in that row (tuple).
18Update Anomalies
- Try to update the value of one of the attributes
of a branch. - You also need to update that information in all
the rows about the same branch.
19Decomposition of Relations
- Staff and Branch relations which are obtained by
decomposing StaffBranch do not suffer from these
anomalies. - Two important properties of decomposition
- Lossless-join property enables us to find any
instance of original relation from corresponding
instances in the smaller relations. - Dependency preservation property enables us to
enforce a constraint on original relation by
enforcing some constraint on each of the smaller
relations.
20Normalization
- A technique for producing a set of tables with
desirable properties that support the
requirements of a user or company.
21The Process of Normalization
- Formal technique for analyzing a relation based
on - its primary key
- functional dependencies between attributes.
- Often executed as a series of steps.
- Each step corresponds to a specific normal form,
which has known properties. - As normalization proceeds, relations become
- progressively more restricted in format
- less vulnerable to update anomalies.
22The Process of Normalization
- Given a relation, use the following cycle
- Find out what normal form it is in
- Transform the relation to the next higher form by
decomposing it to form simpler relations - You may need to refine the relation further if
decomposition resulted in undesirable properties
23Normalization Flow
UNF
Remove repeating groups
1NF
Remove partial dependencies
2NF
Remove transitive dependencies
3NF
More normalized forms
24Unnormalized Form (UNF)
- A table that contains one or more repeating
groups. - To create an unnormalized table
- transform data from information source (e.g.
form) into table format with columns and rows.
Example 1 address and name fields are composite
25Another example of UNF
Example 2 repeating columns for each client
composite name field
26UNF to 1NF
- Remove repeating group by
- entering appropriate data into the empty columns
of rows containing repeating data (flattening
the table). - Or by
- placing repeating data along with copy of the
original key attribute(s) into a separate table.
27First Normal Form (1NF)
- A relation in which intersection of each row and
column contains one and only one value. - Steps from UNF to 1NF
- Nominate an attribute or group of attributes to
act as the key for the unnormalized table. - Identify repeating group(s) in unnormalized table
which repeats for the key attribute(s).
28Branch table in UNF
29Converting Branch table to 1NF
30Second Normal Form (2NF)
- Based on concept of full functional dependency
- A and B are attributes of a relation R,
- B is fully dependent on A (denoted A-gtB) if B is
functionally dependent on A but not on any proper
subset of A. - 2NF only applies to tables with composite primary
keys. - 2NF is
- A table that is in 1NF and
- In which the values of each non-primary-key
column can be worked out from the values in all
the columns that make up the primary key.
311NF to 2NF
- Steps
- Identify primary key for the 1NF relation.
- Identify functional dependencies in the relation.
- If partial dependencies exist on the primary key
remove them by placing them in a new relation
along with copy of their determinant.
32TempStaffAllocation table is not in 2NF
33Converting TempStaffAllocation table to 2NF
34Third normal form (3NF)
- A table that is
-
- in 1NF and 2NF and
- in which all non-primary-key column can be worked
out from only the primary key column(s) and no
other columns.
35Third Normal Form (3NF)
- Based on concept of transitive dependency
- A, B and C are attributes of a relation such that
if A ? B and B ? C, - then C is transitively dependent on A via B.
(Provided that A is not functionally dependent on
B or C). - 3NF
- A relation that is in 1NF and 2NF and
- no non-primary-key attribute is transitively
dependent on the primary key.
362NF to 3NF
- Steps
- Identify the primary key in the 2NF relation.
- Identify functional dependencies in the relation.
- If transitive dependencies exist on the primary
key, remove them by placing them in a new
relation along with copy of their determinant.
37StaffBranch table is not in 3NF
38Converting the StaffBranch table to 3NF
39Review Normalization Flow
UNF
Remove repeating groups
1NF
Remove partial dependencies
2NF
Remove transitive dependencies
3NF
More normalized forms
40Conclusion
- Quality of the relations derived from ER models
is unknown. - Normalization is a systematic process of either
assessing or converting these relations into
progressively stricter normal forms. - Advanced normal forms such as Boyce-Codd normal
form (BNCF), 4NF and 5NF exist.