Title: Data Normalization
1Data Normalization
- A formal process for deciding which attributes
should be grouped together in a relation - Primarily a tool to validate and improve a
logical design so that it satisfies certain
constraints that avoid unnecessary duplication of
data - The process of decomposing relations with
anomalies to produce smaller, well-structured
relations
2Well-Structured Relations
- A relation that contains minimal data redundancy
and allows users to insert, delete, and update
rows without causing data inconsistencies - Goal is to avoid anomalies
- Insertion Anomalyadding new rows forces user to
create duplicate data - Deletion Anomalydeleting rows may cause a loss
of data that would be needed for other future
rows - Modification Anomalychanging data in a row
forces changes to other rows because of
duplication
General rule of thumb A table should not pertain
to more than one entity type
3ExampleFigure 5-2b
QuestionIs this a relation?
AnswerYes Unique rows and no multivalued
attributes
QuestionWhats the primary key?
AnswerComposite Emp_ID, Course_ID
4Anomalies in this Table
- Insertioncant enter a new employee without
having the employee take a class - Deletionif we remove employee 140, we lose
information about the existence of a Tax Acc
class - Modificationgiving a salary increase to employee
100 forces us to update multiple records
Why do these anomalies exist? Because there are
two themes (entity types) in this one relation.
This results in data duplication and an
unnecessary dependency between the entities
5Objective of normalization
- Minimize data redundancy, avoiding anomalies and
conserving storage space - Simplify the enforcement of referential integrity
constraint - Make it easy to maintain data
- Provide a better design
6Steps in normalization
- Normal form a state of a relation that can be
determined by applying simple rules regarding
functional dependencies (or relationships between
attributes) - First normal form (multivalued attributes)
- Second normal form (partial functional
dependency) - Third normal form (transitive functional
dependency)
7Figure 5.22 Steps in normalization
8Functional dependency
- Functional dependency is a particular
relationship between two attributes or two sets
of attributes The value of one attribute (the
determinant) determines the value of other
attributes - For any relation R, the attribute B is
functionally dependent on A if for every instance
of A, that value of A uniquely determines the
value of B - Represented as A ? B
- Normalization is based on the analysis of
functional dependency
9Determinant
- Determinant The attribute on the left-hand side
of the arrow in a functional dependency
- A primary key is always a determinant,
- while a determinant may or may not be
- a primary key
10Examples of Functional Dependency
- SSN ? NAME, ADDRESS, BIRTHDATE
- A persons name, address and birthdate are
functionally dependent on that persons social
security number - VIN ? MAKE, MODEL, COLOR
- The make, model and color of a vehicle are
functionally dependent on the vehicle
identification number - ISBN ? TITLE
- The title of a book is functionally dependent on
the books international standard book number
(ISBN)
11Candidate Keys
- Candidate Key
- An attribute, or combination of attributes, that
uniquely identifies a row in a relation - Unique identification
- Nonredundancy
- One of the candidate keys will become the primary
key - Each non-key field is functionally dependent on
every candidate key
12Key and functional dependency
13First Normal Form
- No multivalued attributes (or no repeating group)
- Every attribute value is atomic
- Primary key has been defined
- Fig. 5-25 is not in 1st Normal Form (multivalued
attributes) ? it is not a relation - Fig. 5-26 is in 1st Normal form
- All relations are in 1st Normal Form
14Case of Pine valley furniture
15Table with multivalued attributes, not in 1st
normal form
Note this is NOT a relation
16Table with no multivalued attributes and unique
rows, in 1st normal form
Note this is relation, but not a well-structured
one
17Functional dependency for invoice
18Anomalies in this Table
- Insertionif new product is developed and has not
yet been ordered by any customer, what if you
want to insert the record of this product into
the table? - Deletionif we delete the Dining Table from Order
1006, we lose information concerning this item's
finish and price - Updatechanging the price of product ID 4
requires update in several records
Why do these anomalies exist? Because there are
multiple themes (entity types) in one relation.
This results in duplication and an unnecessary
dependency between the entities
19Second Normal Form
- 1NF PLUS every non-key attribute is fully
functionally dependent on the ENTIRE primary key - Every non-key attribute must be defined by the
entire key, not by only part of the key - No partial functional dependencies--a nonkey
attribute is functionally dependent on part of
the primary key
20Figure 5-27 Functional dependency diagram for
INVOICE
Order_ID ? Order_Date, Customer_ID,
Customer_Name, Customer_Address
Customer_ID ? Customer_Name, Customer_Address
Product_ID ? Product_Description, Product_Finish,
Unit_Price
Order_ID, Product_ID ? Order_Quantity
Therefore, NOT in 2nd Normal Form
21Convert to 2NF
- Create a new relation for each primary key
attribute that is a determinant in a partial
dependency. That attribute is the primary key in
the new relation - Move the nonkey attributes that are dependent on
this primary key from the old relation to the new
relation
22Figure 5-28 Removing partial dependencies
Getting it into Second Normal Form
Partial dependencies are removed, but there are
still transitive dependencies
23Third Normal Form
- 2NF PLUS no transitive dependencies (functional
dependencies on non-primary-key attributes) - Note This is called transitive, because the
primary key is a determinant for another
attribute, which in turn is a determinant for a
third - Solution Non-key determinant with transitive
dependencies go into a new table non-key
determinant becomes primary key in the new table
and stays as foreign key in the old table
24Figure 5-28 Removing partial dependencies
Getting it into Third Normal Form
Transitive dependencies are removed
25Relational scheme for invoice
26Additional Normal Forms
- Relations in third normal form are sufficient for
most practical database applications - However, 3NF does not guarantee that all
anomalies have been removed - There are additional normal forms to remove them
- Boyce-Codd Normal Form
- Fourth Normal Form
- Fifth Normal Form
- Domain Key Normal Form
27final words on normalization
- The key
- The whole key
- Nothing but the key
28Merging Relations(View Integration Problems)
- As part of the logical design process, normalized
relations may have been created from a number of
separate E-R diagrams and (possibly) other user
views. - Some of the relations may be redundant that is,
they may refer to the same entities. - If so, we should merge those relations to remove
the redundancy.
29Merging Relations(View Integration Problems)
- Synonyms Different names, same
meaning(e.g., Student_No and Student_ID) - STUDENT1 (Student_ID, Name)
- STUDENT2 (Student , Name, Address)
- Homonyms Same name, different
meanings(e.g., Address - Campus address or
permanent address?) - STUDENT1 (Student_ID, Name, Address)
- STUDENT2 (Student , Name, Phone, Address)
- STUDENT (Student_ID, Name, Phone_No,
Campus_Address, Permanent_Address)
- STUDENT (SSN, Name, Address)
30Merging Relations(View Integration Problems)
- Transitive Dependencies may create transitive
dependencies when merging two 3NF relations
together - STUDENT1 (Student_ID, Major)
- STUDENT2 (Stu_ID, Advisor))
- STUDENT (Student_ID, Major, Advisor)
- STUDENT (Student_ID, Major)
- MAJOR ADVISOR (Major, Advisor)
-
31Merging Relations(View Integration Problems)
- Supertype/Subtype These relationships may be
hidden in user views or relations - PATIENT1 (Patient_ID, Name, Address)
- PATIENT2 (Patient_ID, Room_No)
- Two different types of patients PATIENT1
contains attributes common to all patients.
PATIENT2 contains an attribute (Room_No) that is
a characteristic only of inpatients. - Create supertype/subtype relationships for these
entities - PATIENT1 (Patient_ID, Name, Address)
- INPATIENT (Patient_ID, Room_No)
- OUTPATIENT (Patient_ID, Date_Treated)