Data Normalization - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Data Normalization

Description:

Primarily a tool to validate and improve a logical design so ... Homonyms: Same name, different meanings (e.g., Address - Campus address or permanent address? ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 32
Provided by: miche244
Category:

less

Transcript and Presenter's Notes

Title: Data Normalization


1
Data Normalization
  • A formal process for deciding which attributes
    should be grouped together in a relation
  • Primarily a tool to validate and improve a
    logical design so that it satisfies certain
    constraints that avoid unnecessary duplication of
    data
  • The process of decomposing relations with
    anomalies to produce smaller, well-structured
    relations

2
Well-Structured Relations
  • A relation that contains minimal data redundancy
    and allows users to insert, delete, and update
    rows without causing data inconsistencies
  • Goal is to avoid anomalies
  • Insertion Anomalyadding new rows forces user to
    create duplicate data
  • Deletion Anomalydeleting rows may cause a loss
    of data that would be needed for other future
    rows
  • Modification Anomalychanging data in a row
    forces changes to other rows because of
    duplication

General rule of thumb A table should not pertain
to more than one entity type
3
ExampleFigure 5-2b
QuestionIs this a relation?
AnswerYes Unique rows and no multivalued
attributes
QuestionWhats the primary key?
AnswerComposite Emp_ID, Course_ID
4
Anomalies in this Table
  • Insertioncant enter a new employee without
    having the employee take a class
  • Deletionif we remove employee 140, we lose
    information about the existence of a Tax Acc
    class
  • Modificationgiving a salary increase to employee
    100 forces us to update multiple records

Why do these anomalies exist? Because there are
two themes (entity types) in this one relation.
This results in data duplication and an
unnecessary dependency between the entities
5
Objective of normalization
  • Minimize data redundancy, avoiding anomalies and
    conserving storage space
  • Simplify the enforcement of referential integrity
    constraint
  • Make it easy to maintain data
  • Provide a better design

6
Steps in normalization
  • Normal form a state of a relation that can be
    determined by applying simple rules regarding
    functional dependencies (or relationships between
    attributes)
  • First normal form (multivalued attributes)
  • Second normal form (partial functional
    dependency)
  • Third normal form (transitive functional
    dependency)

7
Figure 5.22 Steps in normalization
8
Functional dependency
  • Functional dependency is a particular
    relationship between two attributes or two sets
    of attributes The value of one attribute (the
    determinant) determines the value of other
    attributes
  • For any relation R, the attribute B is
    functionally dependent on A if for every instance
    of A, that value of A uniquely determines the
    value of B
  • Represented as A ? B
  • Normalization is based on the analysis of
    functional dependency

9
Determinant
  • Determinant The attribute on the left-hand side
    of the arrow in a functional dependency
  • A primary key is always a determinant,
  • while a determinant may or may not be
  • a primary key

10
Examples of Functional Dependency
  • SSN ? NAME, ADDRESS, BIRTHDATE
  • A persons name, address and birthdate are
    functionally dependent on that persons social
    security number
  • VIN ? MAKE, MODEL, COLOR
  • The make, model and color of a vehicle are
    functionally dependent on the vehicle
    identification number
  • ISBN ? TITLE
  • The title of a book is functionally dependent on
    the books international standard book number
    (ISBN)

11
Candidate Keys
  • Candidate Key
  • An attribute, or combination of attributes, that
    uniquely identifies a row in a relation
  • Unique identification
  • Nonredundancy
  • One of the candidate keys will become the primary
    key
  • Each non-key field is functionally dependent on
    every candidate key

12
Key and functional dependency
13
First Normal Form
  • No multivalued attributes (or no repeating group)
  • Every attribute value is atomic
  • Primary key has been defined
  • Fig. 5-25 is not in 1st Normal Form (multivalued
    attributes) ? it is not a relation
  • Fig. 5-26 is in 1st Normal form
  • All relations are in 1st Normal Form

14
Case of Pine valley furniture
15
Table with multivalued attributes, not in 1st
normal form
Note this is NOT a relation
16
Table with no multivalued attributes and unique
rows, in 1st normal form
Note this is relation, but not a well-structured
one
17
Functional dependency for invoice
18
Anomalies in this Table
  • Insertionif new product is developed and has not
    yet been ordered by any customer, what if you
    want to insert the record of this product into
    the table?
  • Deletionif we delete the Dining Table from Order
    1006, we lose information concerning this item's
    finish and price
  • Updatechanging the price of product ID 4
    requires update in several records

Why do these anomalies exist? Because there are
multiple themes (entity types) in one relation.
This results in duplication and an unnecessary
dependency between the entities
19
Second Normal Form
  • 1NF PLUS every non-key attribute is fully
    functionally dependent on the ENTIRE primary key
  • Every non-key attribute must be defined by the
    entire key, not by only part of the key
  • No partial functional dependencies--a nonkey
    attribute is functionally dependent on part of
    the primary key

20
Figure 5-27 Functional dependency diagram for
INVOICE
Order_ID ? Order_Date, Customer_ID,
Customer_Name, Customer_Address
Customer_ID ? Customer_Name, Customer_Address
Product_ID ? Product_Description, Product_Finish,
Unit_Price
Order_ID, Product_ID ? Order_Quantity
Therefore, NOT in 2nd Normal Form
21
Convert to 2NF
  • Create a new relation for each primary key
    attribute that is a determinant in a partial
    dependency. That attribute is the primary key in
    the new relation
  • Move the nonkey attributes that are dependent on
    this primary key from the old relation to the new
    relation

22
Figure 5-28 Removing partial dependencies
Getting it into Second Normal Form
Partial dependencies are removed, but there are
still transitive dependencies
23
Third Normal Form
  • 2NF PLUS no transitive dependencies (functional
    dependencies on non-primary-key attributes)
  • Note This is called transitive, because the
    primary key is a determinant for another
    attribute, which in turn is a determinant for a
    third
  • Solution Non-key determinant with transitive
    dependencies go into a new table non-key
    determinant becomes primary key in the new table
    and stays as foreign key in the old table

24
Figure 5-28 Removing partial dependencies
Getting it into Third Normal Form
Transitive dependencies are removed
25
Relational scheme for invoice
26
Additional Normal Forms
  • Relations in third normal form are sufficient for
    most practical database applications
  • However, 3NF does not guarantee that all
    anomalies have been removed
  • There are additional normal forms to remove them
  • Boyce-Codd Normal Form
  • Fourth Normal Form
  • Fifth Normal Form
  • Domain Key Normal Form

27
final words on normalization
  • The key
  • The whole key
  • Nothing but the key

28
Merging Relations(View Integration Problems)
  • As part of the logical design process, normalized
    relations may have been created from a number of
    separate E-R diagrams and (possibly) other user
    views.
  • Some of the relations may be redundant that is,
    they may refer to the same entities.
  • If so, we should merge those relations to remove
    the redundancy.

29
Merging Relations(View Integration Problems)
  • Synonyms Different names, same
    meaning(e.g., Student_No and Student_ID)
  • STUDENT1 (Student_ID, Name)
  • STUDENT2 (Student , Name, Address)
  • Homonyms Same name, different
    meanings(e.g., Address - Campus address or
    permanent address?)
  • STUDENT1 (Student_ID, Name, Address)
  • STUDENT2 (Student , Name, Phone, Address)
  • STUDENT (Student_ID, Name, Phone_No,
    Campus_Address, Permanent_Address)
  • STUDENT (SSN, Name, Address)

30
Merging Relations(View Integration Problems)
  • Transitive Dependencies may create transitive
    dependencies when merging two 3NF relations
    together
  • STUDENT1 (Student_ID, Major)
  • STUDENT2 (Stu_ID, Advisor))
  • STUDENT (Student_ID, Major, Advisor)
  • STUDENT (Student_ID, Major)
  • MAJOR ADVISOR (Major, Advisor)

31
Merging Relations(View Integration Problems)
  • Supertype/Subtype These relationships may be
    hidden in user views or relations
  • PATIENT1 (Patient_ID, Name, Address)
  • PATIENT2 (Patient_ID, Room_No)
  • Two different types of patients PATIENT1
    contains attributes common to all patients.
    PATIENT2 contains an attribute (Room_No) that is
    a characteristic only of inpatients.
  • Create supertype/subtype relationships for these
    entities
  • PATIENT1 (Patient_ID, Name, Address)
  • INPATIENT (Patient_ID, Room_No)
  • OUTPATIENT (Patient_ID, Date_Treated)
Write a Comment
User Comments (0)
About PowerShow.com