Introduction to Normalization - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Introduction to Normalization

Description:

Start with a list of all attributes, considered as if you had a ... ring Kay jewelers prince charming. ring walmart miss piggy. Place - item? Item - place? ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 23
Provided by: ellenw4
Learn more at: http://cs.hiram.edu
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Normalization


1
Introduction to Normalization
  • CPSC 356 Database
  • Ellen Walker
  • Hiram College

2
Building a Schema
  • Start with a list of all attributes, considered
    as if you had a giant flat database (one
    relation) with all possible information in one
    place
  • Divide the attributes into multiple relations
  • Intuitively
  • According to formal rules (normalization)
  • This is a formalized alternative to the
    algorithms we learned before

3
What Makes A Good Schema?
  • Each relation should have clear semantics, i.e.
    can be easily described in a few words
  • Try to avoid redundancy (to minimize storage
    space, but also to avoid anomalies)
  • Avoid a design that encourages too many NULL
    values in a relation. NULL can be ambiguous N/A
    vs. unknown vs. not-yet-entered, etc.
  • Dont split related attributes so that the
    relationship between them is lost (e.g. make sure
    LastName and UserID are both in the same relation)

4
Tracking Real Estate Staff
  • Consider a single relation for real estate
  • It contains branch name, branch number, staff
    name, staff number, staff salary, etc.
  • One entry for each staff member of each branch
  • Branch information is repeated for different
    staff (REDUNDANCY!)
  • Staff information is repeated if they work in
    multiple branches (REDUNDANCY!)
  • This is an example of what NOT to do

5
Redundancy-caused Anomalies
  • Insertion Anomalies
  • A branch with no staff has many NULLs
  • Entering a new staff member has NULL branch info
  • But branch number and staff number are both part
    of primary key! (Why)
  • Deletion Anomalies
  • When the last staff member at a branch is
    deleted, the branch info is lost
  • Update Anomalies
  • If we make a change in branch info once, it must
    be changed in all copies (for all staff).

6
Solving Redundancy Problems
  • Decompose the relation into multiple relations
  • Use Foreign Keys so the complete relation can be
    reconstructed through a join
  • Branch has branch number branch info
  • Staff has staff number, staff info branch
    number as foreign key
  • Foreign Keys are exactly the attributes that are
    in the primary key of the other relation
  • Insertion, deletion update anomalies are gone!
  • Consider add branch with no staff, remove last
    staff member, update branch info

7
How to Decompose?
  • Decompositions are not (always) intuitively
    obvious
  • Codd discovered mathematical properties (called
    Normal Forms) that describe goodness of
    decomposition
  • First, Second, Third normal forms decrease
    redundancy without loss of information
  • BCNF, Fourth and Fifth normal forms potentially
    introduce information loss (we will see)
  • To understand normal forms, start with functional
    dependencies

8
Functional Dependencies
  • If A and B are attributes, and every value of A
    is associated with exactly one value of B (so
    knowing A predicts B), then B is functionally
    dependent on A (We write this as A-gtB)
  • Functional dependency is based on the semantics
    (meaning) of the attributes.
  • A-gtB and B-gtA are two different constraints
  • Email -gt first name is a valid dependency
  • First name -gt Email is not a valid dependency

9
Examples of Functional Dependencies
  • US Zip Code -gt State
  • US Area Code -gt State
  • Email -gt Firstname, lastname
  • HotelNo, RoomNo -gt Price
  • JobTitle, ServiceLength -gt Salary

10
What are the dependencies?
  • item place customer-name
  • ring Kay jewelers prince charming
  • ring walmart miss piggy
  • Place -gt item?
  • Item -gt place?
  • oil walmart tin man

11
Finding Dependencies in Data
  • If a value of attribute A is associated with two
    or more values of B, then it is not true that
    A-gtB.
  • If a value of attribute A is associated with
    exactly one value of B, then it might be true
    that A-gtB.
  • Only when every possible value of attribute A is
    associated with exactly one value of B is it true
    that A-gtB.

12
Characteristics of Functional Dependencies for
Normalization
  • For any given values of the attributes on the
    left, there is exactly one possible attribute on
    the right
  • No future data will ever invalidate the
    dependency
  • Dependency is nontrivial -- no attributes from
    the left are repeated on the right

13
Keys Functional Dependency
  • Remember, a candidate key is a subset of
    attributes that is (guaranteed) unique for every
    tuple
  • Therefore, a valid candidate key determines all
    other attributes in the tuple
  • Therefore, there is a functional dependency from
    the candidate key to all other non-key attributes
    of the relation.
  • (Since the primary key is a candidate key, these
    arguments can also be made for primary keys)

14
Manipulating Functional Dependencies
  • Given a set of dependencies, derive more
    dependencies using inference rules
  • The closure X of a set of dependencies is the
    set of all possible dependencies that can be
    derived from it.

15
Armstrongs Inference Rules for Manipulating
Dependencies
  • if Y is a subset of X, then X -gt Y
  • Alternatively X,Y -gt X (Reflexive)
  • If X-gtY then X,Z-gtY,Z (Augmentation)
  • If X-gtY and Y-gtZ then X-gtZ (Transitive)

16
Additional Inference Rules
  • A-gtA (Self-determination)
  • If A-gtB,C then A-gtB and A-gtC (Decomposition)
  • If A-gtB and A-gtC then A-gtB,C (Union)
  • If A-gtB and C-gtD then A,C -gt B,D (Composition)

17
When are two sets of FDs equivalent?
  • When we can use inference rules to transform A to
    B , then A and B are equivalent
  • Problem it might take a long time to find the
    right set of inference rules
  • What we need is a standard form of FDs - then
    we can just compare

18
Finding the Closure
  • F is a set of functional dependencies (e.g. the
    obvious ones from primary keys) We want to find
    X, which is the set of all attributes that are
    dependent on X (based on F).
  • X X
  • repeat
  • for each dependency Y-gtZ in F do
  • if Y is a subset of X then X X union Z
  • until no more can be added to X

19
Closure Example
  • F is the following set of dependencies
  • A-gtB,C C-gtD A,D -gt F
  • What is A (all attributes that can be derived
    from A)?
  • Initialize A A
  • Because A-gtB,C add B,C to A
  • Because C is in A and C-gtD, add D to A
  • Because A and D are in A, add F to A
  • Therefore A is A,B,C,D,F

20
Equivalence Test
  • Are the following sets of FDs equivalent?
  • AB-gtC, D-gtE, AE-gtG, GD-gtH, ID-gtJ
  • ABD-gtC, ABE-gtG, GD-gtEH, IE-gtJ
  • Compute closures for each, if any two are
    different, they are not equivalent
  • You will need to consider every left side

21
Finding a Key
  • Given a relation with attributes ABCDEFGHIJ and
    the following FDs, find a candidate key for the
    relation
  • AB-gtC, D-gtE, AE-gtG, GD-gtH, ID-gtJ
  • A candidate key is a subset of attributes that
    has the entire set of attributes as its closure.
  • Lets try ABD

22
What is Normalization?
  • Formal technique for analyzing relations based on
    primary key (or candidate keys) and functional
    dependencies
  • Series of tests (normal forms), each of which is
    harder to pass
  • Normal forms 1NF, 2NF, 3NF, BCNF depend on
    functional dependencies
  • Higher forms (4NF, 5NF) based on other
    dependencies
  • To avoid update anomalies without loss, normalize
    to 3NF.
Write a Comment
User Comments (0)
About PowerShow.com