Title: Normalization of Tables
1Normalization of Tables
Between two evils, choose neither between two
goods, choose both. Tryon Edwards
2Steps to E-R Transformation
- 1. Identify entities
- 2. Identify relationships
- 3. Determine relationship type
- 4. Determine level of participation
- 5. Assign an identifier for each entity
- 6. Draw completed E-R diagram
- 7. Deduce a set of preliminary skeleton tables
along with a proposed primary key for each table
(using rules provided) - 8. Develop a list of all attributes of interest
(not already listed and systematically assign
each to a table in such a way to achieve a 3NF
design (i.e., no repeating groups, no partial
dependencies, and no transitive dependencies)
3Tables
- Database design is the process of separating
information into multiple tables that are related
to each other - Single table designs work only for the simplest
of situations in which data integrity problems
are easy to correct - Anomalies (abnormalities) often arise in single
table designs as a result of inserting, deleting,
or updating records - Some tables are better structured than others
(i.e., result in fewer anomalies)
4Redundancy
- Unnecessary repetition or duplication of data
- increases likelihood of errors due to keying
inconsistencies
5Multi-valued Problems
- Solution 1? Include all authors names in a
single field - Difficult to search for a single authors name or
create an alphabetical list of authors
6Multi-valued Problems
- Solution 2? Add multiple columns, one for each
value - empty fields waste storage space
- awkward to search across fields (e.g., Any books
by Snoopy? Must search Author1, Author2, etc.) - necessitates the creation of a new column every
time a book has an additional author
7Multi-valued Problems
- Solution 3? Add multiple rows, one for each
value - Data about a book must be repeated for as many
times as there are authors of a book (also
creates redundancy which lead to keying errors
and unnecessarily wasting storage space with
large files) - count of total of books or from each
publisher would be wrong
8Update Anomalies
- Update Anomalies
- To update an agents telephone number, each
instance must be changed - if we miss an item or enter it incorrectly we
create an unreliable table
- An update anomaly occurs when multiple record
changes for a single attribute are necessary.
9Deletion Anomalies
- Deletion anomalies
- What happens if a customer record is deleted?
- What happens if an agent record is deleted?
- A deletion anomaly occurs when the removal of a
record results in the unintended loss of
important information.
10Insertion Anomalies
- Insertion anomalies
- What happens if we want to enter information
regarding an agent for whom we do not have a
customer? - Do we add null values (blanks) for the other
fields?
- An insertion anomaly occurs when there is not a
reasonable place to assign attributes and
attribute values to records.
11The Problem with Nulls
1. Nulls used in mathematical expressions -
unknown quantity leads to unknown total value -
misleading value of all inventory
2. Nulls used in aggregate functions - blanks
exist under category - cannot be counted because
they dont exist!
12Database Design Problems
- Use of the relational database model removes some
database anomalies - Further removal of database anomalies relies on a
structured technique called normalization - Presence of some of these anomalies is sometimes
justified in order to enhance performance - Thus, database design consists of balancing the
art of design with the science of design
13Normalization
- Goal in database design to create well-structured
tables - Transform E-R models to tables following the
rules provided - Assuring tables are well-structured with minimal
problems (redundancy, multi-valued attributes,
update anomalies, insertion anomalies, deletion
anomalies) is achieved using structured technique
called normalization - Normalization is the structured decomposition of
one table into two or more tables using a
procedure designed to determine the most
appropriate split - Normalization our method of making sure the E-R
design was correct in the first place - Normalization refers to a series of forms we
will cover 1NF to 3NF, which is usually
sufficient. Note that there are also 4NF,
Boyce-Codd Normal Form (BCNF), Fifth Normal Form
(5NF) and Domain-Key Normal Form (DKNF)
14First Normal Form
- A table is in first normal form if it meets the
following criteria The data are stored in a
two-dimensional table with no two rows identical
and there are no repeating groups. - The following table in NOT in first normal form
because it contains a multi-valued attribute (an
attribute with more than one value in each row).
15Handling multi-valued attributes Incorrect
Solutions
16Handling multi-valued attributes Correct Solution
- Create another entity (table) to handle multiple
instances of the repeating group. This second
table is then linked to the original table with
an identifier (i.e., foreign key). This solution
has the following advantages - no limit to the number of hobbies per member
- no waste of disk space
- searching becomes much easier within a column
(e.g., who likes hiking?)
17Handling Repeating Groups
- An attribute can have a group of several data
entries. Repeating groups can be removed by
creating another table which holds those
attributes that repeat. This second table
(validation table) is then linked to the original
table with an identifier (i.e., foreign key) - Advantages fewer characters tables reduces
miskeying, update anomalies
18Second Normal Form
- A table is in second normal form if it meets the
following criteria The relation is in first
normal form, and, all nonkey attributes are
functionally dependent on the entire primary key. - Applies only to tables that have a composite
primary key. - In the following table, both the EmpID and
Training (composite primary key) determine Date,
whereas, only EmpID (part of the primary key)
determines Dept.
19Removing Partial Dependencies
- Remove partial dependencies by separating the
relation into two relations. Reduces the
problems of - update anomalies
- delete anomalies
- insert anomalies
- redundancies
20Third Normal Form
- A table is in third normal form if it meets the
following criteria The relation is in second
normal form, and, a nonkey field is not
functionally dependent on another nonkey field. - The following table is in second normal form but
NOT in third normal form because Member_Id (the
primary key) does not determine every attribute
(does not determine RegistrationFee).
RegistrationFee is determined by Sport.
Member ID ? FName, LName, Lesson Lesson ? Cost
21Removing non-key Transitive Dependencies
- Remove transitive dependencies by placing
attributes involved in a new relational table.
Reduces the problems of - update anomalies
- delete anomalies
- insert anomalies
- redundancies
22Normalization Example Video Store
- A video rental shop tracks all of their
information in one table. There are now 20,000
records in it. Is it possible to achieve a more
efficient design? (They charge 10/movie/day.)
VIDEO (Cust_name, Cust_address, Cust_phone,
Rental_date, Video_1, Video_2, Video_3,
VideoType_1, VideoType_2, VideoType3,
Return_date, Total_Price, Paid?)
23Is the Video store in 1NF?
- No attributes should form repeating groups -
remove them by creating another table. There are
repeating groups for videos and customers.
CUSTOMER (Cust_Num, Cust_Name, Cust_address_Cust_p
hone
VIDEO (VideoNum, VideoName, VideoType
RENTAL (Cust_num, VideoNum, Rental_date,
Return_date, TotalPrice, Paid?)
24Video Store 1NF (contd)
- Have not yet removed all repeating groups - video
is a multi-valued attribute - move to another
table.
RENTALDETAILS (RentalNum, VideoNum)
RENTAL (RentalNum, Cust_Num, Rental_date,
Return_Date, TotalPrice, Paid?)
25The Video Store is now in 1NF
CUSTOMER (Cust_Num, Cust_Name, Cust_address,
Cust_phone
VIDEO (VideoNum, VideoName, VideoType
RENTALDETAILS (RentalNum, VideoNum)
RENTAL (RentalNum, Cust_Num, Rental_date,
Return_Date, TotalPrice, Paid?)
26Is the Video Store in 2NF?
- The only table that has a composite primary key
has no other fields, therefore, yes.
CUSTOMER (Cust_Num, Cust_Name, Cust_address,
Cust_phone
VIDEO (VideoNum, VideoName, VideoType
RENTALDETAILS (RentalNum, VideoNum)
RENTAL (RentalNum, Cust_Num, Rental_date,
Return_Date, TotalPrice, Paid?)
27Is the Video Store in 3NF?
- Does each attribute in each table depend upon the
primary key?
28The Video Store is now in 3NF
- Yes, because in each table, every attribute
depends on the primary key and not on any other
key.
CUSTOMER (Cust_Num, Cust_Name, Cust_address,
Cust_phone
VIDEO (VideoNum, VideoName, VideoType
RENTAL (RentalNum, Cust_Num, Rental_date)
RENTALDETAILS (RentalNum, VideoNum, ReturnDate,
Amt_Paid)
29Conflicting Goals of Design
- Database design must reconcile the following
requirements - Design elegance requires that the design must
adhere to design rules concerning nulls, derived
attributes, redundancies, relationship types,
etc. - Information requirements are dictated by the end
users - Operational (transaction) speed requirements are
also dictated by the end users - Clearly, an elegant database design that fails to
address end user information requirements or one
that forms the basis for an implementation whose
use progresses at a snail's pace has little
practical use.