Title: Week 10 October 31
1Week 10October 31
- Extended ERD
- Data Normalization
2Problems with ER modeling
- Fan traps - Pathway between two entities is
ambiguous - Chasm traps - Pathway does not exist between
certain entity occurrences - Inheritance - An entity receives its attributes
from a class of attributes
Extended Entity Relationship (ERR) modeling
3Connection Trap Fan Trap
What products belong to which product
categories? Which products have restricted use
aboard a plane?
Merchandise Lines Merchandise_line Description
Product Categories Product_category Merchandise_l
ine
Classify
Have
Products Product_code Description Merchandise_lin
e
4Connection Trap Fan Trap
What products belong to which product
categories? Which products have restricted use
aboard a plane?
Merchandise Lines Merchandise_line Description
Product Categories Product_category Merchandise_l
ine
Classify
Have
Products Product_code Description Merchandise_lin
e
To satisfy these queries, we need to form a
relationship
5Connection Trap Chasm Trap
What products belong to the same merchandise
line? Which products require a UL listing?
Merchandise Lines Merchandise_line Description UL
_listing
Product Categories Product_category Merchandise_l
ine
Classify
Have
Products Product_code Description Product_categor
y
To satisfy these queries, we need to form a
relationship
Known What merchandise lines are composed of
what products
6EER modelingSuperclass and Subclass Entity Types
- Superclass - Higher order of classification or
categorization - Subclass - A member of a superclass that provides
specification
Electronic Merchandise
Music and Videos
Superclasses
Audio
Visual
CD
Receiver
Cassette
attributes of CD
attributes of cassette decks
attributes of receivers
7EER modelingSuperclass and Subclass Entity Types
- Specialization - top-down
- Maximizing differences between members by
identifying distinguishing characteristics - Generalization - bottom-up
General
Electronic Merchandise
Audio
Visual
CD
Receiver
Cassette
Specific
attributes of CD
attributes of cassette decks
attributes of receivers
8EER modelingSuperclass and Subclass Entity Types
- Specialization - top-down
- Generalization - bottom-up
- Minimizing differences between entities by
identifying common features
General
Electronic Merchandise
Audio
Visual
CD
Receiver
Cassette
Specific
attributes of CD
attributes of cassette decks
attributes of receivers
9EER modelingAttribute Inheritance
CDP-325 (Sony CD Changer)
Attributes common to all audio merchandise are
inherited
General
Electronic Merchandise
Audio
Visual
CD
Receiver
Cassette
Specific
attributes of CD
attributes of cassette decks
attributes of receivers
10EER Diagram
Product_code
Prod_descip
1
M
Products
Manufacturers
Sells
1
Disjoint constraint
Produces
Superclass/subclass
d
1
CD
Receiver
Cassette
Db range
Flutter
Watts
11Constraints
- Disjoint (d, o)
- Entity can be a member of only one of the
subclasses of specialization - Under non-disjoint, an entity can be a member of
more than one subclass of specialization - Participation (partial or total)
- Total - every entity in the superclass must be a
member of a subclass in specialization - Partial - An entity need not belong to any of the
subclasses of specialization
12Data Normalization
- The process of decomposing complex data
structures into simple relations according to a
set of dependency rules. McFadden and Hoffer
13Data Normalization
- The purpose of normalization is to produce a
stable set of relations that is a faithful model
of the operations of the enterprise. - Achieve a design that is highly flexible
- Reduce redundancy
- Ensure that the design is free of certain update,
insertion and deletion anomalies Catherine
Richardo, 1990
14Normalization
1NF
2NF
Progressively putting the relation into a higher
normal form
3NF
BCNF
4NF
1510001
Order No.
Stereos To Go Invoice
6 15 99
Date / /
Go, Hogs
0000-000-0000-0
Account No.
John Smith
Customer
2036-26 Street
Address
1/05
Sacramento CA 95819
City
State
Zip Code
6 18 99
Date Shipped / /
Item
Product
Product Description/Manufacturer
Qty
Price
Number
Code
1
SAGX730 Pioneer Remote A/V Receiver
1 1 1
56995 35995 39995
2
AT10 Cervwin Vega Loudspeakers
CDPC725 Sony Disc-Jockey CD Changer
3
4
5
132985 10000 10306 153291
Subtotal Shipping Handling Sales Tax Total
16File-Based System
Invoice Program
Invoices
Customer Orders
Customer Account Program
Account
Customer Accounts
Report
File
Customer Mailings Program
Customer Mailing List
Mailing List
File
17Data Redundancy
- Customer Order File
- PO number
- Customer account number
- Customer name, address, city, state, zip code
- Order date
- Product code, product description, price, unit
- Customer Account File
- Account Number
- Customer name, mailing address, city, state, zip
code - Customer Mailing List File
- Customer name, mailing address, city, state, zip
code
18Unnormalized Relation
(Invoice_number, Invoice_date, Date_delivered,
Cust_account Cust_name Cust_addr Cust_city
Cust_state Zip_code, Item1 Item1_descrip
Item1_qty Item1_price, Item2 Item2_descrip
Item2_qty Item2_price, . . . , Item7
Item7_descrip Item7_qty Item7_price)
How would a program process the data to recreate
the invoice?
19First Normal Form (1NF)
- A relation is in first normal form if and only if
every attribute is single-valued for each tuple. - Remove all repeating groups
- Create a flat file
20Unnormalized to 1NF
(Invoice_number, Invoice_date, Date_delivered,
Cust_account Cust_name Cust_addr Cust_city
Cust_state Zip_code, Item1, Item1_descrip,
Item1_qty, Item1_price, Item2, Item2_descrip,
Item2_qty, Item2_price, . . . , Item7,
Item7_descrip, Item7_qty, Item7_price)
Repeating groups
A flat file places all the data of a transaction
into a single record.
This is reminiscent of a COBOL or BASIC program
processing a single transaction with one read
statement.
21Unnormalized to 1NF
(Invoice_number, Invoice_date, Date_delivered,
Cust_account, Cust_name, Cust_addr, Cust_city,
Cust_state, Zip_code, Item, Item_descrip,
Item_qty, Item_price)
Nominated group of attributes to serve as the
key (form a unique combination)
- Eliminate the repeating groups.
- Each row retains data for one item.
- If a person bought 5 items, we would have five
tuples
221NF
Flat File
Invoice number
Account number
Customer name
Item Quantity
Item Price
Description
Item
Account number
23Second Normal Form (2NF)
- A relation is in second normal form if and only
if it is in first normal form and the nonkey
attributes are fully functionally dependent on
the key. - Full functional dependency
- B is functionally dependent on A if each value of
A is associated with exactly one value of B
Attribute B
Attribute A
Determinant
242NF
- Second normal form applies to relations with
composite keys (i.e., a primary key composed of
two or more attributes) - A relation with a single attribute primary key is
automatically in at least 2NF
25 From 1NF to 2NF
(Invoice_number, Invoice_date, Date_delivered,
Cust_account, Cust_name, Cust_addr, Cust_city,
Cust_state, Zip_code, Item, Item_descrip,
Item_qty, Item_price)
What attribute(s) can be used to uniquely
identify a tuple?
If the primary key consisted of invoice_number
and item (i.e., composite key), we would need to
remove the partial dependencies.
26 From 1NF to 2NF
(Invoice_number, Invoice_date, Date_delivered,
Cust_account, Cust_name, Cust_addr, Cust_city,
Cust_state, Zip_code, Item, Item_descrip,
Item_qty, Item_price)
Using Invoice number and Item as the key...
Some of the attributes are dependent upon
invoice_number for their values and others on
item. In either case, they are not functionally
dependent on the entire key.
27 From 1NF to 2NF
Which attributes are functionally dependent on
which keys?
Invoice_date, Date_delivered, Cust_account,
Cust_name, Cust_addr, Cust_city, Cust_state,
Zip_code, tem_descrip, Item_qty, Item_price
?
Invoice_number Vs. Item
28 From 1NF to 2NF
(Invoice_number, Invoice_date, Date_delivered,
Cust_account, Cust_name, Cust_addr, Cust_city,
Cust_state, Zip_code)
(Item, Item_descrip, Item_qty, Item_price)
Is this unique by itself? What happens if the
item is purchased more than once?
29 From 1NF to 2NF
(Invoice_number, Invoice_date, Date_delivered,
Cust_account, Cust_name, Cust_addr, Cust_city,
Cust_state, Zip_code)
Partial dependency
(Invoice_number, Item, Item_descrip, Item_qty,
Item_price)
Composite key (forms a unique combination)
30 From 1NF to 2NF
(Invoice_number, Invoice_date, Date_delivered,
Cust_account, Cust_name, Cust_addr, Cust_city,
Cust_state, Zip_code)
(Invoice_number, Item, Item_qty, Item_price)
(Item, Item_descrip)
31 From 1NF to 2NF
In contrast...
(Invoice_number, Invoice_date, Date_delivered, Cus
t_account Cust_name Cust_addr Cust_city
Cust_state Zip_code, Item Item_descrip
Item_qty Item_price)
If the primary key consisted of invoice_number
and Invoice_date (i.e., composite key), we would
NOT have partial dependencies. Thus, the
relation would be in 2NF.
32Third Normal Form (3NF)
- A relation is in third normal form if it is in
second normal form and no nonkey attribute is
transitively dependent on the key. - Remove transitive dependencies
- Each nonkey attribute must depend upon the key,
the whole key, and nothing but key. Kent,
1978
33 From 2NF to 3NF
(Invoice_number, Invoice_date, Date_delivered,
Cust_account, Cust_name, Cust_addr, Cust_city,
Cust_state, Zip_code)
(Invoice_number, Item, Item_qty, Item_price)
(Item, Item_descrip)
Which attributes are dependent on others? Is
there a problem?
34Transitive Dependencies and Anomalies
- Insertion anomalies
- To add a new row, all customer (name, address,
city, state, zip code, phone) and products
(description) must be consistent with previous
entries - Deletion anomalies
- By deleting a row, a customer or product may
cease to exist - Modification anomalies
- To modify a customers or products data in one
row, all modifications must be carried out to all
others
35Insertion and Modification AnomaliesFor example
Insert a new Panasonic product
Product_code
Manufacturer_name
DVD-A110 Panasonic PV-4210 Panasonic PV-4250 Panas
onic
CT-32S35 PAN
Inconsistency
DVD-A110 Panasonic PV-4210 PanaSonic PV-4250 Pana
Sonic CT-32S35 PAN
Change all Panasonic products manufacturer name
to Panasonic USA
36Deletion AnomalyFor Example
4377182 John Smith ??? Sacramento CA 95831 4398711
Arnold S ??? Davis CA 95691 4578461 Gray
Davis ??? Sacramento CA 95831 4873179 Lisa
Carr ??? Reno NV 89557
By deleting customer Arnold S, we would also be
deleting Davis, California.
37Invoice_number Invoice_date Date_delivered Cust_ac
count Cust_name Cust_addr Cust_city Cust_state Zip
_code Item Item_descrip Invoice_numberItem Item_q
ty Item_price
Transitive Dependencies
- A condition where A, B, C are attributes of a
relation such that if A ? B and B ? C, then C
is transitively dependent on A via B (provided
that A is not functionally dependent on B or C).
38Why Should City and State Be Separated from
Customer Relation?
- City and state are dependent on zip code for
their values and not the customers identifier
(i.e., key). Zip_code ? City, State - Otherwise, Cust_account ? Cust_addr,
Zip_code ? City, StateIn which case, you have
transitive dependency.
393NF
Invoice Relation (Invoice_number, Invoice_date,
Date_delivered, Cust_account)
Customer Relation (Cust_account, Cust_name,
Cust_addr, Zip_code)
Zip_code Relation (Zip_code, City, State)
Invoice_items Relation (Invoice_number, Item,
Item_qty, Item_price)
Items Relation (Item, Item_descrip)
403NF
Invoice Relation (Invoice_number, Invoice_date,
Date_delivered, Cust_account)
Customer Relation (Cust_account, Cust_name,
Cust_addr, Zip_code)
Zip_code Relation (Zip_code, City, State)
Invoice_items Relation (Invoice_number, Item,
Item_qty, Item_price)
Items Relation (Item, Item_descrip)
Manufacturers Relation (Manuf_code, Manuf_name)
Since the Items relation contains the
manufacturers name in the description, a
separate Manufacturers relation can be created
41First to Third Normal Form(1NF - 3NF)
- 1NF A relation is in first normal form if and
only if every attribute is single-valued for each
tuple (remove the repeating or multi-value
attributes and create a flat file) - 2NF A relation is in second normal form if and
only if it is in first normal form and the nonkey
attributes are fully functionally dependent on
the key (remove partial dependencies) - 3NF A relation is in third normal form if it is
in second normal form and no nonkey attribute is
transitively dependent on the key (remove
transitive dependencies)
42(No Transcript)