Title: Optimizing the Relational Schema
1Optimizing the Relational Schema
- How can the Relational Schema be optimized?
- What is normalization?
- Is normalization necessary?
- Is normalization sufficient to guarantee adequate
quality of the Relational Schema?
2New Terms
- Base relation
- Functional Dependency
- Legal Extension
- Minimal Set of Functional Dependencies
- Normal Form
- Update Anomalies
- Insertion Anomalies
- Modification Anomaly
- Deletion Anomaly
- 1st, 2nd, 3rd, Boyce-Codd Normal Form
- Lossless/Non-additive Join
- Prime attribute
- Spurious tuple
- Dependency Preservation
3Concepts
- Functional dependency
- Disney Movie gt Humor, Sentimental, No Adult
language - James Bond Movie gt Gadgets, Violence, Women
- Deletion Anomaly
- If husband dies, wife is burnt alive
- Item vanishes from Catalog/Menu when no new
orders come in - Spurious Tuple
- Assigned bad credit history because of name
mismatch - Frequent security checks while boarding aircraft
- Modification Anomaly
- Why PDAs are popular
- Insertion Anomaly
- Can we produce a chicken without an egg? And vice
versa?
4Guidelines for a good relational model
- Design a good ER/EER model
- Conserve space (Redundancy, Nulls)
- Clarity of Relational Schema (Design should be
self-evident) - Group attributes logically, with no lumping
- Minimize work in updates
- Specify functional dependencies explicitly
- Preserve integrity of information
5Update Anomalies
- Insertion Anomaly (Figure 14.3)
- Due to wrong coupling of groups of attributes
- Being forced to enter nulls for inapplicable
information - Being forced to enter redundant information
repeatedly - Being forced to violate Key constraint or Entity
integrity - Deletion Anomaly (Figure 14.3)
- Due to wrong coupling of groups of attributes
- Being forced to delete information as a side
effect - Modification Anomaly (Fig 14.3)
- Being forced to do extra work every time an
update is made (In effect, maintaining
consistency among multiple copies of information)
6Spurious Tuples (Figure 14.5/6)
- Result of unwise naming of attributes wrong
splitting of a Relation - Natural Join results in nonsense tuples
- Analogous to non-existent tuples generated by
Cartesian Product - Solution
- Have common attribute names for Foreign Keys
- Have different names for all other attributes,
even when they refer to the same attribute
7Functional Dependency
- X ?Y (Given value of attribute set X, value of
attribute set Y can be deduced unidirectional,
opposite not necessarily true) - Functional Dependency Information (F) needs to be
provided explicitly, in addition to the
Relational Schema - The Minimal set of functional dependencies
(complete, single attribute on right side for
every dependency) for a Relation is sufficient to
derive all existing dependencies by using
Armstrongs inference rules
8Inference Rules for determining X (closure of X
under F)
- IR1 (reflexive rule (Note 7)) If X Â Y, then X
â Y. - IR2 (augmentation rule (Note 8)) X â Y  XZ
â YZ. - IR3 (transitive rule) X â Y, Y âZ  X â Z.
- IR4 (decomposition, or projective, rule) X â
YZ  X â Y. - IR5 (union, or additive, rule) X â Y, XâZ Â
X â YZ. - IR6 (pseudotransitive rule) X â Y, WY â Z Â
WX â Z. - First 3 are Armstrongs rules and sufficient by
themselves
9Using inference Rules to determine F from F
F SSN â ENAME, PNUMBER â PNAME,
PLOCATION, SSN, PNUMBER â HOURS
(Armstrongs rules)
SSN SSN, ENAME PNUMBER
PNUMBER, PNAME, PLOCATION SSN, PNUMBER
SSN, PNUMBER, ENAME, PNAME, PLOCATION, HOURS
10Normalization
Normal Form of Relational Schema
Initial Relational Schema
Normalization
Minimal Set of Functional Dependencies
11Normalization
- Applied to Relation schemas, one at a time and
independently - Need at least 3rd Normal Form (May be
automatically achieved if EER model is sound) - Primarily addresses
- Update anomalies
- Considerations of Space
- Normalization does NOT
- eliminate the possibility of spurious tuples,
- guarantee to preserve all functional dependencies
(i.e., information about the original design may
still be lost) - guarantee the best performance
12Normal Forms
- First Normal Form (1NF) (déjà vu!) (Figure 14.9)
- Only atomic attributes permitted
- Replace any multivalued, composite, nested or
composite attributes with atomic attribute
representations - Best done by creating a separate Relation for
each of the concerned attributes - Second Normal Form (2NF)
- Only Full Functional Dependencies allowed (No
Partial dependencies) (Figure 14.10a) - Third Normal Form (3NF)
- No transitive dependencies allowed (Only primary
key can be function argument) (Figure 14.10b)