Title: Lecture 3 Functional Dependency and Normal Forms
1Lecture 3 Functional Dependency and Normal Forms
CS157B
- Prof. Sin-Min Lee
- Department of Computer Science
2Database Design Process
Application 1
Application 2
Application 3
Application 4
External Model
External Model
External Model
External Model
Application 1
Conceptual requirements
Application 2
Conceptual Model
Logical Model
Conceptual requirements
Internal Model
Application 3
Conceptual requirements
Application 4
Conceptual requirements
3Relational Database Model
Relations
Source ESRI Advanced ArcInfo
4Source ESRI Advanced ArcInfo
5Source ESRI Advanced ArcInfo
6Source ESRI Advanced ArcInfo
7Georelational Database Model
8Attribute Relationships
Functional Dependency refers to the
relationships between attributes within a
relation. If the value of attribute A
determines the value of attribute B, then
attribute B is functionally dependent upon
attribute A.
9Source ESRI Advanced ArcInfo
10Â Functional Dependencies
- X -gt Y means
- X functionally determines Y
- Y depends on X
- Values of Y component depend on, determined
by values of X component
11Functional Dependencies
- Given t1 and t2
- if t1X t2 X then t1Y t2 Y (1)
- In other words if the values of X are equal, then
Y value are equal - Values of X component uniquely (functionally)
determine values of Y component iff (1)
12Data Normalization
- Primarily a tool to validate and improve a
logical design so that it satisfies certain
constraints that avoid unnecessary duplication of
data. - The process of decomposing relations with
anomalies to produce smaller, well-structured
relations. - Primary Objective Reduce Redundancy,Reduce
nulls, - Improve modify activities
- insert,
- update,
- delete,
- but not read
- Price degraded query, display, reporting
13Normal Forms
- First Normal Form (1NF)
- Second Normal Form (2NF)
- Third Normal Form (3NF)
- Boyce-Codd Normal Form (BCNF)
- Fourth Normal Form (4NF)
- Fifth Normal Form (5NF)
14Normalization
15Unnormalized Relations
- First step in normalization is to convert the
data into a two-dimensional table - In unnormalized relations data can repeat within
a column
16Unnormalized Relation
17First Normal Form
- To move to First Normal Form a relation must
contain only atomic values at each row and
column. - No repeating groups
- A column or set of columns is called a Candidate
Key when its values can uniquely identify the row
in the relation.
18First Normal Form
19Second Normal Form
- A relation is said to be in Second Normal Form
when every nonkey attribute is fully functionally
dependent on the primary key. - That is, every nonkey attribute needs the full
primary key for unique identification
20Second Normal Form
21Second Normal Form
22Second Normal Form
23Third Normal Form
- A relation is said to be in Third Normal Form if
there is no transitive functional dependency
between nonkey attributes - When one nonkey attribute can be determined with
one or more nonkey attributes there is said to be
a transitive functional dependency. - The side effect column in the Surgery table is
determined by the drug administered - Side effect is transitively functionally
dependent on drug so Surgery is not 3NF
24Third Normal Form
25Third Normal Form
26Functional Dependency and Keys
- Functional Dependency The value of one attribute
(the determinant) determines the value of another
attribute. - Candidate Key Each non-key field is functionally
dependent on every candidate key.
27Steps in Normalization
28Normalization most used
- Four most commonly used normal forms are first
(1NF), second (2NF) and third (3NF) normal forms,
and BoyceCodd normal form (BCNF). - Based on functional dependencies among the
attributes of a relation. - A relation can be normalized to a specific form
to prevent possible occurrence of update
anomalies.
29First Normal Form
- No multi-valued attributes.
- Every attribute value is atomic.
- Why are the following tables not in 1NF
- Employee (ssn, Name, Salary, Address,
ListOfSkills) - Department (Did, Dname, ssn)
30Second Normal Form
- 1NF and every non-key attribute is fully
functionally dependent on the primary key. - Every non-key attribute must be defined by the
entire key, not by only part of the key. - No partial functional dependencies.
- Assuming that we have a composite PK
(LicensePlate, OwnerSSN) for the Vechicle - Table below, why is the table not in 2NF ?
- Vehicle (LicensePlate, Brand, Model,
PurchasePrice, Year, OwnerSSN, OwnerName)
31Third Normal Form BCNF
- 2NF and no transitive dependencies (functional
dependency between non-key attributes BCNF) - Why are the following tables not in 3NF or BCNF ?
- Why is Employee ssn, name, salary, did, dname
- Customer
323NF BCNF
- It is very rare for a Table to be in 3NF and not
be in BCNF (violation of BCNF). - Given a Relation R with attributes A, B and C
where A and B are together the composite PK, - IF A, B -gt C and C -gt B
- THEN R is in 3NF and is not in BCNF
- Example Student, course -gt Instructor
- Instructor -gt Course
-
33Steps in Normalization
- 1NF a table, without multivalued attributes
- if not, then decompose
- 2NF 1NF and every non-key attribute is fully
functionally dependent on the primary key - if not, then decompose
- 3NF 2NF and no transitive dependencies
- if not, then decompose
- GENERAL
- Each table should describe a single theme
- Modification anomalies are minimized
- Hint THE KEY, THE WHOLE KEY AND NOTHING BUT THE
KEY
34- EXAMPLE - OBTAIN CANDIDATE KEYS
- Consider the following scheme from an airline
database system - ( P (pilot) , F (flight ), D (date), T
(scheduled time to depart) ) - We have the following FD's
- F ----gt T PDT ----gt F FD ----gt
P - Provide some superkeys
- PDT is a superkey, and FD is a superkey.
- Is PDT a candidate key?
- PD is not a superkey, nor is DT, nor is PT.
- So, PDT is a candidate key.
- FD is also a candidate key, since neither F or D
are superkeys.
35- If F is a set of functional dependencies for a
relation R, the set of all functional
dependencies that can be derived from F, denoted
by F, is called the CLOSURE of F. - We can use Armstrong's axioms, and the 3 derived
rules, to compute the closure of F, F.
36- WORKING TO GET THE CLOSURE F
- GIVEN scheme (A, B, C, G, H, I)
- GIVEN FD set (A---gtB, A---gtC, CG---gtH, CG---gtI,
B---gtH) - Some members of F are
- A---gtH Transitivity Rule applied to A---gtB and
B---gtH) - CG---gtHI Union Rule applied to CG---gtH and
CG---gtI - AG---gtI By Augmentation Rule, AG---gtCG then
Transitivity
37- THE CLOSURE OF A SET OF ATTRIBUTES
- GIVEN FD set F and a given attribute A (or set
of attributes A) - FIND The set of attributes functionally
dependent on A, called the closure of A, and
denoted by A - IMPORTANT USE FOR THIS To determine if A is a
superkey, we compute A, the set of attributes
functionally dependent on A. If A consists of
ALL the attributes in the relation, then A is a
superkey - HOW DO WE FIND A? The following algorithm does
the trick!
38- ALGORITHM TO FIND THE CLOSURE OF ATTRIBUTE A,
DENOTED BY A
result A while result changes for each
functional dependency B---gtC begin if B is
contained in result, then result result U C
' end endwhile
A result
39- EXAMPLE TO FIND THE CLOSURE A OF AN ATTRIBUTE A
- GIVEN Relation R with attributes W, X, Y, Z and
FD's W ---gt Z YZ ---gt X WZ ---gt Y - FIND WZ
- PSEUDO TRACE OF THE ALGORITHM
- result WZ
- from first 2 FD's, no change to "result"
- from WZ ---gt Y, since WZ is contained in result,
we - get result WZY
- since YZ is contained in result, we get result
WZYX - Thus, every attribute in R is in WZ, so WZ is a
superkey!
40Normalization
- Normalization of data - method for analyzing
schemas - Unsatisfactory schemas decomposed into smaller
ones with desirable properties - Objectives of normalization
- good relation schemas disallowing update
anomalies
41Formal framework
- database normalized to any degree (1, 2, 3, 4, 5,
etc.) - normalization is not done in isolation
- need
- lossless join
- dependency preservation
- additional normal forms meet other desirable
criteria
42Normal Forms
- 1st, 2nd, 3rd, BCNF consider only FD and key
constraints - constraints must not be hard to understand or
detect - need not normalize to highest form (e.g. for
performance reasons)
431NF - 1st normal form
- part of the formal definition of a relation
- disallow multivalued attributes, composite
attributes and their combination - In 1NF single (atomic, indivisible) values
44Normalize into 1NF?
- How to normalize nested relations into 1NF?
- Remove nested relation attributes into new
relation - propagate PK
- combine PK and partial PK
- recursively unnest - multilevel nesting
- useful in converting hierarchical schemes into
1NF
45Â Difficulties with 1NF
- insert, delete, update
- Â Determine if describe entity identified by PK?
- If not, called non-full FDs
- we need full FDs for good inserts, deletes,
updates
46Â Second Normal Form - 2NF
- Uses the concepts of FDs, PKs and this
definition - An FD is a Full functional dependency if
- given Y -gt Z
- Removal of any attribute from Y means the FD does
not hold any more
472NF
- A relation schema R is in 2NF if
- Relation is in 1NF
- Every non-prime attribute A in R is fully
functionally dependent on the primary key - Prime attribute - attribute that is a member of
the primary key K - R can be decomposed into 2NF relations via the
process of 2NF normalization - Remove partial dependencies
- create new relations where partials are full
48Simplifying Functional Dependencies through
Normalization
Normalization the identification of functional
dependencies and the modifications required to
structurally change the database to remove
undesirable dependencies
49Source ESRI Advanced ArcInfo
50Source ESRI Advanced ArcInfo
51Source ESRI Advanced ArcInfo
52Source ESRI Advanced ArcInfo
53Source ESRI Advanced ArcInfo
54Source ESRI Advanced ArcInfo
55Reading Assignment
September 2 ,2004 Read the following
articleIBM's early relational database
scientistshttp//www.mcjones.org/System_R/SQL_Re
union_95/sqlr95.html Chapter 3 3.1. And Chapter
7,7.1-7.3.2 Work on problems 7.12.7.13,7.14,7.15