Database Theory and Methodology

About This Presentation

Title:

Database Theory and Methodology

Description:

So far we have not developed any measure of 'goodness' to measure the quality of ... We need formal concepts to define 'goodness' ... The Evils of Redundancy ... – PowerPoint PPT presentation

Number of Views:92

Avg rating:3.0/5.0

Slides: 38

Provided by: nihankes

Category:

more less

Transcript and Presenter's Notes

Title: Database Theory and Methodology

1
Database Theory and Methodology
2
The Good and the Bad

So far we have not developed any measure of
goodness to measure the quality of the design,
other than our intuition.
Bad database design
Redundant Information
Update anomalies
Wasting storage with null values
Generation of invalid data during joins
Good database design
Easy to explain its semantics
No redundancy (or at least reduced redundancy)
No information loss during joins
We need formal concepts to define goodness and
badness of relational schemas.

3
The Evils of Redundancy

Redundancy is at the root of several problems
associated with relational schemas
Redundant storage
Insertion anomalies
Deletion anomalies
Update anomalies

4
Example Bad Design

redundancy airline name repeated for the same
flight
inconsistency when airline name for a flight
changes, it must be changed in many places

5
Bad Database Design

insertion anomalies how do we represent that
SK912 is flown by Scandinavian without there
being a date and a plane assigned?
deletion anomalies cancelling AA411 on 10/22/00
makes us lose that it is flown by American.
update anomalies if DL242 is flown by Sabena, we
must change it everywhere.

6
Bad Database Design- decomposition
7
Bad Database Design- information loss

information loss we polluted the database with
false facts we cant find the true facts.

8
Good Database Design

no redundancy of FACT (!)
no inconsistency
no insertion, deletion or update anomalies
no information loss

9
Goal Devise a Theory for the Following

Decide whether a particular relation R is in
good form.
In the case that a relation R is not in good
form, decompose it into a set of relations R1,
R2, ..., Rn such that
each relation is in good form
the decomposition is a lossless-join
decomposition
Our theory is based on
functional dependencies
multivalued dependencies

10
Functional Dependencies (FDs)

A functional dependency is a constraint between
two sets of attributes from the database.
Definition A functional dependency, denoted by X
? Y, holds over relation R if, for every
allowable instance r of R
t1 ? r, t2 ? r, t1.X t2.X implies t1.Y t2.Y
i.e., given two tuples in r, if the X values
agree, then the Y values must also agree. (X and
Y are sets of attributes.)
An FD is a statement about all allowable
relations.
Must be identified based on semantics of
application.
Given some allowable instance r1 of R, we can
check if it violates some FD f, but we cannot
tell if f holds over R!
If X ? Y in R, this does not say whether or not Y
? X.

11
Functional Dependencies and Keys

A key constraint is a special case of an FD.
Note the following
If X is a candidate key, X ? Y for any subset of
attributes Y of R.
X ? Y does not require that the set X be minimal
the additional minimality condition must be met
for X to be a key.
If X ? Y holds where Y is the set of all
attributes in R, and there is some subset V of X
s.t. V ? Y, then X is a superkey.

12
Example Constraints on Entity Set

Consider relation obtained from Hourly_Emps
Hourly_Emps (ssn, name, lot, rating, hrly_wages,
hrs_worked)
Notation We will denote this relation schema by
listing the attributes SNLRWH
This is really the set of attributes
S,N,L,R,W,H.
Sometimes, we will refer to all attributes of a
relation by using the relation name. (e.g.,
Hourly_Emps for SNLRWH)
Some FDs on Hourly_Emps
ssn is the key S ? SNLRWH
rating determines hrly_wages R ? W

13
Example (Contd.)

Problems due to R ? W
Update anomaly Can we change W in
just the 1st tuple of SNLRWH?
Insertion anomaly What if we want to insert an
employee and dont know the hourly wage for his
rating?
Deletion anomaly If we delete all employees with
rating 5, we lose the information about the wage
for rating 5!

Will two smaller tables be better?
14
Closure of a set of FDs

Given some FDs, we can usually infer additional
FDs
ssn ? did, did ? lot implies ssn ? lot
mgr_ssn ? mgr_phone, dept_no ? mgr_ssn implies
dept_no ? mgr_phone
An FD f is implied by a set of FDs F if f holds
whenever all FDs in F hold.
Formally, the set of all dependencies that
include F as well as all dependencies that can be
inferred from F is called the closure of F it is
denoted by F.

15
Inference Rules for FDs

Armstrongs Axioms (X, Y, Z are sets of
attributes)
Reflexivity If Y ? X then X ? Y.
Augmentation If X ? Y, then XZ ? YZ for
any Z.
Transitivity If X ? Y and Y ? Z, then X
? Z.
These are sound and complete inference rules for
FDs!
By sound we mean that they generate only FDs in
F when applied to a set F of FDs.
By complete we mean that repeated application of
these rules will generate all FDs in the closure
of F.

16
Derived inference rules

Union if XY and XZ, then XYZ.
Decomposition if XYZ, then XY and XZ.
Pseudotransitivity if XY and WYZ, then WXZ.
These additional rules are not essential their
soundness can be proved using Armstrongs Axioms.
Exercise Prove rules Union, Decomposition and
Pseudotransitivity using A.A.

17
Example

Consider the following schema
Contracts(cid,sid,jid,did,pid,qty,value)
C is the key C ? CSJDPQV
Project purchases each part using single
contract JP ? C
Dept purchases at most one part from a supplier
SD ? P
JP ? C, C ? CSJDPQV imply JP?CSJDPQV
SD?P implies SDJ ? JP
SDJ ? JP, JP ? CSJDPQV imply SDJ ?
CSJDPQV
We can infer several additional FDs that are in
the closure by using augmentation or
decomposition. e.g. C ? CSJDPQV implies C ? C,
C? S,C ? J, C? D, C? P, C? Q, C? V.
We have also a number of trivial FDs from
reflexivity rule.

18
Reasoning About FDs

Computing the closure of a set of FDs can be
expensive. (Size of closure is exponential in
attrs!)
Typically, we just want to check if a given FD X
? Y is in the closure of a set of FDs F. An
efficient check
Compute attribute closure of X (denoted X) wrt
F
Set of all attributes A such that X ? A is in F.
There is a linear time algorithm to compute this.
Check if Y is in X.
Does F A ?B, B ? C, CD ? E imply A ? E?
i.e, is A ?E in the closure F ?
(Equivalently, is E in X)?

19
Attribute Closure

X X
repeat
oldX X
for each functional dependency Y ? Z in F do
if Y ? X then X X ? Z
until X oldX

20
Example

Given the following set of FDs
F Ssn ? Ename,
Pnumber ? Pname, Plocation,
Ssn,Pnumber ? Hours
The attribute closures
Ssn Ssn, Ename
Pnumber Pnumber, Pname,Plocation
Ssn,Pnumber Ssn, Pnumber, Ename, Pname,
Plocation, Hours

21
Example

R (A, B, C, G, H, I)
F A ? B A ? C CG ? H CG ? I B ? H
(AG)
result AG.
result ABCG (A ? C and A ? B)
result ABCGH (CG ? H and CG ? AGBC)
4. result ABCGHI (CG ? I and CG ? AGBCH)
Is AG a candidate key?
Is AG a super key?
Does AG ? R? Is (AG) ? R
Is any subset of AG a superkey?
Does A ? R? Is (A) ? R
Does G ? R? Is (G) ? R

22
Finding a Key for a Relation

Algorithm Finding a Key K for R, given a set of
FDs
Set K R.
For each attribute A in K
Compute (K-A) wrt F
If (K-A) contains all the attributes in R,
then
set K K A

23
Example

Consider the following schema and the set of FDs
R (Ssn, Pnumber, Ename, Pname, Plocation,
Hours)
F Ssn ? Ename, Pnumber ? Pname, Plocation,
Ssn,Pnumber ? Hours
The Key is Ssn,Pnumber, since
Ssn,Pnumber Ssn, Pnumber, Ename, Pname,
Plocation, Hours

24
Equivalence of two sets of FDs

Definition A set of FDs F covers another set of
FDs E if every FD in E is also in F.
Definition F and E are equivalent if F E
(i.e. F covers E and E covers F).
We can determine whether F covers E by
calculating X with respect to F for each FD XY
in E, and then checking whether this X includes
the attributes in Y. If this is the case for
every FD in E, then F covers E.

25
Minimal Cover

Sets of functional dependencies may have
redundant dependencies that can be inferred from
the others
Eg A ? C is redundant in A ? B, B ? C,
A ? C
Parts of a functional dependency may be redundant
E.g. on RHS A ? B, B ? C, A ? CD can
be simplified to A ?
B, B ? C, A ? D
E.g. on LHS A ? B, B ? C, AC ? D can
be simplified to A ?
B, B ? C, A ? D
Intuitively, a canonical cover of F is a
minimal set of functional dependencies
equivalent to F, having no redundant dependencies
or redundant parts of dependencies

26
Minimal Sets of FDs

A set of FDs F is minimal if
Every dependency in F has a single attribute for
its right-hand side.
We cant replace any dependency X? A in F with a
dependency Y?A where Y ? X and still have a set
of dependencies equivalent with F.
This ensures that there are no redundancies by
having redundant attributes on the left-hand side
of a dependency.
We cant remove any dependency from F and still
have a set of dependencies equivalent with F.
This ensures that there are no redundancies by
having a dependency that can be inferred from the
remaining FDs in F.

27
Minimal Cover of a set of FDs

A minimal cover of a set of FDs E is a minimal
set of dependencies F that is equivalent to E.
There can be several minimal covers for a set of
FDs.
Algorithm Finding Minimal Cover F for a set of
Fds E.
1. Set F E.
2. Replace each Fd X ? A1,..., An in F by the n
Fds X?A1,..., X?An.
3. For each Fd X?A in F
For each attribute B?X
If F - X?A ? (X - B) ?A ? F
Then replace X?A with (X - B) ?A in F.
4. For each remaining Fd X?A in F
If F - X?A?F, then remove X?A from F.

28
Examples

E A ? B, ABCD ? E, EF ? GH, ACDF ? EG
has the following minimal cover
F A ? B, ACD ? E, EF ? G, EF ? H
E Ssn ? Ename, Pnumber ? Pname, Plocation,
Ssn,Pnumber ? Hours
F Ssn ? Ename, Pnumber ? Pname, Pnumber ?
Plocation, Ssn,Pnumber ? Hours

29
Normal Forms

Returning to the issue of schema refinement, the
first question to ask is whether any refinement
is needed!
If a relation is in a certain normal form (BCNF,
3NF etc.), it is known that certain kinds of
problems are avoided/minimized. This can be used
to help us decide whether decomposing the
relation will help.
Role of FDs in detecting redundancy
Consider a relation R with 3 attributes, ABC.
No FDs hold There is no redundancy here.
Given A ? B Several tuples could have the
same A value, and if so, theyll all have the
same B value!

30
Normalization

Normalization of data is a process of analyzing
the given relational schemas based on their FDs
and primary keys to achieve the desirable
properties of
minimizing redundancy
minimizing update, insert and delete anomalies.
The process of normalization through
decomposition must also confirm two additional
properties
Lossless join property (nonadditive join
property)
Dependency preservation property

31
Overview of NFs
NF2 1NF 2NF 3NF BCNF
32
Normal Forms

NF2 non-first normal form.
1NF R is in 1NF iff all domain values are
atomic.
2NF R is in 2. NF iff R is in 1NF and every
non-key attribute is fully dependent on the key.
3NF R is in 3NF iff R is 2NF and every nonkey
attribute is non-transitively dependent on the
key
BCNF R is in BCNF iff every determinant is a
candidate key
Determinant an attribute on which some other
attribute(s) is fully functionally dependent.

33
1NF Atomic Values
Attributes must be defined over domains with
atomic values
34
2NF Full Functional Dependency

An FD X ? Y is a full functional dependency if
any attribute A ? X , X A does not
functionally determine Y.
An FD X ? Y is a partial functional dependency if
some attribute A ? X , (X A) ? Y.
E.g. ssn, proj_num ? hours is a full FD.
ssn, proj_num ? ename is a partial FD.
Definition A relation schema R is in 2NF if R
is in 1NF and every nonprime attribute A is fully
functionally dependent on the primary key.

35
Third Normal Form (3NF)

Relation R with FDs F is in 3NF if, for all X ?
A in F
A ? X (called a trivial FD), or
X contains a key for R, or
A is part of some key for R.
Minimality of a key is crucial in third condition
above!
If R is in 3NF, obviously in 2NF.
If R is in 3NF, some redundancy is still
possible. It is a compromise, used when BCNF not
achievable (e.g., no good decomposition, or
performance considerations).
Lossless-join, dependency-preserving
decomposition of R into a collection of 3NF
relations always possible.

36
What Does 3NF Achieve?

If 3NF is violated by X ? A, one of the following
holds
X is a subset of some key K
We store (X, A) pairs redundantly.
X is not a proper subset of any key.
There is a chain of FDs K ? X ? A, which means
that we cannot associate an X value with a K
value unless we also associate an A value with an
X value.
But even if relation is in 3NF, these problems
could arise.
e.g., Reserves SBDC, S ? C, C ? S is in
3NF, but for each reservation of sailor S, same
(S, C) pair is stored.

37
Boyce-Codd Normal Form (BCNF)

Reln R with FDs F is in BCNF if, for all X ? A
in F
A X (called a trivial FD), or
X contains a key for R.
In other words, R is in BCNF if the only
non-trivial FDs that hold over R are key
constraints.
No dependency in R that can be predicted using
FDs alone.
If we are shown two tuples that agree upon the X
value, we cannot infer the A value in
one tuple from
the A value in the other.
If example relation is in BCNF, then 2 tuples
must be identical
(since X is a key).

Write a Comment

User Comments (0)

About PowerShow.com

Database Theory and Methodology - PowerPoint PPT Presentation

Database Theory and Methodology

So far we have not developed any measure of 'goodness' to measure the quality of ... We need formal concepts to define 'goodness' ... The Evils of Redundancy ... – PowerPoint PPT presentation