Database Design Theory - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

Database Design Theory

Description:

One solution store everything in one big table. Appl(sid,name,addr, cid,title, ... Attribute (set) Y is functionally dependent on attribute (set) X if, whenever ... – PowerPoint PPT presentation

Number of Views:21

Avg rating:3.0/5.0

Slides: 42

Provided by: hanrei

Category:

more less

Transcript and Presenter's Notes

Title: Database Design Theory

1
Database Design Theory

Which tables to have in a database
Normalization

2
Database Design Theory

Given some body of data to be represented in a
database, as modelled in an E-R diagram, what is
the most suitable logical structure for that
data?
How do we decide on the appropriate tables and
the attributes of the tables?

3
Requirements

Accommodate data integrity.
General integrity constraints
e.g. referential integrity
Domain specific integrity constraints
e.g. no user can borrow more than 4 books.
Robust in the sense that the design should be
application independent.
We try to achieve this through the elimination of
redundancy.

4
The Danger of Redundancy

Consider the example
For students, we want to know student ID, name
and address.
For courses, we need to know course ID, title and
lecturer.
For employees, we need to know the employee ID,
name and department.
For each department, we need to know the
department ID, the name and the location.
For each enrollment, we need to know the grade.

5
The Danger of Redundancy Continued

One solution store everything in one big table
Appl(sid,name,addr,
cid,title,
eid, ename,
deptid, dname, loc,
grade)
Clearly, this leads to redundancy.
For example, we need to store the students
address for every course they have been
registered for.

6
The Danger of Redundancy. Conclusion

If everything in one table, then
Greater space requirements
Insertion anomalies
Cannot store information on student who has not
passed a course yet.
Deletion anomalies
We may want to delete a course but some student
may be registered only for that course.
Update anomalies
If a student changes their address, many tuples
need to be updated.
Danger of inconsistency in database.

7
Good Database Design

The basic idea
A good database is one in which each table
consists of a primary key and a set of mutually
independent attributes.
Strategy for achieving a good database design
Identify undesirable dependencies in a table and
decompose by projection.

8
Functional Dependencies (FDs)

Attribute (set) Y is functionally dependent on
attribute (set) X if, whenever two tuples have
the same value for X, they also have the same
value for Y.
Notation X ? Y
X is called the determinant.
An FD A? B is non-trivial if and only if B ? A
and B?A.

9
Functional Dependencies in Our Example.

If everything is in one table, then these FDs
exist
sid ? name, addr
cid ? title
eid ? ename
deptid ? dname, loc
sid, cid ? grade
Note we also have
sid, cid, eid, deptid ?
all other attributes

10
Keys Again

A set of attributes X in a relation R is a
superkey if every attribute in R is functionally
dependent on X.
A candidate key is a minimal superkey.
Alternate keys are candidate keys that have not
been selected as primary keys.
A prime attribute is a member of a candidate key.

11
Armstrongs Axioms

Let X,Y and Z be sets of attributes of a relation
R
Reflexivity (X ?Y) ? (X ? Y)
Augmentation
(X ? Y) ? (XZ ? YZ)
Transitivity
((X ?Y) (Y? Z)) ? (X ?Z)
Axioms are sound and complete
Can derive all FDs that follow from a given set
of FDs.
Derive only true FDs

12
Some Consequences of Armstrongs Axioms

The following are implied by Armstrongs axioms
Decomposition
(X ? YZ) ?(X ? Y)
Union
((X ? Y)(X ? Z)) ? (X ? YZ)
Pseudo transitivity
((X ? Y)(WY ? Z)) ? (WX ? Z)

13
Closure of a Set of Functional Dependencies

If F is a set of functional dependencies, the
closure of F, F, is the set of all functional
dependencies logically implied by those in F.
Useful since it allows us to determine candidate
keys (there must be functional dependency to all
other attributes), but very expensive to compute.

14
Closure Under a Set of Functional Dependencies

Since F is too expensive to compute, we use
closure of X under a set of functional
dependencies, X.
(X ? Y) in F if and only if
Y ? X.
Since X is relatively easy to compute, we can
now verify whether X is a superkey.

15
Computing X.

To compute X under a set of FDs F
INPUT X, F
OUTPUT X
S X
WHILE
there is a (Z?Y) in F
with Z ? S and Y ? S
DO S SY
ENDWHILE
X S

16
Decomposition

Recall that having identified undesirable FDs, we
now need to decompose.
Decomposition
Let U be a relation scheme. A set of R1,..,Rn
of relation schemes is a decomposition of U if
R1 ? ? Rn U
Every attribute of U occurs in at least one Ri.

17
Desirable Properties of a Decomposition

Decompositions should be
Lossless
Dependency preserving
No redundancy
Minimal number of tables
Sometimes, not all properties can be achieved
simultaneously.

18
Lossless Decomposition

Let
R1,..,Rn a decomposition of U
u relation instance over U
Pi ?Ri(u) for i from 1 to n
Then
R1,..,Rn is a lossless decomposition if
u P1 P n
In other words, the original relation can be
reconstructed.

19
Dependency Preserving Decompositions

In decomposing a table, ensure that any FDs are
easily enforceable.
Example
Relation U(A,B,C)
FDs A ? B, A ? C, B ? C
If we decompose U into R(A,B) and S(B,C), then A
? B, B ? C can be easily enforced when changing R
or S.
Because of transitivity, A ? C is automatically
enforced.

20
Non-Dependency Preserving Decomposition

If we decompose U into R(A,B) and S(A,C), then
enforcing A ? B and A ? C is easy.
However, B ? C becomes an interrelational
constraint and can only be enforced through a
join.
This decomposition is not dependency preserving.

21
Normalization

Normal forms, as defined in relational database
theory, are guidelines for the design of the
tables in the database.
Normalization reduces redundancy.
Important to remember why we want to avoid
redundancy
Space requirements
Insertion, deletion and update anomalies.

22
The Normal Forms

First normal form
Second normal form
Third normal form
Boyce-Codd normal form
Fourth normal form
Fifth normal form
The normal forms are ordered in that everything
in 2NF is also in 1NF.
We ignore 5NF, as violations hardly occur in
practice.

23
First Normal Form

A relation is in 1NF iff the value of each
attribute in a tuple is atomic.
A relation which is not in 1NF
SID CID GRADE
123 CS33Q A
CS35A B
234 CS33Q C
CS34A B
CS36Q B

24
Getting Tables into 1NF

Normalizing a table which is not in 1NF is easy
Simply repeat the other fields.
Thus
SID CID GRADE
123 CS33Q A
123 CS35A B
234 CS33Q C
234 CS34A B
234 CS36Q B

25
Second and Third Normal Form

Second and third normal form concern relationship
between non-key and prime attributes.
Recall that a prime attribute is a member of a
candidate key.
Under 2NF and 3NF, a non-key attribute value must
provide a fact about the key, the whole key and
nothing but the key.
Every non-prime attribute must be fully
functionally dependent on a candidate key.

26
Second Normal Form

2NF is violated when a non-key attribute depends
on a proper subset of a candidate key.
The following violates 2NF
Result(cid, sid, name, grade)
as name is functionally dependent on sid alone.

27
Dangers of Violating 2NF

Note that name is repeated for every course that
a student has a grade for.
Problems
Danger of inconsistency if a student changes
their name, e.g., by getting married.
If a student has not passed any courses yet, then
the students name cannot be stored.

28
Getting Tables into 2NF

Decompose the table into
Result(cid, sid, grade)
Student(sid, name)
This decomposition leads to longer retrieval
times for queries which involve joins.
Normalization is necessary to avoid anomalies
which arise because of changes to attributes.
If little chance of changes, then sometimes do
not normalize.

29
Third Normal Form

Third normal form is violated when a non-prime
attribute depends on another non-prime attribute.
The following violates 3NF
Empl(eid, dept, loc)
loc is a fact about dept.
Danger same as violation of 2NF.

30
Getting Tables into 3NF

Again, decompose
Empl(eid, dept)
Department(dept, loc)
We can always restore 3NF through a lossless and
dependency preserving decomposition.

31
Boyce-Codd Normal Form (BCNF)

A relation scheme R is in BCNF if every
determinant of a FD over R is a candidate key.
In other words, the determinant of every FD is a
superkey.
Violation of BCNF
R(A,B,C,D,E,F)
A ? BC, D ? AEF
D ABCDEF
D is a good primary key.
A ABC

32
Another Violation of BCNF

Assume that we give each registration for a
course a unique registration number
Reg(rid, sid, cid, sname, grade)
FDs
rid ? sid, cid
sid, cid ? rid, grade
sid ? sname
rid all attributes

33
Getting Tables into BCNF

Decompose according to the FD whose determinant
is not a superkey.
In our example, sid ? sname
This gives
Reg(rid, sid, cid, grade)
Stud(sid, sname)
Not always possible to get tables into BCNF while
preserving all functional dependencies.

34
Example where BCNF is not possible

Consider
R(A,B,C)
AB ? C, C ? B
Not in BCNF because C is not a superkey.
However, every decomposition of R fails to be
dependency preserving as we have to split up the
attributes in AB ? C
Have to settle for 3NF.

35
Multivalued Dependencies (MVDs)

In an FD, X ? Y, knowing the value of X means
that you know the unique value for Y.
In an MVD, X ? ? Y, knowing the value of X means
that you know the set of values from which Y can
come.

36
Example of MVD

Assume we have two streams for some course,
taught by different instructors, and that for
each course, we use two textbooks.
Example
course instructor text
CS35A Rao Date
Harold Korth
CS34A Rao Jackson
Mugisa Rich

37
Example of MVD Continued