Database Design Theory - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Database Design Theory

Description:

One solution store everything in one big table. Appl(sid,name,addr, cid,title, ... Attribute (set) Y is functionally dependent on attribute (set) X if, whenever ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 42
Provided by: hanrei
Category:
Tags: big | database | design | theory

less

Transcript and Presenter's Notes

Title: Database Design Theory


1
Database Design Theory
  • Which tables to have in a database
  • Normalization

2
Database Design Theory
  • Given some body of data to be represented in a
    database, as modelled in an E-R diagram, what is
    the most suitable logical structure for that
    data?
  • How do we decide on the appropriate tables and
    the attributes of the tables?

3
Requirements
  • Accommodate data integrity.
  • General integrity constraints
  • e.g. referential integrity
  • Domain specific integrity constraints
  • e.g. no user can borrow more than 4 books.
  • Robust in the sense that the design should be
    application independent.
  • We try to achieve this through the elimination of
    redundancy.

4
The Danger of Redundancy
  • Consider the example
  • For students, we want to know student ID, name
    and address.
  • For courses, we need to know course ID, title and
    lecturer.
  • For employees, we need to know the employee ID,
    name and department.
  • For each department, we need to know the
    department ID, the name and the location.
  • For each enrollment, we need to know the grade.

5
The Danger of Redundancy Continued
  • One solution store everything in one big table
  • Appl(sid,name,addr,
  • cid,title,
  • eid, ename,
  • deptid, dname, loc,
  • grade)
  • Clearly, this leads to redundancy.
  • For example, we need to store the students
    address for every course they have been
    registered for.

6
The Danger of Redundancy. Conclusion
  • If everything in one table, then
  • Greater space requirements
  • Insertion anomalies
  • Cannot store information on student who has not
    passed a course yet.
  • Deletion anomalies
  • We may want to delete a course but some student
    may be registered only for that course.
  • Update anomalies
  • If a student changes their address, many tuples
    need to be updated.
  • Danger of inconsistency in database.

7
Good Database Design
  • The basic idea
  • A good database is one in which each table
    consists of a primary key and a set of mutually
    independent attributes.
  • Strategy for achieving a good database design
  • Identify undesirable dependencies in a table and
    decompose by projection.

8
Functional Dependencies (FDs)
  • Attribute (set) Y is functionally dependent on
    attribute (set) X if, whenever two tuples have
    the same value for X, they also have the same
    value for Y.
  • Notation X ? Y
  • X is called the determinant.
  • An FD A? B is non-trivial if and only if B ? A
    and B?A.

9
Functional Dependencies in Our Example.
  • If everything is in one table, then these FDs
    exist
  • sid ? name, addr
  • cid ? title
  • eid ? ename
  • deptid ? dname, loc
  • sid, cid ? grade
  • Note we also have
  • sid, cid, eid, deptid ?
  • all other attributes

10
Keys Again
  • A set of attributes X in a relation R is a
    superkey if every attribute in R is functionally
    dependent on X.
  • A candidate key is a minimal superkey.
  • Alternate keys are candidate keys that have not
    been selected as primary keys.
  • A prime attribute is a member of a candidate key.

11
Armstrongs Axioms
  • Let X,Y and Z be sets of attributes of a relation
    R
  • Reflexivity (X ?Y) ? (X ? Y)
  • Augmentation
  • (X ? Y) ? (XZ ? YZ)
  • Transitivity
  • ((X ?Y) (Y? Z)) ? (X ?Z)
  • Axioms are sound and complete
  • Can derive all FDs that follow from a given set
    of FDs.
  • Derive only true FDs

12
Some Consequences of Armstrongs Axioms
  • The following are implied by Armstrongs axioms
  • Decomposition
  • (X ? YZ) ?(X ? Y)
  • Union
  • ((X ? Y)(X ? Z)) ? (X ? YZ)
  • Pseudo transitivity
  • ((X ? Y)(WY ? Z)) ? (WX ? Z)

13
Closure of a Set of Functional Dependencies
  • If F is a set of functional dependencies, the
    closure of F, F, is the set of all functional
    dependencies logically implied by those in F.
  • Useful since it allows us to determine candidate
    keys (there must be functional dependency to all
    other attributes), but very expensive to compute.

14
Closure Under a Set of Functional Dependencies
  • Since F is too expensive to compute, we use
    closure of X under a set of functional
    dependencies, X.
  • (X ? Y) in F if and only if
  • Y ? X.
  • Since X is relatively easy to compute, we can
    now verify whether X is a superkey.

15
Computing X.
  • To compute X under a set of FDs F
  • INPUT X, F
  • OUTPUT X
  • S X
  • WHILE
  • there is a (Z?Y) in F
  • with Z ? S and Y ? S
  • DO S SY
  • ENDWHILE
  • X S

16
Decomposition
  • Recall that having identified undesirable FDs, we
    now need to decompose.
  • Decomposition
  • Let U be a relation scheme. A set of R1,..,Rn
    of relation schemes is a decomposition of U if
  • R1 ? ? Rn U
  • Every attribute of U occurs in at least one Ri.

17
Desirable Properties of a Decomposition
  • Decompositions should be
  • Lossless
  • Dependency preserving
  • No redundancy
  • Minimal number of tables
  • Sometimes, not all properties can be achieved
    simultaneously.

18
Lossless Decomposition
  • Let
  • R1,..,Rn a decomposition of U
  • u relation instance over U
  • Pi ?Ri(u) for i from 1 to n
  • Then
  • R1,..,Rn is a lossless decomposition if
  • u P1 P n
  • In other words, the original relation can be
    reconstructed.

19
Dependency Preserving Decompositions
  • In decomposing a table, ensure that any FDs are
    easily enforceable.
  • Example
  • Relation U(A,B,C)
  • FDs A ? B, A ? C, B ? C
  • If we decompose U into R(A,B) and S(B,C), then A
    ? B, B ? C can be easily enforced when changing R
    or S.
  • Because of transitivity, A ? C is automatically
    enforced.

20
Non-Dependency Preserving Decomposition
  • If we decompose U into R(A,B) and S(A,C), then
    enforcing A ? B and A ? C is easy.
  • However, B ? C becomes an interrelational
    constraint and can only be enforced through a
    join.
  • This decomposition is not dependency preserving.

21
Normalization
  • Normal forms, as defined in relational database
    theory, are guidelines for the design of the
    tables in the database.
  • Normalization reduces redundancy.
  • Important to remember why we want to avoid
    redundancy
  • Space requirements
  • Insertion, deletion and update anomalies.

22
The Normal Forms
  • First normal form
  • Second normal form
  • Third normal form
  • Boyce-Codd normal form
  • Fourth normal form
  • Fifth normal form
  • The normal forms are ordered in that everything
    in 2NF is also in 1NF.
  • We ignore 5NF, as violations hardly occur in
    practice.

23
First Normal Form
  • A relation is in 1NF iff the value of each
    attribute in a tuple is atomic.
  • A relation which is not in 1NF
  • SID CID GRADE
  • 123 CS33Q A
  • CS35A B
  • 234 CS33Q C
  • CS34A B
  • CS36Q B

24
Getting Tables into 1NF
  • Normalizing a table which is not in 1NF is easy
    Simply repeat the other fields.
  • Thus
  • SID CID GRADE
  • 123 CS33Q A
  • 123 CS35A B
  • 234 CS33Q C
  • 234 CS34A B
  • 234 CS36Q B

25
Second and Third Normal Form
  • Second and third normal form concern relationship
    between non-key and prime attributes.
  • Recall that a prime attribute is a member of a
    candidate key.
  • Under 2NF and 3NF, a non-key attribute value must
    provide a fact about the key, the whole key and
    nothing but the key.
  • Every non-prime attribute must be fully
    functionally dependent on a candidate key.

26
Second Normal Form
  • 2NF is violated when a non-key attribute depends
    on a proper subset of a candidate key.
  • The following violates 2NF
  • Result(cid, sid, name, grade)
  • as name is functionally dependent on sid alone.

27
Dangers of Violating 2NF
  • Note that name is repeated for every course that
    a student has a grade for.
  • Problems
  • Danger of inconsistency if a student changes
    their name, e.g., by getting married.
  • If a student has not passed any courses yet, then
    the students name cannot be stored.

28
Getting Tables into 2NF
  • Decompose the table into
  • Result(cid, sid, grade)
  • Student(sid, name)
  • This decomposition leads to longer retrieval
    times for queries which involve joins.
  • Normalization is necessary to avoid anomalies
    which arise because of changes to attributes.
  • If little chance of changes, then sometimes do
    not normalize.

29
Third Normal Form
  • Third normal form is violated when a non-prime
    attribute depends on another non-prime attribute.
  • The following violates 3NF
  • Empl(eid, dept, loc)
  • loc is a fact about dept.
  • Danger same as violation of 2NF.

30
Getting Tables into 3NF
  • Again, decompose
  • Empl(eid, dept)
  • Department(dept, loc)
  • We can always restore 3NF through a lossless and
    dependency preserving decomposition.

31
Boyce-Codd Normal Form (BCNF)
  • A relation scheme R is in BCNF if every
    determinant of a FD over R is a candidate key.
  • In other words, the determinant of every FD is a
    superkey.
  • Violation of BCNF
  • R(A,B,C,D,E,F)
  • A ? BC, D ? AEF
  • D ABCDEF
  • D is a good primary key.
  • A ABC

32
Another Violation of BCNF
  • Assume that we give each registration for a
    course a unique registration number
  • Reg(rid, sid, cid, sname, grade)
  • FDs
  • rid ? sid, cid
  • sid, cid ? rid, grade
  • sid ? sname
  • rid all attributes

33
Getting Tables into BCNF
  • Decompose according to the FD whose determinant
    is not a superkey.
  • In our example, sid ? sname
  • This gives
  • Reg(rid, sid, cid, grade)
  • Stud(sid, sname)
  • Not always possible to get tables into BCNF while
    preserving all functional dependencies.

34
Example where BCNF is not possible
  • Consider
  • R(A,B,C)
  • AB ? C, C ? B
  • Not in BCNF because C is not a superkey.
  • However, every decomposition of R fails to be
    dependency preserving as we have to split up the
    attributes in AB ? C
  • Have to settle for 3NF.

35
Multivalued Dependencies (MVDs)
  • In an FD, X ? Y, knowing the value of X means
    that you know the unique value for Y.
  • In an MVD, X ? ? Y, knowing the value of X means
    that you know the set of values from which Y can
    come.

36
Example of MVD
  • Assume we have two streams for some course,
    taught by different instructors, and that for
    each course, we use two textbooks.
  • Example
  • course instructor text
  • CS35A Rao Date
  • Harold Korth
  • CS34A Rao Jackson
  • Mugisa Rich

37
Example of MVD Continued
  • Putting table in 1NF gives
  • Course Instructor Text
  • CS35A Rao Date
  • CS35A Rao Korth
  • CS35A Harold Date
  • CS35A Harold Korth
  • CS34A Rao Jackson
  • CS34A Rao Rich
  • CS34A Mugisa Jackson
  • CS34A Mugisa Rich
  • With primary key
  • Course, Instructor, Text
  • Since no FD, in BCNF.

38
Redundancy because of MVDs
  • However, still redundancy in the table because
  • if ltc,p,xgt and ltc,p,xgt in table ltc,p,xgt and
    ltc,p,xgt in table too.
  • The table contains two multivalued dependencies
  • Course ? ? Instructor
  • Course ? ? Text
  • Danger of insertion and update anomalies

39
Fourth Normal Form
  • Under 4NF, a relation should not contain two or
    more independent MVDs.
  • In other words, if there is a MVD, X ? ? Y, then
    X should be a superkey.

40
Getting Tables into 4NF
  • Again, get a table into 4NF through decomposition
    so that each MVD is captured in a separate table.
  • Example
  • CP(Course, Instructor)
  • CT(Course, Text)

41
Normalization Reconsidered
  • Normalization helps avoid
  • Insertion anomalies
  • Update anomalies
  • Deletion anomalies
  • Normalization increases retrieval time for some
    queries.
Write a Comment
User Comments (0)
About PowerShow.com