Database Theory and Methodology - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Database Theory and Methodology

Description:

So far we have not developed any measure of 'goodness' to measure the quality of ... We need formal concepts to define 'goodness' ... The Evils of Redundancy ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 38
Provided by: nihankes
Category:

less

Transcript and Presenter's Notes

Title: Database Theory and Methodology


1
Database Theory and Methodology
2
The Good and the Bad
  • So far we have not developed any measure of
    goodness to measure the quality of the design,
    other than our intuition.
  • Bad database design
  • Redundant Information
  • Update anomalies
  • Wasting storage with null values
  • Generation of invalid data during joins
  • Good database design
  • Easy to explain its semantics
  • No redundancy (or at least reduced redundancy)
  • No information loss during joins
  • We need formal concepts to define goodness and
    badness of relational schemas.

3
The Evils of Redundancy
  • Redundancy is at the root of several problems
    associated with relational schemas
  • Redundant storage
  • Insertion anomalies
  • Deletion anomalies
  • Update anomalies

4
Example Bad Design
  • redundancy airline name repeated for the same
    flight
  • inconsistency when airline name for a flight
    changes, it must be changed in many places

5
Bad Database Design
  • insertion anomalies how do we represent that
    SK912 is flown by Scandinavian without there
    being a date and a plane assigned?
  • deletion anomalies cancelling AA411 on 10/22/00
    makes us lose that it is flown by American.
  • update anomalies if DL242 is flown by Sabena, we
    must change it everywhere.

6
Bad Database Design- decomposition
7
Bad Database Design- information loss
  • information loss we polluted the database with
    false facts we cant find the true facts.

8
Good Database Design
  • no redundancy of FACT (!)
  • no inconsistency
  • no insertion, deletion or update anomalies
  • no information loss

9
Goal Devise a Theory for the Following
  • Decide whether a particular relation R is in
    good form.
  • In the case that a relation R is not in good
    form, decompose it into a set of relations R1,
    R2, ..., Rn such that
  • each relation is in good form
  • the decomposition is a lossless-join
    decomposition
  • Our theory is based on
  • functional dependencies
  • multivalued dependencies

10
Functional Dependencies (FDs)
  • A functional dependency is a constraint between
    two sets of attributes from the database.
  • Definition A functional dependency, denoted by X
    ? Y, holds over relation R if, for every
    allowable instance r of R
  • t1 ? r, t2 ? r, t1.X t2.X implies t1.Y t2.Y
  • i.e., given two tuples in r, if the X values
    agree, then the Y values must also agree. (X and
    Y are sets of attributes.)
  • An FD is a statement about all allowable
    relations.
  • Must be identified based on semantics of
    application.
  • Given some allowable instance r1 of R, we can
    check if it violates some FD f, but we cannot
    tell if f holds over R!
  • If X ? Y in R, this does not say whether or not Y
    ? X.

11
Functional Dependencies and Keys
  • A key constraint is a special case of an FD.
  • Note the following
  • If X is a candidate key, X ? Y for any subset of
    attributes Y of R.
  • X ? Y does not require that the set X be minimal
    the additional minimality condition must be met
    for X to be a key.
  • If X ? Y holds where Y is the set of all
    attributes in R, and there is some subset V of X
    s.t. V ? Y, then X is a superkey.

12
Example Constraints on Entity Set
  • Consider relation obtained from Hourly_Emps
  • Hourly_Emps (ssn, name, lot, rating, hrly_wages,
    hrs_worked)
  • Notation We will denote this relation schema by
    listing the attributes SNLRWH
  • This is really the set of attributes
    S,N,L,R,W,H.
  • Sometimes, we will refer to all attributes of a
    relation by using the relation name. (e.g.,
    Hourly_Emps for SNLRWH)
  • Some FDs on Hourly_Emps
  • ssn is the key S ? SNLRWH
  • rating determines hrly_wages R ? W

13
Example (Contd.)
  • Problems due to R ? W
  • Update anomaly Can we change W in
    just the 1st tuple of SNLRWH?
  • Insertion anomaly What if we want to insert an
    employee and dont know the hourly wage for his
    rating?
  • Deletion anomaly If we delete all employees with
    rating 5, we lose the information about the wage
    for rating 5!

Will two smaller tables be better?
14
Closure of a set of FDs
  • Given some FDs, we can usually infer additional
    FDs
  • ssn ? did, did ? lot implies ssn ? lot
  • mgr_ssn ? mgr_phone, dept_no ? mgr_ssn implies
    dept_no ? mgr_phone
  • An FD f is implied by a set of FDs F if f holds
    whenever all FDs in F hold.
  • Formally, the set of all dependencies that
    include F as well as all dependencies that can be
    inferred from F is called the closure of F it is
    denoted by F.

15
Inference Rules for FDs
  • Armstrongs Axioms (X, Y, Z are sets of
    attributes)
  • Reflexivity If Y ? X then X ? Y.
  • Augmentation If X ? Y, then XZ ? YZ for
    any Z.
  • Transitivity If X ? Y and Y ? Z, then X
    ? Z.
  • These are sound and complete inference rules for
    FDs!
  • By sound we mean that they generate only FDs in
    F when applied to a set F of FDs.
  • By complete we mean that repeated application of
    these rules will generate all FDs in the closure
    of F.

16
Derived inference rules
  • Union if XY and XZ, then XYZ.
  • Decomposition if XYZ, then XY and XZ.
  • Pseudotransitivity if XY and WYZ, then WXZ.
  • These additional rules are not essential their
    soundness can be proved using Armstrongs Axioms.
  • Exercise Prove rules Union, Decomposition and
    Pseudotransitivity using A.A.

17
Example
  • Consider the following schema
  • Contracts(cid,sid,jid,did,pid,qty,value)
  • C is the key C ? CSJDPQV
  • Project purchases each part using single
    contract JP ? C
  • Dept purchases at most one part from a supplier
    SD ? P
  • JP ? C, C ? CSJDPQV imply JP?CSJDPQV
  • SD?P implies SDJ ? JP
  • SDJ ? JP, JP ? CSJDPQV imply SDJ ?
    CSJDPQV
  • We can infer several additional FDs that are in
    the closure by using augmentation or
    decomposition. e.g. C ? CSJDPQV implies C ? C,
    C? S,C ? J, C? D, C? P, C? Q, C? V.
  • We have also a number of trivial FDs from
    reflexivity rule.

18
Reasoning About FDs
  • Computing the closure of a set of FDs can be
    expensive. (Size of closure is exponential in
    attrs!)
  • Typically, we just want to check if a given FD X
    ? Y is in the closure of a set of FDs F. An
    efficient check
  • Compute attribute closure of X (denoted X) wrt
    F
  • Set of all attributes A such that X ? A is in F.
  • There is a linear time algorithm to compute this.
  • Check if Y is in X.
  • Does F A ?B, B ? C, CD ? E imply A ? E?
  • i.e, is A ?E in the closure F ?
    (Equivalently, is E in X)?

19
Attribute Closure
  • X X
  • repeat
  • oldX X
  • for each functional dependency Y ? Z in F do
  • if Y ? X then X X ? Z
  • until X oldX

20
Example
  • Given the following set of FDs
  • F Ssn ? Ename,
  • Pnumber ? Pname, Plocation,
  • Ssn,Pnumber ? Hours
  • The attribute closures
  • Ssn Ssn, Ename
  • Pnumber Pnumber, Pname,Plocation
  • Ssn,Pnumber Ssn, Pnumber, Ename, Pname,
    Plocation, Hours

21
Example
  • R (A, B, C, G, H, I)
  • F A ? B A ? C CG ? H CG ? I B ? H
  • (AG)
  • result AG.
  • result ABCG (A ? C and A ? B)
  • result ABCGH (CG ? H and CG ? AGBC)
  • 4. result ABCGHI (CG ? I and CG ? AGBCH)
  • Is AG a candidate key?
  • Is AG a super key?
  • Does AG ? R? Is (AG) ? R
  • Is any subset of AG a superkey?
  • Does A ? R? Is (A) ? R
  • Does G ? R? Is (G) ? R

22
Finding a Key for a Relation
  • Algorithm Finding a Key K for R, given a set of
    FDs
  • Set K R.
  • For each attribute A in K
  • Compute (K-A) wrt F
  • If (K-A) contains all the attributes in R,
    then
  • set K K A

23
Example
  • Consider the following schema and the set of FDs
  • R (Ssn, Pnumber, Ename, Pname, Plocation,
    Hours)
  • F Ssn ? Ename, Pnumber ? Pname, Plocation,
    Ssn,Pnumber ? Hours
  • The Key is Ssn,Pnumber, since
  • Ssn,Pnumber Ssn, Pnumber, Ename, Pname,
    Plocation, Hours

24
Equivalence of two sets of FDs
  • Definition A set of FDs F covers another set of
    FDs E if every FD in E is also in F.
  • Definition F and E are equivalent if F E
    (i.e. F covers E and E covers F).
  • We can determine whether F covers E by
    calculating X with respect to F for each FD XY
    in E, and then checking whether this X includes
    the attributes in Y. If this is the case for
    every FD in E, then F covers E.

25
Minimal Cover
  • Sets of functional dependencies may have
    redundant dependencies that can be inferred from
    the others
  • Eg A ? C is redundant in A ? B, B ? C,
    A ? C
  • Parts of a functional dependency may be redundant
  • E.g. on RHS A ? B, B ? C, A ? CD can
    be simplified to A ?
    B, B ? C, A ? D
  • E.g. on LHS A ? B, B ? C, AC ? D can
    be simplified to A ?
    B, B ? C, A ? D
  • Intuitively, a canonical cover of F is a
    minimal set of functional dependencies
    equivalent to F, having no redundant dependencies
    or redundant parts of dependencies

26
Minimal Sets of FDs
  • A set of FDs F is minimal if
  • Every dependency in F has a single attribute for
    its right-hand side.
  • We cant replace any dependency X? A in F with a
    dependency Y?A where Y ? X and still have a set
    of dependencies equivalent with F.
  • This ensures that there are no redundancies by
    having redundant attributes on the left-hand side
    of a dependency.
  • We cant remove any dependency from F and still
    have a set of dependencies equivalent with F.
  • This ensures that there are no redundancies by
    having a dependency that can be inferred from the
    remaining FDs in F.

27
Minimal Cover of a set of FDs
  • A minimal cover of a set of FDs E is a minimal
    set of dependencies F that is equivalent to E.
  • There can be several minimal covers for a set of
    FDs.
  • Algorithm Finding Minimal Cover F for a set of
    Fds E.
  • 1. Set F E.
  • 2. Replace each Fd X ? A1,..., An in F by the n
    Fds X?A1,..., X?An.
  • 3. For each Fd X?A in F
  • For each attribute B?X
  • If F - X?A ? (X - B) ?A ? F
  • Then replace X?A with (X - B) ?A in F.
  • 4. For each remaining Fd X?A in F
  • If F - X?A?F, then remove X?A from F.

28
Examples
  • E A ? B, ABCD ? E, EF ? GH, ACDF ? EG
    has the following minimal cover
  • F A ? B, ACD ? E, EF ? G, EF ? H
  • E Ssn ? Ename, Pnumber ? Pname, Plocation,
    Ssn,Pnumber ? Hours
  • F Ssn ? Ename, Pnumber ? Pname, Pnumber ?
    Plocation, Ssn,Pnumber ? Hours

29
Normal Forms
  • Returning to the issue of schema refinement, the
    first question to ask is whether any refinement
    is needed!
  • If a relation is in a certain normal form (BCNF,
    3NF etc.), it is known that certain kinds of
    problems are avoided/minimized. This can be used
    to help us decide whether decomposing the
    relation will help.
  • Role of FDs in detecting redundancy
  • Consider a relation R with 3 attributes, ABC.
  • No FDs hold There is no redundancy here.
  • Given A ? B Several tuples could have the
    same A value, and if so, theyll all have the
    same B value!

30
Normalization
  • Normalization of data is a process of analyzing
    the given relational schemas based on their FDs
    and primary keys to achieve the desirable
    properties of
  • minimizing redundancy
  • minimizing update, insert and delete anomalies.
  • The process of normalization through
    decomposition must also confirm two additional
    properties
  • Lossless join property (nonadditive join
    property)
  • Dependency preservation property

31
Overview of NFs
NF2 1NF 2NF 3NF BCNF
32
Normal Forms
  • NF2 non-first normal form.
  • 1NF R is in 1NF iff all domain values are
    atomic.
  • 2NF R is in 2. NF iff R is in 1NF and every
    non-key attribute is fully dependent on the key.
  • 3NF R is in 3NF iff R is 2NF and every nonkey
    attribute is non-transitively dependent on the
    key
  • BCNF R is in BCNF iff every determinant is a
    candidate key
  • Determinant an attribute on which some other
    attribute(s) is fully functionally dependent.

33
1NF Atomic Values
Attributes must be defined over domains with
atomic values
34
2NF Full Functional Dependency
  • An FD X ? Y is a full functional dependency if
    any attribute A ? X , X A does not
    functionally determine Y.
  • An FD X ? Y is a partial functional dependency if
    some attribute A ? X , (X A) ? Y.
  • E.g. ssn, proj_num ? hours is a full FD.
  • ssn, proj_num ? ename is a partial FD.
  • Definition A relation schema R is in 2NF if R
    is in 1NF and every nonprime attribute A is fully
    functionally dependent on the primary key.

35
Third Normal Form (3NF)
  • Relation R with FDs F is in 3NF if, for all X ?
    A in F
  • A ? X (called a trivial FD), or
  • X contains a key for R, or
  • A is part of some key for R.
  • Minimality of a key is crucial in third condition
    above!
  • If R is in 3NF, obviously in 2NF.
  • If R is in 3NF, some redundancy is still
    possible. It is a compromise, used when BCNF not
    achievable (e.g., no good decomposition, or
    performance considerations).
  • Lossless-join, dependency-preserving
    decomposition of R into a collection of 3NF
    relations always possible.

36
What Does 3NF Achieve?
  • If 3NF is violated by X ? A, one of the following
    holds
  • X is a subset of some key K
  • We store (X, A) pairs redundantly.
  • X is not a proper subset of any key.
  • There is a chain of FDs K ? X ? A, which means
    that we cannot associate an X value with a K
    value unless we also associate an A value with an
    X value.
  • But even if relation is in 3NF, these problems
    could arise.
  • e.g., Reserves SBDC, S ? C, C ? S is in
    3NF, but for each reservation of sailor S, same
    (S, C) pair is stored.

37
Boyce-Codd Normal Form (BCNF)
  • Reln R with FDs F is in BCNF if, for all X ? A
    in F
  • A X (called a trivial FD), or
  • X contains a key for R.
  • In other words, R is in BCNF if the only
    non-trivial FDs that hold over R are key
    constraints.
  • No dependency in R that can be predicted using
    FDs alone.
  • If we are shown two tuples that agree upon the X
    value, we cannot infer the A value in
    one tuple from
    the A value in the other.
  • If example relation is in BCNF, then 2 tuples
    must be identical
    (since X is a key).
Write a Comment
User Comments (0)
About PowerShow.com