Title: CS 541 Review
1CS 541Review
2Course Outline
- Course Introduction
- Relational Data Model
- Data Modeling
- Entity-Relationship
- Constraints
- Relational Theory
- Relational Algebra
- Keys and Dependencies
- Normalization
- Using a Relational Database
- SQL, Views, Constraints, Triggers
- Storage mechanisms
- I/O cost estimation
- Putting the Data on Disk
- Block/record
- Buffer management
- Indexing
- Dense vs. Sparse
- Index Sequential files
- B-Trees
- Hashing / Bitmap Indexes
- Query Processing
- Query Optimization
- Handling Failure
- Concurrency Control
- Transaction Management
- Research topics (not on final)
3Entity/Relationship Model
- Diagrams to represent designs.
- Entity like object, thing.
- Entity set like class set of similar
entities/objects. - Attribute property of entities in an entity
set, similar to fields of a struct. - In diagrams, entity set ? rectangleattribute ?
oval.
name
phone
ID
Students
height
4- Complex Example
- Keys
- One to many, many to many, one to one
relationships - Weak entity sets
UPC
Buyer
Product
Ordered
OB
OB
OP
OP
Name
Qty Ordered
ID
Shipment
Part of
Part-of ismany-many and not a weak relationship!
Qty Shipped
5Design Principles
- Setting client has (possibly vague) idea of what
he/she wants. You must design a database that
represents these thoughts and only these
thoughts. - Avoid redundancy
- saying the same thing more than once.
- Wastes space and encourages inconsistency.
- Example
- Good
name
addr
name
ManfBy
Beers
Manfs
6Use Schema to Enforce Constraints
- The design schema should enforce as many
constraints as possible. - Don't rely on future data to follow assumptions.
- Example
- If registrar wants to associate only one
instructor with a course, don't allow sets of
instructors and count on departments to enter
only one instructor per course.
7 Intuitive Rule for E.S. Vs. Attribute
- Make an entity set only if it either
- Is more than a name of something i.e., it has
nonkey attributes or relationships with a number
of different entity sets, or - Manfs deserves to be an E.S. because we record
addr, a nonkey attribute. - Is the many in a many-one relationship The
following design illustrates both points - Beers deserves to be an E.S. because it is at the
many end.
name
addr
name
ManfBy
Beers
Manfs
8 Don't Overuse Weak E.S.
- There is a tendency to feel that no E.S. has its
entities uniquely determined without following
some relationships. - However, in practice, we almost always create
unique ID's to compensate social-security
numbers, VIN's, etc. - The only times weak E.S.'s seem necessary are
when - We can't easily create such ID's e.g., no one is
going to accept a species ID as part of the
standard nomenclature (species is a weak E.S.
supported by membership in a genus). - There is no global authority to create them,
e.g., crews and studios.
9Relational Model
- Table relation.
- Column headers attributes.
- Row tuple
- Beers
- Relation schema name(attributes) other
structure info.,e.g., keys, other constraints.
Example Beers(name, manf) - Order of attributes is arbitrary, but in practice
we need to assume the order given in the relation
schema. - Relation instance is current set of rows for a
relation schema. - Database schema collection of relation schemas.
name manf WinterBrew Petes BudLite A.B.
10Relational Data Model
- Set theoretic
- Domain set of values
- like a data type
- Cartesian product (or product)
- D1 ??D2 ??... ? Dn
- n-tuples (V1,V2,...,Vn)
- s.t., V1 ??D1, V2 ??D2,...,Vn ??Dn
- Relation-subset of cartesian product
- of one or more domains
- FINITE only empty set allowed
- Tuples members of a relation inst.
- Arity number of domains
- Components values in a tuple
- Domains corresp. with attributes
- Cardinality number of tuples
Relation as table Rows tuples Columns
components Names of columns attributes Set of
attribute names schema REL (A1,A2,...,An)
A1 A2 A3 ... An a1 a2 a3 an b1 b2
a3 cn a1 c3 b3 bn . . . x1 v2 d3
wn
Attributes
C a r d i n a l i t y
Tuple
Component
Arity
11Relational Design
- Simplest approach (not always best) convert each
E.S. to a relation and each relationship to a
relation. - Entity Set ? Relation
- E.S. attributes become relational attributes.
- Becomes
- Beers(name, manf)
name
manf
Beers
12Functional Dependencies
- X ? A assertion about a relation R that
whenever two tuples agree on all the attributes
of X, then they must also agree on attribute A - Why do we care?
- Knowing functional dependencies provides a formal
mechanism to divide up relations (normalization) - Saves space
- Prevents storing data that violates dependencies
13Example Functional Dependencies
- In ABC with FDs A ? B, B ? C, project onto AC.
- A ABC yields A ? B, A ? C.
- B BC yields B ? C.
- AB ABC yields AB ? C drop in favor of A ? C.
- AC ABC yields AC ? B drop in favor of A ? B.
- C C and BC BC adds nothing.
- Resulting FDs A ? B, A ? C, B ? C.
- Projection onto AC A ? C.
14Normalization
- Goal BCNF Boyce-Codd Normal Form
- all FDs follow from the fact key ?
everything. - Formally, R is in BCNF if for every nontrivial FD
for R, say X ? A, then X is a superkey. - Nontrivial right-side attribute not in left
side. - Why?
- 1. Guarantees no redundancy due to FDs.
- 2. Guarantees no update anomalies one
occurrence of a fact is updated, not all. - 3. Guarantees no deletion anomalies valid fact
is lost when tuple is deleted.
15Decomposition to Reach BCNF
- 1. Compute X.
- Cannot be all attributes why?
- 2. Decompose R into X and (RX) ? X.
- 3. Find the FDs for the decomposed relations.
- Project the FDs from F calculate all
consequents of F that involve only attributes
from X or only from (R?X) ? X.
R
X
X
163NF
- One FD structure causes problems
- If you decompose, you cant check all the FDs
only in the decomposed relations. - If you dont decompose, you violate BCNF.
- Abstractly AB ? C and C ? B.
- Example 1 title city ? theatre and theatre ?
city. - Example 2 street city ? zip,zip ? city.
- Keys A, B and A, C, but C ? B has a left
side that is not a superkey. - Suggests decomposition into BC and AC.
- But you cant check the FD AB ? C in only these
relations.
17Elegant Workaround
- Define the problem away.
- A relation R is in 3NF iff (if and only if)for
every nontrivial FD X ? A, either - 1. X is a superkey, or
- 2. A is prime member of at least one key.
- Thus, the canonical problem goes away you dont
have to decompose because all attributes are
prime.
18What 3NF Gives You
- There are two important properties of a
decomposition - We should be able to recover from the decomposed
relations the data of the original. - Recovery involves projection and join, which we
shall defer until weve discussed relational
algebra. - We should be able to check that the FDs for the
original relation are satisfied by checking the
projections of those FDs in the decomposed
relations. - Without proof, we assert that it is always
possible to decompose into BCNF and satisfy (1). - Also without proof, we can decompose into 3NF and
satisfy both (1) and (2). - But it is not possible to decompose into BNCF and
get both (1) and (2). - Street-city-zip is an example of this point.
19Multivalued Dependencies
- The multivalued dependency X ?? Y holds in a
relation R if whenever we have two tuples of R
that agree in all the attributes of X, then we
can swap their Y components and get two new
tuples that are also in R. - X Y others
20MVD Rules
- 1. Every FD is an MVD.
- Because if X ?Y, then swapping Ys between tuples
that agree on X doesnt create new tuples. - Example, in Drinkers name ?? addr.
- 2. Complementation if X ?? Y, then X ?? Z, where
Z is all attributes not in X or Y. - Example since name ?? phonesholds in
Drinkers, so doesname ?? addr beersLiked.
214NF
- Eliminate redundancy due to multiplicative effect
of MVDs. - Roughly treat MVDs as FD's for decomposition,
but not for finding keys. - Formally R is in Fourth Normal Form if whenever
MVDX ?? Y is nontrivial (Y is not a subset of X,
and X ? Y is not all attributes), then X is a
superkey. - Remember, X ? Y implies X ?? Y, so 4NF is more
stringentthan BCNF. - Decompose R, using4NF violation X ?? Y,into XY
and X ? (RY).
R
Y
X
22Why Decomposition Works?
- What does it mean to work? Why cant we just
tear sets of attributes apart as we like? - Answer the decomposed relations need to
represent the same information as the original. - We must be able to reconstruct the original from
the decomposed relations. - Projection and Join Connect the Original and
Decomposed Relations - Suppose R is decomposed into S and T. We project
R onto S and onto T.
23Theorem
- Suppose we decompose a relation with schema XYZ
into XY and XZ and project the relation for XYZ
onto XY and XZ. Then XY XZ is guaranteed to
reconstruct XYZ if and only if X ??Y (or
equivalently, X ?? Z). - Usually, the MVD is really a FD, X ? Y or X ?Z.
- BCNF When we decompose XYZ into XY and XZ, it is
because there is a FD X ? Y or X ? Z that
violates BCNF. - Thus, we can always reconstruct XYZ from its
projections onto XY and XZ. - 4NF when we decompose XYZ into XY and XZ, it is
because there is an MVD X ?? Y or X ?? Z that
violates 4NF. - Again, we can reconstruct XYZ from its
projections onto XY and XZ.
24- Relational Algebra
- limited expressive power (subset of possible
queries) - good optimizer possible
- rich enough language to express enough useful
things - Finiteness
- ? SELECT
- p PROJECT
- X CARTESIAN PRODUCT
FUNDAMENTAL - U UNION BINARY
- SET-DIFFERENCE
- ? SET-INTERSECTION
- ? THETA-JOIN
CAN BE DEFINED - NATURAL JOIN
IN TERMS OF - DIVISION or QUOTIENT
FUNDAMENTAL OPS
UNARY
25Extended (Nonclassical)Relational Algebra
- Adds features needed for SQL, bags.
- Duplicate-elimination operator ?.
- Extended projection.
- Sorting operator ?.
- Grouping-and-aggregation operator ?.
- Outerjoin operator o .
26SQL
- DML
- select, from, where, renaming
- set operations
- ordering
- aggregate functions
- nested subqueries
- other parts DDL, embedded SQL, auth etc
27Constraints
- Commercial relational systems allow much more
fine-tuning - of constraints than do the modeling languages we
learned earlier. - In essence SQL programming is used to describe
constraints. - Outline
- Primary key declarations.
- Foreign-keys referential integrity constraints.
- Attribute- and tuple-based checks constraints
within relations. - SQL Assertions global constraints.
- Not found in Oracle.
- Oracle Triggers.
- A substitute for assertions.
28Triggers (Oracle Version)
- Often called event-condition-action rules.
- Event a class of changes in the DB, e.g.,
insertions into Beers. - Condition a test as in a where-clause for
whether or not the trigger applies. - Action one or more SQL statements.
- Differ from checks or SQL assertions in that
- Triggers invoked by the event the system doesnt
have to figure out when a trigger could be
violated. - Condition not available in checks.
29 Views
- An expression that describes
- a table without creating it.
- View definition form is
- CREATE VIEW ltnamegt AS ltquerygt
30 Semantics of View Use
31Modification to Views Via Triggers
- Oracle allows us to intercept a modification to
a view through an instead-of trigger. - Example
- Likes(drinker, beer)
- Sells(bar, beer, price)
- Frequents(drinker, bar)
- CREATE VIEW Synergy AS
- SELECT Likes.drinker, Likes.beer,
- Sells.bar
- FROM Likes, Sells, Frequents
- WHERE Likes.drinker Frequents.drinker AND
- Likes.beer Sells.beer AND
- Sells.bar Frequents.bar
32Cursors
- Declare by
- CURSOR ltnamegt IS
- select-from-where statement
- Cursor gets each tuple from the relation produced
by the select-from-where, in turn, using a fetch
statement in a loop. - Fetch statement
- FETCH ltcursor namegt INTO
- variable list
- Break the loop by a statement of the form
- EXIT WHEN ltcursor namegt NOTFOUND
- True when there are no more tuples to get.
- Open and close the cursor with OPEN and CLOSE.
33Authorization in SQL
- File systems identify certain access privileges
on files, e.g., read, write, execute. - In partial analogy, SQL identifies six access
privileges on relations, of which the most
important are - 1. SELECT the right to query the relation.
- 2. INSERT the right to insert tuples into the
relation may refer to one attribute, in which
case the privilege is to specify only one column
of the inserted tuple. - 3. DELETE the right to delete tuples from the
relation. - 4. UPDATE the right to update tuples of the
relation may refer to one attribute.
34Storage
- Secondary storage, mainly disks
- I/O times
- Time Seek Time
- Rotational Delay
- Transfer Time
- Other
- I/Os should be avoided,
- especially random ones..
35- How to lay out data on disk
- Data Items
- Records
- Blocks
- Files
- Memory
- DBMS
-
36Indexing
- Index sequential file
- Search key ( ? primary key)
- Primary index (on Sequencing field)
- Secondary index
- Dense index (all Search Key values in)
- Sparse index
- Multi-level index
- Insertion/deletion
37BTree n3
100
120 150 180
30
3 5 11
120 130
180 200
100 101 110
150 156 179
30 35
38B tree rules
- (1) All leaves at same lowest level (balanced
tree) - (2) Pointers in leaves point to records except
for sequence pointer - Use at least
- Non-leaf ?(n1)/2? pointers
- Leaf ?(n1)/2? pointers to data
39CS 541Review
- Undo/Redo Logging?Normalization?Multiple
Granularity Locking? - December 6, 2002
40undo/redo logging
- Update ? ltTi, Xid, New X val, Old X valgt
- page X
- Page X can be flushed before or after Ti commit
- Log record flushed before corresponding updated
page (WAL) - Flush at commit (log only)
41Non-quiesce checkpoint
- L
- O
- G
- for
- undo dirty buffer
- pool pages
- flushed
Start-ckpt active TR Ti,T2,...
end ckpt
...
...
...
...
42Examples what to do at recovery time?
T1,- a
...
Ckpt T1
...
Ckpt end
...
T1- b
...
? Undo T1 (undo a,b)
43Example
...
T1 a
...
...
T1 b
...
...
T1 c
...
T1 cmt
...
ckpt- end
ckpt-s T1
? Redo T1 (redo b,c)
44Recovery process
- Backwards pass (end of log ? latest checkpoint
start) - construct set S of committed transactions
- undo actions of transactions not in S
- Undo pending transactions
- follow undo chains for transactions in
(checkpoint active list) - S - Forward pass (latest checkpoint start ? end of
log) - redo actions of S transactions
backward pass
start check- point
forward pass
45Undo/Redo Logging,17.4.5 (c)Where could End CKPT
be written?
- ltStart Sgt
- ltS, A, 60, 61gt
- ltCommit Sgt
- ltStart Tgt
- ltT, A, 61, 62gt
- ltStart Ugt
- ltU, B, 20, 21gt
- ltstart checkpointgt
- ltT, C, 30, 31gt
- ltStart Vgt
- ltU, D, 40, 41gt
- ltV, F, 70, 71gt
- ltCommit Ugt
- ltT, E, 50, 51gt
- ltCommit Tgt
- ltV, B, 21, 22gt
- ltCommit Vgt
46Normalization (3.6.1 (e))
- R(A,B,C,D,E)
- FDs AB?C, C?D, D?B, D?E
- BCNF Violations
- BCNF
- 3NF
47Multiple Granularity Locking
- Comp Requestor
- IS IX S SIX X
- IS
- Holder IX
- S
- SIX
- X
T
T
T
T
F
F
F
F
T
T
F
F
T
F
T
F
F
F
F
T
F
F
F
F
F
48- Parent Child can be
- locked in locked in
- IS
- IX
- S
- SIX
- X
P
IS, S IS, S, IX, X, SIX S, IS not necessary X,
IX, SIX none
C
49Rules
- (1) Follow multiple granularity comp function
- (2) Lock root of tree first, any mode
- (3) Node Q can be locked by Ti in S or IS only if
- parent(Q) locked by Ti in IX or IS
- (4) Node Q can be locked by Ti in X,SIX,IX only
- if parent(Q) locked by Ti in IX,SIX
- (5) Ti is two-phase
- (6) Ti can unlock node Q only if none of Qs
- children are locked by Ti
50Exercise
- Can T2 access object f2.2 in X mode? What locks
will T2 get?
T1(IX)
R1
t1
t4
t2
T1(IX)
t3
f2.1
f2.2
f3.1
f3.2
T1(X)
51Exercise
- Can T2 access object f2.2 in X mode? What locks
will T2 get?
T1(IX)
R1
t1
t4
t2
T1(X)
t3
f2.1
f2.2
f3.1
f3.2
52Exercise
- Can T2 access object f3.1 in X mode? What locks
will T2 get?
T1(IS)
R1
t1
t4
t2
T1(S)
t3
f2.1
f2.2
f3.1
f3.2
53Exercise
- Can T2 access object f2.2 in S mode? What locks
will T2 get?
T1(SIX)
R1
t1
t4
t2
T1(IX)
t3
f2.1
f2.2
f3.1
f3.2
T1(X)
54Exercise
- Can T2 access object f2.2 in X mode? What locks
will T2 get?
T1(SIX)
R1
t1
t4
t2
T1(IX)
t3
f2.1
f2.2
f3.1
f3.2
T1(X)