Title: Dr' Alexandra I' Cristea
1CS 319 Theory of Databases C4
- Dr. Alexandra I. Cristea
- http//www.dcs.warwick.ac.uk/acristea/
2(provisionary) Content
- Generalities DB
- Integrity constraints (FD revisited)
- Relational Algebra (revisited)
- Query optimisation
- Tuple calculus
- Domain calculus
- Query equivalence
- LLJ, DP and applications
- Temporal Data
- The Askew Wall
3 previous FD
4Relational model
- E.F. Codd check wikipedia!
- Papers
- http//www.cl.cam.ac.uk/Teaching/2003/Databases/co
dd.pdf - Reprint at A relational Model of Data for Large
Shared Data Banks - Relational Completeness of Data Base Sublanguages
5Relational Model
- Structure of Relational Databases
- Fundamental Relational-Algebra-Operations
- Additional Relational-Algebra-Operations
based on Silberschatz, Korth, Sudarshan Database
System Concepts 5th Edition, Chapter 2
6Example of a Relation Instance
7Basic Structure
- Formally, given sets D1, D2, . Dn a relation r
is a subset of D1 x D2 x x DnThus,
a relation is a set of n-tuples (a1, a2, , an)
where each ai ? Di
8Example
- If
- customer_name Jones, Smith, Curry, Lindsay,
/ Set of all customer names / - customer_street Main, North, Park, / set
of all street names/ - customer_city Harrison, Rye, Pittsfield,
/ set of all city names / - Then r (Jones, Main, Harrison),
(Smith, North, Rye),
(Curry, North, Rye),
(Lindsay, Park, Pittsfield) is a relation
over - customer_name x customer_street x
customer_city
9Attribute Types
- Each attribute of a relation has a name
- The set of allowed values for each attribute is
called the domain of the attribute - Attribute values are (normally) required to be
atomic that is, indivisible - E.g. the value of an attribute can be an account
number, but cannot be a set of account numbers - Domain is said to be atomic if all its members
are atomic - The special value null is a member of every
domain - The null value causes complications in the
definition of many operations - We shall ignore the effect of null values in our
main presentation and consider their effect later
10Relation Schema
- A1, A2, , An are attributes
- R (A1, A2, , An ) is a relation schema
- Example
- Customer_schema (customer_name,
customer_street, customer_city) - r(R) denotes a relation r on the relation schema
R - Example
- customer (Customer_schema)
11Relation Instance
- The current values (relation instance) of a
relation are specified by a table - An element t of r is a tuple, represented by a
row in a table
attributes (or columns)
customer_name
customer_street
customer_city
Jones Smith Curry Lindsay
Main North North Park
Harrison Rye Rye Pittsfield
tuples (or rows)
customer
12Relations are Unordered
- Order of tuples is irrelevant (tuples may be
stored in an arbitrary order) - Example account relation with unordered tuples
13Database
- A database consists of multiple relations
- Information about an enterprise is broken up into
parts, with each relation storing one part of
the information - Storing all information as a single relation
such as bank(account_number, balance,
customer_name, ..)results in - repetition of information
- e.g.,if two customers own an account (What gets
repeated?) - the need for null values
- e.g., to represent a customer without an account
- Normalization theory deals with how to design
relational schemas
14The customer Relation
15The depositor Relation
16Keys
- Let K ? R
- K is a superkey of R if values for K are
sufficient to identify a unique tuple of each
possible relation r(R) - Example customer_name, customer_street and
customer_name are both superkeys
of Customer, if no two customers can possibly
have the same name
17Keys (Cont.)
- K is a candidate key if K is minimal
- Example customer_name superkey no subset
of it is a superkey. - Primary key a candidate key chosen as the
principal means of identifying tuples within a
relation
18Foreign Keys
- A relation schema may have an attribute that
corresponds to the primary key of another
relation. The attribute is called a foreign key.
19Query Languages
- Language in which user requests information from
the database. - Categories of languages
- Procedural
- Non-procedural, or declarative
- Pure languages
- Relational algebra
- Tuple relational calculus
- Domain relational calculus
- Pure languages form underlying basis of query
languages that people use.
20Relational Algebra
- Procedural language
- Six basic operators
- select ?
- project ?
- union ?
- set difference
- Cartesian product x
- rename ?
- The operators take one or two relations as
inputs and produce a new relation as a result.
21Select Operation Example
A
B
C
D
? ? ? ?
? ? ? ?
1 5 12 23
7 7 3 10
A
B
C
D
? ?
? ?
1 23
7 10
22Select Operation
- Notation ? p(r)
- p is called the selection predicate
- Defined as ?p(r) t t ? r and p(t)
- Where p is a formula in propositional calculus
consisting of terms connected by ? (and), ?
(or), ? (not)Each term is one of - ltattributegt op ltattributegt or ltconstantgt
- where op is one of , ?, gt, ?. lt. ?
- Example of selection ? branch_namePerryridge
(account)
23Project Operation Example
A
B
C
? ? ? ?
10 20 30 40
1 1 1 2
A
C
A
C
?A,C (r)
? ? ? ?
1 1 1 2
? ? ?
1 1 2
24Project Operation
- Notation where A1, A2 are attribute names and r
is a relation name. - The result is defined as the relation of k
columns obtained by erasing the columns that are
not listed - Duplicate rows removed from result, since
relations are sets - Example To eliminate the branch_name attribute
of account ?account_number, balance
(account)
25Union Operation Example
A
B
A
B
? ? ?
1 2 1
? ?
2 3
s
r
A
B
? ? ? ?
1 2 1 3
26Union Operation
- Notation r ? s
- Defined as r ? s t t ? r or t ? s
- For r ? s to be valid.
- 1. r, s must have the same arity (same number
of attributes) - 2. The attribute domains must be compatible
(example 2nd column of r deals with the
same type of values as does the 2nd column
of s) - Example to find all customers with either an
account or a loan - ?customer_name (depositor) ? ?customer_name
(borrower)
27Set Difference Operation Example
A
B
A
B
? ? ?
1 2 1
? ?
2 3
s
r
A
B
? ?
1 1
28Set Difference Operation
- Notation r s
- Defined as
- r s t t ? r and t ? s
- Set differences must be taken between compatible
relations. - r and s must have the same arity
- attribute domains of r and s must be compatible
29Cartesian-Product Operation Example
A
B
C
D
E
? ?
1 2
? ? ? ?
10 10 20 10
a a b b
r
s
A
B
C
D
E
? ? ? ? ? ? ? ?
1 1 1 1 2 2 2 2
? ? ? ? ? ? ? ?
10 10 20 10 10 10 20 10
a a b b a a b b
30Cartesian-Product Operation
- Notation r x s
- Defined as
- r x s t q t ? r and q ? s
- Assume that attributes of r(R) and s(S) are
disjoint. (That is, R ? S ?). - If attributes of r(R) and s(S) are not disjoint,
then renaming must be used.
31Composition of Operations
- Can build expressions using multiple operations
- Example ?AC(r x s)
- r x s
- ?AC(r x s)
A
B
C
D
E
? ? ? ? ? ? ? ?
1 1 1 1 2 2 2 2
? ? ? ? ? ? ? ?
10 10 20 10 10 10 20 10
a a b b a a b b
A
B
C
D
E
? ? ?
? ? ?
10 10 20
a a b
1 2 2
32Rename Operation
- Allows us to refer to results of RA expressions
to refer to a relation by more than one name. - Example ? x (E)
- returns the expression E under the name X
- expression E under name X, with attributes
renamed to A1 , A2 , ., An .
33 34Relational Algebra Operators thus
- Procedural language with six basic operators
- select ?
- project ?
- union ?
- set difference
- Cartesian product x
- rename ?
- The operators take one or two relations as
inputs and produce a new relation as a result.
35Banking Example
- branch (branch_name, branch_city, assets)
- customer (customer_name, customer_street,
customer_city) - account (account_number, branch_name, balance)
- loan (loan_number, branch_name, amount)
- depositor (customer_name, account_number)
- borrower (customer_name, loan_number)
36(No Transcript)
37Example Queries
- Find all loans of over 1200
-
Find the loan number for each loan of an amount
greater than 1200
- ?loan_number (?amount gt 1200 (loan))
Find the names of all customers who have a loan,
an account, or both, from the bank
- ?customer_name (borrower) ? ?customer_name
(depositor)
38Example Queries
- Find the names of all customers who have a loan
at the Perryridge branch.
?customer_name (?branch_namePerryridge
(?borrower.loan_number loan.loan_number(borrower
x loan)))
Find the names of all customers who have a loan
at the Perryridge branch but do not have an
account at any branch of the bank.
?customer_name (?branch_name Perryridge
(?borrower.loan_number loan.loan_number(borrowe
r x loan))) ?customer_name(de
positor)
39Example Queries
- Find the names of all customers who have a loan
at the Perryridge branch.
- Query 1
- ?customer_name (?branch_name Perryridge
- (?borrower.loan_number loan.loan_number
(borrower x loan)))
- Query 2
- ?customer_name(?loan.loan_number
borrower.loan_number ( (?branch_name
Perryridge (loan)) x borrower))
40Example Queries
- Find the largest account balance
- Strategy
- Find those balances that are not the largest
- Rename account relation as d so that we can
compare each account balance with all others - Use set difference to find those account balances
that were not found in the earlier step. - The query is
-
?balance(account) - ?account.balance
(?account.balance lt d.balance (account x rd
(account)))
41Formal Definition
- A basic expression in the relational algebra
consists of either one of the following - A relation in the database
- A constant relation
- Let E1 and E2 be relational-algebra expressions
the following are all relational-algebra
expressions - E1 ? E2
- E1 E2
- E1 x E2
- ?p (E1), P is a predicate on attributes in E1
- ?s(E1), S is a list consisting of some of the
attributes in E1 - ? x (E1), x is the new name for the result of E1
42Additional Operations
- We define additional operations that do not add
any power to the - relational algebra, but that simplify common
queries. - Set intersection
- Natural join
- Division
- Assignment
43Set-Intersection Operation
- Notation r ? s
- Defined as
- r ? s t t ? r and t ? s
- Assume
- r, s have the same arity
- attributes of r and s are compatible
- Note r ? s r (r s)
44Set-Intersection Operation Example
A B
A B
? ? ?
1 2 1
? ?
2 3
s
r
A B
? 2
45Natural-Join Operation
- Let r and s be relations on schemas R and S
respectively. Then, r s is a relation on
schema R ? S obtained as follows - Consider each pair of tuples tr from r and ts
from s. - If tr and ts have the same value on each of the
attributes in R ? S, add a tuple t to the
result, where - t has the same value as tr on r
- t has the same value as ts on s
46Natural joint example
- Example
- R (A, B, C, D)
- S (E, B, D)
- Result schema (A, B, C, D, E)
- r s is defined as
- ?r.A, r.B, r.C, r.D, s.E (?r.B s.B ? r.D s.D
(r x s))
47Natural Join Operation Example
B
D
E
A
B
C
D
1 3 1 2 3
a a a b b
? ? ? ? ?
? ? ? ? ?
1 2 4 1 2
? ? ? ? ?
a a b a b
r
s
A
B
C
D
E
? ? ? ? ?
1 1 1 1 2
? ? ? ? ?
a a a a b
? ? ? ? ?
48Division Operation
r ? s
- Notation
- Suited to queries that include the phrase for
all. - Let r and s be relations on schemas R and S
respectively where - R (A1, , Am , B1, , Bn )
- S (B1, , Bn)
- The result of r ? s is a relation on schema
- R S (A1, , Am)
- r ? s t t ? ? R-S (r) ? ? u ? s ( tu ?
r ) - Where tu means the concatenation of tuples t and
u to produce a single tuple
49Division Operation Example
A
B
B
? ? ? ? ? ? ? ? ? ? ?
1 2 3 1 1 1 3 4 6 1 2
1 2
s
A
r
? ?
50Another Division Example
A
B
C
D
E
D
E
? ? ? ? ? ? ? ?
a a a a a a a a
? ? ? ? ? ? ? ?
a a b a b a b b
1 1 1 1 3 1 1 1
a b
1 1
s
r
A
B
C
? ?
a a
? ?
51Division Operation (Cont.)
- Property
- Let q r ? s
- Then q is the largest relation satisfying q x s
? r - Definition in terms of the basic algebra
operationLet r(R) and s(S) be relations, and let
S ? R - r ? s ?R-S (r ) ?R-S ( ( ?R-S (r ) x s )
?R-S,S(r )) - To see why
- ?R-S,S (r) simply reorders attributes of r
- ?R-S (?R-S (r ) x s ) ?R-S,S(r) ) gives those
tuples t in ?R-S (r ) such that for some tuple
u ? s, tu ? r.
52Assignment Operation
- The assignment operation (?) provides a
convenient way to express complex queries. - Write query as a sequential program consisting
of - a series of assignments
- followed by an expression whose value is
displayed as a result of the query. - Assignment must always be made to a temporary
relation variable. - Example Write r ? s as
- temp1 ? ?R-S (r ) temp2 ? ?R-S ((temp1 x s
) ?R-S,S (r )) result temp1 temp2 - The result to the right of the ? is assigned to
the relation variable on the left of the ?. - May use variable in subsequent expressions.
53Bank Example Queries
- Find the names of all customers who have a loan
and an account at the bank.
?customer_name (borrower) ? ?customer_name
(depositor)
- Find the name of all customers who have a loan at
the bank and the loan amount
?customer_name, loan_number, amount (borrower
loan)
54Bank Example Queries
- Find all customers who have an account from at
least the Downtown and the Uptown branches.
- Query 1
- ?customer_name (?branch_name Downtown
(depositor account )) ? - ?customer_name (?branch_name Uptown
(depositor account))
- Query 2
- ?customer_name, branch_name (depositor
account) ? ?temp(branch_name)
((Downtown ), (Uptown )) - Note that Query 2 uses a constant relation.
55Bank Example Queries
- Find all customers who have an account at all
branches located in Brooklyn city.
?customer_name, branch_name (depositor
account) ? ?branch_name (?branch_city
Brooklyn (branch))
56Library Case
reservation
book
author
name
department
Initials
title
name
publisher
date
year
cancelled
copy
borrow
name
department
department
from
cpYear
to
present
57Library database
- book ( ISBN, title, publisher, year )
- author (ISBN, initials, name )
- copy (barcode, ISBN, department, cpYear, present
) - reservation (name, department, ISBN, date,
cancelled ) - borrow (name, department, barcode, from, to )
58Library Questions (RA)
- List all the authors of books of which the
library has a copy that has never been borrowed. - List all the authors of books that have never
been borrowed. - List all the authors of which no book has ever
been borrowed.
59(simple) Employee database
- employee(person_name, street, city)
- works(person_name, company_name, salary)
- company(company_name, city)
- manages(person_name, manager_name)
60Exercise 2.1.c
- Find the names of all employees who earn more
than every employee of SBC. - The more than every... clause cannot be
expressed directly in the relational algebra. - For all other employees there is an employee of
SBC earning more. This can be described using a
selection on a cartesian product (of works with
works). - We then use the difference to find the correct
result.
61Solution 2.1.c
- ?person-name (W)
- - ?person-name( (?w.salaryltw2.salary
w2.company-name SBC (W x ?W2(W) ))
62Exercise 2.5.e
- Find all companies located in every city in which
SBC is located.
63Case 1 1 city per company
- ?C.company-name (
- (?C.citySBC.city
- (?C (company ) x (?SBC.company-nameSBC (?SBC
(company ) ) ) - Trivial!
64Case 2 multiple cities per company
- Find all companies located in every city in which
SBC is located. - The every city... clause cannot be expressed
directly in the relational algebra (except for
division). - Lets see
65Case 2a multiple cities division
- All cities where SBC is located
- SBCCity ?city (?company-nameSBC (company ) )
- Is SBCCity constant?
- Yes!
- So we can use it on the right hand side of the
division. - companies located in every city in which SBC is
located - company ? SBCCity
- Still quite neat!!
66Case 2b multiple cities no division
- Find all companies located in every city in which
SBC is located. - The every city... clause cannot be expressed
directly in the relational algebra (except for
division). - For the other companies there is a city in which
SBC is located and the other company is not. - The cities where a company is not located are all
the cities except the ones where the company is
located. - We thus need 2 times a difference in this query.
- Can also be formulated using a difference operator
67Case 2b multiple cities no division
- For the other companies there is a city in which
SBC is located and the other company is not
IntRes. - E ?company-name (company ) x SBCCity
- IntRes E company
- We want not (IntRes)
- ?company-name (company ) - ?company-name (IntRes)
68- What is the meaning of
- SBCCity ?
69Recognizing Types of Queries
- Identify the type of the following queries, and
afterwards also translate them to the algebra
(PSJ, union, intersection, difference, division) - Give the name of customers that have a loan with
a branch where they also have an account. - Give the name of customers who have a loan at a
branch where they do not have an account. - Give the name of customers who have a loan at
every branch where they have an account. - Give the name of customers who have loans only at
branches where they have an account.
70Reading Queries
- 5. ?customer-name(?balancegtamount(
- account ?? depositor ?? borrower ?? loan))
- 6. ?branch-name(branch) ?
- ?branch-name(?branch-city ? customer-city(
- branch ?? account ?? depositor ?? customer))
- 7. ?X.customer-name
(?X.account-number Y.account-number ?
X.customer-name ? Y.customer-name
(?X(depositor) ? ?Y(depositor)))
71Beer Database
- visits(drinker, bar)
- serves(bar, beer)
- likes(drinker, beer).
72Beer questions with a difference
- Give all drinkers that visit bars that dont
serve any beer they like - Give all drinkers that only visit bars that serve
a beer they like - Give all drinkers that only visit bars that serve
no beer they like - Give all drinkers that only visit bars that serve
all beers they like (and maybe other beers as
well) - Give all drinkers that only visit bars that only
serve beers they like (and thus serve nothing
else)
73Summary
- We have learned RA
- We have learned to perform simple and more
complex queries in RA
74 to follow Query optimisation