Title: Relational Algebra
1Relational Algebra Calculus
- Zachary G. Ives
- University of Pennsylvania
- CIS 550 Database Information Systems
- September 21, 2004
Some slide content courtesy of Susan Davidson
Raghu Ramakrishnan
2Administrivia
3A Set of Logical Operations The Relational
Algebra
- Six basic operations
- Projection ?? (R)
- Selection ?? (R)
- Union R1 R2
- Difference R1 R2
- Product R1 R2
- (Rename) ??-gtb (R)
- And some other useful ones
- Join R1 ?? R2
- Semijoin R1 ?? R2
- Intersection R1 Å R2
- Division R1 R2
4Data Instance for Operator Examples
STUDENT
COURSE
Takes
PROFESSOR
Teaches
5Mini-Quiz
- Try writing queries for these
- The names of students named Bob
- The names of students expecting an A
- The names of students in Amir Roths 501 class
- The sids and names of students not enrolled in
any courses
6The Big Picture SQL to Algebra toQuery Plan to
Web Page
Web Server / UI / etc
Query Plan anoperator tree
Execution Engine
Optimizer
Storage Subsystem
SELECT FROM STUDENT, Takes, COURSE
WHERE STUDENT.sid Takes.sID AND
Takes.cID cid
7Optimization Is Based on Algebraic Equivalences
- Relational algebra has laws of commutativity,
associativity, etc. that imply certain
expressions are equivalent in semantics - They may be different in cost of evaluation!
?c Ç d(R) ?c(R) ?d(R)
?c (R1 R2) R1 ?c R2
?c Ç d (R) ?c (?d (R))
- Query optimization finds the most efficient
representation to evaluate (or one thats not bad)
8Switching Gears An Equivalent, ButVery
Different, Formalism
- Codd invented a relational calculus that he
proved was equivalent in expressiveness to the
rel. algebra - Based on a subset of first-order logic
declarative, without an implicit order of
evaluation - Tuple relational calculus
- Domain relational calculus
- More convenient for certain kinds of
manipulations - The database uses the relational algebra
internally - But query languages (e.g., SQL) are mostly based
on the relational calculus
9Domain Relational Calculus
- Queries have form
- ltx1,x2, , xngt p
- Predicate Boolean expression over x1,x2, , xn
- Precise operations depend on the domain and query
language may include special functions, etc. - Assume the following at minimum
- ltxi,xj,gt ? R X op Y X op const const op X
- where op is ?, ?, ?, ?, ?, ?
- xi,xj, are domain variables
domain variables
predicate
10More Complex Predicates
- Starting with these atomic predicates, build up
new predicates by the following rules - Logical connectives If p and q are predicates,
then so are p ? q, p ? q, ?p, and p ? q - (xgt2) ? (xlt4)
- (xgt2) ? ?(xgt0)
- Existential quantification If p is a predicate,
then so is ?x.p - ?x. (xgt2) ?(xlt4)
- Universal quantification If p is a predicate,
then so is ?x.p - ?x.xgt2
- ?x. ?y.ygtx
11Some Examples
- Faculty ids
- Subjects for courses with students expecting a
C - All course numbers for which there exists a
smaller course number
12Logical Equivalences
- There are two logical equivalences that will be
heavily used - p ? q ? ?(p ? q) (Whenever p is true, q must
also be true.) - ?x. p(x) ? ??x. ?p(x) (p is true for all x)
- The second can be a lot easier to check!
- Example
- The highest course number offered
13Terminology Free and Bound Variables
- A variable v is bound in a predicate p when p is
of the form ?v or ?v - A variable occurs free in p if it occurs in a
position where it is not bound by an enclosing ?
or ? - Examples
- x is free in x gt 2
- x is bound in ?x. x gt y
14Can Rename Bound Variables Only
- When a variable is bound one can replace it with
some other variable without altering the meaning
of the expression, providing there are no name
clashes - Example ?x. x gt 2 is equivalent to ?y. y gt 2
- Otherwise, the variable is defined outside our
scope
15Safety
- Pitfall in what we have done so far how do we
interpret - ltsid,namegt ?ltsid,namegt ? STUDENT
- Set of all binary tuples that are not students
an infinite set (and unsafe query) - A query is safe if no matter how we instantiate
the relations, it always produces a finite answer - Domain independent answer is the same regardless
of the domain in which it is evaluated - Unfortunately, both this definition of safety
and domain independence are semantic conditions,
and are undecidable
16Safety and Termination Guarantees
- There are syntactic conditions that are used to
guarantee safe formulas - The definition is complicated, and we wont
discuss it you can find it in Ullmans
Principles of Database and Knowledge-Base Systems - The formulas that are expressible in real query
languages based on relational calculus are all
safe - Many DB languages include additional features,
like recursion, that must be restricted in
certain ways to guarantee termination and
consistent answers
17Mini-Quiz
- How do you write
- Which students have taken more than one course
from the same professor?
18Translating from RA to DRC
- Core of relational algebra ?, ?, ?, x, -
- We need to work our way through the structure of
an RA expression, translating each possible form. - Let TRe be the translation of RA expression e
into DRC. - Relation names For the RA expression R, the DRC
expression is ltx1,x2, , xngt ltx1,x2, , xngt
? R
19Selection TR?? R
- Suppose we have ??(e), where e is another RA
expression that translates as - TRe ltx1,x2, , xngt p
- Then the translation of ?c(e) is
- ltx1,x2, , xngt p??where ? is obtained
from ? by replacing each attribute with the
corresponding variable - Example TR?12 ?4gt2.5R (if R has arity 4)
is - ltx1,x2, x3, x4gt lt x1,x2, x3, x4gt ? R ?
x1x2 ? x4gt2.5
20Projection TR?i1,,im(e)
- If TRe ltx1,x2, , xngt p then
TR?i1,i2,,im(e) ltx i1,x i2, , x im gt ?
xj1,xj2, , xjk.p, where xj1,xj2, , xjk are
variables in x1,x2, , xn that are not in x i1,x
i2, , x im - Example With R as before,?1,3 (R)ltx1,x3gt?
x2,x4. ltx1,x2, x3,x4gt ?R
21Union TRR1 ? R2
- R1 and R2 must have the same arity
- For e1 ? e2, where e1, e2 are algebra expressions
- TRe1ltx1,,xngtp and TRe2lty1,yngtq
- Relabel the variables in the second
- TRe2lt x1,,xngtq
- This may involve relabeling bound variables in q
to avoid clashes - TRe1?e2ltx1,,xngtp?q.
- Example TRR1 ? R2 lt x1,x2, x3,x4gt
ltx1,x2, x3,x4gt?R1 ? ltx1,x2, x3,x4gt?R2
22Other Binary Operators
- Difference The same conditions hold as for union
- If TRe1ltx1,,xngtp and TRe2lt
x1,,xngtq - Then TRe1- e2 ltx1,,xngtp??q
- Product
- If TRe1ltx1,,xngtp and TRe2lt
y1,,ymgtq - Then TRe1? e2 ltx1,,xn, y1,,ym gt p?q
- Example TRR?S ltx1,,xn, y1,,ym gt
ltx1,,xngt ?R ? lty1,,ym gt ?S
23What about the Tuple Relational Calculus?
- Weve been looking at the Domain Relational
Calculus - The Tuple Relational Calculus is nearly the same,
but variables are at the level of a tuple, not an
attribute - Q 9 S ? COURSES, 9 T 2 Takes (S.cid T.cid Æ
Q.cid S.cid Æ Q.exp-grade T.exp-grade)
24Limitations of the Relational Algebra / Calculus
- Cant do
- Aggregate operations
- Recursive queries
- Complex (non-tabular) structures
- Most of these are expressible in SQL, OQL, XQuery
using other special operators - Sometimes we even need the power of a
Turing-complete programming language
25Summary
- Can translate relational algebra into relational
calculus - DRC and TRC are slightly different syntaxes but
equivalent - Given syntactic restrictions that guarantee
safety of DRC query, can translate back to
relational algebra - These are the principles behind initial
development of relational databases - SQL is close to calculus query plan is close to
algebra - Great example of theory leading to practice!