Title: Relational Algebra and Relational Calculus
1Chapter 6
- Relational Algebra and Relational
Calculus
2Introduction
- Relational algebra and relational calculus are
formal query languages of the relational model. - Relational algebra is a procedural language.
- Relational calculus is a non-procedural language.
- Both are equivalent to one another.
- Both have formal strong foundation on logic.
- Query languages ! programming languages
- The result of an operation is a new relation.
- Using one or more input relations.
- Operation does not change original relations.
3Relational Algebra
- Other characteristics of Relational algebra
- Set language All tuples are manipulated without
looping. - Output from one operation can become input to
another operation. - It allows operations to be nested, just as in
arithmetic. This property is called closure. - The sequence of relational algebra operations
forms a relational algebra expression - The result is also a relation.
4Relational Algebra Operations
- Unary relational operations Selection,
Projection, and Rename. - Set theory operations Union, Intersection,
Difference, and Cartesian Product. - Binary relational operations Join, and Division.
- Additional relational operations Outer Joins,
Outer Union, and Semijoin.
5Relational Algebra Operations
6Relational Algebra Operations
7COMPANY database schema
8COMPANY database state
9Selection (or Restriction)
- ?predicate (R)
- It selects a subset of the tuples (rows) of R
that satisfy the specified condition (predicate). - No duplicates in the result (Why?)
- The selection condition acts as a filter.
- Tuples that make the condition false are filtered
out. - Schema of result is identical to the schema of
the input relation. - Degree(Result) Degree(R)
- Cardinality(Result) Cardinality(R)
10Example - Selection (or Restriction)
- List the employees whose department is 4
- ?dno 4 (Employee)
- List all employees whose salary is greater than
30,000 - ?salary gt 10000 (Employee)
11Selection (contd.)
- SELECTION Operation Properties
- The Selection operation ? ltselection
conditiongt(R) produces a relation S that has the
same schema (same attributes) as R - Selection ? is commutative
- ? ltcondition1gt(? lt condition2gt (R)) ?
ltcondition2gt (? lt condition1gt (R)) - Because of commutativity property, a cascade
(sequence) of Selection operations may be applied
in any order - ?ltcond1gt(?ltcond2gt (?ltcond3gt (R)) ?ltcond2gt
(?ltcond3gt (?ltcond1gt ( R))) - A cascade of Selection operations may be replaced
by a single selection with a conjunction of all
the conditions - ?ltcond1gt(?lt cond2gt (?ltcond3gt(R)) ? ltcond1gt AND
lt cond2gt AND lt cond3gt(R))) - The number of tuples in the result of a Selection
is less than (or equal to) the number of tuples
in the input relation R.
12Projection
- ?col1, . . . , coln(R)
- It defines a relation that contains a vertical
subset of R, extracting the values of specified
attributes. - It creates a vertical partitioning of R.
- It eliminates duplicates. Why?
- Some DBMSs do not eliminate duplicates, unless
the user asks for it. - Example List each employees first and last name
and salary - ?lname, fname, salary(Employee)
13Projection (contd.)
- Projection Operation Properties
- The number of tuples in the result of projection
?ltlistgt(R) is always less or equal to the number
of tuples in R. - If the list of attributes includes a key of R,
then the number of tuples in the result of
Projection is equal to the number of tuples in R. - Projection is not commutative.
- ? ltlist1gt (? ltlist2gt (R) ) ? ltlist1gt (R) as
long as ltlist2gt contains the attributes in
ltlist1gt. - Degree(Result) Degree(R)
- Cardinality(Result) Cardinality(R)
14Applying Selection and Projection
15Single relational expression
- To retrieve the first name, last name, and salary
of all employees who work in department number 5 - We can write a single relational expression
- ?lname, fname, salary(?dno 5(Employee))
- Or a sequence of relational operations, giving a
name to each intermediate relations - Dept5Emps ?dno 5(Employee)
- Result ?lname, fname, salary(Dept5Emps)
16The Rename operation
- Denoted by ? (rho), it is used to
- assign names to intermediate relations, or
- rename a relation and its attributes.
- General form
- ?S (B1, B2, , Bn )(R) changes both
- the relation name to S, and
- the column (attribute) names to B1, B1, ..Bn
- ?S(R) changes
- the relation name only to S
- ?(B1, B2, , Bn )(R) changes
- the column (attribute) names only to B1, B1, ..Bn
17The Rename operation (contd.)
- For convenience, we also use a shorthand for
renaming attributes in an intermediate relation - If we write
- RESULT ? ? FNAME, LNAME, SALARY (DEP5_EMPS)
- RESULT will have the same attribute names as
DEP5_EMPS (same attributes as EMPLOYEE) - If we write
- RESULT (F, M, L, S, B, A, SX, SAL, SU, DNO) ?
- ? RESULT (F,M,L.S.B,A,SX,SAL,SU, DNO)(DEP5_EMPS)
- The 10 attributes of DEP5_EMPS are renamed to F,
M, L, S, B, A, SX, SAL, SU, DNO, respectively.
18Binary Union operation
- R ? S
- Union of two relations R and S defines a relation
that contains all the tuples of R, or S, or both
R and S, duplicate tuples being eliminated. - R and S must be union compatible.
- Same number of attributes
- Corresponding attributes have the same or
compatible domain (type compatible). - Schema of the result is 1st relation schema.
19Example - Union
- To retrieve the social security numbers of all
employees who either work in department 5
(RESULT1 below) or directly supervise an employee
who works in department 5 (RESULT2 below) - We can use the UNION operation as follows
- DEP5_EMPS ? ?dno5 (EMPLOYEE)
- RESULT1 ? ? SSN(DEP5_EMPS)
- RESULT2(SSN) ? ?SUPERSSN(DEP5_EMPS)
- RESULT ? RESULT1 ? RESULT2
- The union operation produces the tuples that are
in either RESULT1 or RESULT2 or both
20Example of Union (contd.)
21Set Difference (Minus or Except)
- R S
- Defines a relation consisting of the tuples that
are in relation R, but not in S. - R and S must be union compatible.
- Schema of the result is 1st relation schema.
- Same definition applies to all the set operations.
22Intersection
- R ? S
- Defines a relation consisting of the set of all
tuples that are in both R and S. - R and S must be union compatible.
- Expressed using basic operations
- R ? S R (R S)
23Examples Set Theory operations
24Properties of Union, intersect, and difference
- Notice that both union and intersection are
commutative operations that is - R ? S S ? R, and R ? S S ? R
- Both union and intersection can be treated as
n-ary operations applicable to any number of
relations as both are associative operations - R ? (S ? T) (R ? S) ? T
- (R ? S) ? T R ? (S ? T)
- The minus operation is not commutative that is
in general - R S ? S R
25Cartesian (or cross) product
- R X S
- Defines a relation that is the concatenation of
every tuple of relation R with every tuple of
relation S. - Each row of R is paired with each row of S.
- Relations do not have to be type compatible.
- Result schema has one field for each field of R
and S.
26Cartesian Product (contd.)
- Generally, CROSS PRODUCT is not a meaningful
operation - Can become meaningful when followed by other
operations - Example (not meaningful)
- FEMALE_EMPS ? ? sexF(EMPLOYEE)
- EMPNAMES ? ? fname, lname, SSN (FEMALE_EMPS)
- EMP_DEPENDENTS ? EMPNAMES x DEPENDENT
- EMP_DEPENDENTS will contain every combination of
EMPNAMES and DEPENDENT - whether or not they are actually related
27Example Cartesian Product
28Join Operation
- Join is a derivative of Cartesian product.
- Equivalent to performing a Selection, using join
predicate as selection formula, over Cartesian
product of the two operand relations. - Very important and of the most difficult
operation - Hard to implement efficiently in an RDBMS.
- It allows us combine related tuples from various
relations. - General form (called theta join) R ltjoin
conditiongt S - Where R and S can be any relations.
- Join condition can be any general Boolean
expression.
29Join Operation (contd.)
- Example Suppose that we want to retrieve the
name of the manager of each department. - To get the managers name, we need to combine
each DEPARTMENT tuple with the EMPLOYEE tuple
whose SSN value matches the MGRSSN value in the
department tuple. - We do this by using the join operation.
- DEPT_MGR ? DEPARTMENT MGRSSNSSN EMPLOYEE
- MGRSSNSSN is the join condition.
- Combines each department record with the employee
who manages the department - The join condition can also be specified as
DEPARTMENT.MGRSSN EMPLOYEE.SSN
30Example of the Join Operation
DEPT_MGR ? DEPARTMENT MGRSSNSSN
EMPLOYEE
31Some properties of Join
- Consider the following JOIN operation
- R(A1, A2, . . ., An) S(B1, B2,
. . ., Bm) - R.AiS.Bj
- Result is a relation Q with degree n m
attributes - Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that
order. - The resulting relation state has one tuple for
each combination of tuplesr from R and s from S,
but only if they satisfy the join condition
rAisBj - Hence, if R has nR tuples, and S has nS tuples,
then the join result will generally have less
than nR nS tuples. - Only related tuples (based on the join condition)
will appear in the result
32Other Join Operations
- In addition to the Theta Join
- Equijoin (a particular type of Theta Join)
- Natural join
- Outer join
- Semijoin
33Equijoin
- A theta join where join condition contains only
equality (). - The result of an equijoin always have one or more
pair of attributes (with identical names or not)
that have identical values in every tuple. - The join for the creation of DEPT_MGR was an
equijoin. - Same degree and cardinality of a Theta join
- Degree(Result) Degree(R) Degree(S)
- Card(Result) Card(R) x Card(S)
34Natural Join
- R S
- An Equijoin of the two relations R and S over all
common attributes x. One occurrence of each
common attribute is eliminated from the result. - If attributes have different names, a renaming
operations is applied first.
35Example - Natural Join
- To apply a natural join on the DNUMBER attributes
of DEPARTMENT and DEPT_LOCATIONS, it is
sufficient to write - DEPT_LOCS ? DEPARTMENT DEPT_LOCATIONS
- Only attribute with the same name is DNUMBER
- An implicit join condition is created based on
this attribute - DEPARTMENT.DNUMBERDEPT_LOCATIONS.DNUMBER
- Another example Q ? R(A,B,C,D) S(C,D,E)
- The implicit join condition includes each pair of
attributes with the same name, ANDed together - R.CS.C AND R.D.S.D
- Result keeps only one attribute of each such
pair - Q(A,B,C,D,E)
36Another example of Natural Join
37Complete set of Relational Operations
- The set of operations including Selection ?,
Projection ? , Union ?, Difference -, and
Cartesian Product X is called a complete set
because any other relational algebra expression
can be expressed by a combination of these five
basic operations. - For example
- R ? S (R ? S ) ((R - S) ? (S - R))
- R ltjoin conditiongtS ? ltjoin conditiongt (R
X S)
38Division
- R(x,y) ? S(y)
- Not supported as a primitive operator
- Let R have 2 fields, x and y S has only field y
- R(x,y) ? S(y) is the set of all x values in R
such that the y - values associated with an x value in R
contains all y - values in S.
- In general, x and y can be any lists of fields.
- Expressed using basic operations
- T1 ? ?x(R)
- T2 ? ?x((S X T1) R)
- T ? T1 T2
39Example - Division
40Query Tree Notation
- Query Tree
- An internal data structure to represent a query.
- Standard technique for estimating the work
involved in executing the query, the generation
of intermediate results, and the optimization of
execution. - Nodes stand for operations like selection,
projection, join, renaming, division, . - Leaf nodes represent base relations.
- A tree gives a good visual feel of the complexity
of the query and the operations involved. - Algebraic Query Optimization consists of
rewriting the query or modifying the query tree
into an equivalent tree. - (see Chapter 15)
41Example of Query Tree
42Outer joins
- Use Outer joins if you want to display rows in
the result that do not have matching values in
the join column. - Outer joins Left ( ), Right ( ), and Full
( ) - Left Outer Join R S
- (Left) outer join is a join in which tuples from
R that do not have matching values in common
columns of S are also included in result
relation. - Attribute of S in the unmatched tuples are filled
with null values.
43Outer joins (contd.)
- Right outer join is similar to left outer join.
- It keeps unmatching tuples of the right relation
in the result of R S. - R attributes in the unmatched tuples are filled
with null values. - The full outer joins keeps all unmatching tuples
of the left and right relations of the operation. - It is a combination of a left- and a right outer
join.
44Example - Left Outer join
- Temp ?(Employee) SSNMGR_SSN ( Department)
Result ? ?fname,minit,lname,dname(Department)
45Outer Union Operations
- OUTER UNION Operations
- It was developed to take the union of tuples from
two relations if the relations are not type
compatible. - This operation will take the union of tuples in
two relations R(X, Y) and S(X, Z) that are
partially compatible, meaning that only some of
their attributes, say X, are type compatible. - The attributes that are type compatible are
represented only once in the result, and those
attributes that are not type compatible from
either relation are also kept in the result
relation T(X, Y, Z).
46Outer Union (contd.)
- Example Outer union of relations Student(name,
SSN, department, advisor) Instructor(name, SSN,
department, rank) - Tuples are matched by the common attributes
name, SSN, and department. - If a student is also an instructor, both advisor
and rank will have values otherwise, one of
these attributes will be null. - Resulting schema
- Stu_Or_Inst(name, SSN, department, advisor, rank)
47Semijoin
- R FS
- Defines a relation that contains the tuples of R
that participate in the join of R with S. - Result schema is the schema of the first
relation. - Can rewrite Semijoin using Projection and Join
- R F S ?R.A(R F S)
- Example List employees data for employees who
work at the Research department - Employee dnodnumber (sdnameResearch(
Department))
48Examples of queries in RA Procedural form
- Q1 Retrieve the name and address of all
employees who work for the Research department. - RESEARCH_DEPT ? ? DNAMEResearch (DEPARTMENT)
- RESEARCH_EMPS ? (RESEARCH_DEPT DNUMBER
DNOEMPLOYEEEMPLOYEE) - RESULT ? ? FNAME, LNAME, ADDRESS (RESEARCH_EMPS)
- Q6 Retrieve the names of employees who have no
dependents. - ALL_EMPS ? ? SSN(EMPLOYEE)
- EMPS_WITH_DEPS(SSN) ? ? ESSN(DEPENDENT)
- EMPS_WITHOUT_DEPS ? (ALL_EMPS - EMPS_WITH_DEPS)
- RESULT ? ? LNAME, FNAME (EMPS_WITHOUT_DEPS
EMPLOYEE)
49Examples of queries in RA Single expression
- As a single expression, these queries become
- Q1 Retrieve the name and address of all
employees who work for the Research department. - ? Fname, Lname, Address (s Dname Research
- (DEPARTMENT DnumberDno(EMPLOYEE))
- Q6 Retrieve the names of employees who have no
dependents. - ? Lname, Fname((? Ssn (EMPLOYEE) - ? Ssn (?
Essn (DEPENDENT))) EMPLOYEE)
50Relational Calculus
- Two versions tuple relational calculus (TRC) and
domain relational calculus (DRC). - Calculus uses variables, constants, operators
(comparison and logical), and quantifiers. - TLC variables range over tuples (rows).
- DRC variables range over domain elements
(columns). - Expressions in relational calculus are called
formulas (or predicates). - Usually an answer to a formula is a set of tuples
that make the formula evaluate to true.
51Tuple Relational Calculus
- Query has the form t p(t)
- t is a tuple variable and p(t) is a formula.
- It finds the set of all tuples t such that p(t)
is true. - Tuple variable is a variable that ranges over a
named relation ie., variable whose only
permitted values are tuples of the relation. - Example Specify range of a tuple variable t of
the Employee relation as Employee(t) - t Employee(t) ? Get all Employee tuples
52Example - Tuple Relational Calculus
- Example To find the first and last names of all
employees whose salary is above 50,000, we can
write the following tuple calculus expression - t.fname, t.lname Employee(t) AND
t.salarygt50000 - The first and last name (Projection ?fname,
lname) of each EMPLOYEE tuple t that satisfies
the condition t.salarygt50000 (Selection ? SALARY
gt50000) will be retrieved. - Queries are evaluated on instances of Employee.
53Quantifiers in Relational calculus
- Can use two quantifiers to tell how many
instances the formula applies to - Existential quantifier (there exists)
- Universal quantifier " (for all)
- Tuple variables qualified by " or are called
bound variables, otherwise called free variables. - Only free variables should appear to the left of
the bar . - Otherwise, the answer is either True or False.
- If F(t) is a formula, then so are (t)(F(t)) and
("t)(F(t)), where t is a tuple variable.
54Quantifiers (contd.)
- The formula (? t)(F) is true if the formula F
evaluates to true for some (at least one) tuple
assigned to free occurrences of t in F otherwise
(? t)(F) is false. - Example Retrieve the name and address of all
employees who work for the Research department - t.fname,t.lname,t.address Employee(t) Ù
(d) (Department(d) Ù (d.name Research) Ù
(d.dnumber t.dno))
55Quantifiers (contd.)
- The formula (? t)(F) is true if the formula F
evaluates to true for every tuple (in the
universe) assigned to free occurrences of t in F
otherwise (? t)(F) is false. - (? t) is used in in statements about every
instance - Example For all Project tuples, the location is
not Houston. - (? t)(p.plocation ? Houston)
- Can also use (p) (p.plocation Houston)
which means There are no locations in Houston. - ("t) (P(t)) ? (t) (P(t))
56Quantifiers (contd.)
- Find the names of employees who work on all the
projects controlled by department number 5. The
query can be - e.lname, e.fname Employee(e) Ù ( (?
x)((Project(x)) V (x.dnum5) V - ( (? w)(Works_On(w) Ù w.ESSNe.SSN Ù
x.pnumberw.pno)))) - Exclude from the universal quantification all
tuples that we are not interested in by making
the condition true for all such tuples. - The first tuples to exclude (by making them
evaluate automatically to true) are those that
are not in the relation R of interest. - In query above, using the expression
(Project(x)) inside the universally quantified
formula evaluates to true all tuples x that are
not in the Project relation. - Then we exclude the tuples we are not interested
in from R itself. The expression not(x.dnum5)
evaluates to true all tuples x that are in the
project relation but are not controlled by
department 5. - Finally, we specify a condition that must hold on
all the remaining tuples in R. - ( (? w)(Works_On(w) Ù w.ESSNe.SSN Ù
x.Pnumberw.pno)