Title: CIS 328 Database Systems I Chapter 3 Relational Model
1CIS 328 Database Systems (I) Chapter 3
Relational Model
2Relational Model Concepts (1)
- Table is called relation
- Row (record) is called tuple
- Column header is called attribute
-
- Students
- SSN Name Age GPA
- 123456789 John 20 3.2
- 23456789 Mary 18 2.9
- 345678912 Bill 19 2.7
Name of the relation
Attributes
head
tuples (rows)
columns
3Relational Model Concepts (2)
- Given a tuple t and an attribute A in a relation
R, - tA represents the value of t under A in R
- Example If t is the second tuple in Students
- tName Mary
- tAge 18
- tName, Age (Mary, 18)
4Domain of an Attribute
- Definition
- The set of values that an attribute can take on
is the domain of the attribute - dom(A) --- the domain of attribute A
- A domain is usually represented by a type
- Examples
- SSN char(9) --- character string of length 9
- Name varchar(30) --- character string of variable
length up to 30 characters - Age number --- a number
5Relational Model Concepts (3)
- Two aspects of a relation
- Schema --- the set of attributes of R.
- State (or contents) --- the CURRENT set of tuples
of R (denoted by r(R)). - Schema of a relation rarely changes. Some
possible changes are - Rename an attribute
- Delete an attribute
- Add an attribute
- Delete the schema
6Relational Model Concepts (4)
- The state of a relation may change frequently.
Some possible changes are - Modify some attribute values
- Delete an existing tuple
- Insert a new tuple
- A given schema may have different states at
different times.
7An Example Database
- Students
Departments - SSN Name Major GPA
Name Location
Chairperson - 1234 Jeff CS 3.2
CS N18 EB
Aggarwal - 2345 Mary Math 3.0
EE Q4 EB
Sackman - 3456 Bob CS 2.7
Math LN2200
Hanson - 4567 Wang EE 2.9
Biology 210 S3
Smith - Courses
Sections - Name Course CreditHours Dept
Course Section Semester
Instructor - Database CS432 4
CS CS432 01
Fall98 Meng - Database CS532 4
CS CS532 01
Fall98 Meng - Dis. Math Math314 4
Math Math314 02 Fall
97 Hanson - Lin. Alg. Math304 4
Math Math304 01
Spring97 Brown
8Relation Schema
- A relation schema is used to describe a relation
- Denoted by R(A1, A2, A3, , An), where
- R Relation schema name
- A1, , An attributes of R
- The degree of a relation is the number of
attributes in a relation schema.
9Examples
- STUDENT(Name, SSN, HomePhone, Address,
OfficePhone, Age, GPA) - Degree(STUDENT) 7
- dom(Name) all names which consists of at most
30 characters. - dom (SSN) is the set of valid 9-digit social
security numbers. - dom(HomePhone) local phone numbers.
10Relation Instance (1)
- A relation (or relation instance) r of the
relation schema R(A1, A2, , An), denoted by
r(R), is a set of n-tuples - r t1, t2, , tm
- each n-tuple ti is an ordered list of n values,
- ti ltv1, v2, , vn) where
- each value vi, 1 ? i ? n, is an element of
dom(Ai) or a special null value.
11Relation Instance (2)
- A relation r(R) is a mathematical relation of
degree n on the domains dom(A1), dom(A2), , and
dom(An), which is a subset of the Cartesian
product of the domains that define R - r(R) ? dom(A1) ? dom(A2) ? ? dom(An)
- We denote the number of values or cardinality of
a domain D by ?D? - The total number of tuples in the Cartesian
product is - ?dom(A1)? ?dom(A2)? ?dom(An)?
12Ordering of Tuples in a Relation
- Entries in a single column (its heading is called
attribute) are of the same type and meaning - Attribute name must be unique (order is
irrelevant) - Tuples have no order among them
13Rules of Relational Databases
- Rule 1 (Domain Constraint)
- No multi-valued attributes are allowed in a table
- That is, for any tuple t and attribute A in a
table, tA must be a single atomic value - entries in the table are single-valued (atomic).
Therefore composite and multi-valued attributes
are not directly represented in the relational
model
14Examples of Multi-valued Attributes
- Employees
- SSN Name Age
Dependents - 123456789 Bob 34 Allen,
Ann - 234567891 Mary 42 Kathy
- 345678912 Bill 47 Mike,
Susan, David - Other examples
- Attribute Authors of relation Books
- Attribute Reference_Books of relation Courses
- Attribute of Hobbies of relation Employees
15Key Constraints
- Rule 2 (The Unique Row Constraint)
- All tuples in a relation must be distinct
- No two rows in the same table can be identical at
any given time. That is, each tuple in a table is
unique - This rule comes from the mathematical definition
that a relation is a set of tuples and the fact
that a set never contains two identical elements - This rule has serious implications on the
performance of relational database systems. - When a new tuple is inserted to a relation, the
system has to make sure that the new tuple is
different from all existing tuples in the relation
16Superkey (1)
- Definition
- A superkey (SK) of a relation is a set of one or
more attributes whose values uniquely identify
every tuple of the relation - Superkey a subset of attributes whose values are
distinct for each tuple in R - Superkeys may contain redundant attributes
- Examples
- Attribute SSN is a SK of relation Students
- SSN, Name is also a SK
- The set of attributes Name, Birthdate,
Home_Address is a SK of Students
17Superkey (2)
- Is the set of all attributes of a relation a
superkey of the relation? - In the following relation, is attribute A a
superkey? - How about B, C?
- A B C
D - a1 b1 c1
d1 - a1 b2 c2
d1 - a2 b2 c1
d1 - a2 b1 c2
d1
18Superkey (3)
- Some claims
- Every relation has at least one superkey
- Any superset of a superkey is a superkey
- From a given state of a relation, we may
determine whether a set of attributes of the
relation does not form a superkey, but we can not
determine if a set of attributes forms a superkey
19 Key (1)
- Key a subset of attributes in R whose values are
unique for each tuple in r(R), but with no
redundant attributes. - Definition A set of attributes is a key of a
relation if - (1) it is a superkey of the relation, and
- (2) no proper subset of it is a superkey of the
relation - A Key of any relation is a minimal super key
- Student ID is candidate key for Student, since it
is a superkey, and no subset of it is a superkey.
20Key (2)
- If any attribute is removed from a key, then the
remaining attributes no longer form a key
(minimality property) - Example
- Students(SSN, Name, Home_Address, Birthdate,
GPA), - SSN is a key.
- SSN, Name is a superkey but not a key
- Name, Home_Address, Birthdate is also a key
21Key (3)
- Every relation has at least one key
- A relation may have more than one key
-
- Keys of a relation are also known as candidate
keys of the relation. - Candidate key a subset of attributes which can
be used as a key - Customer-id is candidate key of customer
- account-number is candidate key of account
22Candidate Key
- Example Find all possible candidate keys for the
following relation based on its current tuples. - A B C D
- a1 b1 c1 d1
- a1 b2 c2 d1
- a2 b2 c1 d1
- a2 b1 c2 d1
- Answer A, B, A, C, B, C
- Although several candidate keys may exist, one of
the candidate keys is selected to be the primary
key
23Primary Key (1)
- Definition
- A primary key of a relation is the candidate key
chosen by the database designer for a particular
application. - The primary key of each relation is chosen and
declared at the time when the relation is defined
Once chosen, it cannot be changed - The primary key is usually chosen to be the
candidate key that has the smallest number of
attributes to improve both storage efficiency and
query processing efficiency
24Primary Key (2)
- With the primary key defined, only the values
under the attributes in the primary key need to
be checked for identifying duplicate when new
tuples are inserted (index is often used) - The primary key of a relation is often used in
references from other relations
25Example
- STUDENT(ST-NO, SSN, Name, Age, GPA, )
- Superkey
- ST-No, Name
- SSN, Age, Address
- Candidate keys
- SSN
- ST-NO
- Primary Key
- ST-NO
26Null Value
- For a given tuple t and a given attribute A of a
relation R, the following cases may occur when t
is to be inserted into R. - tA is unknown
- tA is yet to be assigned
- tA is inapplicable
- When one of the above cases occurs, assign a null
value to tA - tA null
27Constraints on Null
- Rule 3 (Entity Integrity Constraint)
- No attribute in the primary key can take on null
values - Note A null value is different from either a 0
or a space - No primary key value can be null
- i.e. null is not allowed as a value for a primary
key
28Foreign Key (1)
- Definition
- A set of attributes of relation R1 is a foreign
key FK in R1 if it satisfies the following two
conditions - There is a relation R2 with the primary key PK
such that FK and PK have the same number of
attributes with compatible domains - For any tuple ti in R1, either there exists a
tuple tj in R2 such that tiFK tjPK or
tiFK is null
29Foreign Key (2)
- R1(PK1, A1, A2, , An, FK)
- R2(PK2, B1, B2, , Bm)
- FK is called a foreign key iff
- Attributes in FK have the same domain as PK2
- A value of FK in a tuple ti of R1 either occurs
as a value of PK2 for some tuple tj in R2 or
null - tiFK tjPK2
- we say that ti refers to (references) tj
- R1 and R2 in the definition could be the same
relation - Employees(SSN, Age, Salary, Position, Manager_SSN)
30Foreign Key (3)
- Employee SSN Name Age
Dept-Name - 123456789 John
45 Sales - 234567891 Mary 42
Service - 345678912 Bob
39 null - Department Name Location
Manager - Sales
Binghamton Bill - Inventory Vestal
Charles - Service
Vestal Maria - Dept_Name of Employee is a foreign key
referencing Name of Department
31Foreign Key (4)
- Rule 4 (Referential Integrity Constraint)
- No relation is allowed to contain unmatched
foreign key values - States that a tuple in one relation which refers
to another relation must refer to an existing
tuple in that relation - Using a foreign key of a relation to reference
the (primary) key of another relation is THE WAY
used by the relational data model to establish
relationships among different relations
32Foreign Key (5)
STNO
CSNO
Name
Ruba 123 cs111
Ali 222 cs210
CSNO Name Hrs
Cs111 Intro to com 3
Cs210 C programming 3
33Semantic Integrity Constraints
- GPA grater than or equal to 0 and less than or
equal to 4 - Age greater than 0
- Grade greater than or equal 35 and less than or
equal 100
34Update Operations on Relations
- Updates
- insert, or
- delete, or
- Modify
- Retrievals
- Queries
35The Insert Operation
- Provides a list of attribute values for a new
tuple (t) which is to be inserted into a relation
r(R) - When inserting a new tuple in r(R), we should
make sure that the values preserve all constraint
types we studied before - May violate all types of constraint
- InsertltCecilia, F, Kolonsky, null,
1960-04-05,3rd Street, Katy, TX, F, 28000,
null, 4gt into EMPLOYEE gt - Rejected, Primary Key is null (Entity Integrity
Constraint)
36The Insert Operation
- InsertltCecilia, F, Kolonsky, 99988777,
1960-04-05,3rd Street, Katy, TX, F, 28000,
null, 4gt into EMPLOYEE gt - Rejected, Primary Key is duplicate (Key
Constraint) - InsertltCecilia, F, Kolonsky, 677678989,
1960-04-05,3rd Street, Katy, TX, F, 28000,
null, 7gt into EMPLOYEE gt - Rejected, DEPARTMENT 7 does not exist.
(Referential Integrity Constraint) - InsertltCecilia, F, Kolonsky, 677678989,
1960-04-05,3rd Street, Katy, TX, F, 28000,
null, 4gt into EMPLOYEE gt - Accepted
37The Delete Operation
- Deletes a tuple or tuples from r(R)
- It can only violate referential integrity
constraint. - Delete from WORKS_ON where ESSN 999887777
and PNO 10 - Accepted
- Delete from EMPLOYEE where SSN 999887777
- Rejected. Tuples in WORK_ON refer to this tuple,
if the tuple is deleted, referential integrity
violations will result - Delete from EMPLOYEE where SSN 333445555
- Rejected. Tuples in EMPLOYEE, DEPARTMENT,
WORK_ON, and DEPENDENT refer to this tuple, if
the tuple is deleted, referential integrity
violations will result
38The Modify Operation
- It is used to change the values of one or more
attributes in a tuple or more of some relation
r(R) - May violate all constraints
- Update EMPLOYEE, set Salary 29000 where SSN
999887777 - Accepted
- Update EMPLOYEE, set DNO 1 where SSN
999887777 - Accepted
39The Modify Operation
- Update EMPLOYEE, set DNO 7 where SSN
999887777 - Rejected, it violates referential integrity
constraint - Update EMPLOYEE, set SSN 980000000 where SSN
999887777 - Rejected, it violates referential integrity
constraint - Update EMPLOYEE, set SSN 987654321 where SSN
999887777 - Rejected, it violates unique row (key) and
referential integrity constraints
40Basic Relational Algebra Operations
- Is a collection of operators that are used to
manipulate entire relations. The result of each
operation is a new relation which can be further
manipulated.
41SELECT Operation
- ? --- Select (sigma)
- Format ?selection-condition(R)
- Semantics
- returns all tuples of relation R that satisfy
the selection-condition - Select operation is unary. It applies to a single
relation - is used to select a subset of the tuples in a
relation that satisfy a selection condition.
42Formats of Selection Conditions
- (a) A op v A is an attribute, op is an operator
(, ?, lt, ?, gt, ?), and v is a constant. - Age ? 20, Name Bill'
- (b) A op B A and B are two attributes in R.
- Persons(SSN, Name, Birthplace, Residence)
- Birthplace Residence
- (c) Combinations of (a) and (b) connected by and,
or or not. - Age ? 20 and Birthplace Residence
43An Example of SELECT (1)
- Example Find all students who are 20 years old
or younger, and whose birthplace is the same as
his/her residence. - ?Age ? 20 and Birthplace Residence(Students)
44An Example of Select (2)
- If the current Students is
- SSN Name Age GPA
Birthplace Residence - 123456789 John 20 3.2
Vestal Vestal - 234567891 Mary 18 2.9 Binghamton
Vestal - 345678912 Bill 19 2.7
Endwell Endwell - 456789123 Nancy 24 3.6 Binghamton
NYC - then the result is a new relation
- SSN Name Age GPA
Birthplace Residence - 123456789 John 20 3.2
Vestal Vestal - 345678912 Bill 19 2.7
Endwell Endwell
45SELECT Operation
- Commutativity of select
- ?condition-1(?condition-2(R))
- ?condition-2(?condition-1(R))
- ?condition-1 and condition-2(R)
- ?city Irbid AND GPA gt 65 (STUDENT) or
- ?city Irbid (?GPA gt 65 (STUDENT)
46PROJECT Operation
- ? --- project (pi)
- Format
- ?attribute-list(R),
- where attribute-list is a subset of all
attributes in R - Semantics
- Returns all tuples of relation R but for each
tuple, only values under attribute-list are
returned - Project removes duplicate tuples automatically
47Project (2)
- Selects certain columns from the table and
discards other columns - ? ltattribute-listgt (ltrelation-namegt)
- degree of resulting relation is equal to the
number of attributes in the ltattribute-listgt - The number of tuples of the result of project is
less than or equal to the number of tuples in the
original relation. (It removes the duplicates)
48Project (3)
- Example Find the name and GPA of all students.
- ?Name,GPA(Students)
- Students
- SSN Name Age GPA Name
GPA - 123456789 John 20 3.2 John
3.2 - 234567891 Mary 18 2.9 Mary
2.9 - 345678912 John 19 3.2
- Input Relation
Output Relation
49Project (4)
- If attribute-list-1 ? attribute-list-2,
- then
- ?attribute-list-1(?attribute-list-2(R))
- ?attribute-list-1(R)
- The Project operation is not commutative
- Retrieve all student numbers and names who live
in Amman. - ?STNO, ST-Name(?City Amman(STUDENT))
50Project (5)
- In complex queries, it becomes necessary to store
intermediate results, therefore we should know
how to give names to relations and attributes - Amman-students ?city Amman (STUDENT)
- Result ? STNO, ST-Name (Amman-Students)
- or
- Result(Number,Name) ? STNO, ST-Name
(Amman-Students)
Renaming of attributes
51Select and Project
- Example
- Find the name and GPA of all students who are 20
years old or younger and whose birthplace and
residence are the same - ?Name, GPA(?Age?20 and BirthplaceResidence(Stud
ents))
52RENAME Operation
- ? --- rename (rho)
- Format ?S(R)
- Semantics
- Make a copy of relation R and name the copy as S
- ?S(R)
- Rename R only
- ?S(B1, B2, , Bn)(R)
- Rename R and its attributes
- ?(B1, B2, , Bn)(R)
- Rename attributes only
53Set Theoretic Operations
- UNION, INTERSECTION, DIFFERENCE.
- binary (applied to two relations at a time)
- To apply any of these operators to relations,
relations should be union-compatible. - Two relations R(A1, A2, , An) and S(B1, B2, ,
Bm) are said to be union-compatible if - they have the same degree (n m) and
- dom(Ai) dom(Bi) for 1 ? i ? n.
- Both R and S have the same number of attributes
and the corresponding attributes have the same
domain
54Union (1)
- ? --- union
- Format
- R1 ? R2
- Semantics
- Returns all tuples that belong to either R1 or
R2. - Formally
- R1 ? R2 t t ? R1 or t ? R2
- Condition of union
- R1 and R2 must be union compatible.
- The union operator removes duplicate tuples
automatically.
55Union (2)
- Example
- R1 R2
R1 ? R2 - A B C A B C A B
C - a1 b1 c1 a0 b0 c0 a1 b1
c1 - a2 b2 c2 a1 b1 c1 a2 b2
c2 - a3 b3 c3 a2 b2 c2 a3 b3
c3 - a4 b4 c4 a0
b0 c0 -
a4 b4 c4
56Set Difference (1)
- - set difference
- Format
- R1 - R2
- Semantics
- Returns all tuples that belong to R1 but not R2.
- Formally
- R1 - R2 t t ? R1 and t ? R2
- Set difference also requires union compatibility
between R1 and R2
57Set Difference (2)
- Example
- R1 R2 R1
- R2 - A B C A B C A B
C - a1 b1 c1 a0 b0 c0 a3 b3
c3 - a2 b2 c2 a1 b1 c1
- a3 b3 c3 a2 b2 c2
- a4 b4 c4
58INTERSECTION
- ? --- set intersection
- Format
- R1 ? R2
- Semantics
- Returns all tuples that belong to both R1 and R2.
- Formally
- R1 ? R2 t t ? R1 and t ? R2
- Derivation from existing operators
- R1 ? R2 R1 - (R1 - R2) R2 - (R2 - R1)
59INTERSECTION
- Union and Intersection are commutative operations
- R ? S S ? R and
- R ? S S ? R
- Union and Intersection can be applied to any
number of relations and both are associative - R ? (S ? Q) (S ? R) ? Q
- R ? (S ? Q) (S ? R) ? Q
- Difference operator is not commutative
- R - S ? S - R in general.
60Cartesian Product (1)
- ? --- Cartesian product
- Format
- R1 ? R2
- Semantics
- Returns every tuple that can be formed by
concatenating a tuple in R1 with a tuple in R2 - Binary operation, but the relations on which it
is applied do not have to be union compatible
61Cartesian Product (2)
- Example R1 A B C R2 B
D E - a1 b1 c1
b1 d1 e1 - a2 b2 c2
b2 d2 e2 - a3 b3 c3
- R1 ? R2 A R1.B C R2.B
D E - a1 b1 c1
b1 d1 e1 - a1 b1 c1
b2 d2 e2 - a2 b2 c2
b1 d1 e1 - a2 b2 c2
b2 d2 e2 - a3 b3 c3
b1 d1 e1 - a3 b3 c3
b2 d2 e2 -
62Cartesian Product (3)
- If R1 and R2 have common attributes, then the
full names of these attributes must be used - Example
- Use R.A instead of A
- To prevent identical attribute names from
occurring in the same relation schema, R ? R is
not allowed. However, R ? ?S(R) is allowed - Commutativity R1 ? R2 R2 ? R1
63Cartesian Product (4)
- Given R(A1, A2, , An) and S(B1, B2, , Bm)
- R?S Q(A1, A2, , An, B1, B2, , Bm)
- degree of Q n m
- If R1 has N tuples and R2 has M tuples, then
- R1 ? R2 has NM tuples
- Cartesian product is extremely expensive
- If R1 and R2 are both large, then each relation
may need to be scanned many times to perform the
Cartesian product. - Writing out the result can be very expensive due
to the large size of the result
64Example
- Retrieve for each female employee a list of names
of her dependents - FEMALE-EMPS ? Sex F (EMPLOYEE)
- EMP-NAMES ?Fname,Lname,SSN(FEMALE-EMPS)
- EMP-DEPENDENTS EMP-NAMES ? DEPENDENT
- ACTUAL-DEP ?SSN ESSN (EMP-DEPENDENTS)
- RESULT ?Fname,Lname,Dependent-name(ACTUAL-DEP)
65Relational Algebra Example (1)
- Example Find the names of each employee and
his/her manager - Employees SSN Name Age
Dept-Name - 123456789 John 34
Sales - 234567891 Mary 42
Service - 345678912 Bill
39 null - Departments Name Location
Manager - Sales
XYZ Bill - Inventory YZX
Charles - Service ZXY
Maria
66Relational Algebra Example (2)
- A relational algebra expression is
- ?Employees.Name, Departments.Manager
- (?Employees.Dept_Name Departments.Name
- (Employees ? Departments))
- A simplified version (don't use full name when
you don't have to) - ?Employees.Name, Manager
- (?Dept_Name Departments.Name
- (Employees ?Departments))
67Relational Algebra Example (3)
- Use assignment operator ( ) to save the
intermediate result into a temporary relation - Example The following expression
- ?Employees.Name, Manager(?Dept_Name
Departments.Name - (Employees ? Departments) )
- is equivalent to the following series of
expressions - TEMP1 Employees ? Departments
- TEMP2 ?Dept_NameDepartments.Name(TEMP1)
- RESULT ?Employees.Name, Manager (TEMP2)
68Relational Algebra Example (4)
- Example Find the names of all students who have
the highest GPA - STUDENTS
- SSN Name GPA
- 123456789 John 3.8
- 234567891 Maria 3.2
- 345678912 Mike 3.0
69Relational Algebra Example (5)
- Step 1 Find the GPAs that are not the highest
- TEMP1 ?Students.GPA(?Students.GPA lt S2.GPA
(Students ? ?S2(Students))) - Students ? ?S2(Students)
- SSN Name GPA
S2.SSN S2.Name S2.GPA - 123456789 John 3.8
123456789 John 3.8 - 123456789 John 3.8
234567891 Maria 3.2 - 123456789 John 3.8
345678912 Mike 3.0 - 234567891 Maria 3.2
123456789 John 3.8 - 234567891 Maria 3.2
234567891 Maria 3.2 - 234567891 Maria 3.2
345678912 Mike 3.0 - 345678912 Mike 3.0
123456789 John 3.8 - 345678912 Mike 3.0
234567891 Maria 3.2 - 345678912 Mike 3.0
345678912 Mike 3.0
70Relational Algebra Example (6)
- Step 2 Find the highest GPA
- TEMP2 ?GPA(Students) - TEMP1
- Step 3 Find the names of students who have the
highest GPA - RESULT ?Name(?Students.GPA EMP2.GPA
(Students?TEMP2)) -
71Join (1)
- ? --- join
- Format
- R1 ?join-condition R2
- Semantics
- Returns all tuples in R1 ? R2 which satisfy the
join condition - Derivation from existing operators
- R1 ?join-condition R2 ?join-condition(R1 ? R2)
- Format of join condition
- R1.A op R2.B
- R1.A1 op R2.B1 and R1.A2 op R2.B2 . . .
- Tuples whose join attributes are NULL do not
appear in the result
72Join (2)
- Example Find the names of all employees and
their department locations - Employees SSN Name Age
Dept-Name - 123456789 John 34
Sales - 234567891 Mary 42
Service - 345678912 Bill
39 null - Departments Name Location
Manager - Sales
Binghamton Bill - Inventory Endicott
Charles - Service
Vestal Maria
73Join (3)
- ?Employees.Name, Location (Employees
- ?Dept-Name Departments.Name Departments)
- Result Name Location
- John
Binghamton - Mary
Vestal
74Join (4)
- Example Find the names of all employees who earn
more than his/her manager - Employees SSN Name Salary
Manager-SSN - 123456789 John
34k 234567891 - 234567891 Bill
40k null - 345678912 Mary
38k null - 456789123 Mike
41k 345678912 - ?Employees.Name (Employees ?Employees.Manager-SSN
EMP.SSN and Employees.Salary gt EMP.Salary - ?EMP(Employees))
75Equijoin
- Definition A join is called an equijoin if only
equality operator is used in all join conditions.
- R1 R2 R1 ?R1.B R2.B R2
- A B B C A R1.B R2.B C
- a b b c a b
b c - d b c d d b b
c - b c a d b c
c d - Most joins in practice are equijoins.
76Natural Join (1)
- Definition
- A join between R1 and R2 is a natural join if
- There is an equality comparison between every
pair of identically named attributes from the two
relations - Among each pair of identically named attributes
from the two relations, only one remains in the
result - Natural join is denoted by ? with no join
conditions explicitly specified
77Natural Join (2)
- Example
- R1(A, B, C) ? R2(A, C, D)
- has attributes (A, B, C, D) in the result
- Questions
- How to express natural join in terms of equijoin
and other relational operator? - R1 ? R2 ?R1.A, B, R2.C, D
- (R1? R1.AR2.A and R1.C R2.C R2)
78A Complete Set of Relational Algebra Operations
- The relational algebra is a set of expressions as
defined below - A relation is an expression.
- If E1 and E2 are expressions, so are
- ?P(E1), ?A(E1), ?S(E1), E1?E2, E1-E2, E1?E2
- That is, any expression that can be formed from
base relations and the six relational operators
is a relational algebra expression
79Division (1)
- A motivating example
- Emp Proj Proj
- 1 10 10
- 1 20 20
- 1 30 30
- 2 20
- 3 10
- 3 20
- Which employee participates in all projects?
80Division (2)
- ? --- division
- Format
- R1 ? R2
- Restriction
- Every attribute in R2 is in R1
- Semantics
- Consider
- R1(A1, ..., An, B1, ..., Bm) ? R2(B1, ..., Bm)
- Let T ? A1, ..., An (R1)
- Returns those tuples in T such that for every
tuple t returned, the concatenation of t with
every tuple in R2 is in R1 - R1 ? R2
- t t ? ?A1, ..., An(R1) ? ? u ? R2, tu ? R1
81Division (3)
- E.g. R1 R2 R1
? R2 - A B C D C D A B
- a b c d c d a
b - a b e f e f
e d - b c e f
- e d c d
- e d e f
- a b d e
- Derivation from existing operators
- R1 ? R2 T - ?A1, ..., An ((T ? R2) - R1)
82Division (4)
- Example
- Find the names and GPAs of all students who take
all courses taken by a student with SSN
123456789 - Students (SSN, Name, GPA)
- Takes (SSN, Course, GRADE)
- Step 1 Find all courses that are taken by the
student with SSN 123456789. - TEMP1 ?Course(?SSN 123456789(Takes))
83Division (5)
- Step 2 Find the SSNs of those students who take
all courses in TEMP1. - TEMP2 Takes ? TEMP1
- Step 3 Obtain the final result.
- RESULT ?Name, GPA(Students ? TEMP2)
84Division (6)
- Find the names of employees who participate in
every (all) project - Employees(SSN, Name, Department)
- Projects(Proj, Name, Budget)
- Participation(SSN, Proj, Hours)
- ?Name(Employees?(Participation??Proj(Projects)))
- Can we replace Projects by Participation?
Only when Project has total participation
85Aggregate Functions Grouping
- Well know Aggregate Functions
- SUM, AVERAGE, MAXIMUM, MINIMUM, COUNT
- Format ? (Script F)
- ltgrouping attributesgt ? ltfunction listgt (R)
- ltgrouping attributesgt is a list of attributes in
R - ltfunction listgt is a list of
- (ltfunctiongt ltattributegt) pairs
- ltfunctiongt is one of aggregate functions
- ltattributegt is an attribute in R
86Examples
- Retrieve each department number, the number of
employees in the department, and their average
salary - DNO ?COUNT( SSN), AVERAGE(SALARY) (EMPLOYEE)
- DNO COUNT_SSN AVERAGE_SALARY
- 1 1 55000
- 4 3 31000
- 5 4 33250
-
87Examples
- Retrieve for each department number, the number
of employees in the department, and their average
salary - ?R(DNO, NO_OF_EMP, AVERAGE_SAL)(DNO ? COUNT(
SSN), - AVERAGE(SALARY) (EMPLOYEE))
- DNO NO_OF_EMP AVERAGE_SAL
- 1 1 55000
- 4 3 31000
- 5 4 33250
-
88Examples
- If no grouping attributes are specified, the
functions are applied to attribute values of all
the tuples in the relation - Retrieve the number of employees, and their
average salary - ?COUNT( SSN), AVERAGE(SALARY) (EMPLOYEE))
- COUNT_SSN AVERAGE_SALARY
- 8 35125
89c
- R1 R2
R1 ? R2 - A B C C D E A B
C D E - a1 b1 c1 c1 d1 e1 a1 b1
c1 d1 e1 - a4 b3 c2 c6 d3 e2
- The second tuples of R1 and R2 are not present in
the result (called dangling tuples) - Applications exist that require to retain
dangling tuples.
90Outerjoin (2)
- ?O --- outer join
- Format R1 ?O R2
- Semantics like join except
- it retains dangling tuples from both R1 and R2
- it uses null to fill out missing entries
- R1 ?O R2
- A B C D
E - a1 b1 c1 d1
e1 - a4 b3 c2 null
null - null null c6 d3
e2
91Left Outerjoin and Right Outerjoin
- ?LO --- left outer join
- Format R1 ?LO R2
- Semantics like outerjoin but retains only
dangling tuples of the relation on the left - ?RO --- right outer join
- Format R1 ?RO R2
- Semantics like outerjoin but retains only
dangling tuples of the relation on the right
92Left Outerjoin and Right Outerjoin (2)
- R1 R2
R1 ? R2 - A B C C D E A B
C D E - a1 b1 c1 c1 d1 e1 a1 b1
c1 d1 e1 - a4 b3 c2 c6 d3 e2
93Left Outerjoin and Right Outerjoin (3)
-
- R1 ?LO R2
- A B C D
E - a1 b1 c1 d1
e1 - a4 b3 c2 null
null -
-
- R1 ?RO R2
- A B C D
E - a1 b1 c1 d1
e1 - null null c6 d3
e2 -
94Recursive Closure Operations
- This operation is applied to a recursive
relationship - Such as the relationship between an employee and
a supervisor - The relationship is described by the foreign key
SUPERSSN of the EMPLOYEE relation - Example
- Retrieve all supervisees of an employee e at all
levels - All employees e directly supervised by employee
e - All employees e directly supervised by each
employee e - All employees e directly supervised by each
employee e and so on
95Recursive Closure Operations
- Retrieve the SSNs of all employees e directly
supervised at level one - by the employee e
whose name is James Borg - Borg_SSN
- ?SSN(?FNAMEJames AND LNAMEBorg(EMPLOYEE))
- SUPERVISSION(SSN1, SSN2)
- ?SSN, SUPERSSN(EMPLOYEE)
- RESULT1(SSN)
- ?SSN1(SUPERVISION ?SSN2SSN BORG_SSN)
96Recursive Closure Operations
- To retrieve all employees supervised by Borg at
level 2 that is, all employees e supervised
by some employee e whose is directly supervised
by Borg - RESULT2(SSN)
- ?SSN1(SUPERVISION ?SSN2SSN RESULT1)
- To get both sets of employees supervised at
levels 1 and 2 by James Borg - RESULT RESULT1 ? RESULT2
97Examples of Queries in Relational Algebra (1)
- Many relational algebra queries can be expressed
using selection, projection and join operators by
following steps - (1) Determine necessary relations to answer the
query. - If R1, ..., Rn are all the relations needed,
- P are all conditions and
- T are all (target) attributes to be output,
- then form the initial query
- ?T(?P(R1 ? ... ? Rn))
98Relational Algebra Example (2)
- (2) If P contains a condition, say Ci, that
involves only attributes in Ri - replace Ri by ?Ci(Ri) and remove Ci from P
- (3) If P contains a condition, say C, that
involves attributes from both Ri and Rj - replace Ri ? Rj by Ri ?C Rj (or a natural
join) and remove C from P
99Relational Algebra Example (3)
- Consider the following database schema
- Students(SSN, Name, GPA, Age, Dept-Name)
- Enrollment(SSN, Course, Grade)
- Courses(Course, Title, Dept-Name)
- Departments(Name, Location, Phone)
100Relational Algebra Example (5)
- Query Find the SSNs and names of all students
who are CS major and who take CIS328 - (1) Relations Students and Enrollment are needed
- T Students.SSN, Students.Name
- P Students.Dept-Name CS,
Enrollment.Course CIS328,
Students.SSN Enrollment.SSN - The initial relational algebra query is
- ?Students.SSN, Name
- (?Dept-NameCS and CourseCIS328 and
- Students.SSN Enrollment.SSN
- (Students ? Enrollment))
101Relational Algebra Example (6)
- (2) Replace
- Students by
- ?Dept-Name CS(Students) and
- Enrollment by
- ?CourseCIS328(Enrollment)
- Remove the two conditions from the initial
expression
102Relational Algebra Example (7)
- (3) Replace
- Students ? Enrollment by
- Students ? Enrollment and
- remove Students.SSN Enrollment.SSN from the
initial expression. - The final expression
- ?Students.SSN,Name(?Dept-Name CS(Students) ?
- ? Course CIS328 (Enrollment))
103Relational Algebra Example (8)
- Query Find the SSN and name of each student who
is CS major together with the titles of the
courses taken by the student - ? Students.SSN, Name, Title
- (?Students.Dept-Name CS and Students.SSN
Enrollment.SSN and Enrollment.Course
Courses.Course - (Students ? Enrollment ? Courses))
104Relational Algebra Example (9)
- ? Students.SSN, Name, Title
- (((?Students.Dept-Name CS (Students)) ?
Enrollment) ? Courses) - ? Students.SSN, Name, Title ((?
Students.Dept-Name CS (Students)) ?
(Enrollment ? Courses))
105Relational Algebra Summary (1)
- Relational algebra operators
- Fundamental operators
- ?C(R), ?A(R), ?S(R), R1 ? R2,
- R1 - R2, R1 ? R2
- Other traditional operators
- R1 ? R2, R1 ?C R2, R1 ? R2
106Relational Algebra Summary (2)
- Some identities
- ?C1(?C2(R)) ?C2(?C1(R)) ?C1 and C2(R)
- ?L1(?L2(R)) ?L1(R) , if L1 ? L2
- R1 ? R2 R2 ? R1
- R1 ? (R2 ? R3) (R1 ? R2) ? R3
- R1 ? R2 R2 ? R1
- R1 ? (R2 ? R3) (R1 ? R2) ? R3
107Extra RA Examples
- www.geocities.com/cu_just/database/RA_Examples.zip
- Chapter 3 slides from Book Web-Site
- Chapter 3 Exercises
-
-
- Good Luck