Title: The Relational Data Model
1The Relational Data Model
- (Based on Chapter 5 in Fundamentals of Database
Systems - by Elmasri and Navathe, Ed. 4)
2Contents
- 1 Relational Model Concepts2 Characteristics of
Relations3 Relational Integrity
Constraints 3.1 Key Constraints 3.2 Entity
Integrity Constraints 3.3 Referential Integrity
Constraints4 Update Operations on
Relations5 Relational Algebra Operations 5.1 SEL
ECT s and PROJECT P 5.2 Set
Operations 5.3 JOIN Operations 5.4 Additional
Relational Operations
31 Relational Model Concepts
4BASIS OF THE MODEL
- The relational Model of Data is based on the
concept of a Relation. - A Relation is a mathematical concept based on
the ideas of sets. - The strength of the relational approach to data
management comes from the formal foundation
provided by the theory of relations. - We review the essentials of the relational
approach in this chapter.
5BASIS OF THE MODEL
- The relational model is covered in Chapter 5 of
the book Fundamentals of Database Systems, by
Elmasri and S.B. Navathe, Ed. 4, 2004. - The model was first proposed by Dr. E.F. Codd of
IBM in 1970 in the following paper - "A
Relational Model for Large Shared Data Banks,"
Communications of the ACM, June 1970.
6INFORMAL DEFINITIONS
- RELATION
- A table of values
- A relation may be thought of as a set of rows.
- A relation may alternately be though of as a set
of columns. - Each row of the relation may be given an
identifier. - Each column typically is called by its column
name or column header or attribute name.
7FORMAL DEFINITIONS
- A Relation may be defined in multiple ways.
- The Schema of a Relation R (A1, A2,
.....An)Relation R is defined over attributes
A1, A2, ..,An
8FORMAL DEFINITIONS (contd.)
- For Example
- CUSTOMER (Cust-id, Cust-name, Address, Phone)
- Here, CUSTOMER is a relation defined over the
four attributes Cust-id, Cust-name, Address,
Phone, each of which has a domain or a set of
valid values. For example, the domain of Cust-id
is 6 digit numbers.
9FORMAL DEFINITIONS (contd.)
- A tuple is an ordered set of values
- Each value is derived from an appropriate domain.
- Each row in the CUSTOMER table may be called as a
tuple in the table and would consist of four
values.
10FORMAL DEFINITIONS (contd.)
- For example
- lt632895, "John Smith", "101 Main St. Atlanta, GA
30332", "(404) 894-2000"gt is a triple belonging
to the CUSTOMER relation.
11FORMAL DEFINITIONS (contd.)
- A relation may be regarded as a set of tuples
(rows). - Columns in a table are also called as attributes
of the relation.
12FORMAL DEFINITIONS (contd.)
- The relation is formed over the Cartesian product
of the sets - each set has values from a domain
- that domain is used in a specific role which is
conveyed by the attribute name. - For example, attribute Cust-name is defined over
the domain of strings of 25 characters. The role
these strings play in the CUSTOMER relation is
that of the name of customers.
13FORMAL DEFINITIONS (contd.)
- Formally, Given R(A1, A2, .........., An) r(R)
is subset-of - dom (A1) X dom (A2) X ...X dom(An)
- R schema of the relation
- r of R a specific "value" or population of R.
- R is also called the intension of a relation
- r is also called the extension of a relation
14FORMAL DEFINITIONS (contd.)
- For example
- Let S1 0,1
- Let S2 a,b,c
- Let R subset-of S1 X S2
- r(R) lt0,agt , lt0,bgt , lt1,cgt
15The attributes and tuples of a relation STUDENT.
16DEFINITION SUMMARY
- Informal Terms
- Table
- Column
- Row
- Value in a column
- Table Definition
- Populated Table
- Formal Terms
- Row
- Attribute
- Tuple
- Domain
- Schema of a relation
- Extension
17Notes
- Whereas languages like SQL use the informal terms
of TABLE (e.g. CREATE TABLE), COLUMN (e.g.
SYSCOLUMN variable), the relational database
textbooks present the model and operations on it
using the formal terms.
182 Characteristics of Relations
- Ordering of tuples in a relation r(R)
- The tuples are not considered to be ordered,
even though they appear to be in the tabular form.
192 Characteristics of Relations
- Ordering of attributes in a relation schema R
(and of values within each tuple) - We will consider the attributes in R(A1, A2, ...,
An) and the values in tltv1, v2, ..., vngt to be
ordered . - (However, a more general alternative definition
of relation does not require this ordering).
202 Characteristics of Relations
- Values in a tuple
- All values are considered atomic (indivisible).
- A special null value is used to represent values
that are unknown or inapplicable to certain
tuples.
21Notation
- We refer to component values of a tuple t by
tAi vi (the value of attribute Ai for tuple
t). - Similarly, tAu, Av, ..., Aw refers to the
subtuple of t containing the values of attributes
Au, Av, ..., Aw, respectively.
22The relation STUDENT from Figure 5.1 with a
different order of tuples.
233 Relational Integrity Constraints
- Constraints are conditions that must hold on all
valid relation instances. - There are five main types of constraints
- NULL constraints
- Domain constraints
- Key constraints,
- Entity integrity constraints, and
- Referential integrity constraints
243.0 NULL Constraints
- Domain constraints specify that the value of each
attribute A can not hold NULL value.
253.0 Domain Constraints
- Domain constraints specify that the value of each
attribute A must be atomic value from domain
dom(A). - See Section 5.1.1
263.1 Key Constraints
- Superkey of R
- A set of attributes SK of R such that no two
tuples in any valid relation instance r(R) will
have the same value for SK. That is, for any
distinct tuples t1 and t2 in r(R), t1SK ltgt
t2SK. - Key (Candidate key) of R
- A "minimal" superkey
- a superkey K such that removal of any attribute
from K results in a set of attributes that is not
a superkey.
273.1 Key Constraints
- Example
- The CAR relation schemaCAR(State, Reg,
SerialNo, Make, Model, Year) - has two keys
- Key1 State, Reg,
- Key2 SerialNo,
- which are also superkeys.
- SerialNo, Make is a superkey but not a key.
283.1 Key Constraints
- If a relation has several candidate keys, one is
chosen arbitrarily to be the primary key. - The primary key attributes are underlined.
29INSERT FIGURE 7.4
30Candidate keys of the CAR relation
- The CAR relation, with two candidate keys
LicenseNumber and EngineSerialNumber
313.2 Entity Integrity
- Relational Database Schema
- A set S of relation schemas that belong to the
same database. - S is the name of the database.
- S R1, R2, ..., Rn
32Entity Integrity
- tPK ltgt null for any tuple t in r(R)
- The primary key attributes PK of each relation
schema R in S cannot have null values in any
tuple of r(R). - This is because primary key values are used to
identify the individual tuples.
33Note
- Other attributes of R may be similarly
constrained to disallow null values, even though
they are not members of the primary key.
343.3 Referential Integrity
- A constraint involving two relations (the
previous constraints involve a single relation). - Used to specify a relationship among tuples in
two relations the referencing relation and the
referenced relation.
35Foreign key
- Tuples in the referencing relation R1 have
attributes FK (called foreign key attributes)
that reference the primary key attributes PK of
the referenced relation R2. - A tuple t1 in R1 is said to reference a tuple t2
in R2 if t1FK t2PK. - A referential integrity constraint can be
displayed in a relational database schema as a
directed arc from R1.FK to R2.
36Schema diagram for the COMPANY relational
database schema.
37One possible database state for the COMPANY
database schema.
38Referential integrity contstraints displayed on
the COMPANY relational database schema.
394 Update Operations on Relations
- There are three basic update operations on
relations - INSERT a tuple.
- DELETE a tuple.
- MODIFY a tuple.
- Integrity constraints should not be violated by
the update operations.
404 Update Operations on Relations
- Several update operations may have to be grouped
together. - Updates may propagate to cause other updates
automatically. This may be necessary to maintain
integrity constraints.
414 Update Operations on Relations
- In case of integrity violation, several actions
can be taken - cancel the operation that causes the violation
(REJECT option) - perform the operation but inform the user of the
violation - trigger additional updates so the violation is
corrected (CASCADE option, SET NULL option) - execute a user-specified error-correction routine
425 The Relational Algebra
- Operations to manipulate relations.
- Used to specify retrieval requests (queries).
- Query result is in the form of a relation.
435 The Relational Algebra
- Relational Operations
- 5.1 SELECT s and PROJECT P operations.
- 5.2 Set operations
- These include UNION ?, INTERSECTION n,
DIFFERENCE -, CARTESIAN PRODUCT ?. - 5.3 JOIN operations .
- 5.4 Other relational operations
- DIVISION, OUTER JOIN, AGGREGATE FUNCTIONS.
445.1 SELECT operation (denoted by s )
- Selects the tuples (rows) from a relation R that
satisfy a certain selection condition c - Form of the operation sc(R)
- The condition c is an arbitrary Boolean
expression on the attributes of R
455.1 SELECT operation (denoted by s )
- Resulting relation has the same attributes as R
- Resulting relation includes each tuple in r(R)
whose attribute values satisfy the condition c
46Boolean expression
- The Boolean expression specified in ltselection
conditiongt is made up of a number of clauses of
the form - ltattribute namegt ltcomparison opgt ltconstant
valuegt, or - ltattribute namegt ltcomparison opgt ltattribute namegt
- Where ltattribute namegt is the name of the
attribute of R, - ltcomparison opgt is normally one of the operators
, lt, ?, gt, ?, ?, and - ltconstant valuegt is a constant value from
attribute domain.
47Boolean expression
- Clauses can be arbitrarily connected by the
Boolean operators AND, OR, and NOT to form a
general selection condition.
48Examples
- s DNO4(EMPLOYEE)
- s SALARYgt30000(EMPLOYEE)
- s (DNO4 AND SALARYgt25000) OR (EMPLOYEE)
- (DNO5 AND SALARYgt30000)
495.1PROJECT operation (denoted by P )
- Keeps only certain attributes (columns) from a
relation R specified in an attribute list L - Form of operation P L(R)
- Resulting relation has only those attributes of R
specified in L - P NAME,LNAME,SALARY(EMPLOYEE)
50Eliminates duplicate tuples
- Duplicate tuples are eliminated by the P
operation. - The PROJECT operation eliminates duplicate tuples
in the resulting relation so that it remains a
mathematical set (no duplicate elements)
51Example
- P SEX,SALARY(EMPLOYEE)
- If several male employees have salary 30000, only
a single tuple ltM, 30000gt is kept in the
resulting relation.
52INSERT FIGURE 7.8
53Sequences of operations
- Several operations can be combined to form a
relational algebra expression (query)
54Example
- Retrieve the names and salaries of employees who
work in department 4 - P FNAME,LNAME,SALARY (sDNO4 (EMPLOYEE) )
55Sequences of operations
- Alternatively, we specify explicit intermediate
relations for each step - DEPT4_EMPS ? s DNO4(EMPLOYEE)
- R ? P FNAME,LNAME,SALARY(DEPT4_EMPS)
56Sequences of operations
- Attributes can optionally be renamed in the
resulting left-hand-side relation - (this may be required for some operations that
will be presented later) - DEPT4_EMPS ? s DNO4(EMPLOYEE)
- R(FIRSTNAME,LASTNAME,SALARY) ? P
FNAME,LNAME,SALARY (DEPT4_EMPS)
57INSERT FIGURE 7.9
585.2 Set Operations
- Binary operations from mathematical set theory
- UNION R1 ? R2,
- INTERSECTION R1 ? R2,
- SET DIFFERENCE R1 ? R2,
- CARTESIAN PRODUCT R1 ? R2.
59Union compatibility
- For ?, ?, ? , the operand relations R1(A1, A2,
..., An) and R2(B1, B2, ..., Bn) - must have the same number of attributes, and
- the domains of corresponding attributes must be
compatible - that is, dom(Ai)dom(Bi) for i1, 2, ..., n.
- This condition is called union compatibility.
60Union compatibility
- The resulting relation for ?, ? , or ? has the
same attribute names as the first operand
relation R1 (by convention).
61INSERT FIGURE 7.11
62CARTESIAN PRODUCT
- CARTESIAN PRODUCT
- R(A1, A2, ..., Am, B1, B2, ..., Bn) ?
R1(A1, A2, ..., Am) X R2 (B1, B2, ..., Bn) - A tuple t exists in R for each combination of
tuples t1 from R1 and t2 from R2 such that - tA1, A2, ..., Amt1 and tB1, B2, ..., Bnt2
- If R1 has n1 tuples and R2 has n2 tuples, then R
will have n1n2 tuples.
63CARTESIAN PRODUCT
- CARTESIAN PRODUCT is a meaningless operation on
its own. - It can combine related tuples from two relations
if followed by the appropriate SELECT operation .
64CARTESIAN PRODUCT
- Example
- Combine each DEPARTMENT tuple with the EMPLOYEE
tuple of the manager. - DEP_EMP ? DEPARTMENT ? EMPLOYEE
- DEPT_MANAGER ? s MGRSSNSSN(DEP_EMP)
65INSERT FIGURE 7.12
665.3 JOIN Operations
- THETA Join
- Equijoin
- Outer Join
67THETA JOIN
- Similar to a CARTESIAN PRODUCT followed by a
SELECT. - The condition c is called a join condition.
- R(A1, A2, ..., Am, B1, B2, ..., Bn) ?
R1(A1, A2, ..., Am) c R2 (B1, B2, ..., Bn)
68JOIN condition
- A join condition is of the form
- ltconditiongt AND ltconditiongt AND AND
ltconditiongt - Where each condition is of the form Ai ? Bj ,
- Ai is an attribute of R, Bj is an attribute of S,
- Ai and Bj have the same domain, and
- ? is one of the comparison operators , lt, ?, gt,
?, ?
69EQUIJOIN
- The join condition c includes one or more
equality comparisons involving attributes from
R1 and R2. - That is, c is of the form(AiBj) AND ... AND
(AhBk) - 1lti,hltm, 1ltj,kltn
70EQUIJOIN
- In the above EQUIJOIN operation
- Ai, ..., Ah are called the join attributes of R1
- Bj, ..., Bk are called the join attributes of R2
71Example of using EQUIJOIN
- Retrieve each DEPARTMENT's name and its manager's
name - T ? DEPARTMENT MGRSSNSSN EMPLOYEE
- RESULT ? P DNAME,FNAME,LNAME(T)
72NATURAL JOIN ()
- In an EQUIJOIN R ? R1 c R2, the join
attribute of R2 appear redundantly in the result
relation R. - In a NATURAL JOIN, the redundant join attributes
of R2 are eliminated from R. - The equality condition is implied and need not
be specified. - R ? R1 (join attributes of R1),(join attributes
of R2) R2
73Example
- Retrieve each EMPLOYEE's name and the name of the
DEPARTMENT he/she works for - T ? EMPLOYEE (DNO),(DNUMBER) DEPARTMENT
- RESULT ? P FNAME,LNAME,DNAME(T)
74NATURAL JOIN ()
- If the join attributes have the same names in
both relations, - they need not be specified and
- we can write R ? R1 R2.
75Example
- Retrieve each EMPLOYEE's name and the name of
his/her SUPERVISOR - SUPERVISOR(SUPERSSN,SFN,SLN) ?
- P SSN,FNAME,LNAME(EMPLOYEE)
- T ? EMPLOYEE SUPERVISOR
- RESULT ? P FNAME,LNAME,SFN,SLN(T)
76Note
- In the original definition of NATURAL JOIN, the
join attributes were required to have the same
names in both relations.
77Note
- There can be a more than one set of join
attributes with a different meaning between the
same two relations.
78 For example
- JOIN ATTRIBUTES
- EMPLOYEE.SSN DEPARTMENT.MGRSSN
- EMPLOYEE.DNO DEPARTMENT.DNUMBER
- RELATIONSHIP
- EMPLOYEE manages the DEPARTMENT
- EMPLOYEE works for the DEPARTMENT
79Example
- Retrieve each EMPLOYEE's name and the name of the
DEPARTMENT he/she works for - T ? EMPLOYEE DNODNUMBER DEPARTMENT
- RESULT ? P FNAME,LNAME,DNAME(T)
80Recursive closure operation
- A relation can have a set of join attributes to
join it with itself
- RELATIONSHIP
- EMPLOYEE(2) supervises EMPLOYEE(1)
- JOIN ATTRIBUTES
- EMPLOYEE(1).SUPERSSN EMPLOYEE(2).SSN
81Recursive closure operation
- One can think of this as joining two distinct
copies of the relation, although only one
relation actually exists - In this case, renaming can be useful
82Example
- Retrieve each EMPLOYEE's name and the name of
his/her SUPERVISOR - SUPERVISOR(SSSN,SFN,SLN) ? P
SSN,FNAME,LNAME(EMPLOYEE) - T ? EMPLOYEE SUPERSSNSSSN SUPERVISOR
- RESULT? P FNAME,LNAME,SFN,SLN(T)
83Complete Set of Relational Algebra Operations
- All the operations discussed so far can be
described as a sequence of only the operations
SELECT, PROJECT, UNION, SET DIFFERENCE, and
CARTESIAN PRODUCT. - Hence, the set s ,P , ?, ? , ? is called a
complete set of relational algebra operations. - Any query language equivalent to these
operations is called relationally complete.
84Additional Operations
- For database applications, additional operations
are needed that were not part of the original
relational algebra. - These include
- Aggregate functions and grouping.
- OUTER JOIN and OUTER UNION.
85Division Operation
- Example
- Retrieve the names of employees who work on all
the projects that John Smith work on. - SMITH ? s FNAMEJohn AND LNAMESmith(EMPLOYEE)
- SMITH_PNOS ? p PNO(WORK_ON ESSNSSN SMITH)
- SSN_PNOS ? p PNO,ESSN(WORK_ON)
- SSNS(SSN) ? SSN_PNOS SMITH_PNOS
- RESULT ? p FNAME,LNAME(SSNS EMPLOYEE)
865.4 Additional Relational Operations
- Aggregate functions and grouping.
- OUTER JOIN and OUTER UNION
87AGGREGATE FUNCTIONS
- Functions such as SUM, COUNT, AVERAGE, MIN, MAX
are often applied to sets of values or sets of
tuples in database applicationsltgrouping
attributesgt Fltfunction listgt (R) - The grouping attributes are optional
88AGGREGATE FUNCTIONS
- Example 1
- Retrieve the average salary of all employees (no
grouping needed) - R(AVGSAL) ? F AVERAGE SALARY (EMPLOYEE)
89AGGREGATE FUNCTIONS
- Example 2
- For each department, retrieve the department
number, the number of employees, and the average
salary (in the department) - R(DNO,NUMEMPS,AVGSAL) ? DNO F COUNT SSN,
AVERAGE SALARY (EMPLOYEE) - DNO is called the grouping attribute in the
above example
90INSERT FIGURE 7.16
91OUTER JOIN
- In a regular EQUIJOIN or NATURAL JOIN operation,
tuples in R1 or R2 that do not have matching
tuples in the other relation do not appear in the
result - Some queries require all tuples in R1 (or R2 or
both) to appear in the result - When no matching tuples are found, nulls are
placed for the missing attributes
92OUTER JOIN
- LEFT OUTER JOIN
- R1 R2 lets every tuple in R1 appear in
the result - RIGHT OUTER JOIN
- R1 R2 lets every tuple in R2 appear in
the result - FULL OUTER JOIN
- R1 R2 lets every tuple in R1 or R2
appear in the result
93The left outer join operator
94Example of Queries (1)
- Query 1
- Retrieve the name and address of all employees
who work for the Research department.
95Answers of queries (1)
- RESEARCH_DEPT ? sDNAMEResearch(DEPARTMENT)
- RESEARCH_DEPT_EMPS ? (RESEARCH_DEPT
DNUMBERDNO EMPLOYEE) - RESULT ? pFNAME,LNAME,ADDRESS(RESEARCH_DEPT_EMPS)
96Example of Queries (2)
- Query 2
- For every project located in Stafford, list the
project number, the controlling department
number, and the department manages last name,
address, and birthdate.
97Answers of queries (2)
- STAFFFORD_PROJS ?
- sPLOCATIONSTAFFORD (PROJECT)
- CONTR_DEPT ? (STAFFORD_PROJS DNUMDNUMBER
DEPARTMENT) - PROJ_DEPT_MGR ?
- (CONTR_DEPT MGRSSNSSN EMPLOYEE)
- RESULT ? pPNUMBER,DNUM,LNAME,ADDRESS,BDATE(PROJ_DE
PT_MGR)
98Example of Queries (3)
- Query 3
- Find the names of employees who work on all the
projects controlled by department number 5.
99Answers of queries (3)
- DEPT5_PROJS(PNO) ? pPNUMBER(sDNUM5(PROJECT))
- EMP_PROJ(SSN, PNO) ? pESSN,PNO(WORKS_ON)
- RESULT_EMP_SSNS ? EMP_PROJ ? DEPT5_PROJS
- RESULT ? pLNAME,FNAME(RESULT_EMP_SSNSEMPLOYEE)
100Example of Queries (4)
- Query 4
- Make a list of project numbers for projects that
involve an employee whose last name is Smith,
either as a worker or a manager of the department
that controls the project.
101Answers of queries (4)
- SMITHS(ESSN) ? pPUNMBER(sLNAMESmith(EMPLOYEE))
- SMITH_WORKER_PROJS ? pPNO(WORKS_ONSMITHS)
- MGRS ?
- pLNAME,DNUMBER(EMPLOYEE SSNMGRSSNDEPARTMENT)
- SMITH_MGRS ? sLNAMESmith(MGRS)
- SMITH_MANAGED_DEPTS(DNUM) ? pDNUMBER(SMITH_MGRS)
- SMITH_MGR_PROJS(PNO) ? pPNUMBER(SMITH_MANAGED_DEPT
SPROJECT) - RESULT ? (SMITH_WORKER_PROJS?SMITH_MGR_PROJS)
102Example of Queries (5)
- Query 5
- List the names of all employees with two or more
dependents.
103Answers of queries (5)
- T1?(SSN, NO_OF_DEPS)
- ? ESSNFCOUNT DEPENDENT_NAME(DEPENDENT)
- T2 ? sNO_OF_DEPS?2(T1)
- RESULT ? pLNAME,FNAME(T2EMPLOYEE)
104Example of Queries (6)
- Query 6
- Retrieve the names of employees who have no
dependents.
105Answers of queries (6)
- ALL_EMPS ? pSSN(EMPLOYEE)
- EMPS_WITH_DEPS(SSN) ? pESSN(DEPENDENT)
- EMPS_WITHOUT_DEPS ?
- (ALL_EMPS ? EMPS_WITH_DEPS)
- RESULT ?
- pLNAME,FNAME(EMPS_WITHOUT_DEPS EMPLOYEE)
106Example of Queries (7)
- Query 7
- List the names of managers who have at least one
dependent.
107Answers of queries (7)
- MGRS(SSN) ? pMGRSSN(DEPARTMENT)
- EMPS_WITH_DEPS(SSN) ? pESSN(DEPENDENT)
- MGRS_WITH_DEPS ? (MGRS ? EMPS_WITH_DEPS)
- RESULT ?
- pLNAME,FNAME(MGRS_WITH_DEPS EMPLOYEE)