Title: CSC228H File Structures and Data Management
1CSC228H - File Structures and Data Management
The Relational model
- Proposed in 1970 by E.F. Codd in A Relational
Model of Data for - Large Shared Data Banks, Communications of the
ACM, Vol. 13, - No. 6, June 1970, pp. 377-387
- It was intended to form the basis for a high
level data language that - would yield maximum independence between the
organization and - representation of data, and the programs that
use and manipulate - this data.
- Additionally, it provides a basis for checking
consistency, and for - eliminating redundancy within the data
2CSC228H - File Structures and Data Management
The Relational model
- There are several components in this model
- - A set of relations (commonly referred to as
tables) - - A set of operations that act upon and
manipulate relations - - A set of rules for maintaining integrity
within the database - A relation (which is the relational model
equivalent of an entity - in Entity-Relationship diagrams) is a table
with rows and columns, - it has two components a relation schema, and a
relation instance - The schema defines the structure of the
relation, it describes the - relations name, and the names and domains of
each column - (attribute, or field) in the table.
3CSC228H - File Structures and Data Management
The Relational model
- As an example, consider a database that stores
information about - cartoon characters, lets say that there are
two entities in this - database, Cartoon, and Character, which can be
represented - in an E-R diagram as follows
Cartoon
Character
cartoon_id name network country
c_name shows
has
appears in
- Assume for simplicity that the characters names
are unique, and that - each character can appear in only one cartoon
(hence the one to - many relationship between them).
4CSC228H - File Structures and Data Management
The Relational model
- From this diagram we can obtain the relation
schema for the - Cartoon entity
- Cartoon(cartoon_idstring, namestring,
networkstring, countrystring) - The schema specifies what type of data will be
stored in each - column (in this case, all are strings).
- An instance of this relation would be a table in
which each row - has the same number of fields as the relation
schema, and fields - have the same domains. Rows are referred to as
tuples.
5CSC228H - File Structures and Data Management
The Relational model
- A particular instance of the Cartoon relation
would look like
- An instance of a relation must have the
following properties - - Attributes are atomic, they do not represent
a group of values or an array - - Column values are of the same kind
- - Each row is unique
- - The order of columns is unimportant
- - The order of rows is unimportant
- - Each column must have a unique name
6CSC228H - File Structures and Data Management
The Relational model
- Another requirement is that there must be an
attribute or group of - attributes that uniquely identify any row
within a table. Such a group - of attributes is called the primary key of the
relation - Every row in the table must contain a primary
key (it can not be NULL) - If we have parent-child relationships between
our entities, we must - include the primary key of the parent entity in
all its children within - each child, the primary key of the parent
becomes a foreign key. - Foreign keys are used to cross-reference values
across different - relations, and provide a way of checking and
enforcing integrity - within the database.
7CSC228H - File Structures and Data Management
The Relational model
- In our previous example, notice that the
Character entity is a child - of the Cartoon entity (the relationship between
Character and - Cartoon defines this).
Cartoon
Character
cartoon_id name network country
c_name shows
has
appears in
- Notice that from the attributes of both
entities, there is no way to - determine which character appears in which
cartoon. - We have to add the primary key of Cartoon to our
Character entity - as a foreign key.
8CSC228H - File Structures and Data Management
The Relational model
- Hence we have the following situation
Cartoon
Character
cartoon_id name network country
cartoon_id c_name shows
has
appears in
- We can declare the relation schema for the
Character relation as - Now it is possible to determine which characters
appear in which - cartoon by matching the value of the cartoon_id
attribute between the - two relations.
Character(carton_id string, c_name string,
shows string)
9CSC228H - File Structures and Data Management
The Relational model
- Our tables look like this now, we can
cross-reference values between - both by matching cartoon_id
Now we can determine that Bart Simpson, Homer
Simpson, and Lisa Simpson appear in The Simpsons
cartoon
10CSC228H - File Structures and Data Management
The Relational model
- Notice that c_name is still the primary key of
the Character relation - This is not the only possible outcome of
introducing a foreign key into - a relation
- - It is possible for the foreign key to become
part of the primary key - for the child relation
- - It is also possible for the foreign key to
become the primary key - for the child relation
- The particular situation that will occur depends
on the structure of the - entities, relationships, and attributes in the
database
11CSC228H - File Structures and Data Management
The Relational model
- There is another situation that requires
especial attention, the relational - model does not allow the direct specification
of many to many - relationships, if such relationships are found
within our E-R diagram, - they must be resolved by introducing a new,
intermediate entity. - Suppose that we change our E-R diagram slightly,
so that characters - can now appear in many different cartoons.
Cartoon
Character
cartoon_id name network country
cartoon_id c_name shows
has
appears in
12CSC228H - File Structures and Data Management
The Relational model
- We must resolve the situation by converting the
relationship - appears in into an entity called Appearance.
And specifying the - relationships between this new entity and the
original ones.
Character
Cartoon
cartoon_id name network country
c_name shows
Notice that the Character entity no
longer contains the cartoon_id field as a
foreign key! This is because Cartoon and
Character and no longer directly related (the
parent- child relationship does not exist).
Instead, the primary keys of both Cartoon and
Character are added as foreign keys to
the Appearance entity, and become part of its
primary key.
Makes an
Has an
Appearance
cartoon_id c_name
13CSC228H - File Structures and Data Management
The Relational model
- A table is not allowed to exist unrelated
(otherwise it would not be - possible to navigate between tables and
cross-reference values). - The only exception is the case in which we have
only one entity, - and thus a one-table database.
- We define the cardinality of a table as the
number of rows contained - in it.
- The degree, or arity of a relation is the number
of attributes it has. - A relational database is a group of relations
with different names, - the schema of the database is the set of
schemas for the relations - it contains.
14CSC228H - File Structures and Data Management
The Relational model
- An instance of a relational database is a
collection of tables, one - for each relation that is part of the database.
Relational database instance
Relational database schema
Cartoon(cartoon_idstring, namestring,
networkstring, countrystring)
Character(carton_id string, c_name string,
creator string)
15CSC228H - File Structures and Data Management
The Relational model
- The relational model defines integrity
constraints that ensure the - consistency of data in a database, there are
two integrity rules that - must be enforced
- - Entity Integrity Specifies that for every
instance of an entity, a - primary key must be defined, be unique, and
cannot be null. - - Referential Integrity Specifies that for
every foreign key value, - the value must match a primary key value in
an associated table. - This constraint ensures that we can correctly
cross-reference - information across different tables
16CSC228H - File Structures and Data Management
The Relational model
- Additionally, it is common practice that a DBMS
will enforce attribute - integrity, thus ensuring that every column
inside each table of the - database contains values that are appropriate
to the relevant attribute, - this means validating the following
-
- - Data Type
- - Length
- - Date Format
- - Range
- - Constraints
- - Null Support
- - Default Value
- These constraints must be enforced when the
contents of the database - are changed (i.e. during insertion, deletion,
or updating of data)
17CSC228H - File Structures and Data Management
The relational model
- Finally, it is important to remark that a
relationship between 2 tables is - made by matching the values of the primary key
in one, and the - corresponding secondary key in the other. We
can cross-reference many - tables in this way.
- The next slide shows an example of how primary
and foreign keys are - used to cross-reference information across
several tables to answer - a query. In this case, we want to know which
publisher(s) print books - written by Ernest Hemingway
18CSC228H - File Structures and Data Management
The relational model
Book
Author
Notice that the primary key for Author_title is
the combination of BOTH foreign keys, and that
this table was created to resolve the
many-to-many relationship between books and
authors.
Publisher
Author_title
19CSC228H - File Structures and Data Management
Query languages
- We will now focus on the data manipulation
component of the relational - model. Query languages are design to allow the
user to ask questions - about the contents of the database. There are
two main formal query - languages associated with the relational model
Relational Algebra, - and Relational Calculus.
- Relational Algebra is based on a set of
operators that perform - specific functions on the contents of a table
or group of tables, queries - expressed using relational algebra consist of a
series of steps that - constitute the procedure for computing the
answer to the query.
20CSC228H - File Structures and Data Management
Query languages (continued)
- Relational Calculus expresses queries as
mathematical formulas that - describe the answer to the queries, but not the
way in which the answer - is computed. It will not be discussed here.
Note
- Queries operate on tables. The input to a query
is a set of instances - of relations, and its output is an instance of
a relation (the - schema of the output relation is determined by
the query itself)
21CSC228H - File Structures and Data Management
Relational Algebra
- Expresses queries as a sequence of operators.
Each of these - operators takes as input one or two relation
instances, and produces - a new relation instance as the result.
- A relational algebra expression is defined
recursively as - - a relation
- - a unary operator applied to a single
expression, or - - a binary operator applied to two expressions
- A relational algebra expression describes the
procedure that is followed - to compute the answer to a query, based on the
order in which operators - are evaluated within the query.
22CSC228H - File Structures and Data Management
Relational Algebra
- The relational algebra operators are
- - Selection (?)
- - Projection (?)
- - Union (?)
- - Intersection (?)
- - Set-difference (-)
- - Cross-product (?)
- - Renaming (?)
- - Join ( )
- - Division (/)
23CSC228H - File Structures and Data Management
Relational algebra
We will use the following relation instances
A1
B1
X1
P1
24CSC228H - File Structures and Data Management
Selection
- The selection operator (?) is used to pick
particular rows from a table - base on the value of one or more of its
attributes, for example, we - can select the records for all books with
price less than 15.00 dollars - by using the following expression
?price
B1
The selection operator produces as output a new
relation with the same attributes as B1, but
which contains only those rows that satisfy
the selection criteria
?price
Q1
25CSC228H - File Structures and Data Management
Projection
- The projection operator (?) picks particular
attributes from a table - and produces a new relation containing only the
specified attributes, - we could use it so that the result of our
previous query contains - only the name of the book and its price
?title,price(?price
Q1
Q2
Notice that the projection operator was applied
to the relation instance that results from the
selection operator
?title,price(?price
26CSC228H - File Structures and Data Management
Set Operations
- Relational algebra includes 4 set operations,
the first 3 (union, - intersection, and difference) require that the
relations they are - operating on be union-compatible
- Two relations are union-compatible if they have
the same number - of fields, and if corresponding fields (taken
from left to right in the - the schema definitions) have the same domain.
- For union, intersection, and difference, the
resulting relation will have - the same fields as the first input relation.
27CSC228H - File Structures and Data Management
Set operations
B1
B2
- The union operator (?) takes two relation
instances and produces a - new table that contains the rows that appear
in either of them.
(B1) ? (B2)
28CSC228H - File Structures and Data Management
Set operations
- The intersection operator (?) produces a table
that contains the rows - that appear in both of its input tables.
(B1) ? (B2)
- The expression (R) - (S) produces a table that
contains the rows - that appear in R but not in S.
(B1) - (B2)
29CSC228H - File Structures and Data Management
Set operations
- The expression (R)?(S) produces a relation that
contains all the fields - in R, all the fields in S, and one row for
each possible combination of - a row in R with a row in S. Each row in
(R)?(S) is the concatenation - of a row from R with a row from S.
- Assume we have the following relations
C1
F1
30CSC228H - File Structures and Data Management
Set operations
- The expression (F1)?(C1) would produce the
following relation
Q3
31CSC228H - File Structures and Data Management
Set operations
- The cross-product does not require the relations
to be - union-compatible
- It is possible that the input tables to the
cross-product may contain - attributes with the same name (for example, if
we perform the cross - product of a table with itself), this creates a
naming conflict. In such - situations the resulting fields in the output
relation are in parenthesis, - and are referred to by their position within
the table.
32CSC228H - File Structures and Data Management
Renaming
- The renaming operator (?) is used to resolve
naming conflicts, and - to give names to tables that are the result of
other relational algebra - expressions, the format of this operator is
?(R(F),E) - The operator takes input table E (which is the
result of the evaulation - of some relational algebra expression), and
produces an output - table called R, with the same rows of E, and
the same fields, except - that some fields have been renamed as
specified in the renaming list - F.
- The renaming list contains entries of the type
oldname newname, or - position newname.
33CSC228H - File Structures and Data Management
Renaming
- ?(Training_Plan(2 a_name,instructor prof,
), (F1)?(C1) ) yields
Training_Plan
Notice the name of the output relation, the
second fields name is changed to a_name, and the
attribute previously named instructor, is renamed
to prof
34CSC228H - File Structures and Data Management
Joins
- Are the most common way to combine information
from different - tables, you can think of a join as a
cross-product followed by one - or many selection operations and projections,
however in practice, - the full cross product is rarely computed. The
selection operations - are applied as the answer to the query is being
generated. - Conditional Joins are defined as follows
- R c S ?c(R?S)
- The selection condition may apply to columns of
both R and S, - fields are referenced using the notation
Table.attribute, or - Table.position.
35CSC228H - File Structures and Data Management
Conditional Join example
- F1 C1.instructorJim Phelps C1, would yield
Q4
- Compare this with the complete cross-product of
F1 and C1, also, - note that it is not necessary to compute the
complete cross product - to arrive at this answer. The appropriate
attributes in F1 and C1 are - evaluated against the join conditions, and
only those rows that - meet the conditions are combined.
36CSC228H - File Structures and Data Management
Equijoin
- Is a particular form of join, in which the
selection conditions are of - the form R.nameS.name, that is, a join that
involves only rows that - have matching values between a field in R and a
field in S. Notice that - since the values of R.name and S.name are the
same, one of these - fields is redundant. Hence, the equijoin
operation includes an additional - projection operation to remove the column
S.name from the output - relation.
37CSC228H - File Structures and Data Management
Equijoin example
B1
- B1 B1.publ_idP1.publ_id P1
P1
Q5
Notice that the duplicated publ_id field has been
dropped from the resulting relation.
38CSC228H - File Structures and Data Management
Natural Join
- Is a special case of equijoin in which equality
is required on all - fields that have the same name between R and S.
The previous - slide is in fact an example of a natural join,
since publ_id is the - only field that B1 and P1have in common.
- If the tables have no matching field names, the
result of the natural - join is simply the cross-product of the two.
39CSC228H - File Structures and Data Management
Division
- The division operator takes two relation
instances, one that has exactly - 2 fields, and the other with exactly one field.
It then builds a relation - containing all the values in the first column
of the first table that match - in the second column ALL values of the second
relation.
40CSC228H - File Structures and Data Management
Division example
Completed
Given these relation instances, one which lists
the courses that have been completed by each
agent, and 4 which list the courses that make up
a given level of training, we can use the
division operator to find out which agents have
completed which training levels.
Completed / Level_100 Completed /
Level_200 Completed / Level_300 Completed
/ Advanced
Level_100
Level_300
Level_200
Advanced
41CSC228H - File Structures and Data Management
Relational Algebra examples
A1
B1
X1
P1
42CSC228H - File Structures and Data Management
Relational Algebra examples
- For all books in the database, get the name,
price, and publisher.
?title, price, p_name(B1 B1.publ_idP1.publ_id
P1)
43CSC228H - File Structures and Data Management
- Get the title and price of all books written by
Ernest Hemingway.
X1
Lets do this one by steps, first, notice that
the author data and the book data are not
directly related, instead we have an auxiliary
table called X1. The first step in out process is
to join X1 and A1 to get all the book_id values
for Hemingways books. There are several ways to
do this, for this example, lets use the
following
A1
B1
?(Temp_1,((?last_nameHemingway,
first_nameErnestA1) X1))
Temp_1
44CSC228H - File Structures and Data Management
- Get the title and price of all books written by
Ernest Hemingway (cont.)
Temp_1
Now we must join that with B1 matching the
book_id, and project the fields we want into the
final output.
?title,price(B1 Temp_1)
B1
Result
45CSC228H - File Structures and Data Management
- Get the author, title, publisher, and price of
all books that are less than - 15.00
Lets start by joining P1 and B1, and selecting
those books that have a price of less than 15.00
B1
?(Temp_2, (?price
P1
Temp_2
46CSC228H - File Structures and Data Management
- Get the author, title, publisher, and price of
all books that are less than - 15.00 (cont).
Now, we have to match books to authors
through X1, so we join Temp_2 with X1, and the
result of that we join with A1. Finally, we
project the attributes we are interested in.
Temp_2
?last_name,first_name,title,p_name, price(A1
(X1 Temp_2))
X1
A1
Result