CSC228H File Structures and Data Management - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

CSC228H File Structures and Data Management

Description:

cartoon characters, let's say that there are two entities in this ... Cartoon(cartoon_id:string, name:string, network:string, country:string) ... – PowerPoint PPT presentation

Number of Views:133
Avg rating:3.0/5.0
Slides: 47
Provided by: sciencela
Category:

less

Transcript and Presenter's Notes

Title: CSC228H File Structures and Data Management


1
CSC228H - File Structures and Data Management
The Relational model
  • Proposed in 1970 by E.F. Codd in A Relational
    Model of Data for
  • Large Shared Data Banks, Communications of the
    ACM, Vol. 13,
  • No. 6, June 1970, pp. 377-387
  • It was intended to form the basis for a high
    level data language that
  • would yield maximum independence between the
    organization and
  • representation of data, and the programs that
    use and manipulate
  • this data.
  • Additionally, it provides a basis for checking
    consistency, and for
  • eliminating redundancy within the data

2
CSC228H - File Structures and Data Management
The Relational model
  • There are several components in this model
  • - A set of relations (commonly referred to as
    tables)
  • - A set of operations that act upon and
    manipulate relations
  • - A set of rules for maintaining integrity
    within the database
  • A relation (which is the relational model
    equivalent of an entity
  • in Entity-Relationship diagrams) is a table
    with rows and columns,
  • it has two components a relation schema, and a
    relation instance
  • The schema defines the structure of the
    relation, it describes the
  • relations name, and the names and domains of
    each column
  • (attribute, or field) in the table.

3
CSC228H - File Structures and Data Management
The Relational model
  • As an example, consider a database that stores
    information about
  • cartoon characters, lets say that there are
    two entities in this
  • database, Cartoon, and Character, which can be
    represented
  • in an E-R diagram as follows

Cartoon
Character
cartoon_id name network country
c_name shows
has
appears in
  • Assume for simplicity that the characters names
    are unique, and that
  • each character can appear in only one cartoon
    (hence the one to
  • many relationship between them).

4
CSC228H - File Structures and Data Management
The Relational model
  • From this diagram we can obtain the relation
    schema for the
  • Cartoon entity
  • Cartoon(cartoon_idstring, namestring,
    networkstring, countrystring)
  • The schema specifies what type of data will be
    stored in each
  • column (in this case, all are strings).
  • An instance of this relation would be a table in
    which each row
  • has the same number of fields as the relation
    schema, and fields
  • have the same domains. Rows are referred to as
    tuples.

5
CSC228H - File Structures and Data Management
The Relational model
  • A particular instance of the Cartoon relation
    would look like
  • An instance of a relation must have the
    following properties
  • - Attributes are atomic, they do not represent
    a group of values or an array
  • - Column values are of the same kind
  • - Each row is unique
  • - The order of columns is unimportant
  • - The order of rows is unimportant
  • - Each column must have a unique name

6
CSC228H - File Structures and Data Management
The Relational model
  • Another requirement is that there must be an
    attribute or group of
  • attributes that uniquely identify any row
    within a table. Such a group
  • of attributes is called the primary key of the
    relation
  • Every row in the table must contain a primary
    key (it can not be NULL)
  • If we have parent-child relationships between
    our entities, we must
  • include the primary key of the parent entity in
    all its children within
  • each child, the primary key of the parent
    becomes a foreign key.
  • Foreign keys are used to cross-reference values
    across different
  • relations, and provide a way of checking and
    enforcing integrity
  • within the database.

7
CSC228H - File Structures and Data Management
The Relational model
  • In our previous example, notice that the
    Character entity is a child
  • of the Cartoon entity (the relationship between
    Character and
  • Cartoon defines this).

Cartoon
Character
cartoon_id name network country
c_name shows
has
appears in
  • Notice that from the attributes of both
    entities, there is no way to
  • determine which character appears in which
    cartoon.
  • We have to add the primary key of Cartoon to our
    Character entity
  • as a foreign key.

8
CSC228H - File Structures and Data Management
The Relational model
  • Hence we have the following situation

Cartoon
Character
cartoon_id name network country
cartoon_id c_name shows
has
appears in
  • We can declare the relation schema for the
    Character relation as
  • Now it is possible to determine which characters
    appear in which
  • cartoon by matching the value of the cartoon_id
    attribute between the
  • two relations.

Character(carton_id string, c_name string,
shows string)
9
CSC228H - File Structures and Data Management
The Relational model
  • Our tables look like this now, we can
    cross-reference values between
  • both by matching cartoon_id

Now we can determine that Bart Simpson, Homer
Simpson, and Lisa Simpson appear in The Simpsons
cartoon
10
CSC228H - File Structures and Data Management
The Relational model
  • Notice that c_name is still the primary key of
    the Character relation
  • This is not the only possible outcome of
    introducing a foreign key into
  • a relation
  • - It is possible for the foreign key to become
    part of the primary key
  • for the child relation
  • - It is also possible for the foreign key to
    become the primary key
  • for the child relation
  • The particular situation that will occur depends
    on the structure of the
  • entities, relationships, and attributes in the
    database

11
CSC228H - File Structures and Data Management
The Relational model
  • There is another situation that requires
    especial attention, the relational
  • model does not allow the direct specification
    of many to many
  • relationships, if such relationships are found
    within our E-R diagram,
  • they must be resolved by introducing a new,
    intermediate entity.
  • Suppose that we change our E-R diagram slightly,
    so that characters
  • can now appear in many different cartoons.

Cartoon
Character
cartoon_id name network country
cartoon_id c_name shows
has
appears in
12
CSC228H - File Structures and Data Management
The Relational model
  • We must resolve the situation by converting the
    relationship
  • appears in into an entity called Appearance.
    And specifying the
  • relationships between this new entity and the
    original ones.

Character
Cartoon
cartoon_id name network country
c_name shows
Notice that the Character entity no
longer contains the cartoon_id field as a
foreign key! This is because Cartoon and
Character and no longer directly related (the
parent- child relationship does not exist).
Instead, the primary keys of both Cartoon and
Character are added as foreign keys to
the Appearance entity, and become part of its
primary key.
Makes an
Has an
Appearance
cartoon_id c_name
13
CSC228H - File Structures and Data Management
The Relational model
  • A table is not allowed to exist unrelated
    (otherwise it would not be
  • possible to navigate between tables and
    cross-reference values).
  • The only exception is the case in which we have
    only one entity,
  • and thus a one-table database.
  • We define the cardinality of a table as the
    number of rows contained
  • in it.
  • The degree, or arity of a relation is the number
    of attributes it has.
  • A relational database is a group of relations
    with different names,
  • the schema of the database is the set of
    schemas for the relations
  • it contains.

14
CSC228H - File Structures and Data Management
The Relational model
  • An instance of a relational database is a
    collection of tables, one
  • for each relation that is part of the database.

Relational database instance
Relational database schema
Cartoon(cartoon_idstring, namestring,
networkstring, countrystring)
Character(carton_id string, c_name string,
creator string)
15
CSC228H - File Structures and Data Management
The Relational model
  • The relational model defines integrity
    constraints that ensure the
  • consistency of data in a database, there are
    two integrity rules that
  • must be enforced
  • - Entity Integrity Specifies that for every
    instance of an entity, a
  • primary key must be defined, be unique, and
    cannot be null.
  • - Referential Integrity Specifies that for
    every foreign key value,
  • the value must match a primary key value in
    an associated table.
  • This constraint ensures that we can correctly
    cross-reference
  • information across different tables

16
CSC228H - File Structures and Data Management
The Relational model
  • Additionally, it is common practice that a DBMS
    will enforce attribute
  • integrity, thus ensuring that every column
    inside each table of the
  • database contains values that are appropriate
    to the relevant attribute,
  • this means validating the following
  • - Data Type
  • - Length
  • - Date Format
  • - Range
  • - Constraints
  • - Null Support
  • - Default Value
  • These constraints must be enforced when the
    contents of the database
  • are changed (i.e. during insertion, deletion,
    or updating of data)

17
CSC228H - File Structures and Data Management
The relational model
  • Finally, it is important to remark that a
    relationship between 2 tables is
  • made by matching the values of the primary key
    in one, and the
  • corresponding secondary key in the other. We
    can cross-reference many
  • tables in this way.
  • The next slide shows an example of how primary
    and foreign keys are
  • used to cross-reference information across
    several tables to answer
  • a query. In this case, we want to know which
    publisher(s) print books
  • written by Ernest Hemingway

18
CSC228H - File Structures and Data Management
The relational model
Book
Author
Notice that the primary key for Author_title is
the combination of BOTH foreign keys, and that
this table was created to resolve the
many-to-many relationship between books and
authors.
Publisher
Author_title
19
CSC228H - File Structures and Data Management
Query languages
  • We will now focus on the data manipulation
    component of the relational
  • model. Query languages are design to allow the
    user to ask questions
  • about the contents of the database. There are
    two main formal query
  • languages associated with the relational model
    Relational Algebra,
  • and Relational Calculus.
  • Relational Algebra is based on a set of
    operators that perform
  • specific functions on the contents of a table
    or group of tables, queries
  • expressed using relational algebra consist of a
    series of steps that
  • constitute the procedure for computing the
    answer to the query.

20
CSC228H - File Structures and Data Management
Query languages (continued)
  • Relational Calculus expresses queries as
    mathematical formulas that
  • describe the answer to the queries, but not the
    way in which the answer
  • is computed. It will not be discussed here.

Note
  • Queries operate on tables. The input to a query
    is a set of instances
  • of relations, and its output is an instance of
    a relation (the
  • schema of the output relation is determined by
    the query itself)

21
CSC228H - File Structures and Data Management
Relational Algebra
  • Expresses queries as a sequence of operators.
    Each of these
  • operators takes as input one or two relation
    instances, and produces
  • a new relation instance as the result.
  • A relational algebra expression is defined
    recursively as
  • - a relation
  • - a unary operator applied to a single
    expression, or
  • - a binary operator applied to two expressions
  • A relational algebra expression describes the
    procedure that is followed
  • to compute the answer to a query, based on the
    order in which operators
  • are evaluated within the query.

22
CSC228H - File Structures and Data Management
Relational Algebra
  • The relational algebra operators are
  • - Selection (?)
  • - Projection (?)
  • - Union (?)
  • - Intersection (?)
  • - Set-difference (-)
  • - Cross-product (?)
  • - Renaming (?)
  • - Join ( )
  • - Division (/)

23
CSC228H - File Structures and Data Management
Relational algebra
We will use the following relation instances
A1
B1
X1
P1
24
CSC228H - File Structures and Data Management
Selection
  • The selection operator (?) is used to pick
    particular rows from a table
  • base on the value of one or more of its
    attributes, for example, we
  • can select the records for all books with
    price less than 15.00 dollars
  • by using the following expression
    ?price

B1
The selection operator produces as output a new
relation with the same attributes as B1, but
which contains only those rows that satisfy
the selection criteria
?price
Q1
25
CSC228H - File Structures and Data Management
Projection
  • The projection operator (?) picks particular
    attributes from a table
  • and produces a new relation containing only the
    specified attributes,
  • we could use it so that the result of our
    previous query contains
  • only the name of the book and its price
    ?title,price(?price

Q1
Q2
Notice that the projection operator was applied
to the relation instance that results from the
selection operator
?title,price(?price
26
CSC228H - File Structures and Data Management
Set Operations
  • Relational algebra includes 4 set operations,
    the first 3 (union,
  • intersection, and difference) require that the
    relations they are
  • operating on be union-compatible
  • Two relations are union-compatible if they have
    the same number
  • of fields, and if corresponding fields (taken
    from left to right in the
  • the schema definitions) have the same domain.
  • For union, intersection, and difference, the
    resulting relation will have
  • the same fields as the first input relation.

27
CSC228H - File Structures and Data Management
Set operations
B1
B2
  • The union operator (?) takes two relation
    instances and produces a
  • new table that contains the rows that appear
    in either of them.

(B1) ? (B2)
28
CSC228H - File Structures and Data Management
Set operations
  • The intersection operator (?) produces a table
    that contains the rows
  • that appear in both of its input tables.

(B1) ? (B2)
  • The expression (R) - (S) produces a table that
    contains the rows
  • that appear in R but not in S.

(B1) - (B2)
29
CSC228H - File Structures and Data Management
Set operations
  • The expression (R)?(S) produces a relation that
    contains all the fields
  • in R, all the fields in S, and one row for
    each possible combination of
  • a row in R with a row in S. Each row in
    (R)?(S) is the concatenation
  • of a row from R with a row from S.
  • Assume we have the following relations

C1
F1
30
CSC228H - File Structures and Data Management
Set operations
  • The expression (F1)?(C1) would produce the
    following relation

Q3
31
CSC228H - File Structures and Data Management
Set operations
  • The cross-product does not require the relations
    to be
  • union-compatible
  • It is possible that the input tables to the
    cross-product may contain
  • attributes with the same name (for example, if
    we perform the cross
  • product of a table with itself), this creates a
    naming conflict. In such
  • situations the resulting fields in the output
    relation are in parenthesis,
  • and are referred to by their position within
    the table.

32
CSC228H - File Structures and Data Management
Renaming
  • The renaming operator (?) is used to resolve
    naming conflicts, and
  • to give names to tables that are the result of
    other relational algebra
  • expressions, the format of this operator is
    ?(R(F),E)
  • The operator takes input table E (which is the
    result of the evaulation
  • of some relational algebra expression), and
    produces an output
  • table called R, with the same rows of E, and
    the same fields, except
  • that some fields have been renamed as
    specified in the renaming list
  • F.
  • The renaming list contains entries of the type
    oldname newname, or
  • position newname.

33
CSC228H - File Structures and Data Management
Renaming
  • ?(Training_Plan(2 a_name,instructor prof,
    ), (F1)?(C1) ) yields

Training_Plan
Notice the name of the output relation, the
second fields name is changed to a_name, and the
attribute previously named instructor, is renamed
to prof
34
CSC228H - File Structures and Data Management
Joins
  • Are the most common way to combine information
    from different
  • tables, you can think of a join as a
    cross-product followed by one
  • or many selection operations and projections,
    however in practice,
  • the full cross product is rarely computed. The
    selection operations
  • are applied as the answer to the query is being
    generated.
  • Conditional Joins are defined as follows
  • R c S ?c(R?S)
  • The selection condition may apply to columns of
    both R and S,
  • fields are referenced using the notation
    Table.attribute, or
  • Table.position.

35
CSC228H - File Structures and Data Management
Conditional Join example
  • F1 C1.instructorJim Phelps C1, would yield

Q4
  • Compare this with the complete cross-product of
    F1 and C1, also,
  • note that it is not necessary to compute the
    complete cross product
  • to arrive at this answer. The appropriate
    attributes in F1 and C1 are
  • evaluated against the join conditions, and
    only those rows that
  • meet the conditions are combined.

36
CSC228H - File Structures and Data Management
Equijoin
  • Is a particular form of join, in which the
    selection conditions are of
  • the form R.nameS.name, that is, a join that
    involves only rows that
  • have matching values between a field in R and a
    field in S. Notice that
  • since the values of R.name and S.name are the
    same, one of these
  • fields is redundant. Hence, the equijoin
    operation includes an additional
  • projection operation to remove the column
    S.name from the output
  • relation.

37
CSC228H - File Structures and Data Management
Equijoin example
B1
  • B1 B1.publ_idP1.publ_id P1

P1
Q5
Notice that the duplicated publ_id field has been
dropped from the resulting relation.
38
CSC228H - File Structures and Data Management
Natural Join
  • Is a special case of equijoin in which equality
    is required on all
  • fields that have the same name between R and S.
    The previous
  • slide is in fact an example of a natural join,
    since publ_id is the
  • only field that B1 and P1have in common.
  • If the tables have no matching field names, the
    result of the natural
  • join is simply the cross-product of the two.

39
CSC228H - File Structures and Data Management
Division
  • The division operator takes two relation
    instances, one that has exactly
  • 2 fields, and the other with exactly one field.
    It then builds a relation
  • containing all the values in the first column
    of the first table that match
  • in the second column ALL values of the second
    relation.

40
CSC228H - File Structures and Data Management
Division example
Completed
Given these relation instances, one which lists
the courses that have been completed by each
agent, and 4 which list the courses that make up
a given level of training, we can use the
division operator to find out which agents have
completed which training levels.
Completed / Level_100 Completed /
Level_200 Completed / Level_300 Completed
/ Advanced
Level_100
Level_300
Level_200
Advanced
41
CSC228H - File Structures and Data Management
Relational Algebra examples
A1
B1
X1
P1
42
CSC228H - File Structures and Data Management
Relational Algebra examples
  • For all books in the database, get the name,
    price, and publisher.

?title, price, p_name(B1 B1.publ_idP1.publ_id
P1)
43
CSC228H - File Structures and Data Management
  • Get the title and price of all books written by
    Ernest Hemingway.

X1
Lets do this one by steps, first, notice that
the author data and the book data are not
directly related, instead we have an auxiliary
table called X1. The first step in out process is
to join X1 and A1 to get all the book_id values
for Hemingways books. There are several ways to
do this, for this example, lets use the
following
A1
B1
?(Temp_1,((?last_nameHemingway,
first_nameErnestA1) X1))
Temp_1
44
CSC228H - File Structures and Data Management
  • Get the title and price of all books written by
    Ernest Hemingway (cont.)

Temp_1
Now we must join that with B1 matching the
book_id, and project the fields we want into the
final output.
?title,price(B1 Temp_1)
B1
Result
45
CSC228H - File Structures and Data Management
  • Get the author, title, publisher, and price of
    all books that are less than
  • 15.00

Lets start by joining P1 and B1, and selecting
those books that have a price of less than 15.00
B1
?(Temp_2, (?price
P1
Temp_2
46
CSC228H - File Structures and Data Management
  • Get the author, title, publisher, and price of
    all books that are less than
  • 15.00 (cont).

Now, we have to match books to authors
through X1, so we join Temp_2 with X1, and the
result of that we join with A1. Finally, we
project the attributes we are interested in.
Temp_2
?last_name,first_name,title,p_name, price(A1
(X1 Temp_2))
X1
A1
Result
Write a Comment
User Comments (0)
About PowerShow.com