Title: Relational Model
1Relational Model Algebra
- Zachary G. Ives
- University of Pennsylvania
- CIS 550 Database Information Systems
- October 31, 2013
Some slide content courtesy of Susan Davidson
Raghu Ramakrishnan
2Recall Our Initial Discussion
- There are a variety of ways of representing data,
each with trade-offs - Free text often need a human
- Shapes/points in space
-
- Objects with properties
- In general, our emphasis will be on the last item
- though there are spatial databases, OO
databases, text databases, and the like
3The Relational Data Model (1970)
- Lessons from the Codd paper
- Lets separate physical implementation from
logical - Model the data independently from how it will be
used (accessed, printed, etc.) - Describe the data minimally and mathematically
- A relation describes an association between data
items tuples with attributes - We generally think of tables and rows, but thats
somewhat imprecise - Use standard mathematical (logical) operations
over the data these are the relational algebra
or relational calculus - How does this model relate to objects,
properties? What are its abilities and
limitations?
4Why Did It Take So Many Years to Implement
Relational Databases?
- Codds original work 1969-70
- Earliest relational database research 1976
- Oracle 2.0 1979
- Why the gap?
- You could do the same thing in other ways
- Nobody wants to write math formulas
- Why would I turn my data into tables?
- It wont perform well
- What do you think?
5Getting More Concrete Buildinga Database and
Application
- Start with a conceptual model
- On paper using certain techniques well discuss
next week - We ignore low-level details focus on logical
representation - Design implement schema
- Design and codify (in SQL) the relations/tables
- Do physical layout indexes, etc.
- Import the data
- Write applications using DBMS and other tools
- Many of the hard problems are taken care of by
other people (DBMS, API writers, library authors,
web server, etc.)
6Conceptual Design for CIS Student Course Survey
Whos taking what, and what grade do they
expect?
PROFESSOR
fid
name
This design is independent ofthe final form of
the report!
Teaches
Takes
STUDENT
COURSE
cid
name
semester
sid
name
exp-grade
7Example Schema
STUDENT
COURSE
Takes
- Our focus now relational schema set of tables
- Can have other kinds of schemas XML, object,
cid subj sem
550-0109 DB F09
520-1009 AI S09
501-0109 Arch F09
sid name
1 Jill
2 Qun
3 Nitin
sid exp-grade cid
1 A 550-0109
1 A 520-1009
3 C 500-0109
PROFESSOR
Teaches
fid name
1 Ives
2 Taskar
8 Martin
fid cid
1 550-0109
2 520-1009
8 501-0109
8Some Terminology
- Columns of a relation are called attributes or
fields - The number of these columns is the arity of the
relation - The rows of a relation are called tuples
- Each attribute has values taken from a domain,
e.g., subj has domain string - Theoretically a relation is a set of tuples no
tuple can occur more than once - Real systems may allow duplicates for efficiency
or other reasons well ignore this for now - Objects and XML may also have the same content
with different identity
9Describing Relations
- A schema can be represented many ways
- In relational DBs, we use relation(attributedomai
n) - To the DBMS, use data definition language (DDL)
like programming language type definitions
STUDENT(sidint, namestring) Takes(sidint,
exp-gradechar2, cidstring) COURSE(cidstring,
subjstring, semchar3) Teaches(fidint,
cidstring) PROFESSOR(fidint, namestring)
10More on Attribute Domains
- Relational DBMSs have very limited built-in
domains either tables or scalar attributes
int, string, byte sequence, date, etc. - But more generally
- We can have nested relations
- Object-oriented, object-relational systems allow
complex, user-defined domains lists, classes,
etc. - XML systems allow for XML trees (or lists of
trees) that follow certain structural constraints - Database people, when they are discussing design,
often assume domains are evident to the
readerSTUDENT(sid, name)
11Integrity Constraints
- Domains and schemas are one form of constraint on
a valid data instance - Other important constraints include
- Key constraints
- Subset of fields that uniquely identifies a
tuple, and for which no subset of the key has
this property - May have several candidate keys one is chosen as
the primary key - A superkey is a subset of fields that includes a
key - Inclusion dependencies (referential integrity
constraints) - A field in one relation may refer to a tuple in
another relation by including its key - The referenced tuple must exist in the other
relation for the database instance to be valid
12SQL Structured Query Language
- The standard language for relational data
- Invented by folks at IBM, esp. Don Chamberlin
- Actually not a particularly elegant language
- Beat a more elegant competing standard, QUEL,
from Berkeley - Separated into a DML (data manipulation language)
DDL - DML based on relational algebra (mostly)
calculus, which we discuss this week - Later well see how its embedded in a host
language
13Table DefinitionSQL-92 DDL and Constraints
CREATE TABLE Takes (sid INTEGER, exp-grade
CHAR(2), cid STRING(8), PRIMARY KEY (sid,
cid), FOREIGN KEY (sid) REFERENCES
STUDENT, FOREIGN KEY (cid) REFERENCES COURSE
)
CREATE TABLE STUDENT (sid INTEGER, name
CHAR(20), )
14Example Data Instance
STUDENT
COURSE
Takes
cid subj sem
550-0109 DB F09
520-1009 AI S09
501-0109 Arch F09
sid name
1 Jill
2 Qun
3 Nitin
sid exp-grade cid
1 A 550-0109
1 A 520-1009
3 C 501-0109
PROFESSOR
Teaches
fid cid
1 550-0109
2 700-1009
8 501-0109
fid name
1 Ives
2 Taskar
8 Martin
15From Tables ? SQL ? Web Application
- lthtmlgt
- ltbodygt
- lt!-- hypotheticalEmbeddedSQL
- SELECT FROM STUDENT, Takes, COURSE
- WHERE STUDENT.sid Takes.sID
- AND Takes.cID cid
- --gt
- lt/bodygt
- lt/htmlgt
C -gt machine code sequence -gt microprocessor Java
-gt bytecode sequence -gt JVM SQL -gt relational
algebra expression -gt query execution engine
16Codds Relational Algebra
- A set of mathematical operators that compose,
modify, and combine tuples within different
relations - Relational algebra operations operate on
relations and produce relations (closure) - f Relation ? Relation f Relation x Relation ?
Relation
17Codds Logical Operations The Relational Algebra
- Six basic operations
- Projection ?? (R)
- Selection ?? (R)
- Union R1 R2
- Difference R1 R2
- Product R1 R2
- (Rename) ???b (R)
- And some other useful ones
- Join R1 ?? R2
- Semijoin R1 ?? R2
- Intersection R1 Å R2
- Division R1 R2
18Data Instance for Operator Examples
STUDENT
COURSE
Takes
cid subj sem
550-0109 DB F09
520-1009 AI S09
501-0109 Arch F09
sid name
1 Jill
2 Qun
3 Nitin
4 Marty
sid exp-grade cid
1 A 550-0109
1 A 520-1009
3 A 520-1009
3 C 501-0109
4 C 501-0109
PROFESSOR
Teaches
fid name
1 Ives
2 Taskar
8 Martin
fid cid
1 550-0109
2 520-1009
8 501-0109
19Projection, ??
20Selection, ??
21Product X
22Join, ?? A Combination of Productand Selection
23Union ?
24Difference
25Rename, ?a?b
- The rename operator can be expressed several
ways - The book has a very odd definition thats not
algebraic - An alternate definition
- ?a?b(x) Takes the relation with schema
? Returns a relation with the attribute list ? - Rename isnt all that useful, except if you join
a relation with itself - Why would it be useful here?
26Mini-Quiz
- This completes the basic operations of the
relational algebra. We shall soon find out in
what sense this is an adequate set of operations.
- Try writing queries for these
- The names of students named Bob
- The names of students expecting an A
- The names of students in Milo Martins 501 class
- The sids and names of students not enrolled
27Deriving Intersection
- Intersection as with set operations, derivable
from difference
A Å B
(A B) (A B) (B A) (A B) (B A)
A-B
B-A
A B
28Division
- A somewhat messy operation that can be expressed
in terms of the operations we have already
defined - Used to express queries such as The fid's of
faculty who have taught all subjects - Paraphrased The fids of professors for which
there does not exist a subject that they havent
taught
29Division R1 R2
- Requirement schema(R1) ¾ schema(R2)
- Result schema schema(R1) schema(R2)
- Professors who have taught all courses
- What about Courses that have been taught by all
faculty?
?fid (?fid,subj(Teaches ? COURSE) ?subj(COURSE))
30Division Using Our Existing Operators
- All possible teaching assignments Allpairs
- NotTaught, all (fid,subj) pairs for which
professor fid has not taught subj - Answer is all faculty not in NotTaught
?fid,subj (PROFESSOR ?subj(COURSE))
Allpairs - ?fid,subj(Teaches ? COURSE)
?fid(PROFESSOR) - ?fid(NotTaught)
?fid(PROFESSOR) - ?fid(
?fid,subj (PROFESSOR ?subj(COURSE)) -
?fid,subj(Teaches ? COURSE))
31The Big Picture SQL to Algebra toQuery Plan to
Web Page
Web Server / UI / etc
Query Plan anoperator tree
Execution Engine
Optimizer
Storage Subsystem
SELECT FROM STUDENT, Takes, COURSE
WHERE STUDENT.sid Takes.sID AND
Takes.cID cid
32Hint of Future Things OptimizationIs Based on
Algebraic Equivalences
- Relational algebra has laws of commutativity,
associativity, etc. that imply certain
expressions are equivalent in semantics - They may be different in cost of evaluation!
?c Ç d(R) ?c(R) ?d(R)
?c (R1 R2) R1 ?c R2
?c Ç d (R) ?c (?d (R))
- Query optimization finds the most efficient
representation to evaluate (or one thats not bad)
33Next Time An Equivalent, ButVery Different,
Formalism
- Codd invented a relational calculus that he
proved was equivalent in expressiveness - Based on a subset of first-order logic
declarative, without an implicit order of
evaluation - More convenient for describing certain things,
and for certain kinds of manipulations - And, in fact, the basis of SQL!