Title: CS 405G: Introduction to Database Systems
1CS 405G Introduction to Database Systems
- Lecture 4 Relational Model
- Instructor Chen Qian
2Review
- A data model is
- a group of concepts for describing data.
- What are the two terms used by ER model to
describe a miniworld? - Entity
- Relationship
- What makes a good conceptual database design
?
10/22/2015
2
2
10/22/2015
3Todays Outline
- Relational Model
- Relational Model and Relational Database Schemas
- Informal definition, not so formal, and formal
- Relational Model Constraints
4Why Study the Relational Model?
- Most widely used model.
- Legacy systems in older models
- e.g., IBMs IMS
- Object-oriented concepts merged in
- Object-Relational model
- Early work done in POSTGRES research project at
Berkeley - XML features in most relational systems
- Can export XML interfaces
- Can embed XML inside relational fields
5Historically
- The model was first proposed by Dr. E.F. Codd of
IBM in 1970 in the following paper"A Relational
Model for Large Shared Data Banks,"
Communications of the ACM, June 1970. - The above paper caused a major revolution in the
field of Database management and earned Ted Codd
the coveted ACM Turing Award.
The picture is from wikipedia
6Database Design
7Relational Model Concepts
- Relational database a set of relations.
- Relation made up of 2 parts
- Schema specifies name of relation, plus the
name and type of each attribute. - E.g. Students(sid string, name string, login
string, age integer, gpa real) - Instance a table, with rows and columns.
- rows cardinality
- fields degree / arity
8Relation
- RELATION A table of values
- A relation may be thought of as a set of rows
(table view). - Each row represents a fact that corresponds to a
real-world entity or relationship. - Each row has a value of an item or set of items
that uniquely identifies that row in the table.
9Relation
- Sometimes row-ids or sequential numbers are
assigned to identify the rows in the table. - A relation may alternately be thought of as a set
of columns (schema view). - Each column typically is called by its column
name or column header or attribute name.
10A (Slightly) Formal Definition
- A database is a collection of relations (or
tables) - Each relation is identified by a name and a list
of attributes (or columns) - Each attribute has a name and a domain (or type)
- Such as SIDstring
- Set-valued attributes not allowed
- Simplicity is a virtue!
11Schemas
- Relation schema relation name attributes
types of attributes - in order
- Example Beers(name, manf) or Beers(name string,
manf string) - Database collection of relations.
- Database schema set of all relation schemas in
the database.
12Schema versus instance
- Schema (metadata)
- Students(sid string, name string, login
string, age integer, gpa real) - Specification of how data is to be structured
logically - Defined at set-up Rarely changes
- Instance
- Content
- Changes rapidly, but always conforms to the
schema - Compare to type and objects of type in a
programming language - Entity and entity type?
13Example
- Schema
- Student (SID integer, name string, age integer,
GPA float) - Course (CID string, title string)
- Enroll (SID integer, CID integer)
- Instance
- 142, Amy, 20, 3.3, 123, Bob, 22, 3.1, ...
- CS405G, Intro. to Database Systems, ...
- 142, CS405G, 142, CS314, ...
14Formal Definition (Set Theory)
- Formally, given sets D1, D2, . Dn a relation r
is a subset of D1 x D2 x x Dn - x Cartesian product
- For sets A and B, the Cartesian product A B is
the set of all ordered pairs (a, b) where a ? A
and b ? B. - Thus, a relation is a set of n-tuples (a1, a2, ,
an) where each ai ? Di
15Example
- Example If
- customer_name Jones, Smith, Curry, Lindsay,
- customer_street Main, North, Park,
- customer_city Harrison, Rye, Pittsfield,
- Then r (Jones, Main, Harrison), (Smith,
North, Rye), (Curry, North, Rye),
(Lindsay, Park, Pittsfield) is a relation
over - customer_name customer_street customer_city
16Attribute Types
- Each attribute of a relation has a name,
designating the role of the attribute - The set of allowed values for each attribute is
called the domain of the attribute - Attribute values (domain members) are required to
be atomic that is, indivisible - E.g. the value of an attribute can be an account
number, but cannot be a set of account numbers - Domain is said to be atomic if all its members
are atomic - The special value null is a member of every
domain
17Relation Schema
- A1, A2, , An are attributes
- R (A1, A2, , An ) is a relation schema
- Example
- Customer_schema (customer_name,
customer_street, customer_city) - r(R) denotes a relation r on the relation schema
R - Example
- customer (Customer_schema)
18Relation Instance
- The current values (relation instance) of a
relation are specified by a table - An element t of r is a tuple, represented by a
row in a table
attributes (or columns)
customer_name
customer_street
customer_city
Jones Smith Curry Lindsay
Main North North Park
Harrison Rye Rye Pittsfield
tuples (or rows)
customer
19Definition Summary
Informal Terms Formal Terms
Table Relation
Column Attribute/Domain
Row Tuple
Values in a column Domain
Table Definition Schema of a Relation
Populated Table Extension
20Characteristics of Relation
- The tuples in a ration r(R) are not considered to
be ordered, even though they appear to be in the
tabular form. - We consider the attributes in R(A1, A2, ..., An)
and the values in tltv1, v2, ..., vngt to be
ordered . -
- All values are considered atomic (indivisible).
A special null value is used to represent values
that are unknown or inapplicable to certain
tuples.
21Characteristics of Relation
- Notation we refer to component values of a tuple
t by tAi vi (the value of attribute Ai for
tuple t). -
- Similarly, tAu, Av, ..., Aw refers to the
subtuple of t containing the values of attributes
Au, Av, ..., Aw, respectively.
22Relational Integrity Constraints
- Integrity Constraints are conditions that must
hold on all valid relation instances. - There are four main types of constraints
- Domain constraints
- The value of a attribute must come from its
domain - Key constraints
- Entity integrity constraints
- Referential integrity constraints
23Primary Key Constraints
- A set of fields is a candidate key (abbreviated
as key) for a relation if - 1. No two distinct tuples can have same values in
all key fields, and - 2. Property 1 is not true for any subset of the
key. - What if Part 2 is false? A super key a set of
fields that contains a key. - If there are multiple keys for a relation, one of
the keys is chosen (by DBA) to be the primary key.
24Key Example
- E.g., given a schema Student(sid string, name
string, gpa float) we have - sid is a key for Students. (What about name?)
The set sid, gpa is a superkey. - CAR (licence_num string, Engine_serial_num
string, make string, model string, year
integer) - What is the candidate key(s)
- Which one you may use as a primary key
- What are the super keys
25Entity Integrity
- Entity Integrity The primary key attributes (PK)
of each relation schema R cannot have null values
in any tuple of r(R). - Other attributes of R may be similarly
constrained to disallow null values, even though
they are not members of the primary key.
26Foreign Keys, Referential Integrity
- Foreign key Set of fields in one relation that
is used to refer to a tuple in another
relation. (Must correspond to primary key of the
second relation.) Like a logical pointer. - Foreign key constraint The foreign key in the
referencing relation must match the primary key
of the referenced relation. - E.g. sid is a foreign key referring to Students
- Student(sid string, name string, gpa float)
- Enrolled(sid string, cid string, grade string)
- If all foreign key constraints are enforced,
referential integrity is achieved, i.e., no
dangling references.
27Foreign Key constraints
- Only students listed in the Students relation
should be allowed to enroll for courses.
Enrolled
Students
- Possible violation Add lt50000, History105, Bgt to
Enrolled. - Possible violation delete lt53650, Smith, gt from
Students.
28Other Types of Constraints
- Semantic Integrity Constraints
- based on application semantics and cannot be
expressed by the model per se - e.g., the max. no. of hours per employee for all
projects he or she works on is 56 hrs per week - A constraint specification language may have to
be used to express these - SQL-99 allows triggers and ASSERTIONS to allow
for some of these
29Next class
- Logical design of databases using the relational
model.
30