Title: Database Theory
1Database Theory
2Outline
- Basic Concepts
- Database and Database Users (Chapter 1)
- Database System Concepts and Architecture
(Chapter 2) - Database Design
- Database Design Process (Chapter 16)
- Entity-Relationship (ER) Modeling (Chapter3)
- Functional Dependencies and Normalization for
Relational Database (Chapter 14) - Relational Design Algorithms (Chapter 15)
- Relational Data Model Mapping (Chapter 9)
- Relational Database
- The Relational Data Model (Chapter 7)
- Relational Algebra (Chapter 7)
- SQL A Relational Database Language (Chapter 8)
- Relational Calculus (Chapter 9)
- Database Implementation
- Transaction Processing (Chapter 19)
- Concurrency Control (Chapter 20)
- Database Recovery (Chapter 21)
- Advanced Topics
3Database and Database Users
- Basic Concepts
- Main Characteristics of Database Technology
- Classes of Database Users
- Additional Database Characteristics
- When not to use a DBMS
4Basic Concepts
- Database
- A collection of related data.
- Data
- Known facts that can be recorded and have
implicit meaning. - Mini-world
- Some part of the real world about which data is
stored in database. - Database Management System (DBMS)
- A software package to facilitate the creation and
maintenance of a computerized database. - Database System
- The DBMS software together with the data itself.
5Main Characteristics of Database Technology
- Self-contained nature of a database system
- A DBMS catalog stores the description
(meta-data) of the database. This allows the DBMS
software to work with different databases. - Insulations between program and data
- Data abstractions
- A data model is used to hide storage details and
present the user with a conceptual view of the
database. - Program-data independence
- Allows changing data storage structures without
having to change the DBMS access programs. - Program-operation independence
- Allows changing operation implementation without
having to change the DBMS access programs. - Support of multiple views of data
6Additional Characteristics of Database Technology
- Controlling data redundancy
- Restricting unauthorized access to data.
- Providing persistent storage for program objects
and data structure. - Providing multiple interfaces to different
classes of users. - Representing complex relationships among data.
- Enforcing integrity constraints on the database.
- Providing backup and recovery services.
- Potential for enforcing standards.
- Flexibility to change data structures.
- Reduced application development time.
- Availability of up-to-date information.
- Economies of scale.
7Classes of Database Users
- Workers on the scene Persons whose job involves
daily use of a large database. - Database administrators (DBAs) Responsible for
managing the database system. - Database designers Responsible for designing
the database. - End users Access the database for querying ,
updating , generating reports, etc. - Casual end users Occasional users.
- Parametric (or naive) end users They use
pre-programmed canned transactions to interact
continuously with the database. For example, bank
tellers or reservation clerks. - Sophisticated end users Use full DBMS
capabilities for implementing complex
applications. - System Analysts/Application programmers Design
and implement canned transactions for Parametric
users. - Workers behind the scene Persons whose job
involves design , development , operation, and
maintenance of the DBMS software and system
environment. - DBMS designers and implementers Design and
implement the DBMS software package itself. - Tool developers Design and implement tools that
facilitate the use of DBMS software. Tools
include design tools , performance tools ,
special interfaces , etc. - Operators and maintenance personnel Work on
running and maintaining the hardware and software
environment for the database system.
8When not to Use a DBMS
- Main costs of using a DBMS
- High initial investment and possible need for
additional hardware. - Overhead for providing generality, security,
recovery, integrity, and concurrency control. - When DBMS may be unnecessary
- If the database and applications are simple, well
defined and not expected to change. - If there are stringent real-time requirements
that may not be met because of the DBMS overhead - If access to data by multiple users is not
required.
9Database System Concepts and Architecture
- Data Models
- Three-Schema Architecture
- Data Independence
- DBMS Languages
- DBMS Interfaces
- DBMS Architecture
- Database System Utilities
- Classification of DBMS
10Data Models
- Data Model
- A set of concepts to describe the structure of a
database, and certain constraints that the
database should obey. - Data Model Operations
- Operations for specifying database retrievals and
updates by referring to the concepts of data
model. - Categories of data models
- Conceptual (high-level, semantic) data models
Provide concepts that are close to the way many
users perceive data. (Also called entity-based or
object-based data models) - Physical (low-level, internal) data models
Provide concepts that describe the details of how
data is stored in the computer. - Implementation (record-oriented) data models
Provide concepts that fall between above two,
balancing user views with some computer storage
details.
11Data Models
- Database Schema
- The description of database. Includes description
of database structure and the constraints that
should hold on the database. - Database catalog
- Stores database schema.
- Schema Diagram
- A diagrammatic display of ( some aspects of) a
database schema. - Database Instance
- The actual data stored in a database at a
particular moment in time. Also called database
state (or occurrence) - The database schema changes very infrequently.
The database state changes every time the
database is updated. Schema is also called
intension, whereas the state is called extension.
12Three Schema Architecture
- Internal schema at the internal level to describe
data storage structures and access paths.
Typically uses a physical data model. - Conceptual schema at the conceptual level to
describe the structure and constraints for the
whole database. Uses a conceptual or an
implementation data model . - External schemas at the external level to
describe the various user views. Usually uses the
same data model as the conceptual level. - Mappings transform requests and results between
levels.
13Database System Architecture
External View
External View
External Level
external/conceptual mapping
Conceptual Schema
Conceptual Level
conceptual/internal Mapping
Internal Schema
Internal Level
Stored Databases
14Data Independence
- Logical Data Independence
- The capacity to change the conceptual schema
without having to change the external schemas and
their application programs. - Physical Data Independence
- The capacity to change the internal schema
without having to change the conceptual schema. - When a schema at a lower level is changed, only
the mappings between this schema and higher level
schemas need to be changed in a DBMS that fully
supports data independence. - Mappings create overhead
15Database System Languages
- Data Definition Language (DDL)
- Used by the DBA and database designers to
specify the conceptual schema of a database. In
many DBMSs, the DDL is also used to define
internal and external schemas(views). In some
DBMSs, separate storage definition language (SDL)
and view definition language (VDL) are used to
define internal and external schemas. - Data Manipulation Language (DML)
- Used to specify database retrievals and updates.
- High-level (nonprocedural) DML can be used on its
own to specify database operations. - Low-level (procedural) DML retrieves a record at
a time and must be embedded in a general-purpose
programming language. - When DML is embedded in a general-purpose
programming language (host language), it is
called data sublanguage. - When DML is used in a stand-alone interactive
manner, it is called query language
16DBMS Interfaces
- Stand-alone query language interfaces.
- Programmer interfaces for embedding DML in
programming languages - Pre-compiler Approach
- Procedure (Subroutine) Call Approach
- Menu-based
- Graphic-based
- Forms-based
- Natural language
- Combination of above
- Parametric interfaces using function keys
- Report generation languages
- Interfaces for DBA
- Creating accounts, granting authorizations
- Setting system parameters
- Changing schemas or access path
17Database System Utilities
- Loading data stored in files into a database.
- Backing up the database periodically on tape.
- Reorganizing database file structures.
- Generating Report.
- Performance monitoring.
- Sorting files.
- User monitoring.
- Data compression.
18Classification of DBMSs
- Based on the data model used
- Relational
- Multidimensional
- Network
- Hierarchical.
- Object-oriented
- Semantic
- Entity-Relationship
- Other Classifications
- Single-user vs. multi-user
- Centralized vs. distributed
- Homogeneous vs. Heterogeneous
- OLTP vs. OLAP
19Database Design
- Goals of Database Design
- Satisfy the information content requirements of
the specified users and applications - Provide natural and easy-to-understand
structuring of information - Support processing requirements and any
performance objectives - Database Design Process
- Requirement collection and analysis
- Conceptual database design
- Choice of DBMS
- Data model mapping (Logical database design)
- Physical database design
- Database system implementation and tuning
20Requirement Collection and Analysis
- The major application areas and user groups that
will use the database or whose work will be
affected by it are identified. Key individuals
and committees within each group are chosen to
carry out subsequent steps of requirement
analysis - Existing documentations concerning the
applications is studied and analyzed. - The current operating environment and planned use
of the information is studied. - Written responses to sets of questions are
sometimes collected from potential users or user
groups. Key individuals may be interviewed to
help in assessing the worth of information and in
setting up of priorities.
21Conceptual Database Design
- Conceptual Schema Design
- Choice of high-level conceptual data model such
as ER model and dimensional model - Approaches to conceptual schema design
- centralized schema design approach
- view integration approach
- Strategies for conceptual schema design
- top-down strategy
- bottom-up strategy
- inside-out strategy
- mixed strategy
- Transaction Design
22Physical Database Design
- Criteria for guiding the physical database design
- Response time
- Space utilization
- Transaction throughput
- Physical database design in relational database
- Factors that influent the physical database
design - Analyzing the database queries and transactions
- Analyzing the expected frequencies of invocation
of queries and transactions - Analyzing the time constraints for queries and
transactions - Analyzing the expected frequencies of update
operations - Analyzing the uniqueness constraints on
attributes - Physical database design decisions
- Indexing
- De-normalization
- Storage design
23Database Tuning in Relational Database
- Goals
- Make application run fast
- lower the response time of queries and
transactions - improve the overall throughput of transactions
- Tuning indexes
- Some queries may take too long for lack of an
index - Some indexes may not get utilized
- Some indexes may causing excessive overhead
- Tuning database design
- De-normalization
- Table partition
- Duplicate attributes
- Tuning queries
24Automated Design Tools
- Database Design Tools
- Erwin
- Rational Rose
- Power Designer
- Schema Diagram Notation
- UML (Unified Modeling Language)
- IDEF1X (Integration Definition for Information
Modeling) - IE (Information Engineering)
- CHEN's ERD Notation
25Entity-Relationship (ER) Modeling
- Example Database Application (COMPANY)
- ER Model Concepts
- Entities and Attributes
- Entity Types, Value Sets, and Key Attributes
- Relationships and Relationship Types
- Structural Constraints and Roles
- Weak Entity Types
- ER Diagrams Notation
- Relationships of Higher Degree
- Enhanced ER Modeling
26Example of COMPANY Database
- Requirements for the COMPANY Database
- The company is organized into departments. Each
department has a name, number, and a employee who
manages the department. We keep track of the
start date of the department manager. A
department may have several locations. - Each department controls a number of projects.
Each project has a name, number, and is located
at a single location. - We store each employee's social security number,
address, salary, sex and birth date. Each
employee works for one department but may work on
several projects. We keep track of the number of
hours per week that an employee currently works
on each project. We also keep track of the direct
supervisor of each employee. - each employee may have a number of dependents.
For each dependent, we keep their name, sex,
birth date, and relationship to the employee.
27ER Model Concepts Entities and Attributes
- Entities
- Entities are specific objects or things in the
mini-world that are represented in the database
for example, the EMPLOYEE John Smith, the
Research DEPARTMENT, the ProductX PROJECT. - Attributes
- Attributes are properties used to describe an
entity for example, an EMPLOYEE entity may have
a Name, SSN, Address, Sex, BirthDate. A specific
entity will have a value for each of its
attributes for example a specific employee
entity may have Name 'John Smith', SSN
'123456789', Address '731 Fondren , Houston,
TX', Sex 'M', BirthDate '09-JAN-55'. - Attribute Types
- Simple each entity has a single atomic value for
the attribute for example SSN or Sex. - Composite Attribute may be composed of several
components for example Name(FirstName,
MiddleName, LastName). Composition may form a
hierarchy where some components are themselves
composite. - Multi-Valued An entity may have multiple values
for that attribute for example Color of a CAR or
PreviousDegrees of a STUDENT. Denoted as Color
or PreviousDegrees.
28ER Model Concept Entity Types and Key Attributes
- Entity Type
- Entity type defines a set of entities that have
the same attributes. For example, the EMPLOYEE
entity type or the PROJECT entity type. - Key Attribute
- An attribute of an entity type for which each
entity must have a unique value is called a key
attribute of the entity type. For example, SSN of
EMPLOYEE. - A key attribute may be composite. For example,
VehicleRegistrationNumber is a key of the CAR
entity type with components(Number, State). - An entity type may have more than one key. For
example, the CAR entity type may have two keys
VehicleIdentificationNumber and
VehicleRegistrationNumber(Number, State). - Domains (Value Sets) of Attributes
- Each simple attribute of an entity type is
associates with a domain, which specifies the set
of values the may be assigned to that attribute
for each individual entity.
29ER Model Concepts Relationships and Relationship
Types
- Relationship
- A relationship relates two or more distinct
entities with a specific meaning for example,
EMPLOYEE John Smith works on the ProductX PROJECT
or EMPLOYEE Franklin Wong manages the Research
DEPARTMENT. - Relationship Type
- Relationship of the same type are grouped or
typed into a relationship type. For example, the
WORKS_ON relationship type in which EMPLOYEEs and
PROJECTs participate, or the MANAGEs relationship
type in which EMPLOYEEs and DEPARTMENTs
participate. - More than one relationship type can exist with
the same participating entity types for example,
MANAGES and WORKS_FOR are distinct relationships
between EMPLOYEE and DEPARTMENT participate. - The degree of a relationship type
- The degree of a relationship type is the number
of participating entity types. binary
relationships, ternary relationship, n-ary
relationship
30ER Model Concepts Structural Constraints and
roles
- A relationship can relate two entities of the
same entity type for example, a SUPERVISION
relationship type relates one EMPLOYEE ( in the
role of supervisee) to another EMPLOYEE ( in the
role of supervisor). This is called a recursive
relationship type. - A relationship type can have attributes for
example, HoursPerWeek of WORKS_ON its value for
each relationship instance describes the number
of hours per week that an EMPLOYEE works on a
PROJECT. - Structural constraints on relationships
- Cardinality ratio ( of a binary relationship)
11, 1N, N1, or MN. - Participation constraint (on each participating
entity type) total (called existence dependency)
or partial.
31ER Model Concepts Weak Entity Types
- An entity type that does not have a key attribute
- A weak entity type must participate in an
identifying relationship type with an owner or
identifying entity type - Entities are identified by the combination of a
partial key of the weak entity type and the key
of the identifying entity type. - Example
- Suppose that a DEPENDENT entity is identified by
the dependents first name and birth date, and
the specific EMPLOYEE that the dependent is
related to. DEPENDENT is a weak entity type with
EMPLOYEE as its identifying entity type via the
identifying relationship type DEPENDENT_OF.
32Conceptual Design of COMPANY Database
- Entity types
- DEPARTMENT
- PROJECT
- EMPLOYEE
- DEPENDENT
- Relationship types
- Manage (11)
- Work_for (1n)
- Supervision (1n)
- Controls (1n)
- Works_on (mn)
- Has_dependent (1n)
33ER Diagram of COMPANY Database
34Alternative Notation for Relationship Structural
Constraints
- Specified on each participation of an entity type
E in a relationship type R. - Specifies that each entity e in E participates in
at least min and at most max relationship
instances in R. - Default(no constraint) min 0, max n.
- Must have min ? max, min ? 0, max ? 1.
- Examples
- A department has exactly one manager and an
employee can manage at most one department. - Specify (1,1) for participation of DEPARTMENT in
MANAGES - Specify (0,1) for participation of EMPLOYEE in
MANAGES - An employee can work for exactly one department
but a department can have any number of
employees. - Specify (1,1) for participation of EMPLOYEE in
WORKS_FOR - Specify (0,n) for participation of DEPARTMENT in
WORKS_FOR
35ER Diagram of COMPANY Database
Works_for
0N
11
Manages
11
01
StartDate
36Enhanced Entity-Relationship and Object Modeling
- Subclass, Superclass and Inheritance
- Specialization and Generalization
- Disjoin/Overlapping
- Total/Partial
- Union/Categories
37The Relational Data Model
- Relational Model Concepts
- Characteristics of Relations
- Relational Integrity Constraints
- Domain Constraints
- Key Constraints
- Entity Integrity Constraints
- Referential Integrity Constraints
- Update Operations on Relations
38Relational Model Concepts
39Relational Model Concepts
- Relation ( informally).
- A table of values. Each column in the table has a
column header called an attribute. Each row is
called a tuple. - Formal relational concepts.
- Domain A set of atomic (indivisible) values.
- Attribute A name to suggest the meaning that a
domain plays in a particular relation. Each
attribute Ai has a domain Dom(Ai). - Relation schema A relation name R and a set of
attributes Ai that define the relation. Denoted
by R(A1, A2, ... ,an). For example student(name,
SSN, BirthDate, Addr). - Relational Database Schema A set S of relation
schemas that belong to the same database. S is
the name of the database. S R1, R2, ...,Rn. - Degree of a relation its number of attributes n.
- Tuple t of R(A1, A2,....,An) a (ordered) set of
values t lt v1, v2, ..., vngt where each value vi
is an element of Dom(Ai). Also called a n-tuple. - Relation instance r(r) A set of tuples r(r)
t1, t2,...,Tm, or alternatively r(r) ?dom(a1) ?
dom(a2) ? ... ? dom(an).
40Characteristics of Relations
- The tuples are not considered to be ordered, even
though they appear to be in the tabular form. - We will consider the attributes in R(A1, A2,
...,An) and the values in t lt v1, v2, .., vngt
to be orderd.( However, a more general
alternative definition of relation does not
require this ordering). - All values are considered atomic (indivisible). A
special null value is used to represent values
that are unknown or inapplicable to certain
tuples. - Notation
- We refer to component values of a tuple t by
tAi vi (the value of attribute Ai for tuple
t) - Similarly, tAu, Av, ..., Aw refer to the
sub-tuple of t containing the values of
attributes Au, Av, ..., Aw, respectively.
41Relational Constraints
42Relational Constraints
- Constraints are conditions that must hold on all
valid relation instances. There are three main
types of constraints - Domain Constraints
- Values of each attribute must be atomic.
- Key Constraints
- Superkey of R A set of attributes SK of R such
that no two tuples in any valid relation instance
r(R) will have the same value for SK. That is,
for any distinct tuples t1 and t2 in r(R), t1SK
? t2SK - Key (candidate key) of R A minimal superkey
that is, a superkey K such that removal of any
attribute form K results in a set of attributes
that is not a superkey. - Example The CAR relation schema
- CAR(State, Reg, SerialNo, make, Model, Year) has
two keys Key1 State, Reg, Key2 SerialNo
which are also superkeys. SerialNo, Make is a
superkey but not a key. - If a relation has several candidate keys, one is
chosen arbitrarily to be the primary key. The
primary key attributes are underlined.
43Relational Constraints
- Entity Integrity
- The primary key attributes PK of each relation
schema R in S can not have null values in any
tuple of r(R). This is because primary key values
are used to identify the individual tuples. tPK
? null for any tuple t in r(R). - Referential Integrity
- Referential integrity constraint is used to
specify a relationship among tuples in two
relations the referencing relation and
referenced relation. It involves two relations.
Tuples in the referencing relation R1 have
attributes FK (called foreign key attributes)
that reference the primary key attributes PK of
the referenced relation R2. A tuple t1 in R1 is
said to reference a tuple t2 in R2 if t1FK t2
PK. A referential integrity constraint can be
displayed in a relational database schema as a
directed arc from R1.FK to R2.
44Operations
20.00
25.00
18.00
22.00
45Update Operations on Relations
- Update Operations
- INSERT a tuple
- DELETE a tuple
- MODIFY a tuple
- Integrity constraints should not be violated by
the update operations. - Insert operation could violate any constraint.
- Delete operation could violate referential
constraints. - Modify a primary key or foreign key attribute is
equivalent to delete one tuple and insert
another. Modify other attributes cause no
problems. - Several update operations may have to be grouped
together. - Updates may propagate to cause other updates
automatically. This may be necessary to maintain
integrity constraints. - In case of integrity violation, several actions
can be taken - cancel the operation that causes the violation
- perform the operation but inform the user of
violation - trigger additional updates so the violation is
corrected - execute a user-specified error-correction routine
46Data Model Mapping
- ER-to-Relational Mapping
- EER-to-Relational Mapping
47Relational Model of COMPANY Database
48ER-to-Relational Mapping
- STEP 1 For each regular (strong) entity type E
in the ER schema, create a relation R that
includes all the simple attributes of E. Include
only the simple component attributes of a
composite attribute. Choose one of the key
attributes of E as primary key for R. If the
chosen key of E is composite, the set of simple
attributes that form it will together form the
primary key of R. - STEP 2 For each weak entity type W in the ER
schema with owner entity type E, create a
relation R, and include all simple attributes (or
simple components of composite attributes) of W
as attributes of R. In addition, include as
foreign key attributes of R the primary key
attribute(s) of the relation(s) that correspond
to the owner entity type(s) this takes care of
the identifying relationship type of W. The
primary key of R is the combination of the
primary key(s) of the owner(s) and the partial
key of the weak entity type W, if any. - STEP 3 For each binary 11 relationship type R
in the ER schema, identify the relations S and T
that correspond to the entity types participating
in R. Choose one of the relationsS, sayand
include as foreign key in S the primary key of T.
It is better to choose an entity type with total
participation in R in the role of S. Include all
the simple attributes (or simple components of
composite attributes) of the 11 relationship
type R as attributes of S. - STEP 4 For each regular binary 1N relationship
type R, identify the relation S that represents
the participating entity type at the N-side of
the relationship type. Include as foreign key in
S the primary key of the relation T that
represents the other entity type participating in
R. Include any simple attributes (or simple
components of composite attributes) of the 1N
relationship type as attributes of S.
49ER-to-Relational Mapping
- STEP 5 For each binary MN relationship type R,
create a new relation S to represent R. Include
as foreign key attributes in S the primary keys
of the relations that represent the participating
entity types their combination will form the
primary key of S. Also include any simple
attributes of the MN relationship type (or
simple components of composite attributes) as
attributes of S. Notice that we cannot represent
an MN relationship type by a single foreign key
attribute in one of the participating
relationsas we did for 11 or 1N relationship
typesbecause of the MN cardinality ratio. - STEP 6 For each multivalued attribute A, create
a new relation R. This relation R will include an
attribute corresponding to A, plus the primary
key attribute Kas a foreign key in Rof the
relation that represents the entity type or
relationship type that has A as an attribute. The
primary key of R is the combination of A and K.
If the multivalued attribute is composite, we
include its simple components. - STEP 7 For each n-ary relationship type R, where
n gt 2, create a new relation S to represent R.
Include as foreign key attributes in S the
primary keys of the relations that represent the
participating entity types. Also include any
simple attributes of the n-ary relationship type
(or simple components of composite attributes) as
attributes of S. The primary key of S is usually
a combination of all the foreign keys that
reference the relations representing the
participating entity types. However, if the
cardinality constraints on any of the entity
types E participating in R is 1, then the primary
key of S should not include the foreign key
attribute that references the relation E
corresponding to E. This concludes the mapping
procedure.
50EER-to-Relational Mapping
- STEP 8 Convert each specialization with m
subclasses S1, S2, . . ., Sm and (generalized)
superclass C, where the attributes of C are k,
a1, . . ., an and k is the (primary) key, into
relation schemas using one of the four following
options - Option 8A Create a relation L for C with
attributes Attrs(L) k, a1, . . ., an and
PK(L) k. Create a relation Li for each subclass
Si, 1 1 i 1 m, with the attributes Attrs(Li)
kD attributes of Si and PK(Li) k. - Option 8B Create a relation Li for each subclass
Si, 1 1 i 1 m, with the attributes Attrs(Li)
attributes of SiD k, a1, . . ., an and PK(Li)
k. - Option 8C Create a single relation L with
attributes Attrs(L) k, a1, . . ., an D
attributes of S1 D . . . D attributes of Sm D
t and PK(L) k. This option is for a
specialization whose subclasses are disjoint, and
t is a type (or discriminating) attribute that
indicates the subclass to which each tuple
belongs, if any. This option has the potential
for generating a large number of null values. - Option 8D Create a single relation schema L with
attributes Attrs(L) k, a1, . . ., an D
attributes of S1 D . . . D attributes of Sm D
t1, t2, . . ., tm and PK(L) k. This option is
for a specialization whose subclasses are
overlapping (but will also work for a disjoint
specialization), and each ti, 1 1 i 1 m, is a
Boolean attribute indicating whether a tuple
belongs to subclass Si.
51ER-to-Relational Mapping
ER Model Relational Model
Entity Entity Relation
11 and 1N relationship type Foreign key
MN relationship type Relationship relation and two foreign keys
N-ary relationship type Relationship relation and n foreign keys
Simple attribute Attribute
Composite attribute Set of component attributes
Multi-valued attributes Relation and foreign keys
Value set (Domain) Domain
Key attribute Primary (or candidate) key
52The Relational Algebra
- Relational algebra is a collection of operations
to manipulate relations - Query result is in the form of a relation
- Relational Operations
- SELECT ?
- PROJECT ? operations
- Sequences of operations and renaming of
attributes - Set operations
- UNION ?
- INTERSECTION ?
- DIFFERENCE ?
- CARTESIAN PRODUCT ?
- JOIN operations
- Other relational operations
- DIVISION
- OUTER JOIN
- AGGREGATE FUNCTIONS.
53Relational Operations
- SELECT operation (denoted by ?)
- Selects the tuples (rows) from a relation R that
satisfy a certain selection condition c - Form of the operation ?c(R)
- The condition c is an arbitrary Boolean
expression on the attributes of R - Resulting relation has the same attributes as R
- Resulting relation includes each tuple in r(R)
whose attribute values satisfy condition c - Examples
- ? DNO 4 (EMPLOYEE)
- ? SALARY gt 30333(EMPLOYEE)
- ? (( DNO 4 AND SALARY gt 25000 ) OR DNO 5)
(EMPLOYEE)
54Relational Operations
- Examples
- DNO 4 (EMPLOYEE)
- Jennifer S Wallace 987654321 1941-06-20
Berry, Bellaire, TX F 43000.00 888665555 4 - Ahmad V Jabbar 987987987 1969-03-29
Dallas, Huston, TX M 25000.00 987654321 4 - Alicia J Zelaya 999887777 1968-07-19
Castle, Spring, TX F 25000.00 987654321 4 - SALARY gt 30333(EMPLOYEE)
- Franklin T Wong 333445555 1955-12-08 638
Voss, Huston, TX M 40000.00 888665555 5 - Ramesh K Narayan 666884444 1962-09-15 975 Fire
Oak, Humble, TX M 38000.00 333445555 5 - James E Borg 888665555 1937-11-10 450
Stone, Huston, TX M 55000.00 null 1 - Jennifer S Wallace 987654321 1941-06-20 291
Berry, Bellaire, TX F 43000.00 888665555 4 - (( DNO 4 AND SALARY gt 25000 ) OR DNO 5)
(EMPLOYEE) - Franklin T Wong 333445555 1955-12-08 638
Voss, Huston, TX M 40000.00 888665555 5 - Ramesh K Narayan 666884444 1962-09-15 975 Fire
Oak, Humble, TX M 38000.00 333445555 5 - Jennifer S Wallace 987654321 1941-06-20 291
Berry, Bellaire, TX F 43000.00 888665555 4
55Relational Operations
- PROJECT operation(denoted by ?)
- Keeps only certain attributes (columns) from a
relation R specified in an attribute list L - Form of operation ? L(R)
- Resulting relation has only those attributes of R
specified in L - The PROJECT operation eliminates duplicate tuples
in the resulting relation so that it remains a
mathematical set ( no duplicate elements) - Example
- ? FNAME,LNAME, SALARY(EMPLOYEE)
- ? SEX, SALARY(EMPLOYEE)
56Relational Operations
- Example
- FNAME,LNAME, SALARY(EMPLOYEE)
- John Smith 30000.00
- Franklin Wong 40000.00
- Joice English 25000.00
- Ramesh Narayan 38000.00
- James Borg 55000.00
- Jennifer Wallace 43000.00
- Ahmad Jabbar 25000.00
- Alicia Zelaya 25000.00
- SEX, SALARY(EMPLOYEE)
- F 25000.00
- F 43000.00
- M 25000.00
- M 30000.00
- M 38000.00
- M 40000.00
- M 55000.00
57Relational Operations
- Sequence of operations
- Several operations can be combined to form a
relational algebra expression (query) - Example
- Retrieve the names and salaries of employees who
work in department 4. - FNAME, LNAME, SALARY(? DNO4 (EMPLOYEE))
- Jennifer Wallace 43000.00
- Ahmad Jabbar 25000.00
- Alicia Zelaya 25000.00
- Alternatively we specify explicit intermediate
relations for each step - DEPT4_EMPS ? ?DNO 4 (EMPLOYEE)
- R ? ? FNAME, LNAME, SALLRY(DEPT4_EMPS)
- Attributes can optionally be renamed in the
resulting left-hand side relation(this may be
required for some operations that will be
presented later) - DEPT4_MPS ? ? DNO 4 (EMPLOYEE)
- R(FIRSTNAME,LASTNAME,SALARY) ?
- ? FNAME,LNAME,SALARY(DEPT4_EMPS)
58Relational Operations
- Set Operations
- UNION R1 ? R2,
- INTERSECTION R1 ? R2
- SET DIFFERENCE R1 ? R2
- CARTESIAN PRODUCT R1 ? R2
- For ?, ?, ?, the operand relations R1(A1, A2,
...,An) and R2(B1, B2, ...,Bn) must have the same
number of attributes, and the domains of
corresponding attributes must be compatible that
is dom(Ai) dom(Bi) for i 1,2,..,n. This
condition is called union compatibility. - The resulting relation for ?, ? or ? has the
same attribute names as the first operand
relation R1(by convention).
59Relational Operations
- Cartesian product
- R(A1, A2,....,Am,B1,B2,...,Bn) ? R1(A1,
A2,....,Am) ? R2(B1,B2,...,Bn) - A tuple t exists in R for each combination of
tuples t1 from R1 and t2 from R2 such that tA1,
A2,....,Am t1 and tB1,B2,...,Bn t2 - If R1 has n1 tuples and R2 has n2 tuples, then R
will have n1n2 tuples - CARTESIAN PRODUCT is a meaningless operation on
its own. It can combine related tuples from two
relations if followed y the appropriate SELECT
operation. - Example
- Combine each DEPARTMENT tuple with the EMPLOYEE
tuple of the manager. - DEP_EMP ? DEPARTMENT ? EMPLOYEE
- DEPT_MANAGER ? ?MGRSSN SSN(DEP_EMP)
- James E Borg 888665555 Headquarters 1
888665555 1981-06-19 - Jennifer S Wallace 987654321 Administration 4
987654321 1995-01-01 - Franklin T Wong 333445555 Research 5
333445555 1988-05-22
60Relational Operations
- JOIN operation
- THETA JOIN
- Similar to CARTESIAN PRODUCT followed by a
SELECT. The condition c is called the join
condition. - R(A1, A2,....,Am,B1,B2,...,Bn) ? R1(A1,
A2,....,Am) c R2(B1,B2,...,Bn) - c is in the form of ltconditiongt AND ltconditiongt
AND . . . AND ltconditiongt , where each condition
is of the form Ai ? Bj, Ai is an attribute of R,
Bj is an attribute of S, Ai and Bj have the same
domain, and ? (theta) is one of the comparison
operators , lt, ?, gt, ?, ?. - EQUIJOIN
- The condition c uses only operator ''.
- The attributes appear in condition c are called
join attributes - Examples Retrieve each DEPARTMENTs name and its
managers name - T ? DEPARTMENT MGRSSN SSN EMPLOYEE
- RESULT ? ? DNAME,FNAME,LNAME (T)
61Relational Operations
- JOIN operations
- NATURAL JOIN()
- In an EQUIJOIN R ? R1 cR2, the join attributes
of R2 appear redundantly in the result relation
R. In a NATURAL JOIN, the redundant join
attributes of R2 are eliminated from R. The
equality condition is implied and need not be
specified. - R ? R1 ( join attributes of R1), (join
attributes of R2) R2 - If the join attributes have the same names in
both relations, they need not be specified and we
can write R? R1R2. - Examples
- Retrieve each EMPLOYEEs name and the name of the
DEPARTMENT he/she works for - T ? EMPLOYEE (DNO),(DNUMBER)DEPARTMENT
- RESULT ? ? FNAME, LNAME, DNAME (T)
- retrieve each EMPLOYEEs name and the name of
his/ser SUPERVISOR - SUPERVISOR (SUPERSSN,SFN,SLN) ? ? SSN,FNAME,LNAME
(EMPLOYEE) - T ? EMPLOYEESUPERVISOR
- RESULT ? ? FNAME,LNAME,SFN,SLN(T)
62Relational Operations
- Complete Set of Relational Algebra Operations
- All the operations discussed so far can be
described as a sequence of only the operations
SELECT, PROJECT, UNION, SET DIFFERENCE, and
CARTESIAN PRODUCT. - Hence, the set ?, ?, ?, -, ? is called a
complete set of relational algebra operations.
Any query language equivalent to these operations
is called relationally complete. - For database applications, additional operations
are needed that were not part of the original
relational algebra. These include - Aggregate functions and grouping
- OUTER JOIN and OUTER UNION.
63Relational Operations
- Aggregate Functions
- Functions such as SUM, COUNT, AVERAGE, MIN, MAX
are often applied to sets of values or sets of
tuples in database applications - lt grouping attributesgt ? ltfunction listgt (R)
- The grouping attributes are optional
- Example 1 retrieve the average salary of all
employees ( no grouping needed) - R(AVGSAL) ? ? AVERAGE SALARY (EMPLOYEE)
- 35125.000000
- Example 2 For each department, retrieve the
department number, the number of employees , and
the average salary ( in the department) - R(DNO,NUMEMPS,AVGSAL) ?
- DNO ? COUNT SSN,AVERAGE SALARY (EMPLOYEE)
- 1 1 55000.000000
- 4 3 31000.000000
- 5 4 33250.000000
64Relational Operations
- OUTER JOIN
- In a regular EQUIJOIN or NATURAL JOIN operation,
tuples in R1 or R2 that do not have matching
tuples in other relation do not appear in the
result. Some queries require all tuples in R1 (or
R2 or both) to appear in the result. When no
matching tuples are found, nulls are placed for
the missing attributes - LEFT OUTER JOIN R1 R2 lets every tuple in R1
appear in the result - RIGHT OUTER JOIN R1 R2 lets every tuple in
R2 appear in the result - FULL OUTER JOIN R1 R2 lets every tuple in
R1 or R2 appear in the result
65SQL - A Relational Database Language
- Basic Concepts
- Data Definition in SQL
- Retrieval Queries in SQL
- Specifying Updates in SQL
- Relational Views in SQL
- Creating Indexes in SQL
- Embedding SQL in a Programming Language
- Recent Advances in SQL
66Basic Concept
- Catalog A collection of schemas
- Schema A collections of tables and other
constructs such as constraints. - Table Represents a relation. It includes base
tables and views. - Column Represents an attribute.
- Name Space Hierarchy catalog -gt schema -gt table
-gt column - Qualified Name
- catalog_name.schema_name.table_name.column_name
67Data Definition in SQL
- CREATE TABLE
- Specifies a new base relation by giving it a
name and specifying each of its attributes and
their data types (INTEGER, FLOAT , DECIMAL (i,j),
CHAR(n), VARCHAR(n)). A constraint NOT NULL may
be specified on an attribute . - Example
- CREATE TABLE DEPARTMENT
- ( DNAME VARCHAR(15) NOT NULL ,
- DNUMBER INT NOT NULL ENIQUE,
- MGRSSN CHAR(9) NOT NULL,
- MGRSTARTDATE DATETIME,
- PRIMARY KEY(DNUMBER),
- FOREIGN KEY (MGRSSN) REFERENCES EMPLOYEE
- )
68Data Definition in SQL
- DROP TABLE
- Used to remove a relation (base table) and its
definition. The relation can no longer be used in
queries , updates or any other commands since
its description no longer exists. - Example
- DROP TABLE DEPENDENT
- ALTER TABLE
- Used to add an attribute to one of the base
relations. The new attribute will have NULLs in
all the tuples of the relation right after the
command is executed hence, the NOT NULL
constraint is not allowed for such an attribute. - Example
- ALTER TABLE EMPLOYEE ADD JOB VARCHAR(12)
- The database users must still enter a value for
the new attribute JOB for each EMPLOYEE tuple.
This can be done using the UPDATE command.
69DDL for COMPANY Database
- CREATE TABLE EMPLOYEE
- ( FNAME VARCHAR(15) NOT NULL,
- MINIT CHAR,
- LNAME VARCHAR(15) NOT NULL,
- SSN CHAR(9) NOT NULL,
- BDATE DATETIME,
- ADDRESS VARCHAR(30),
- SEX CHAR,
- SALARY DECIMAL(19,2),
- SUPERSSN CHAR(9),
- DNO INT NOT
NULL, - PRIMARY KEY(SSN),
- FOREIGN KEY (SUPERSSN) REFERENCES
EMPLOYEE(SSN), - )
70DDL for COMPANY Database
- CREATE TABLE DEPARTMENT
- ( DNAME VARCHAR(15) NOT
NULL , - DNUMBER INT
NOT NULL UNIQUE, - MGRSSN CHAR(9)
NOT NULL, - MGRSTARTDATE DATETIME,
- PRIMARY KEY(DNUMBER),
- FOREIGN KEY (MGRSSN) REFERENCES EMPLOYEE
- )
- ALTER TABLE EMPLOYEE
- ADD FOREIGN KEY (DNO)
- REFERENCES DEPARTMENT(DNUMBER)
71DDL for COMPANY Database
- CREATE TABLE DEPT_LOCATIONS
- (DNUMBER INT NOT NULL,
- DLOCATION VARCHAR(15) NOT NULL,
- PRIMARY KEY (DNUMBER, DLOCATION),
- FOREIGN KEY (DNUMBER) REFERENCES
DEPARTMENT(DNUMBER) - )
- CREATE TABLE PROJECT
- (PNAME VARCHAR(15) NOT NULL,
- PNUMBER INT NOT NULL,
- PLOCATION VARCHAR(15),
- DNUM INT NOT NULL,
- PRIMARY KEY (PNUMBER),
- FOREIGN KEY (DNUM) REFERENCES DEPARTMENT(DNUMBER)
- )
72DDL for COMPANY Database
- CREATE TABLE WORKS_ON
- (ESSN CHAR(9) NOT NULL,
- PNO INT NOT NULL,
- HOURS DECIMAL(3,1) NOT NULL,
- PRIMARY KEY (ESSN, PNO),
- FOREIGN KEY (ESSN) REFERENCES EMPLOYEE(SSN),
- FOREIGN KEY (PNO) REFERENCES PROJECT(PNUMBER)
- )
- CREATE TABLE DEPENDENT
- (ESSN CHAR(9)
NOT NULL, - DEPENDENT_NAME VARCHAR(15) NOT NULL,
- SEX CHAR,
- BDATE DATETIME,
- RELATIONSHIP VARCHAR(8),
- PRIMARY KEY (ESSN,DEPENDENT_NAME),
- FOREIGN KEY (ESSN) REFERENCES EMPLOYEE(SSN)
- )
73Basic Queries in SQL
- SQL has one basic statement for retrieving
information from a database the SELECT statement - This is not the same as the SELECT operation of
the relational algebra - Important distinction between SQL and the formal
relational model SQL allows a table (relation)
to have two or more tuples that are identical in
all their attribute values. - SQL relations can be constrained to be sets by a
key constraint, or by using the DISTINCT option
in the SELECT statement.
74The SELECT Statement
- Basic form of the SQL SELECT statement is called
a mapping or a SELECT-FROM-WHERE block - SELECT ltattribute-listgt
- FROM lttable listgt
- WHERE ltconditiongt
- where ltattribute listgt is a list of attribute
names whose values are to be retrieved by the
query. lt table listgt is al list of the relation
names required to process the query. lt conditiongt
is a conditional (Boolean) expression that
identifies the tuples to be retrieved by the
query - Basic SQL queries correspond to using the SELECT,
PROJECT, and JOIN operations of the relational
algebra.
75Sample Basic Queries
- Query 0 Retrieve the birth date and address of
the employee whose name is 'John B. Smith'. - Q0 SELECT BDATE, ADDRESS
- FROM EMPLOYEE
- WHERE FNAME 'John' AND MINIT 'B' AND LNAME
'Smith' - Query 1 Retrieve the name and address of all
employees who work for the 'Research' department. - Q1 SELECT FNAME, LNAME, ADDRESS
- FROM EMPLOYEE, DEPARTMENT
- WHERE DNAME'Research' AND DNUMBER DNO
- Query 2 For every project located in 'Stafford'
, list the project number, the controlling
department number, and the department manager's
last name, address and birth date. - Q2 SELECT PNUMBER, DNUM,LNAME, BDATE, ADDRESS
- FROM PROJECT, DEPARTMENT, EMPLOYEE
- WHERE DNUMDNUMBER AND MGRSSN SSN AND
PLOCATION 'Stafford' - Q2x SELECT PNUMBER, DNUM,LNAME, BDATE, ADDRESS
- FROM DEPARTMENT JOIN PROJECT ON (DNUMDNUMBER)
-
JOIN EMPLOYEE ON (MGRSSN SSN) - WHERE PLOCATION 'Stafford'
76The SELECT Statement Aliases
- ALIASES
- Some queries need to refer to the same relation
twice. In this case, aliases are given to the
relation name - Query 8 For each employee, retrieve the
employees name, and the name of his/her
immediate supervisor. - Q8 SELECT E.FNAME, E.LNAME, S.FNAME, S.LNAME
- FROM EMPLOYEE E, EMPLOYEE S
- WHERE E.SUPERSSN S.SSN
- Renaming attributes
- Q8A SELECT E.FNAME as "Employee First Name",
E.LNAME as "Employee Last Name", - S.FNAME as
"Supervisor First Name" , S.LNAME as "Supervisor
Last Name" - FROM EMPLOYEE E, EMPLOYEE S
- WHERE E.SUPERSSN S.SSN
- Q8B (SQL Server) SELECT E.LNAME ', '
E.FNAME as "Employee Name", -
S.LNAME ', ' S.FNAME as "Supervisor Name" - FROM EMPLOYEE E,
EMPLOYEE S - WHERE E.SUPERSSN
S.SSN - In Q8, the alternate relation names E and S are
called aliases for the EMPLOYEE relation - We can think of E and S as two different copies
of the EMPLOYEE relation E represents employee
in the role of supervisees and S represents
employees in the role of supervisors - Aliasing can also be used in any SQL query for
convenience
77The SELECT Statement Unspecified WHERE-clause
- Unspecified WHERE-clause
- A missing WHERE-clause indicates no condition
hence, all tuples of the relations in the
FROM-clause are selected. This is equivalent to
the condition WHERE TRUE - Query 9 Retrieve the ssn values of all
employees. - Q9 SELECT SSN
- FROM EMPLOYEE
- If more than one relation is specified in the
FROM-clause and there is no join condition,
then the CARTESIAN PRODUCT of tuples is selected - Q10 SELECT SSN, DNAME
- FROM EMPLOYEE, DEPARTNEMT
- It is extremely important not to overlook
specifying any selection and join conditions in
the WHERE-clause otherwise, incorrect and very
large relations may result
78The SELECT Statement DISTINCT and
- Use of
- To retrieve all the attribute values of the
selected tuples, a is used, which stands for
all the attributes. - Q1C SELECT
- FROM EMPLOYEE
- WHERE DNO 5
- Q1D SELECT
- FROM EMPLOYEE , DEPARTMENT
- WHERE DNAME 'Research' AND DNO DNUMBER
- Tables as Set
- SQL does not treat a relation as a set duplicate
tuples can appear. To eliminate duplicate tuples,
the keyword DISTINCT is used. - Q11 SELECT SALARY
- FROM EMPLOYEE
- Q11A SELECT DISTINCT SALARY
- FROM EMPLOYEE
79The SELECT Statement Set Operations
- Set Operations
- SQL has directly incorporated some set
operations. There is a union operation (UNION),
and in some versions of SQL there are set
difference (MINUS) and intersection (INTERSECT)
operations - The resulting relations of these set operations
are sets of tuples duplicate tuples are
eliminated from tuples - The set operations apply only to union compatible
relations the two relations must have the same
attributes and the attributes must appear in the
same order
80The SELECT Statement Set Operations
- Set Operations
- Example
- Query 4 Make a list of all project numbers for
projects that involve an employee whose last name
is 'Smith' as a worker or as a manager of the
department that controls the project - Q4 (SELECT DISTINCT PNAME
- FROM PROJECT, DEPARTMENT, EMPLOYEE
- WHERE DNUMDNUMBER AND MGRSSN SSN AND LNAME
'Smith') - UNION
- (SELECT PNAME
- FROM PROJECT, WORKS_ON, EMPLOYEE
- WHERE PNUMBER PNO AND ESSN SSN AND
- LNAME 'Smith')
81The SELECT Statement Substring Comparision
- Substring Comparison
- The LIKE comparison operator is used to compare
partial strings - Two reserved characters are used '' (