Title: Introduction to Database Systems
1Introduction to Database Systems
- Yuri Breitbart
- MW 445 600pm
- Fall 2004
2Course Goals
- This course is an introduction to the design, use
and internal - workings of database systems. We consider here
systems that - are based on relational model that is, users
data is - represented as a set of two dimensional tables.
During the - class we learn the ways to organize the data so
that the user - applications may work concurrently and get data
from - database quickly and reliably. We first study
the data - modeling techniques and how to convert a data
model into a - set of relations. We then study SQL query
language. Finally, - we study database internal organization of data
a - concurrency control and recovery issues in
database systems.
3References
- A. Silberschatz, H. F. Korth, S Sudarshan,
- Database System Concepts, 4th Ed., McGrow Hill,
2002 - http//www.bell-labs.com/topic/books/db-book
- Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer
Widom, - Database Systems, The Complete Book, Prentice
Hall, 2002 - http//www-db.stanford.edu/ullman/dscb.html
- Class notes
4Prerequisites
- CS 33001 Data Structures
- CS31011 Discrete Structures
- Structured Programming Language (C)
- Software engineering topics related to project
documentation and project design
5Workload Requirements
- 4 Homeworks
- Project
- Midterm and Final Exams
- Homeworks 20 of the final grade
- Project 35 of the final grade
- Midterm 20 of the final grade
- Final 25 of the final grade
- No late homeworks and/or projects are accepted
- A 91 100 B 80-90 C 7079 D - gt64
6Exam, Project, and Homeworks Due Dates
- Homework 1 Out 9/08/2004 In 9/13/2004
- Homework 2 Out 9/13/2004 In 9/17/2004
- Homework 3 Out 9/20/2004 In 9/24/2004
- Homework 4 - Out 9/27/2004 In 10/04/2004
- Midterm 10/13/2004
- Final 12/13/2004
- Project Out 10/18/2004 In 12/01/2004
- Exams are all inclusive
7Class Schedule
- Week 8 - SQL
- Week 9 Constraints Triggers
- Week 10 Data Storage
- Week 11 Indexes
- Week 12 - Query Processing
- Week 13 Transactions
- Week 14 Recovery
-
- Week 1 Database Overview
- Week 2 ER model
- Week 3 Relational Model
- Week 4 Relational Model
- Week 5 OO Model and XML
- Week 6 OO Model and XML
- Week 7 Relational Algebra
8Project
- You will build a database application using
Oracle. The project - consists of several parts
- ER diagram for the application,
- conversion of the ER diagram into a set of
relations, - normalizing a set of relations,
- creating database under ORACLE,
- loading database,
- creating ORACLE application programs,
- testing the system.
- Programming should be done in C. Project
documentation should include - application description,
- ER diagram,
- normalized set of relation,
- description of each application program,
- sample data for each relation, and
- description on how to install and run your
application. -
9Database Overview
- File Management vs Database Management
- Advantages of Database systems storage
persistence, - programming interface, transaction management
- Three level Data Model
- DBMS Architecture
- Database System Components
- Users classification
10File Management Systems
- File is uninterpreted, unstructured collection of
information - File operations delete, catalog, create, rename,
open, close, read, write, find, - Access methods Algorithms to implement
operations along with internal file organization - Examples File of Customers, File of Students
Access method implementation of a set of
operations on a file of students or customers.
11File Management System Problems
- Data redundancy
- Data Access New request-new program
- Data is not isolated from the access
implementation - Format incompatible data
- Concurrent program execution on the same file
- Difficulties with security enforcement
- Integrity issues
12Concurrent Program Execution
- What is the final value of the account AC?
Program1
ACAC-50
AC 103 450
Program2
ACAC-100
13Security Problems
- Allow access to the file only to the authorized
personnel - Ability to restrict access to parts of the record
- Ability to control operation usage by different
users - Protection from unauthorized use
- Protection from the derivation of unauthorized
information
14Data Integrity
- A database constraint is a logical constraint
about the data expressed in a logical language. - STUDENT.AGE gt15
- If (STUDENT.CLASS cs43005) then
(STUDENT.PRIOR_CLASS cs31001) - Database is consistent if data at each time
satisfies all integrity constraints. - Input to any application is a set of consistent
data. An application output is a set of
consistent data.
15Collection of Files
60s 70's 80's 90s now
Hierarchical
Network
Relational
Choice for most new applications
Object Bases
Knowledge Bases
16Advantages of Databases
- Persistent Storage Database not only provides
persistent storage but also efficient access to
large amounts of data - Programming Interface Database allows users to
access and modify data using powerful query
language. It provides flexibility in data
management - Transaction Management Database supports a
concurrent access to the data
17Early Database Applications
- Airline Reservation Systems Data items are
single passenger reservations Information about
flights and airports Information about ticket
prices and tickets restrictions. - Banking Systems Data items are accounts,
customers, loans, mortgages, balances, etc.
Failures are not tolerable. Concurrent access
must be provided - Corporate Records Data items are sales,
accounts, bill of materials records, employee and
their dependents
18Modern Database Applications
- Client Server architecture
- DBMS serves as a server and client queries are
sent to servers - Where to locate servers
- Multimedia Applications
- Multidatabase Applications
- Data Warehouses
19Three Aspects to Studying DBMS's
- 1. Modeling and design of databases.
- Allows exploration of issues before committing to
an implementation. - 2. Programming queries and DB operations like
update. - SQL intergalactic dataspeak.
- 3. DBMS implementation.
- .
20Definitions
- A database is a collection of stored operational
data used by various applications and/or users by
some particular enterprise or by a set of outside
authorized applications and authorized users - A DataBase Management System (DBMS) is a software
system that manages execution of users
applications to access and modify database data
so that the data security, data integrity, and
data reliability is guaranteed for each
application and each application is written with
an assumption that it is the only application
active in the database.
21What Is Data ?
- Different view points
- A sequence of characters stored in computer
memory or storage - Interpreted sequence of characters stored in
computer memory or storage - Interpreted set of objects
22Data Levels and their Roles
- Physical corresponds to the first view of data
How data is stored, how is it accessed, how data
is modified, is data ordered, how data is
allocated to computer memory and/or peripheral
devices, how data items are actually represented
(ASCI, EBCDIC,) - Conceptual corresponds to the second view of
data What we want the data to express and what
relationships between data we must express, what
story data tells, are all data necessary for
the story are discussed. - View corresponds to the third view of dataWhat
part of the data is seen by a specific application
23Physical Data - Example
10
3
6
10
3
6
J
james
3
000375
0000035000
. . . . . . . .
.
benjamin
63
24Examples
- Conceptual
- 1 TA
- 2 Name char(10),
- 2 Age char (3),
- 2 Salary Fixed Dec(6)
- 1 Student
- 2 Name char(10),
- 2 Year-of_study char(3)
- 2 GPA Fixed Dec(5,2)
25Examples
- 1 STUDENTS-TA
- 2 Name char(25),
- 2 Age char (3),
- 2 Salary Fixed Dec(8,2),
- 2 Year-of_study char(3)
- 2 GPA Fixed Dec(3,2)
A view
26Three Level Data View Data Abstractions
. . . . .
View1
View k
Conceptual View Of Data
Phyisal Data Storage
27DBMS Architecture
28Logical and Physical Database Components
- Data Definition Language (DDL)
- Data Manipulation Language (DML)
- Host Language Interface
- Data Administrator
- Users
- Query Processor
- Compiler
- Optimizer
- Management
- Transaction Manager
- File Manager
- Buffer Manager
- Authorization and Integrity Manager
Logical
Physical
29Database Languages
Department
Faculty
Dept
Chair
Name
Dept
SQL
- SELECT Chair
- FROM Faculty, DepartmentWHERE Faculty.name
Ken Noname AND Faculty.Dept
Department.Dept - Data definition language (DDL) like type
definitions in C or C - Data Manipulation Language (DML) Query
(SELECT) UPDATE lt relation name gt SET
ltattributegt lt new-valuegt WHERE ltconditiongt
30Data Definition Language
- Specification notation for defining the database
schema - E.g. create table account (
account-number char(10), balance
integer) - DDL compiler generates a set of tables stored in
a data dictionary - Data dictionary contains metadata (i.e., data
about data) - database schema
- Data storage and definition language
- language in which the storage structure and
access methods used by the database system are
specified - Usually an extension of the data definition
language
31Data Manipulation Language
- Language for accessing and manipulating the data
organized by the appropriate data model - Two classes of languages
- Procedural user specifies what data is required
and how to get those data - Nonprocedural user specifies what data is
required without specifying how to get those data - SQL is the most widely used query language
32 Database Host Languages
C, C, Fortran, Lisp, COBOL
Application prog.
DBMS
Calls to DB
Local Vars
(Memory)
(Storage)
- Host language is completely general
- Query languageless general "non procedural" and
- optimizable
33Data Administrator
- Coordinates all the activities of the database
system the database administrator has a good
understanding of the enterprises information
resources and needs. - Database administrator's duties include
- Schema definition
- Storage structure and access method definition
- Schema and physical organization modification
- Granting user authority to access the database
- Specifying integrity constraints
- Acting as liaison with users
- Monitoring performance and responding to changes
in requirements
34Database Users
- Naïve do not know about database too much,
invoke application programs that are prepared
already - Application Programmers know how to interact
with the system but may not know how DBMS is
designed - Sophisticated users that know advanced use of the
system and can use the system and packages on the
top of the system - DBMS system users write specialized database
applications that do not fit into the traditional
data processing framework
35Query Processor
- Compiler verifies whether a program or query is
written in accordance with DDL and DML rules - Optimizer Finds the most effective way to
access the required data and supply it in a user
requested form. Monitors the query execution and
modifies a query evaluation plan if necessary.
36 Transaction Management
- A transaction is a collection of operations that
performs a single logical function in a database
application - Transaction-management component ensures that the
database remains in a consistent (correct) state
despite system failures (e.g., power failures and
operating system crashes) and transaction
failures. - Concurrency-control manager controls the
interaction among the concurrent transactions, to
ensure the consistency of the database.
37Storage Management
- Storage manager is a program module that provides
the interface between the low-level data stored
in the database and the application programs and
queries submitted to the system. - The storage manager is responsible to the
following tasks - interaction with the file manager
- efficient storing, retrieving and updating of data
38File Manager
- File Manager is responsible for mapping logical
database units (objects, relations, etc.) into a
set of low level files. - It is responsible for maintenance of files and
indexes on them. It should be able to create and
destroy index and collect unused storage space to
eliminate an unneeded gaps on disks.
39Buffer Manager
- Buffer Manager is responsible for the allocation
and maintenance buffer space in a memory to
facilitate processing database data by several
concurrent applications. - Buffer Manager decides when to load data from a
buffer to a database or discard the data and
under what conditions a new data should be put
into a buffer
40Authorization and Integrity Manager
- This manager is responsible for granting an
access to database or portions thereof only to
authorized users and preventing the access to
unauthorized users - Integrity manager must assure data integrity
during normal database operations as well as
during the database failures
41The DBMS Marketplace
- Relational DBMS companies Oracle, Sybase are
among the largest software companies in the
world. - IBM offers its relational DB2 system. With IMS,
a nonrelational system, IBM is by some accounts
the largest DBMS vendor in the world. - Microsoft offers SQL-Server, plus Microsoft
Access for the cheap DBMS on the desktop,
answered by lite systems from other
competitors. - Relational companies also challenged by
object-oriented DB companies. - But countered with object-relational systems,
which retain the relational core while allowing
type extension as in OO systems.
42Logical Data Models
- A collection of tools for describing
- data
- data relationships
- data semantics
- data constraints
- Value based models ER Model, OO Model
- Record Based Models Relational Model
43Entity-Relationship Model
- The enterprise data can be described as a set of
entities and a set of relationships between them. - Entity a data that pertains to, or describes
some component of the enterprise - Each entity is characterized by a set of
attributes - Relationship describes an interconnection
between different entities - Entity Set a set of entities that are
characterized by the same entity definition - Relationship Set a set of relationships of the
same type
44Entity-Relationship Model
- Example of schema in the entity-relationship model
45Object Oriented Model
- An enterprise is described as a collection of
objects and a collection of algorithms that work
with objects - Example Person is an object.
- Object is characterized by a set of public
attributes. Applications may refer only to public
attributes private attributes . Algorithms that
implement the object may refer to private
attributes a set of protected attributes and a
set of methods - Attribute of an object can be another object
- Objects are nested into a hierarchy and can
inherit attributes of their parents
46Object Oriented Model
- OBJECT DATA MODEL
- 1. Complex Objects Nested Structure (pointers
or references) - 2. Encapsulation, set of Methods/Access functions
- 3. Object Identity
- 4. Inheritance Defining new classes like old
classes - Object model usually find objects via explicit
navigation - Also query language in some systems
47Example
- Class Person
- public
- Person()
- Person()
- float GetSalary()
- float PutSalary(float)
- string Name
- int SSN
- date BirthDate
- private
- float salary
-
48Object-Oriented ModelData Encapsulation
- An object contains both data and methods to work
with the data - The physical data representation is visible only
to the object creator. - The implementation details of methods are not
visible to object users - An interface of the object consists of public
attributes and methods - Each object is characterized by an object identity
49Relational Model
- An enterprise is represented as a set of
relations - Domain is a set of atomic values. Each domain
has a NULL value. - Data type Description of a form that domain
values can be represented. - Relation is a subset of a cartesian product of
one or more domains - The elements of relations are called tuples. Each
element in the cartesian product is called
attribute.
50- Relational model is good for
- Large amounts of data gt simple operations
- Navigate among small number of relations
- Difficult Applications for relational model
- VLSI Design (CAD in general)
- CASE
- Graphical Data
ALU
ADDER
CPU
A
FA
Adder
ALU
ADDER
Bill of Materials or transitive closure
51Relational Model
Attributes
Street
City
gpa
Name
Student-id
- Example of tabular data in the relational model
Johnson Smith Johnson Jones Smith
192-83-7465 019-28-3746 192-83-7465 321-12-3123
019-28-3746
Alma North Alma Main North
3.6 2.7 3.2 4.0 3.45
Palo Alto Rye Palo Alto Harrison Rye