Title: CS4432: Database Systems II
1CS4432 Database Systems II
- Course Introduction
- Mohamed Eltabakh
2What Is a Relational Database Management System ?
- Database Management System DBMS
- Relational DBMS RDBMS
- A collection of files that store the data
- But Files that we do not directly access or read
- A big C program written by someone else that
accesses and updates those files for you - But Huge program containing 100s of 1000s of
lines
3Where are RDBMS used ?
- Backend for traditional database applications
- Backend for large Websites
- Backend for Web services
4Example of a Traditional Database Application
University registration
- Suppose we are building a system
- to store the information about
- students
- courses
- professors
- who takes what, who teaches what
5Can we do it without a DBMS ?
- Sure we can! Start by storing the data in files
- students.txt courses.txt
professors.txt - Now write C or Java programs to implement
specific tasks
File System Approach
6Doing it without a DBMS...
- Enroll Mary Johnson in CSE444
Write a C program to do the following
Read students.txt Read courses.txt Findupdate
the record Mary Johnson Findupdate the record
CSE444 Write students.txt Write courses.txt
7What Can Go Wrong
- Several drawbacks of using file systems
- Data redundancy and inconsistency
- Multiple file formats, duplication of information
in different files - Multiple records formats within the same file
- No order enforced between fields
- Difficulty in accessing data
- Need to write a new program to carry out each new
task - No indexes, always scan the entire file
- Integrity problems
- Modify one file (or field in a file), and not
changing the dependent fields or files - Integrity constraints (e.g., account balance gt 0)
become buried in program code rather than being
stated explicitly
8What Can Go Wrong
- Concurrent access by multiple users
- Many users need to access/update the data at the
same time (concurrent access) - Uncontrolled concurrent access can lead to
inconsistencies - Example Two people are updating the same bank
account at the same time - Security problems
- Hard to provide user access to some, but not all,
data - Recovery from crashes
- While updating the data the system crashes
- Maintenance problems
- Hard to search for or update a field
- Hard to add new fields
9Enters a DMBS
Two tier database system
Applications
Direct SQL
Database server(someone elsesC program)
Data files
10Functionality of a DBMS
- The programmer sees SQL, which has two
components - Data Definition Language - DDL
- Data Manipulation Language DML
- Behind the scene the DBMS has
- Query Optimizer
- Query Engine
- Storage Management
- Transaction Management (concurrency, recovery)
Frontend (CS3431)
Backend (CS4432)
11How the Programmer Sees the DBMS
Frontend
- Start with DDL to create tables
- Continue with DML to populate tables
CREATE TABLE Students ( Name CHAR(30) SSN
CHAR(9) PRIMARY KEY NOT NULL, Category
CHAR(20) ) . . .
INSERT INTO Students VALUES(Charles,
123456789, undergraduate) . . . .
12How the Programmer Sees the DBMS
Frontend
Students
Takes
Courses
data independence separate logical view from
physical implementation
13What is Hidden ???
Backend
CREATE TABLE Students ( Name CHAR(30) SSN
CHAR(9) PRIMARY KEY NOT NULL, Category
CHAR(20) ) . . .
Creating file (Data)
Updating catalog tables
May create indexes
14Queries
Frontend
- Find all courses that Mary takes
- We did not specify how to execute
- We did not specify how to optimize
SELECT C.nameFROM Students S, Takes T,
Courses CWHERE S.nameMary and
S.ssn T.ssn and T.cid C.cid
15What is Hidden ???
Backend
Imperative query execution plan
Declarative SQL query
SELECT C.name FROM Students S, Takes T, Courses
C WHERE S.nameMary and S.ssn
T.ssn and T.cid C.cid
16Transactions
Frontend
- Enroll Mary Johnson in CSE444
BEGIN TRANSACTION INSERT INTO Takes SELECT
Students.SSN, Courses.CID FROM Students,
Courses WHERE Students.name Mary Johnson
and Courses.name
CSE444 -- More updates here.... IF
everything-went-OK THEN COMMIT ELSE
ROLLBACK
If system crashes, the transaction is still
either committed or aborted
17Transactions
- A transaction sequence of statements that
either all succeed, or all fail - Basic unit of processing
- Transactions have the ACID properties
- A atomicity
- C consistency
- I independence (Isolation)
- D durability
18Transaction ACID Properties
T1
T2
T3
T4
- Each transaction has a Start and End and
does many things in between - A ? Atomic Either the entire transaction is
done (all its actions) or none. - C ? Consistency A transaction must move the DB
from one consistent state to another consistent
state
19Transaction ACID Properties (Contd)
T1
T2
T3
T4
- What about interaction
- Can T2 read what T1 is writing?
- Can T3 read what T1 is reading?
- Can T4 read what T1 wrote?
- I ? Isolation Although running concurrently,
they should appear as if they run is a certain
serial order
20Transaction ACID Properties (Contd)
T1
T2
T3
T4
- If T1 failed T2 completed ? This means what?
- T1 ? Rolledback T2 ? Committed
- D ? Durability The effect of a committed
transaction must be persistent (not lost)
21Transactions
Backend
22DBMS Backend Components
- We will cover several of these components
Chapter 1 in textbook
23Topics To Be Covered
- File System Structure
- Records in blocks, dictionary, buffer
management, - Indexing Hashing
- B-Trees, hashing,
- Query Processing
- Query costs, join strategies,
- Crash Recovery
- Failures, stable storage,
- Concurrency Control
- Correctness, locks,
- Transaction Processing
- Logs, deadlocks,
24Database Material at WPI
CS 3431
B, C terms
you are here
CS 4432
D term (alternate)
CS 542
CS 561
CS525
MQP
DSRG
Grad. DB
Advanced DB
Selected Topics
Selected DB Project
DB Research at WPI
Varies
Any time
year round
DONT TAKE!
25Database Systems
- The big commercial database vendors
- Oracle
- IBM (with DB2) bought Informix recently
- Microsoft (SQL Server)
- Sybase
- Some free database systems (Unix)
- Postgres
- Mysql
- Predator
- In CS4432 we use Oracle SimpleDB!