Title: This Week
1This Week
- Introduction to Databases
- Course Information
- Grading and Other Things
- Questionnaire
- The E/R Data Model
2Textbooks for COSC 6340
- Required Text Raghu Ramakrishnan and Johannes
Gehrke, Data Management Systems, McGraw Hill,
Third Edition, 2002 (complication the chapter
numbers in the new edition are different!!) - Recommended Jiawei Han and Micheline Kamber,
Data Mining Concepts and Techniques, Morgan
Kaufman Publishers, 2001, ISBN 1-55860-489-8 (4
chapters will be covered) - Other books with relevant material Ramez Elmasri
and Shamkant Navathe, Fundamentals of Database
Systems, Third Edition Addison Wesley ISBN
0-8053-1755-4 -
3Lectures in COSC 6340
- I Basic Database Management Concepts --- Review
of basic database concepts, techniques, and
languages (9 classes, Chapters 1-5, 8-11, and 16
of the textbook). - III Introduction to KDD and Data Warehousing
centering on data warehouses, OLAP, and data
mining moreover, more detailed coverage of
querying and mining data streams and database
clustering (5 classes Chapters 1, 2, 6, and 7 of
the Han/Kamber book chapter 25 of our textbook
and additional material) - III Relational Database Design (2 classes,
chapters 19) - IV Implementation of Relational Operators, Query
Optimization, and Physical Database Design
(Chapters 121420, 3-4 classes) - V Internet Databases and XML (1 class, chapter
27 of the textbook) - VI Other discussion of home works and exams,
student presentations discussion of course
projects (3 classes)
4Tentative Schedule COSC 6340
- Jan. 18 Introduction to COSC 6340
- Fast Review of Undergraduate Material (Jan.
20-Feb. 15) - Jan. 20 Entity-Relationship Data Model ? more
detailed than textbook - Jan. 25 Entity-Relationship Data Model
- Jan. 27 Relational Data Model
- Feb. 1 Mapping E/R Diagrams to Relations
- Feb. 8/10/15 Index storage structures,
B-trees, and hashing, PDBD - Feb. 15/17 Relational Algebra
- Feb. 22 Writing SQL Queries (somewhat short)
- Feb. 24 Leftovers / Review
- March 1 Exam0 (Undergraduate Review Exam)
5Tentative Schedule COSC 6340 Part2
- II KDD and Data Warehousing (approx. 5.5
classes) - March 3 Introduction to KDD
- March 8 Similarity Assessment
- March 10 Clustering
- March 22 Association Rule Mining
- March 29 Spatial Databases
- March 31 Data Warehouses and OLAP
- April 8 (30 minutes) Spatial Data Mining
- April 578 III Relational Database Design
(2.5 classes) - April 14 Midterm Exam (30 minutes review on
April 12, 2005) - April 192628 Student Presentations
- IV Physical Database Design and Query
Optimization (3 classes) - April 8(makeup) Implementation of Relational
Operators - April 8(makeup)/12 Introduction to Query
Optimization - April 12/19 Physical Database Design II
- V Internet Databases (1 class)
- April 21 Introduction to XML and Semi-Structured
Data - April 28 Course Summary Teaching Evaluation
Leftovers
6Exams and Homeworks COSC 6340
- Tu., March 1 Undergraduate Material Review Exam
- Th., April 14 Midterm Exam
- 3 graded homeworks deadlines Feb. 17, April 11,
April 28 - Ungraded Homeworks
- Final Exam Tu/TH., May 10, 11a-130p
- Qualifying Exam Part2 Fr., May 13, 10-1130a
7Why are integrated databases popular?
- Avoidance of uncontrolled redundancy
- Making knowledge accessible that would otherwise
not be accessible - Standardization --- uniform representation of
data facilitating import and export - Reduction of software development (though the
availability of data management systems)
Bookkeeping Device
Integrated Database
Car Salesman
8Popular Topics in Databases
- Efficient algorithms for data collections that
reside on disks (or which are distributed over
multiple disk drives, multiple computers or over
the internet). - Study of data models (knowledge representation,
mappings, theoretical properties) - Algorithms to run a large number of transactions
on a database in parallel finding efficient
implementation for queries that access large
databases database backup and recovery, - Database design
- How to use database management systems as an
application programmer / end user. - How to use database management systems as
database administrator - How to implement database management systems
- Data summarization, knowledge discovery, and data
mining - Special purpose databases (genomic, geographical,
internet,)
9Data Model
Data Model
is used to define
Schema (defines a set of database states)
Current Database State
10Schema for the Library Example using the E/R
Data Model
author
B
Book
(0,35)
(0,1)
Many-to-Many
1-to-1
1-to Many
Many-to-1
11Relational Schema for Library Example in SQL/92
CREATE TABLE Book (B INTEGER, title
CHAR(30), author CHAR(20), PRIMARY KEY
(B))
CREATE TABLE Person (ssn CHAR(9), name
CHAR(30), phone INTEGER, PRIMARY KEY
(ssn))
CREATE TABLE Checkout( book INTEGER,
person CHAR(9), since DATE, PRIMARY KEY
(book), FOREIGN KEY (book) REFERENCES Book,
FOREIGN KEY (person) REFERENCES Person))
12Referential Integrity in SQL/92
- SQL/92 supports all 4 options on deletes and
updates. - Default is NO ACTION (delete/update is
rejected) - CASCADE (also delete all tuples that refer to
deleted tuple) - SET NULL / SET DEFAULT (sets foreign key value
of referencing tuple)
CREATE TABLE Enrolled (sid CHAR(20), cid
CHAR(20), grade CHAR(2), PRIMARY KEY
(sid,cid), FOREIGN KEY (sid) REFERENCES
Students ON DELETE CASCADE ON UPDATE SET
DEFAULT )
13Example of an Internal Schemafor the Library
Example
- INTERNAL Schema Library12 references Library.
- Book is stored sequentially,
- index on B using hashing,
- index on Author using hashing.
- Person is stored using hashing on ssn.
- Check_out is stored sequentially,
- index on since using B-tree.
14Example Stored Database
Index on B Block B mod 10
Index on Author
Relation Book
1 11 51
20 30
0
W,
(1, C,W)
(20, Y,W) (51, C, B)
1
(11, Y,W) (30, Z, B)
Relation Checkout
(101,)
(200,) (500,)
0
Index on since
Relation Person Block sss mod 10
1
15Modern Relational DBMS
Support for Web-Interfaces, XML, and Data Exchange
Transaction Concepts capability of running
many transactions in parallel support
for backup and recovery.
Modern DBMS
Support for OO capability to store operations
Support for data- driven computing
Efficient Implementation of Queries (Query
Optimization, Join Selection Indexing
techniques)
Support for Data Mining operations
Support for OLAP and Data Warehousing
Support for special Data-types long
fields, images, html-links, DNA-sequences, spatial
information,
Support for higher level user interfaces graphica
l, natural language, form-based,
16Disks and Files
- DBMS stores information on (hard) disks.
- This has major implications for DBMS design!
- READ transfer data from disk to main memory
(RAM). - WRITE transfer data from RAM to disk.
- Both are high-cost operations, relative to
in-memory operations, so must be planned
carefully!
17Why Not Store Everything in Main Memory?
- Costs too much. 100 will buy you either 512MB
of RAM or 50GB of disk today --- that is disk
storage 100 times cheaper (but it is approx.
10000 times slower). - Main memory is volatile. We want data to be
saved between runs. (Obviously!) - Typical storage hierarchy
- Main memory (RAM) for currently used data.
- Disk for the main database (secondary storage).
- Tapes for archiving older versions of the data
(tertiary storage).
Remark All reported disk performance/prize data
are as of middle of 2003
18Components of a Disk
Spindle
Disk head
- The platters spin (say, 90rps).
- The arm assembly is moved in or out to position
a head on a desired track. Tracks under heads
make a cylinder (imaginary!).
Sector
Platters
- Only one head reads/writes at any one time.
- Block size is a multiple of sector
size (which is fixed).
19Accessing a Disk Page
- Time to access (read/write) a disk block
- seek time (moving arms to position disk head on
track) - rotational delay (waiting for block to rotate
under head) - transfer time (actually moving data to/from disk
surface) - Seek time and rotational delay dominate.
- Seek time varies from about 1 to 20msec
- Rotational delay varies from 0 to 10msec
- Transfer rate is about 1msec per 32KB page
20Review The ACID properties
- A tomicity All actions in the Xact happen, or
none happen. - C onsistency If each Xact is consistent, and
the DB starts consistent, it ends up consistent. - I solation Execution of one Xact is isolated
from that of other Xacts. - D urability If a Xact commits, its effects
persist. - The Recovery Manager guarantees Atomicity
Durability.
21Example
- Consider two transactions (Xacts)
T1 BEGIN AA100, BB-100 END T2 BEGIN
A1.06A, B1.06B END
- Intuitively, the first transaction is
transferring 100 from Bs account to As
account. The second is crediting both accounts
with a 6 interest payment. - There is no guarantee that T1 will execute before
T2 or vice-versa, if both are submitted together.
However, the net effect must be equivalent to
these two transactions running serially in some
order.
22Atomicity of Transactions
- A transaction might commit after completing all
its actions, or it could abort (or be aborted by
the DBMS) after executing some actions. - A very important property guaranteed by the DBMS
for all transactions is that they are atomic. - DBMS logs all actions so that it can undo the
actions of aborted transactions and redo the
actions of successful transactions.
23Concurrency in a DBMS
- Users submit transactions, and can think of each
transaction as executing by itself. - Concurrency is achieved by the DBMS, which
interleaves actions (reads/writes of DB objects)
of various transactions. - Each transaction must leave the database in a
consistent state if the DB is consistent when the
transaction begins. - DBMS will enforce some ICs, depending on the ICs
declared in CREATE TABLE statements. - Beyond this, the DBMS does not really understand
the semantics of the data. (e.g., it does not
understand how the interest on a bank account is
computed). - Issues Effect of interleaving transactions, and
crashes.
24Example (Contd.)
- Consider a possible interleaving (schedule)
T1 AA100, BB-100 T2
A1.06A, B1.06B
- This is OK. But what about
T1 AA100, BB-100 T2
A1.06A, B1.06B
- The DBMSs view of the second schedule
T1 R(A), W(A), R(B), W(B) T2
R(A), W(A), R(B), W(B)
25Summary
- Concurrency control and recovery are among the
most important functions provided by a DBMS. - Users need not worry about concurrency.
- System automatically inserts lock/unlock requests
and schedules actions of different transactions
in such a way as to ensure that the resulting
execution is equivalent to executing the
transactions one after the other in some order. - Write-ahead logging (WAL) is used to undo the
actions of aborted transactions and to restore
the system to a consistent state after a crash.