This Week - PowerPoint PPT Presentation

About This Presentation
Title:

This Week

Description:

... 0-8053-1755-4 Lectures in COSC 6340 I: Basic Database Management Concepts --- Review of basic database concepts, techniques, and languages ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 26
Provided by: Christop296
Learn more at: https://www2.cs.uh.edu
Category:
Tags: week

less

Transcript and Presenter's Notes

Title: This Week


1
This Week
  1. Introduction to Databases
  2. Course Information
  3. Grading and Other Things
  4. Questionnaire
  5. The E/R Data Model

2
Textbooks for COSC 6340
  • Required Text Raghu Ramakrishnan and Johannes
    Gehrke, Data Management Systems, McGraw Hill,
    Third Edition, 2002 (complication the chapter
    numbers in the new edition are different!!)
  • Recommended Jiawei Han and Micheline Kamber,
    Data Mining Concepts and Techniques, Morgan
    Kaufman Publishers, 2001, ISBN 1-55860-489-8 (4
    chapters will be covered)
  • Other books with relevant material Ramez Elmasri
    and Shamkant Navathe, Fundamentals of Database
    Systems, Third Edition Addison Wesley ISBN
    0-8053-1755-4

3
Lectures in COSC 6340
  • I Basic Database Management Concepts --- Review
    of basic database concepts, techniques, and
    languages (9 classes, Chapters 1-5, 8-11, and 16
    of the textbook).
  • III Introduction to KDD and Data Warehousing
    centering on data warehouses, OLAP, and data
    mining moreover, more detailed coverage of
    querying and mining data streams and database
    clustering (5 classes Chapters 1, 2, 6, and 7 of
    the Han/Kamber book chapter 25 of our textbook
    and additional material)
  • III Relational Database Design (2 classes,
    chapters 19)
  • IV Implementation of Relational Operators, Query
    Optimization, and Physical Database Design
    (Chapters 121420, 3-4 classes)
  • V Internet Databases and XML (1 class, chapter
    27 of the textbook)
  • VI Other discussion of home works and exams,
    student presentations discussion of course
    projects (3 classes)

4
Tentative Schedule COSC 6340
  • Jan. 18 Introduction to COSC 6340
  • Fast Review of Undergraduate Material (Jan.
    20-Feb. 15)
  • Jan. 20 Entity-Relationship Data Model ? more
    detailed than textbook
  • Jan. 25 Entity-Relationship Data Model
  • Jan. 27 Relational Data Model
  • Feb. 1 Mapping E/R Diagrams to Relations
  • Feb. 8/10/15 Index storage structures,
    B-trees, and hashing, PDBD
  • Feb. 15/17 Relational Algebra
  • Feb. 22 Writing SQL Queries (somewhat short)
  • Feb. 24 Leftovers / Review
  • March 1 Exam0 (Undergraduate Review Exam)

5
Tentative Schedule COSC 6340 Part2
  • II KDD and Data Warehousing (approx. 5.5
    classes)
  • March 3 Introduction to KDD
  • March 8 Similarity Assessment
  • March 10 Clustering
  • March 22 Association Rule Mining
  • March 29 Spatial Databases
  • March 31 Data Warehouses and OLAP
  • April 8 (30 minutes) Spatial Data Mining
  • April 578 III Relational Database Design
    (2.5 classes)
  • April 14 Midterm Exam (30 minutes review on
    April 12, 2005)
  • April 192628 Student Presentations
  • IV Physical Database Design and Query
    Optimization (3 classes)
  • April 8(makeup) Implementation of Relational
    Operators
  • April 8(makeup)/12 Introduction to Query
    Optimization
  • April 12/19 Physical Database Design II
  • V Internet Databases (1 class)
  • April 21 Introduction to XML and Semi-Structured
    Data
  • April 28 Course Summary Teaching Evaluation
    Leftovers

6
Exams and Homeworks COSC 6340
  • Tu., March 1 Undergraduate Material Review Exam
  • Th., April 14 Midterm Exam
  • 3 graded homeworks deadlines Feb. 17, April 11,
    April 28
  • Ungraded Homeworks
  • Final Exam Tu/TH., May 10, 11a-130p
  • Qualifying Exam Part2 Fr., May 13, 10-1130a

7
Why are integrated databases popular?
  • Avoidance of uncontrolled redundancy
  • Making knowledge accessible that would otherwise
    not be accessible
  • Standardization --- uniform representation of
    data facilitating import and export
  • Reduction of software development (though the
    availability of data management systems)

Bookkeeping Device
Integrated Database
Car Salesman
8
Popular Topics in Databases
  • Efficient algorithms for data collections that
    reside on disks (or which are distributed over
    multiple disk drives, multiple computers or over
    the internet).
  • Study of data models (knowledge representation,
    mappings, theoretical properties)
  • Algorithms to run a large number of transactions
    on a database in parallel finding efficient
    implementation for queries that access large
    databases database backup and recovery,
  • Database design
  • How to use database management systems as an
    application programmer / end user.
  • How to use database management systems as
    database administrator
  • How to implement database management systems
  • Data summarization, knowledge discovery, and data
    mining
  • Special purpose databases (genomic, geographical,
    internet,)

9
Data Model
Data Model
is used to define
Schema (defines a set of database states)
Current Database State
10
Schema for the Library Example using the E/R
Data Model
author
B
Book
(0,35)
(0,1)
Many-to-Many
1-to-1
1-to Many
Many-to-1
11
Relational Schema for Library Example in SQL/92
CREATE TABLE Book (B INTEGER, title
CHAR(30), author CHAR(20), PRIMARY KEY
(B))
CREATE TABLE Person (ssn CHAR(9), name
CHAR(30), phone INTEGER, PRIMARY KEY
(ssn))
CREATE TABLE Checkout( book INTEGER,
person CHAR(9), since DATE, PRIMARY KEY
(book), FOREIGN KEY (book) REFERENCES Book,
FOREIGN KEY (person) REFERENCES Person))
12
Referential Integrity in SQL/92
  • SQL/92 supports all 4 options on deletes and
    updates.
  • Default is NO ACTION (delete/update is
    rejected)
  • CASCADE (also delete all tuples that refer to
    deleted tuple)
  • SET NULL / SET DEFAULT (sets foreign key value
    of referencing tuple)

CREATE TABLE Enrolled (sid CHAR(20), cid
CHAR(20), grade CHAR(2), PRIMARY KEY
(sid,cid), FOREIGN KEY (sid) REFERENCES
Students ON DELETE CASCADE ON UPDATE SET
DEFAULT )
13
Example of an Internal Schemafor the Library
Example
  • INTERNAL Schema Library12 references Library.
  • Book is stored sequentially,
  • index on B using hashing,
  • index on Author using hashing.
  • Person is stored using hashing on ssn.
  • Check_out is stored sequentially,
  • index on since using B-tree.

14
Example Stored Database
Index on B Block B mod 10
Index on Author
Relation Book
1 11 51
20 30
0
W,
(1, C,W)
(20, Y,W) (51, C, B)
1
(11, Y,W) (30, Z, B)
Relation Checkout
(101,)
(200,) (500,)
0
Index on since
Relation Person Block sss mod 10
1
15
Modern Relational DBMS
Support for Web-Interfaces, XML, and Data Exchange
Transaction Concepts capability of running
many transactions in parallel support
for backup and recovery.
Modern DBMS
Support for OO capability to store operations
Support for data- driven computing
Efficient Implementation of Queries (Query
Optimization, Join Selection Indexing
techniques)
Support for Data Mining operations
Support for OLAP and Data Warehousing
Support for special Data-types long
fields, images, html-links, DNA-sequences, spatial
information,
Support for higher level user interfaces graphica
l, natural language, form-based,
16
Disks and Files
  • DBMS stores information on (hard) disks.
  • This has major implications for DBMS design!
  • READ transfer data from disk to main memory
    (RAM).
  • WRITE transfer data from RAM to disk.
  • Both are high-cost operations, relative to
    in-memory operations, so must be planned
    carefully!

17
Why Not Store Everything in Main Memory?
  • Costs too much. 100 will buy you either 512MB
    of RAM or 50GB of disk today --- that is disk
    storage 100 times cheaper (but it is approx.
    10000 times slower).
  • Main memory is volatile. We want data to be
    saved between runs. (Obviously!)
  • Typical storage hierarchy
  • Main memory (RAM) for currently used data.
  • Disk for the main database (secondary storage).
  • Tapes for archiving older versions of the data
    (tertiary storage).

Remark All reported disk performance/prize data
are as of middle of 2003
18
Components of a Disk
Spindle
Disk head
  • The platters spin (say, 90rps).
  • The arm assembly is moved in or out to position
    a head on a desired track. Tracks under heads
    make a cylinder (imaginary!).

Sector
Platters
  • Only one head reads/writes at any one time.
  • Block size is a multiple of sector
    size (which is fixed).

19
Accessing a Disk Page
  • Time to access (read/write) a disk block
  • seek time (moving arms to position disk head on
    track)
  • rotational delay (waiting for block to rotate
    under head)
  • transfer time (actually moving data to/from disk
    surface)
  • Seek time and rotational delay dominate.
  • Seek time varies from about 1 to 20msec
  • Rotational delay varies from 0 to 10msec
  • Transfer rate is about 1msec per 32KB page

20
Review The ACID properties
  • A tomicity All actions in the Xact happen, or
    none happen.
  • C onsistency If each Xact is consistent, and
    the DB starts consistent, it ends up consistent.
  • I solation Execution of one Xact is isolated
    from that of other Xacts.
  • D urability If a Xact commits, its effects
    persist.
  • The Recovery Manager guarantees Atomicity
    Durability.

21
Example
  • Consider two transactions (Xacts)

T1 BEGIN AA100, BB-100 END T2 BEGIN
A1.06A, B1.06B END
  • Intuitively, the first transaction is
    transferring 100 from Bs account to As
    account. The second is crediting both accounts
    with a 6 interest payment.
  • There is no guarantee that T1 will execute before
    T2 or vice-versa, if both are submitted together.
    However, the net effect must be equivalent to
    these two transactions running serially in some
    order.

22
Atomicity of Transactions
  • A transaction might commit after completing all
    its actions, or it could abort (or be aborted by
    the DBMS) after executing some actions.
  • A very important property guaranteed by the DBMS
    for all transactions is that they are atomic.
  • DBMS logs all actions so that it can undo the
    actions of aborted transactions and redo the
    actions of successful transactions.

23
Concurrency in a DBMS
  • Users submit transactions, and can think of each
    transaction as executing by itself.
  • Concurrency is achieved by the DBMS, which
    interleaves actions (reads/writes of DB objects)
    of various transactions.
  • Each transaction must leave the database in a
    consistent state if the DB is consistent when the
    transaction begins.
  • DBMS will enforce some ICs, depending on the ICs
    declared in CREATE TABLE statements.
  • Beyond this, the DBMS does not really understand
    the semantics of the data. (e.g., it does not
    understand how the interest on a bank account is
    computed).
  • Issues Effect of interleaving transactions, and
    crashes.

24
Example (Contd.)
  • Consider a possible interleaving (schedule)

T1 AA100, BB-100 T2
A1.06A, B1.06B
  • This is OK. But what about

T1 AA100, BB-100 T2
A1.06A, B1.06B
  • The DBMSs view of the second schedule

T1 R(A), W(A), R(B), W(B) T2
R(A), W(A), R(B), W(B)
25
Summary
  • Concurrency control and recovery are among the
    most important functions provided by a DBMS.
  • Users need not worry about concurrency.
  • System automatically inserts lock/unlock requests
    and schedules actions of different transactions
    in such a way as to ensure that the resulting
    execution is equivalent to executing the
    transactions one after the other in some order.
  • Write-ahead logging (WAL) is used to undo the
    actions of aborted transactions and to restore
    the system to a consistent state after a crash.
Write a Comment
User Comments (0)
About PowerShow.com