CS 4420 Database System Implementation - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

CS 4420 Database System Implementation

Description:

Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom, Database System ... http://www-db.stanford.edu/~hector/cs245/notes.htm ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 43
Provided by: ling89
Category:

less

Transcript and Presenter's Notes

Title: CS 4420 Database System Implementation


1
CS 4420 Database System Implementation
  • Ling Liu
  • Associate Professor
  • College of Computing, Georgia Tech

2
Lection 1 Outline
  • A Brief Review of Syllable
  • Lecture, Grading Scheme, Projects, Exams
  • Implementing a Database System through H.G.-M.s
    Strawman DBMS
  • Is it simple?
  • Why is it not?
  • What are the important issues?
  • Issues to be covered in this course

3
Course Documents
  • Required textbook
  • Hector Garcia-Molina, Jeffrey D. Ullman, and
    Jennifer Widom, Database System Implementation,
    Prentice Hall, 2000
  • Lecture Notes
  • http//www-db.stanford.edu/hector/cs245/notes.ht
    m
  • Additional notes online at http//www.cc.gatech.e
    du/classes/AY2004/cs4420_spring/lectures.html

4
Administravia
  • Office Hours
  • Tuesday Thursday 430 530 pm in CCB 102
  • Grading
  • Midterm 1 20
  • Midterm 2 20
  • Project 40
  • Final 20
  • Announcements
  • In class course material will be available
    electronically
  • Newsgroup
  • git.cc.class.cs4420 -- make sure you check this
    regularly
  • TA
  • Weiling Zhu
  • office hours 2-4pm Friday or by appointment,
    location TBA
  • James Caverlee ()
  • office hours 1-3pm Tuesday or by appointment,
    location CCB225B

5
Mechanics
  • Cs4420 home http//www.cc.gatech.edu/classes/AY200
    4/cs4420_spring/
  • Important
  • All handouts assignments will be posted on our
    Web site at http//www.cc.gatech.edu/cla
    sses/AY2003/cs4420_spring/
  • Please check it periodically for last minute
    announcements.

6
Course Project
  • Goal
  • Design a small database management system
  • Team Project 3-4 members per group
  • Two phase deliverables
  • Phase I design report -- 35
  • Phase II Application Programs Final project
    report -- 65
  • Two Alternative Projects (Choose one)

7
Project Option 1
  • Phase 1
  • Implement Storage Manager to handle
  • 1.1)   creation of relations and indexes
  • 1.2)  inserting records into a file (sorted
    sequential file)
  • 1.3)  inserting key values and addresses into the
    index
  • 1.4)  fetching a page
  • Phase 2
  • Implement Query Engine that has 4 components
  • 2.1) scanner/parser/validator
  • 2.2) query optimizer
  • 2.3) query code generator
  • 2.4) runtime database processor

8
Project Option 2
  • Phase 1
  • Study an open-source DB software package such as
    tinySQL (java, primitive and simple)
  • Understand implementation details, such as how to
    handle
  • 1.1)  creation of relations and indexes if any
  • 1.2)  inserting records into a file
  • 1.3)  inserting key values and addresses into the
    index
  • 1.4)  fetching a page
  • 1.5) Basic query processing steps
  • Phase 2
  • Improving the package in three aspects. Examples
    include
  • 2.1) query processor (add join, aggregate, or
    nested queries)
  • 2.2) support larger data files
  • 2.3) add (additional) query optimization methods
    (incl. Index methods)
  • 2.4) support online aggregation/approximate
    queries

9
Lecture 1
  • Strawman DBMS from Stanford
  • Focus on process from Relation ? SQL ? Result
  • Many problems with simple implementation
  • Topics Covered in this course

10
H.G.-M.s Strawman DBMS
  • MEGATRON 3000 DBMS

11
Isnt Implementing a Database System Simple?
Consider a prototype DBMS used in the CS database
courses in Stanford Megatron In fact, our
course project option 1 is a simpler version of
the Megatron 3000.
Note by Hector Garcia-Molina
12
Introducing the
MEGATRON 3000
Database Management System
  • The latest from Megatron Labs
  • Incorporates latest relational technology
  • UNIX compatible

Note by Hector Garcia-Molina
13
Megatron 3000 Implementation Details
  • Relations stored in files (ASCII)
  • e.g., relation R is in /usr/db/R

Smith 123 CS Jones 522 EE
.
.
.
Note by Hector Garcia-Molina
14
Megatron 3000 Implementation Details
  • Directory file (ASCII) in /usr/db/directory

Students name STR id INT dept STR
Depts name STR office STR
.
.
.
Note by Hector Garcia-Molina
15
Megatron 3000Sample Sessions
MEGATRON3000 Welcome to MEGATRON 3000!
quit
.
.
.
Note by Hector Garcia-Molina
16
Megatron 3000Sample Sessions
select from Students Relation
Students Name id dept SMITH
123 CS Johnson 522 EE
Note by J. Ullman
17
Megatron 3000Sample Sessions
select id, office from Students, Dept where
Students.dept Dept.name and id gt 100
id Office 123 CCB
522 MiRC
Note by Hector Garcia-Molina
18
Megatron 3000Sample Sessions
select from Students lpr
Result sent to lpr (printer).
Note by J. Ullman
19
Megatron 3000Sample Sessions
select from Students where id lt 500 R

New relation R created.
Note by J. Ullman
20
Megatron 3000
  • To execute select from R where condition
  • (1) Read schema file to determine attributes of
    R and their types
  • (2) Read R file, for each line (record)
  • (a) Check condition
  • (b) Display if true

Note by Hector Garcia-Molina
21
Megatron 3000
  • To execute select from R where
    condition T
  • (1) Process select as before
  • (2) Write results to new file T
  • (3) Append entry for T to the schema file

Note by Hector Garcia-Molina
22
Megatron 3000
  • To execute
  • select office
  • from Students, Dept
  • where Students.name Smith
  • AND Students.dept Depts.name
  • (1) Read schema file to determine attributes of
    Students and Dept and their types
  • (2) Read Students file, for each line (record)
  • (a) Read Dept file, for each line (record)
  • (i) Create join tuple
  • (ii) Check condition over the joined tuple
  • (iii) Display if true

Note by Hector Garcia-Molina
23
Whats wrong with the Megatron 3000 DBMS?
Note by Hector Garcia-Molina
24
Whats wrong with the Megatron 3000 DBMS?
  • Tuple layout on disk no flexibility for DB
    modification
  • - Change string from EE to ECON and the
    entire file has to be to rewritten
  • - ASCII storage is expensive
  • - Deletions are expensive

Note by Hector Garcia-Molina
25
Whats wrong with the Megatron 3000 DBMS?
  • Search expensive no indexes always read entire
    relation
  • e.g., - Cannot find tuple with given key quickly
  • - Always have to read full relation

Note by Hector Garcia-Molina
26
Whats wrong with the Megatron 3000 DBMS?
  • Brute force query processing
  • Did we need to look at all pairs of student-dept
    tuples?
  • e.g., select
  • from R,S
  • where R.A S.A and S.B gt 1000
  • - Do select first?
  • - More efficient join?

Note by Hector Garcia-Molina
27
Whats wrong with the Megatron 3000 DBMS?
  • No buffer manager Everything comes off of disk
    all the time
  • e.g., Need caching

Note by Hector Garcia-Molina
28
Whats wrong with the Megatron 3000 DBMS?
  • No concurrency control
  • Several users can modify a file at the same time
    with unpredictable results

Note by Hector Garcia-Molina
29
Whats wrong with the Megatron 3000 DBMS?
  • No reliability
  • e.g., - Can lose data in a crash or
  • - Can leave operations half done

Note by Hector Garcia-Molina
30
Whats wrong with the Megatron 3000 DBMS?
  • Little security
  • e.g., - File system insecure
  • - File system protection too coarse

Note by Hector Garcia-Molina
31
Whats wrong with the Megatron 3000 DBMS?
  • No GUI

Note by Hector Garcia-Molina
32
Course Overview
  • File System Structure
  • Records in blocks, dictionary, buffer
    management,
  • Indexing Hashing
  • B-Trees, hashing,
  • Query Processing
  • Query costs, join strategies,
  • Crash Recovery
  • Failures, stable storage,

Note by Hector Garcia-Molina
33
Course Overview
  • Concurrency Control
  • Correctness, locks,
  • Transaction Processing
  • Logs, deadlocks,
  • Security Integrity
  • Authorization, encryption,
  • Distributed Databases
  • Interoperation, distributed recovery,

Note by Hector Garcia-Molina
34
System Structure
Query Parser
User
Strategy Selector
User Transaction
Transaction Manager
Buffer Manager
Recovery Manager
Concurrency Control
File Manager
Log
Lock Table
M.M. Buffer
Statistical Data
Indexes
User Data
System Data
Note by Hector Garcia-Molina
35
GT Database Courses
CS 4400
Fall, Spring, Summer
you are here
CS 4420
Spring
CS 6400
CS 6411
CS 6421
CS 6430
CS 8803
Database Systems Concepts and Design
Object-Oriented Database Models and Systems
Active and Dynamic Database Management Systems
Parallel and Distributed Database Systems and
Applications
Special Topic Courses
36
Some Terms
  • Database system
  • Transaction processing system
  • File access system
  • Information retrieval system

37
Next time
  • Hardware Data Storage and Accessing Disk Blocks
  • Reading Assignment chapters 1 and 2

38
Lecture 2
  • Hardware Data Storage
  • Memory Hierarchy
  • Volatile Storage v.s. Non-volatile Storage
  • DBMSs cannot allow data to be lost, ever
  • Disk
  • Disk Head, cylinder, track, sector, disk
    controller
  • Accessing Disk Blocks
  • Seek time
  • Time to move heads to the proper cylinder
  • Rotational Time
  • Time for desired block to come under head
  • Transfer Time
  • Time during which block passes under head

39
Mechanics
  • http//www.cc.gatech.edu/classes/AY2003/cs4420_spr
    ing/
  • Important
  • All handouts assignments will be posted on our
    Web site at http//www.cc.gatech.edu
    /classes/AY2003/cs4420_spring/
  • Please check it periodically for last minute
    announcements.

40
Tentative Syllabus
  • DATE CHAPTER
    TOPIC
  • Tuesday January 7
    Introduction
  • Thursday January 9 Ch. 2
    Hardware
  • Tuesday January 14 Ch. 3
    File and System Structure
  • Thursday January 16 Ch. 3
    File and System Structure
  • Tuesday January 21 Ch. 4
    Indexing and Hashing
  • Thursday January 23 Ch. 4
    Indexing and Hashing
  • Tuesday January 28 Ch. 5
    Indexing and Hashing
  • Thursday January 30 Ch. 6
    Query Processing
  • Tuesday February 4 Ch. 6
    Query Processing
  • Thursday February 6 Midterm
  • Tuesday February 11 Ch7
    Query Optimization
  • Thursday February 13 Ch. 8
    Crash Recovery
  • Tuesday February 18 Ch. 8
    Crash Recovery
  • Thursday February 20 Ch. 9
    Concurrency Control
  • Tuesday February 25 Ch. 9
    Concurrency Control
  • Thursday February 27 Ch. 9
    Concurrency Control
  • Tuesday March 11 Ch. 10
    Transaction Processing
  • Thursday March 13 Ch. 10
    Transaction Processing

41
Today -- Lecture 3
  • Disk Organization
  • How to layout data on disk
  • how to move it to memory
  • Good DBMS Algorithms
  • Try to make sure that if we read a block, we use
    much of the data on the block
  • Try to put blocks that are accessed together on
    the same cylinder
  • Try to buffer commonly used blocks in main memory

42
Tentative Syllabus
  • DATE CHAPTER
    TOPIC
  • Tuesday week 1
    Introduction
  • Thursday Week1 Ch. 2
    Hardware
  • Tuesday week 2 Ch. 3 File
    and System Structure
  • Thursday January 16 Ch. 3
    File and System Structure
  • Tuesday January 21 Ch. 4
    Indexing and Hashing
  • Thursday January 23 Ch. 4
    Indexing and Hashing
  • Tuesday January 28 Ch. 5
    Indexing and Hashing
  • Thursday January 30 Ch. 6
    Query Processing
  • Tuesday February 4 Ch. 6
    Query Processing
  • Thursday February 6 Midterm
  • Tuesday February 11 Ch7
    Query Optimization
  • Thursday February 13 Ch. 8
    Crash Recovery
  • Tuesday February 18 Ch. 8
    Crash Recovery
  • Thursday February 20 Ch. 9
    Concurrency Control
  • Tuesday February 25 Ch. 9
    Concurrency Control
  • Thursday February 27 Ch. 9
    Concurrency Control
  • Tuesday March 11 Ch. 10
    Transaction Processing
  • Thursday March 13 Ch. 10
    Transaction Processing
Write a Comment
User Comments (0)
About PowerShow.com