CSC 485ESENG 480DCSC 571 Advanced Databases Introduction - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

CSC 485ESENG 480DCSC 571 Advanced Databases Introduction

Description:

Fictitious Megatron DBMS. Stores relations as Unix files. Students(name, sid, dept) is stored in the file /home/megatron/students as. Smith#123#CS ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 13
Provided by: scie232
Category:

less

Transcript and Presenter's Notes

Title: CSC 485ESENG 480DCSC 571 Advanced Databases Introduction


1
CSC 485E/SENG 480D/CSC 571 Advanced Databases
Introduction
2
DB and DBMS
  • Database (DB) a collection of information that
    exists over a long period of time.
  • Database Management System (DBMS) a complex
    software for handling
  • Large data efficiently and safely.

3
DBMS
  • Allows users to create new databases and specify
    their schema, using a data-definition language.
  • Enables users to query and modify the data, using
    a query and data-manipulation language.
  • Supports intelligent storage of very large
    amounts of data.
  • Protects data from accident or not proper use.
  • Example We can require from the DBMS to not
    allow the insertion of two different people with
    the same SIN.
  • Allows efficient access to the data for queries
    and modifications.
  • Example Indexes over specified fields
  • Controls access to data from many users at once
    (concurrency), without allowing bad
    interactions that can corrupt the data
    accidentally.
  • Recovers from software failures and crashes.

4
Database Studies
  • Design of databases.
  • What kinds of information go into the database?
  • How is the information structured?
  • How do data items connect?
  • Database programming.
  • How does one express queries on the database?
  • How is database programming combined with
    conventional programming?
  • Database system implementation.
  • How does one build a DBMS, including such matters
    as query processing, transaction processing and
    organizing storage for efficient access?

Well focus on this part
5
Fictitious Megatron DBMS
  • Stores relations as Unix files
  • Students(name, sid, dept) is stored in the file
    /home/megatron/students as
  • Smith123CS
  • Jones533EE
  • Schemas are stored in /home/megatron/schemas e.g.
  • StudentsnameSTRidINTdeptSTR
  • DeptsnameSTRofficestr

6
Megatron sample session
  • mayne megatron
  • WELCOME TO MEGATRON 2006
  • megaSQL SELECT FROM Students
  • Name id dept
  • -------------------------------------
  • Smith 123 CSC
  • Johnson 522 ECE
  • megaSQL

7
Megatron sample session II
  • megaSQL SELECT FROM Students
  • WHERE id gt 500
  • Johnson522EE
  • megaSQL quit
  • THANK YOU FOR USING MEGATRON 2006
  • mayne

8
Megatron Implementation
  • To execute SELECT FROM R WHERE ltCONDgt
  • Read file schema to get attributes of R
  • Check that the ltCONDgt is semantically valid for R
  • Read file R,
  • for each line
  • check condition
  • if OK, display

9
Whats wrong with Megatron?
  • Tuple layout on disk no flexibility for DB
    modifications.
  • Change CSC to ECON and the entire file has to be
    rewritten.
  • Search Expensive no indexes always read entire
    relation.
  • Bruteforce query processing.
  • No buffer manager everything comes off of disk
    all the time.
  • No concurrency control several users can modify
    a file at the same time with unpredictable
    results.
  • No reliability can lose data in a crash or leave
    operations half done.

10
Architecture of a DBMS
  • The cylindrical component contains not only
    data, but also metadata, i.e. info about the
    structure of data.
  • If DBMS is relational, metadata includes
  • names of relations,
  • names of attributes of those relations, and
  • data types for those attributes (e.g., integer or
    character string).
  • A database also maintains indexes for the data.
  • Indexes are part of the stored data.
  • Description of which attributes have indexes is
    part of the metadata.

11
What will be covered
  • Secondary Storage Management
  • Disks Mechanics
  • Disk Computation Model
  • Handling disk failures
  • Index Structures
  • B-Trees, Extensible Hash Tables, etc.
  • Multidimensional Indexes (for GIS and OLAP)
  • Query Execution
  • Algorithms for relational operators
  • Join methods.
  • Query Compiler
  • Algebraic laws for improving query plans.
  • Cost based plan selection
  • Join orders

12
What will be covered
  • Parallel and Distributed Databases
  • Parallel algorithms on relations
  • Distributed query processing
  • Distributed transactions
  • Googles Map-Reduce framework
  • Peer-to-peer distributed search
  • Data Mining
  • Frequent-Itemset Mining
  • Finding similar items
  • Clustering of large-scale data
  • Databases and the Internet
  • Search engines
  • PageRank
  • Data streams
  • Data mining of streams
Write a Comment
User Comments (0)
About PowerShow.com