Storage and File Organization - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Storage and File Organization

Description:

Storage and File Organization General Overview - rel. model Relational model - SQL Formal & commercial query languages Functional Dependencies Normalization Physical ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 37
Provided by: csBuEduf
Learn more at: https://www.cs.bu.edu
Category:

less

Transcript and Presenter's Notes

Title: Storage and File Organization


1
Storage and File Organization
2
General Overview - rel. model
  • Relational model - SQL
  • Formal commercial query languages
  • Functional Dependencies
  • Normalization
  • Physical Design
  • Indexing

3
Review
  • DBMSs store data on disk
  • Disk characteristics
  • 2 orders of magnitude slower than MM
  • Unit of read/write operations a block/page
    (multiple of sectors)
  • Access time seek time rotation time
    transfer time
  • Sequential I/O much faster than random I/O
  • Database Systems try to minimize the overhead of
    moving data from and to disk

4
Review
  • Methods to improve MM to/from disk transfers
  • Improve the disk technology
  • Use faster disks (more RPMs)
  • Parallelization (RAID) some redundancy
  • Avoid unnecessary reads from disk
  • Buffer management go to buffer (MM) instead of
    disk
  • Good file organization
  • Other (OS based improvements) disk scheduling
    (elevator algorithm, batch writes, etc)

5
Buffer manager- Context
6
Buffer Management
  • Keep pages in a part of memory (buffer), read
    directly from there
  • What happens if you need to bring a new page into
    buffer and buffer is full you have to evict one
    page
  • Replacement policy
  • LRU Least Recently Used (CLOCK)
  • MRU Most Recently Used
  • Toss-immediate remove a page if you know that
    you will not need it again
  • Pinning (needed in recovery, index based
    processing,etc)
  • Other DB specific RPs DBMIN, LRU-k, 2Q

7
Buffer Management in a DBMS
Page Requests from Higher Levels
BUFFER POOL
disk page
free frame
MAIN MEMORY
DISK
choice of frame dictated by replacement policy
  • Data must be in RAM for DBMS to operate on it!
  • Buffer Mgr hides the fact that not all data is in
    RAM

8
When a Page is Requested ...
  • Buffer pool information table contains
    ltframe,
    pageid, pin_count, dirtygt
  • If requested page is not in pool and no free
    frame
  • Choose a frame for replacement.Only un-pinned
    pages are candidates!
  • If frame is dirty, write it to disk
  • Read requested page into chosen frame
  • Pin the page and return its address.
  • If requests can be predicted (e.g., sequential
    scans)
  • pages can be pre-fetched several pages at a
    time!

9
More on Buffer Management
  • Requestor of page must eventually unpin it, and
    indicate whether page has been modified
  • dirty bit is used for this.
  • Page in pool may be requested many times,
  • a pin count is used.
  • To pin a page, pin_count
  • A page is a candidate for replacement iff pin
    count 0 (unpinned)
  • CC recovery may entail additional I/O when a
    frame is chosen for replacement.
  • Write-Ahead Log protocol more later!

10
Buffer Replacement Policy
  • Frame is chosen for replacement by a replacement
    policy
  • Least-recently-used (LRU), MRU, Clock, etc.
  • Policy can have big impact on of I/Os depends
    on the access pattern.

Why MRU? Consider the following access pattern
(page ids) 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
,2,3,4,5,6,7,8,.
Assume a buffer of 10 pages After the first 10
pages, MRU will replace 10 with 11, 11 with 12,
12 with 13, 13 with 14, 14 with 15. Next 9 pages
will be found in and we need no disk reads for
them. Compare this approach against LRU.
11
LRU Replacement Policy
  • Least Recently Used (LRU)
  • for each page in buffer pool, keep track of time
    when last unpinned
  • replace the frame which has the oldest (earliest)
    time
  • very common policy intuitive and simple
  • Works well for repeated accesses to popular pages
  • Problems?
  • Problem Sequential flooding
  • LRU repeated sequential scans.
  • buffer frames lt pages in file means each page
    request causes an I/O.

12
Clock Replacement Policy
  • An approximation of LRU
  • Arrange frames into a cycle, store one reference
    bit per frame
  • Can think of this as the 2nd chance bit
  • When pin count reduces to 0, turn on ref. bit
  • When replacement necessary do for each page in
    cycle if (pincount 0 ref bit is
    on) turn off ref bit else if (pincount 0
    ref bit is off) choose this page for
    replacement until a page is chosen

13
DBMS vs. OS File System
  • OS does disk space buffer mgmt why not let
    OS manage these tasks?
  • Some limitations, e.g., files cant span disks.
  • Buffer management in DBMS requires ability to
  • pin a page in buffer pool, force a page to disk
    order writes (important for implementing CC
    recovery)
  • adjust replacement policy, and pre-fetch pages
    based on access patterns in typical DB operations.

14
File Organization
  • Basics
  • A database is a collection of files, file is a
    collection of records, record (tuple) is a
    collection of fields (attributes)
  • Files are stored on Disks (that use blocks to
    read and write)
  • Two important issues
  • Representation of each record
  • Grouping/Ordering of records and storage in
    blocks

15
File Organization
  • Goal and considerations
  • Compactness
  • Overhead of insertion/deletion
  • Retrieval speed sometime we prefer to bring more
    tuples than necessary in MM and use CPU to filter
    out the unnecessary ones!

16
Record Representation and Page Formats
  • Fixed-Length Records
  • Example
  • Account( acc-number char(10), branch-name
    char(20), balance real)
  • Each record is 38 bytes.
  • Store them sequentially, one after the other
  • Record1 at position 0, record2 at position 38,
    record3 at position 76 etc

Compactness (350 bytes)
17
Fixed-Length Records
  • Simple approach
  • Store record i starting from byte n ? (i 1),
    where n is the size of each record.
  • Record access is simple but records may cross
    blocks
  • Modification do not allow records to cross block
    boundaries
  • Insertion of record i Add at the end
  • Deletion of record i Two alternatives
  • move records
  • i 1, . . ., n to i, . . . , n 1
  • record n to i
  • do not move records, but link all free records
    on a free list

18
Free Lists
  • 2nd approach FLR with Free Lists
  • Store the address of the first deleted record in
    the file header.
  • Use this first record to store the address of the
    second deleted record, and so on
  • Can think of these stored addresses as pointers
    since they point to the location of a record.
  • More space efficient representation reuse space
    for normal attributes of free records to store
    pointers. (No pointers stored in in-use
    records.)

Better handling ins/del
Less compact 420 bytes
19
Page Formats Fixed Length Records
Slot 1
Slot 2
Free Space
. . .
Slot N
Slot M
M
1
0
. . .
1
1
M ... 3 2 1
number of slots
UNPACKED, BITMAP
20
Variable-Length Records
  • 3rd approach Variable-length records arise in
    database systems in several ways
  • Storage of multiple record types in a file.
  • Record types that allow variable lengths for one
    or more fields.
  • Record types that allow repeating fields or
    multivalued attribute.
  • Byte string representation
  • Attach an end-of-record (?) control character to
    the end of each record
  • Difficulty with deletion (leaves holes)
  • Difficulty with growth

4
?
?
?
R1
Field Count
R2
R3
21
Variable-Length Records Slotted Page Structure
  • 4th approach VLR-SP
  • Slotted page header contains
  • number of record entries
  • end of free space in the block
  • location and size of each record
  • Records stored at the bottom of the page
  • External tuple pointers point to record ptrs
    rec-id ltpage-id, slotgt

22
Rid (i,N)
Page i
Rid (i,2)
Rid (i,1)
N
Pointer to start of free space
20
16
24
N . . . 2 1
slots
SLOT DIRECTORY
Insertion 1) Use FP to find space and insert
2) Find available ptr in the
directory (or create a new one)
3) adjust FP and number of records
Deletion ?
23
Variable-Length Records (Cont.)
  • Fixed-length representation
  • reserved space
  • pointers
  • 5th approach Fixed Limit Records (for VLR)
  • Reserved space can use fixed-length records of
    a known maximum length unused space in shorter
    records filled with a null or end-of-record
    symbol.

24
Pointer Method
  • 6th approach Pointer method
  • Pointer method
  • A variable-length record is represented by a list
    of fixed-length records, chained together via
    pointers.
  • Can be used even if the maximum record length is
    not known

25
Pointer Method (Cont.)
  • Disadvantage to pointer structure space is
    wasted in all records except the first in a a
    chain.
  • Solution is to allow two kinds of block in file
  • Anchor block contains the first records of
    chain
  • Overflow block contains records other than
    those that are the first records of chairs.

26
Record Formats Fixed Length
F1
F2
F3
F4
L1
L2
L3
L4
Base address (B)
Address BL1L2
  • Information about field types same for all
    records in a file stored in system catalogs.
  • Finding ith field does not require scan of
    record.

27
Record Formats Variable Length
  • Two alternative formats ( fields is fixed)

F1 F2 F3
F4
Fields Delimited by Special Symbols
Field Count
F1 F2 F3 F4
Array of Field Offsets
  • Second offers direct access to ith field,
    efficient storage
  • of nulls (special dont know value) small
    directory overhead.

28
Column Stores
  • Another way to store records is to store them
    column-wise!
  • Store each column in different files.
  • Advantages?
  • Disadvantages?

29
Ordering and Grouping records
  • Issue 1
  • In what order we place records in a block?
  • Heap technique assign anywhere there is space
  • Ordered technique maintain an order on some
    attribute
  • So, we can use binary search if selection on this
    attribute.

30
File organization
  • Issue 2 In which blocks should records be
    placed
  • Many alternatives exist, each ideal for some
    situation , and not so good in others
  • Heap files Add at the end of the file.Suitable
    when typical access is a file scan retrieving all
    records.
  • Sorted FilesKeep the pages ordered. Best if
    records must be retrieved in some order, or only
    a range of records is needed.
  • Hashed Files Good for equality selections.
    Assign records to blocks according to their value
    for some attribute

31
Data Dictionary Storage
Data dictionary (also called system catalog)
stores metadata that is, data about data, such
as
  • Information about relations
  • names of relations
  • names and types of attributes of each relation
  • names and definitions of views
  • integrity constraints
  • User and accounting information, including
    passwords
  • Statistical and descriptive data
  • number of tuples in each relation
  • Physical file organization information
  • How relation is stored (sequential/hash/)
  • Physical location of relation
  • operating system file name or
  • disk addresses of blocks containing records of
    the relation
  • Information about indices

32
Data dictionary storage
  • Stored as tables!!
  • E-R diagram?
  • Relations, attributes, domains
  • Each relation has name, some attributes
  • Each attribute has name, length and domain
  • Also, views, integrity constraints, indices
  • User info (authorizations etc)
  • statistics

33
A-name
name
position
1
N
has
relation
attribute
domain
34
Data Dictionary Storage (Cont.)
  • A possible catalog representation

Relation-metadata (relation-name,
number-of-attributes,
storage-organization, location)Attribute-
metadata (attribute-name, relation-name,
domain-type, position, length) User-metadata
(user-name, encrypted-password,
group) Index-metadata (index-name,
relation-name, index-type, index-attributes) Vie
w-metadata (view-name, definition)
35
Large Objects
  • Large objects binary large objects (blobs) and
    character large objects (clobs)
  • Examples include
  • text documents
  • graphical data such as images and computer aided
    designs audio and video data
  • Large objects may need to be stored in a
    contiguous sequence of bytes when brought into
    memory.
  • If an object is bigger than a page, contiguous
    pages of the buffer pool must be allocated to
    store it.
  • May be preferable to disallow direct access to
    data, and only allow access through a
    file-system-like API, to remove need for
    contiguous storage.

36
Modifying Large Objects
  • If the application requires insert/delete of
    bytes from specified regions of an object
  • B-tree file organization can be modified to
    represent large objects
  • Each leaf page of the tree stores between half
    and 1 page worth of data from the object
  • Special-purpose application programs outside the
    database are used to manipulate large objects
  • Text data treated as a byte string manipulated by
    editors and formatters.
  • Graphical data and audio/video data is typically
    created and displayed by separate application
  • checkout/checkin method for concurrency control
    and creation of versions
Write a Comment
User Comments (0)
About PowerShow.com