FILE - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

FILE

Description:

Positioned very close to the platter surface (almost touching it) ... Surface of platter divided into ... one head per platter, mounted on a common arm. ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 36
Provided by: RaminSh9
Category:
Tags: file | platter

less

Transcript and Presenter's Notes

Title: FILE


1
FILE SYSTEM STRUCTURE (CHAPTER 11)
2
TERMINOLOGY
  • Computers represent data as a sequence of zero
    and ones, termed bits
  • A byte is eight contiguous bits

0101111110011010101010110000000000..00000
0101111110011010101010110000000000..00000
3
TERMINOLOGY
  • Computers represent data as a sequence of zero
    and ones, termed bits
  • A byte is eight contiguous bits

0101111110011010101010110000000000..00000
0101111110011010101010110000000000..00000
4
TERMINOLOGY
  • Computers represent data as a sequence of zero
    and ones, termed bits
  • A byte is eight contiguous bits

0101111110011010101010110000000000..00000
0101111110011010101010110000000000..00000
5
TERMINOLOGY
  • Computers represent data as a sequence of zero
    and ones, termed bits
  • A byte is eight contiguous bits
  • 1024 bytes is one Kilobyte (KB), e.g., your
    textbook.

0101111110011010101010110000000000..00000
0101111110011010101010110000000000..00000
6
TERMINOLOGY
  • Computers represent data as a sequence of zero
    and ones, termed bits
  • A byte is eight contiguous bits
  • 1024 bytes is one Kilobyte (KB).
  • 1024 KB is one Megabyte (MB), a high resolution
    photograph.

0101111110011010101010110000000000..00000
0101111110011010101010110000000000..00000
7
TERMINOLOGY
  • Computers represent data as a sequence of zero
    and ones, termed bits
  • A byte is eight contiguous bits
  • 1024 bytes is one Kilobyte.
  • 1024 KB is one Megabyte (MB).
  • 1024 MB is one Gigabyte (GB), e.g., a DVD quality
    movie.

0101111110011010101010110000000000..00000
0101111110011010101010110000000000..00000
8
TERMINOLOGY
  • Computers represent data as a sequence of zero
    and ones, termed bits
  • A byte is eight contiguous bits
  • 1024 bytes is one Kilobyte.
  • 1024 KB is one Megabyte.
  • 1024 MB is one Gigabyte.
  • 1024 GB is one Terabyte (TB), all text in the
    library of congress.

0101111110011010101010110000000000..00000
0101111110011010101010110000000000..00000
9
TERMINOLOGY
  • Computers represent data as a sequence of zero
    and ones, termed bits
  • A byte is eight contiguous bits
  • 1024 bytes is one Kilobyte.
  • 1024 KB is one Megabyte.
  • 1024 MB is one Gigabyte.
  • 1024 GB is one Terabyte (TB).
  • 1024 TB is one Petabyte (PB), entire multimedia
    collection at LoC.

0101111110011010101010110000000000..00000
0101111110011010101010110000000000..00000
10
TERMINOLOGY
  • Computers represent data as a sequence of zero
    and ones, termed bits
  • A byte is eight contiguous bits
  • 1024 bytes is one Kilobyte.
  • 1024 KB is one Megabyte.
  • 1024 MB is one Gigabyte.
  • 1024 GB is one Terabyte.
  • 1024 TB is one Petabyte (PB).
  • 1024 PB is one Exabyte (XB), record all phone
    conversations in a year.

0101111110011010101010110000000000..00000
0101111110011010101010110000000000..00000
11
TERMINOLOGY
  • Computers represent data as a sequence of zero
    and ones, termed bits
  • A byte is eight contiguous bits
  • 1024 bytes is one Kilobyte.
  • 1024 KB is one Megabyte.
  • 1024 MB is one Gigabyte.
  • 1024 GB is one Terabyte.
  • 1024 TB is one Petabyte.
  • 1024 PB is one Exabyte.
  • 1024 XB is one Zetabyte (ZB), all uncompressed
    medical data.

0101111110011010101010110000000000..00000
0101111110011010101010110000000000..00000
12
HOW MUCH DATA IS THERE?
  • Approximately 5000 films are made each year
    (worldwide)
  • Two hour display time at 240 mbps 900 TB
  • Approximately 52 billion photographs are taken
    each year
  • _at_ 10 KB per photograph, 520 PB
  • Library of congress
  • 20 million books _at_ 1MB 20 TB
  • 15 million photographs _at_ 1 MB 13 TB
  • 4 million maps _at_ 100 MB 400 TB
  • 500,000 movies _at_ 10 GB 5 PB
  • 3.5 million sound recordings at library of
    congress _at_ 1 audio per CD 2 PB

13
Physical Storage Media
  • A system consists of several forms of storage
  • Cache fastest and most costly form of storage
    volatile managed by the computer system
    hardware.
  • Main memory
  • fast access (10ns to 100ns 1 nanosecond 109
    seconds)
  • generally too small (or too expensive) to store
    the entire database
  • capacities of up to a few Gigabytes widely used
    currently
  • Capacities have gone up and per-byte costs have
    decreased steadily and rapidly (roughly factor
    of 2 every 2 to 3 years)
  • Volatile contents of main memory are usually
    lost if a power failure or system crash occurs.

14
Physical Storage Media (Cont.)
  • Magnetic-disk
  • Data is stored on spinning disk, and read/written
    magnetically
  • Primary medium for the long-term storage of data
    typically stores entire database.
  • Data must be moved from disk to main memory for
    access, and written back for storage
  • Much slower access than main memory (more on this
    later)
  • direct-access possible to read data on disk in
    any order, unlike magnetic tape
  • Capacities range up to roughly ? GB currently
  • Much larger capacity and cost/byte than main
    memory
  • Growing constantly and rapidly with technology
    improvements (factor of 2 to 3 every 2 years)
  • Survives power failures and system crashes
  • disk failure can destroy data, but is very rare

15
Magnetic Hard Disk Mechanism
NOTE Diagram is schematic, and simplifies the
structure of actual disk drives
16
Magnetic Disks
  • Read-write head
  • Positioned very close to the platter surface
    (almost touching it)
  • Reads or writes magnetically encoded information.
  • Surface of platter divided into circular tracks
  • Over 16,000 tracks per platter on typical hard
    disks
  • Each track is divided into sectors.
  • A sector is the smallest unit of data that can be
    read or written.
  • Sector size typically 512 bytes
  • Typical sectors per track 200 (on inner tracks)
    to 400 (on outer tracks)
  • To read/write a sector
  • disk arm swings to position head on right track
  • platter spins continually data is read/written
    as sector passes under head
  • Head-disk assemblies
  • multiple disk platters on a single spindle
    (typically 2 to 4)
  • one head per platter, mounted on a common arm.
  • Cylinder i consists of ith track of all the
    platters

17
Magnetic Disks (Cont.)
  • Earlier generation disks were susceptible to
    head-crashes
  • Surface of earlier generation disks had
    metal-oxide coatings which would disintegrate on
    head crash and damage all data on disk
  • Current generation disks are less susceptible to
    such disastrous failures, although individual
    sectors may get corrupted
  • Disk controller interfaces between the computer
    system and the disk drive hardware.
  • accepts high-level commands to read or write a
    sector
  • initiates actions such as moving the disk arm to
    the right track and actually reading or writing
    the data
  • Computes and attaches checksums to each sector to
    verify that data is read back correctly
  • If data is corrupted, with very high probability
    stored checksum wont match recomputed checksum
  • Ensures successful writing by reading back sector
    after writing it
  • Performs remapping of bad sectors

18
Disk Subsystem
  • Multiple disks connected to a computer system
    through a controller
  • Controllers functionality (checksum, bad sector
    remapping) often carried out by individual disks
    reduces load on controller
  • Disk interface standards families
  • ATA (AT adaptor) range of standards
  • SCSI (Small Computer System Interconnect) range
    of standards
  • Several variants of each standard (different
    speeds and capabilities)

19
Performance Measures of Disks
  • Access time the time it takes from when a read
    or write request is issued to when data transfer
    begins. Consists of
  • Seek time time it takes to reposition the arm
    over the correct track.
  • Average seek time is 1/2 the worst case seek
    time.
  • Would be 1/3 if all tracks had the same number of
    sectors, and we ignore the time to start and stop
    arm movement
  • 4 to 10 milliseconds on typical disks
  • Rotational latency time it takes for the sector
    to be accessed to appear under the head.
  • Average latency is 1/2 of the worst case
    latency.
  • 4 to 11 milliseconds on typical disks (5400 to
    15000 r.p.m.)
  • Data-transfer rate the rate at which data can
    be retrieved from or stored to the disk.
  • 4 to 8 MB per second is typical
  • Multiple disks may share a controller, so rate
    that controller can handle is also important
  • E.g. ATA-5 66 MB/second, SCSI-3 40 MB/s
  • Fiber Channel 256 MB/s

20
Optimization of Disk-Block Access
  • Block a contiguous sequence of sectors from a
    single track
  • data is transferred between disk and main memory
    in blocks
  • sizes range from 512 bytes to several kilobytes
  • Smaller blocks more transfers from disk
  • Larger blocks more space wasted due to
    partially filled blocks
  • Typical block sizes today range from 4 to 16
    kilobytes
  • Disk-arm-scheduling algorithms order pending
    accesses to tracks so that disk arm movement is
    minimized
  • elevator algorithm move disk arm in one
    direction (from outer to inner tracks or vice
    versa), processing next request in that
    direction, till no more requests in that
    direction, then reverse direction and repeat

21
Optimization of Disk Block Access (Cont.)
  • File organization optimize block access time by
    organizing the blocks to correspond to how data
    will be accessed
  • E.g. Store related information on the same or
    nearby cylinders.
  • Files may get fragmented over time
  • E.g. if data is inserted to/deleted from the file
  • Or free blocks on disk are scattered, and newly
    created file has its blocks scattered over the
    disk
  • Sequential access to a fragmented file results in
    increased disk arm movement
  • Some systems have utilities to defragment the
    file system, in order to speed up file access

22
FILE SYSTEM STRUCTURE (Cont)
  • A database system is organized as several layers
    of software
  • Query parser translates a higher level query
    language to an internal representation
  • Query optimizer transforms the internal
    representation to an efficient execution paradigm
  • Concurrency control and crash recovery ensures
    consistency of data in the presence of multiple
    concurrent update operations and
    crash-recoveries.
  • Index methods efficient retrieval of records
    for fast retrieval and update operations
  • Abstraction of multiple records on a disk page
    implements the concept of multiple records on a
    disk page.

23
BIG PICTURE
SELECT SS FROM emp WHERE sal gt 50K
DBMS
24
Overall Organization
SELECT SS FROM emp WHERE sal gt 50K
Relational Algebra operators ?, ?, ?, ?, ?, ?,
?, ?, ?
25
Overall Organization
SELECT SS FROM emp WHERE sal gt 50K
Query Parser
?SS(?salgt 50K (emp))
Relational Algebra operators ?, ?, ?, ?, ?, ?,
?, ?, ?
26
QUERY TREE
  • ?SS(?salgt 50K (emp)) becomes a query tree

Computer Screen
?
TMP File1
? salgt 50K
emp
27
Overall Organization
Query Parser
Query Optimizer
Query Interpretor
Relational Algebra operators ?, ?, ?, ?, ?, ?,
?, ?, ?
Index structures
Abstraction of records
Buffer Pool Manager
File System
28
FILE SYSTEM STRUCTURE (Cont)
  • Buffer manager maintains a portion of memory that
    is conceptualized as disk page frames. It
    maintains which disk pages are memory resident.
    It also implements a replacement policy in order
    to swap a page out in favor of another disk page
    that is being referenced. This happens because
    the number of memory page frames is significantly
    smaller than the number of disk pages.
  • File manager provides the following services
    create a file, delete a file, read a disk page
    into a specific memory address given the physical
    address of disk page on the secondary storage
    device, write a disk page from a memory address
    on to the appropriate physical disk address,
    insert a page into a file, modify a page, and
    delete a page from a file.

29
FILE SYSTEM STRUCTURE (Cont)
  • When a program requests a disk page (by
    specifying its address), the buffer manager takes
    the following steps
  • Check if the page is in the buffer.
  • If it is then pass its address to the calling
    program.
  • Otherwise, read the page from the disk into the
    buffer, possibly replacing some other page, and
    then pass its address to the calling program.
  • Pinned blocks Occasionally, the DBMS needs to
    specifically indicate that some blocks have to be
    kept in the buffer until released by unpinning
    them. These blocks are termed pinned.
  • Forced writing of blocks to disks To preserve
    the consistency of the database during
    crash-recovery, the DBMS might force the buffer
    manager to flush some blocks to disks.

30
PHYSICAL ORGANIZATION OF RECORDS AND BLOCKS
  • Key issues in organizing a file into blocks and
    records
  • Formatting fields within a record.
  • Formatting records within a block.
  • Assigning records into blocks.

31
PHYSICAL ORGANIZATION OF RECORDS AND BLOCKS
(Cont)
  • Formatting fields within a record
  • Fixed length fields stored in a specific order
  • Address of attribute i ß ? Lk
  • Fixed length fields stored on an indexed heap
  • Fields may be stored in an arbitrary manner
  • There is exactly one pointer in the header for
    each field, whether it is present or not.
  • The order of pointers is fixed and specifies the
    order of attributes for all records.

ß
Name SS age salary
i-1
k1
Name SS age salary
32
PHYSICAL ORGANIZATION OF RECORDS AND BLOCKS
(Cont)
  • Variable length fields delimited by special
    symbols
  • Variable length fields delimited by length

32
4
4
4
33
PHYSICAL ORGANIZATION OF RECORDS AND BLOCKS
(Cont)
  • Now that once the structure of a record is
    defined, it must get mapped to disk page.
    Consider fixed length records only.
  • Fixed-length store records continuously within
    the block.
  • record i is located at
  • Ri ß (i-1)L

ß
1 2 3

n
34
PHYSICAL ORGANIZATION OF RECORDS AND BLOCKS
(Cont)
  • Disadvantage
  • Records may span multiple disk page
  • Solution dont allow if results in disk
    fragmentation
  • Insertion and deletion become complicated
  • How do you utilize space that was unallocated?
  • Page reorganization affects external pointers


35
PHYSICAL ORGANIZATION OF RECORDS AND BLOCKS
(Cont)
  • Indexed Heap
  • Each page consists of an array of pointers, each
    pointer points to a record within the block.
  • A record is located by providing its block number
    and index in the pointer array. This combination
    is called a TID and an RID.
  • Insertion and deletion are easy, accomplished by
    manipulating the pointer array.
  • The contents of a block may be reorganized
    without affecting external pointers pointing to
    records. RID does not change when records are
    moved around within a block.
Write a Comment
User Comments (0)
About PowerShow.com