13.1 - PowerPoint PPT Presentation

1 / 544
About This Presentation
Title:

13.1

Description:

SECTIONS 13.1 13.3 – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 545
Provided by: ricky
Category:
Tags: accounts | bank | diagram

less

Transcript and Presenter's Notes

Title: 13.1


1
13.1 13.2
SECONDARY STORAGE MANAGEMENT
202 Sanuja Dabade
2
13.1.1 Memory Hierarchy
  • devices for data storage have different data
    capacities available
  • Cost per byte to store data also varies
  • Device with smallest capacity offer the fastest
    speed with highest cost per bit

3
Memory Hierarchy Diagram
  • Programs,
    DBMS
  • Main Memory DBMSs

Tertiary Storage
As Visual Memory Disk
File System
Main Memory
Cache
4
13.1.1 Memory Hierarchy
  • Cache
  • Lowest level of Memory Hierarchy, but it is
    fastest
  • Data items are copies of certain locations of
    main memory
  • Sometimes, values in cache are changed and
    corresponding changes to main memory are delayed
  • Machine looks for instructions as well as data
    for those instructions in the cache
  • Holds limited amount of data

5
13.1.1 Memory Hierarchy (cont)
  • In a single processor computer the data in main
    memory doesnt have to be updated immediately
  • In multiple processors computer the data is
    updated immediately to main memory
  • .called as write through

6
Main Memory
  • Everything happens in the computer i.e.
    instruction execution, data manipulation, as
    working on information that is resident in main
    memory
  • Main memories are random access.one can obtain
    any byte in the same amount of time

7
Secondary storage
  • Used to store data and programs when they are not
    being processed
  • More permanent than main memory, as data and
    programs are retained when the power is turned
    off
  • E.g. magnetic disks, hard disks

8
13.2
SECONDARY STORAGE MANAGEMENT
201 Eilbroun Benjamin
9
Tertiary Storage
  • Holds data volumes in terabytes
  • Used for databases much larger than what can be
    stored on disk

10
13.1.2 Transfer of Data Between levels
  • Data moves between adjacent levels of the
    hierarchy
  • At the secondary or tertiary levels accessing the
    desired data or finding the desired place to
    store the data takes a lot of time
  • Disk is devided into blocks
  • Entire blocks are moved to and from memory called
    a buffer

11
13.1.2 Transfer of Data Between level (contd)
  • A key technique for speeding up database
    operations is to arrange the data so that when
    one piece of data block is needed it is likely
    that other data on the same block will be needed
    at the same time
  • Same idea applies to other hierarchy levels

12
13.1.3 Volatile and Non Volatile Storage
  • A volatile device forgets what data is stored on
    it after power off
  • Non volatile holds data for longer period even
    when device is turned off
  • All the secondary and tertiary devices are non
    volatile and main memory is volatile

13
13.1.4 Virtual Memory
  • Typical software executes in virtual memory
  • Address space is typically 32 bit or 232 bytes or
    4GB
  • Transfer between memory and disk is in terms of
    blocks

14
13.2.1 Mechanism of Disk
  • Mechanisms of Disks
  • Use of secondary storage is one of the important
    characteristic of DBMS
  • Consists of 2 moving pieces of a disk
  • 1. disk assembly
  • 2. head assembly
  • Disk assembly consists of one or more platters
  • Platters rotate around a central spindle
  • Bits are stored on upper and lower surfaces of
    platters

15
13.2.1 Mechanism of Disk
  • Disk is devided into tracks
  • Track is devided into sectors
  • Tracks are the segments of circle separated by
    gap
  • The tracks that are at fixed radius from center
    form one cylinder

16
(No Transcript)
17
13.2.2 Disk Controller
  • One or more disks are controlled by disk
    controllers
  • Disks controllers are capable of
  • Controlling the mechanical actuator that moves
    the head assembly
  • Selecting the sector from among all those in the
    cylinder at which heads are positioned
  • Transferring bits between desired sector and main
    memory
  • Possible buffering an entire track

18
13.2.3 Disk Access Characteristics
  • Accessing (reading/writing) a block requires 3
    steps
  • Disk controller positions the head assembly at
    the cylinder containing the track on which the
    block is located. It is a seek time
  • The disk controller waits while the first sector
    of the block moves under the head. This is a
    rotational latency
  • All the sectors and the gaps between them pass
    the head, while disk controller reads or writes
    data in these sectors. This is a transfer time.

19
13.3 Accelerating Access to Secondary Storage
  • Several approaches for more-efficiently accessing
    data in secondary storage
  • Place blocks that are together in the same
    cylinder.
  • Divide the data among multiple disks.
  • Mirror disks.
  • Use disk-scheduling algorithms.
  • Prefetch blocks into main memory.
  • Scheduling Latency added delay in accessing
    data caused by a disk scheduling algorithm.
  • Throughput the number of disk accesses per
    second that the system can accommodate.

20
13.3.1 The I/O Model of Computation
  • The number of block accesses (Disk I/Os) is a
    important time approximation for the algorithm.
  • This should be minimized.
  • Ex 13.3 You want to have an index on R to
    identify the block on which the desired tuple
    appears, but not where on the block it resides.
  • For Megatron 747 (M747) example, it takes 11ms to
    read a 16k block.
  • A standard microprocessor can execute millions of
    instruction in 11ms, making any delay in
    searching for the desired tuple negligible.

21
13.3.2 Organizing Data by Cylinders
  • If we read all blocks on a single track or
    cylinder consecutively, then we can neglect all
    but first seek time and first rotational latency.
  • Ex 13.4 We request 1024 blocks of M747.
  • If data is randomly distributed, average latency
    is 10.76ms by Ex 13.2, making total latency 11s.
  • If all blocks are consecutively stored on 1
    cylinder
  • 6.46ms 8.33ms 16 139ms
  • (1 average seek) (time per rotation) ( rotations)

22
13.3.3 Using Multiple Disks
  • If we have n disks, read/write performance will
    increase by a factor of n.
  • Striping distributing a relation across
    multiple disks following this pattern
  • Data on disk R1 R1, R1n, R12n,
  • Data on disk R2 R2, R2n, R22n,
  • Data on disk Rn Rn, Rnn, Rn2n,
  • Ex 13.5 We request 1024 blocks with n 4.
  • 6.46ms (8.33ms (16/4)) 39.8ms
  • (1 average seek) (time per rotation) ( rotations)

23
13.3.4 Mirroring Disks
  • Mirroring Disks having 2 or more disks to hold
    identical copied of data.
  • Benefit 1 If n disks are mirrors of each other,
    the system can survive a crash by n-1 disks.
  • Benefit 2 If we have n disks, read performance
    increases by a factor of n.
  • Performance increases further by having the
    controller select the disk which has its head
    closest to desired data block for each read.

24
13.3.5 Disk Scheduling and the Elevator Problem
  • Disk controller can run this algorithm to select
    which of several requests to process first.
  • Pseudo code
  • requests // array of all non-processed data
    requests
  • upon receiving new data request
  • requests.add(new request)
  • while(requests is not empty)
  • move head to next location
  • if(head location is at data in requests)
  • retrieve data
  • remove data from requests
  • if(head reaches end)
  • reverse head direction

25
13.3.5 Disk Scheduling and the Elevator Problem
(cont)
Events Head starting point Request data at
8000 Request data at 24000 Request data at
56000 Get data at 8000 Request data at 16000 Get
data at 24000 Request data at 64000 Get data at
56000 Request Data at 40000 Get data at 64000 Get
data at 40000 Get data at 16000
64000
56000
48000
Current time

Current time
0
Current time
4.3
Current time
10
Current time
13.6
Current time
20
Current time
26.9
Current time
30
Current time
34.2
Current time
45.5
Current time
56.8
40000
32000
24000
16000
8000
data time






data time
8000.. 4.3





data time
8000.. 4.3
24000.. 13.6




data time
8000.. 4.3
24000.. 13.6
56000.. 26.9



data time
8000.. 4.3
24000.. 13.6
56000.. 26.9
64000.. 34.2


data time
8000.. 4.3
24000.. 13.6
56000.. 26.9
64000.. 34.2
40000.. 45.5

data time
8000.. 4.3
24000.. 13.6
56000.. 26.9
64000.. 34.2
40000.. 45.5
16000.. 56.8
26
13.3.5 Disk Scheduling and the Elevator Problem
(cont)
Elevator Algorithm
FIFO Algorithm
data time
8000.. 4.3
24000.. 13.6
56000.. 26.9
64000.. 34.2
40000.. 45.5
16000.. 56.8
data time
8000.. 4.3
24000.. 13.6
56000.. 26.9
16000.. 42.2
64000.. 59.5
40000.. 70.8
27
13.3.6 Prefetching and Large-Scale Buffering
  • If at the application level, we can predict the
    order blocks will be requested, we can load them
    into main memory before they are needed.

28
13.4
DISK FAILURE WAYS AND THEIR MITIGATION
203 Priya Gangaraju
29
Ways in which disks can fail-
  • Intermittent failure.
  • Media Decay.
  • Write failure.
  • Disk Crash.

30
Intermittent Failures
  • Read or write operation on a sector is successful
    but after repeated tries.
  • The most common form of failure.
  • Parity checks can be used to detect this kind of
    failure.

31
Media Decay
  • Serious form of failure.
  • Bit/Bits are permanently corrupted.
  • Impossible to read a sector correctly even after
    many trials.
  • Stable storage technique for organizing a disk
    is used to avoid this failure.

32
Write failure
  • Attempt to write a sector is not possible.
  • Attempt to retrieve previously written sector is
    unsuccessful.
  • Possible reason power outage while writing of
    the sector.
  • Stable Storage Technique can be used to avoid
    this.

33
Disk Crash
  • Most serious form of disk failure.
  • Entire disk becomes unreadable, suddenly and
    permanently.
  • RAID techniques can be used for coping with disk
    crashes.

34
More on Intermittent failures
  • When we try to read a sector, but the correct
    content of that sector is not delivered to the
    disk controller.
  • If the controller has a way to tell that the
    sector is good or bad (checksums), it can then
    re-issue the read request when the bad data is
    read.

35
More on Intermittent Failures..
  • The controller can attempt to write a sector,
    but the contents of the sector are not what was
    intended.
  • The only way to check this kind of failure is to
    read the sector again.
  • One way to perform the check is to read the
    sector and compare it with the sector we intend
    to write.

36
Contd..
  • Instead of performing the complete comparison at
    the disk controller, simpler way is to read the
    sector and see if a good sector was read.
  • If it is good sector, then the write was correct
    otherwise the write was unsuccessful and must be
    repeated.

37
Checksums.
  • Technique used to determine the good/bad status
    of a sector.
  • Each sector has some additional bits called the
    checksum that are set depending on the values of
    the data bits in that sector.
  • If checksum is not proper on reading, then there
    is an error in reading.

38
Checksums(contd..)
  • There is a small chance that the block was not
    read correctly even if the checksum is proper.
  • The probability of correctness can be increased
    by using many checksum bits.

39
Checksum calculation.
  • Checksum is based on the parity of all bits in
    the sector.
  • If there are odd number of 1s among a collection
    of bits, the bits are said to have odd parity. A
    parity bit 1 is added.
  • If there are even number of 1s then the
    collection of bits is said to have even parity. A
    parity bit 0 is added.

40
Checksum calculation(contd..)
  • The number of 1s among a collection of bits and
    their parity bit is always even.
  • During a write operation, the disk controller
    calculates the parity bit and append it to the
    sequence of bits written in the sector.
  • Every sector will have a even parity.

41
Examples
  • A sequence of bits 01101000 has odd number of
    1s. The parity bit will be 1. So the sequence
    with the parity bit will now be 011010001.
  • A sequence of bits 11101110 will have an even
    parity as it has even number of 1s. So with the
    parity bit 0, the sequence will be 111011100.

42
Checksum calculation(contd..)
  • Any one-bit error in reading or writing the bits
    results in a sequence of bits that has
    odd-parity.
  • The disk controller can count the number of 1s
    and can determine if the sector has odd parity in
    the presence of an error.

43
Odds.
  • There are chances that more than one bit can be
    corrupted and the error can be unnoticed.
  • Increasing the number of parity bits can increase
    the chances of detecting errors.
  • In general, if there are n independent bits as
    checksum, the chances of error will be one in 2n.

44
Stable Storage.
  • Checksums can detect the error but cannot correct
    it.
  • Sometimes we overwrite the previous contents of a
    sector and yet cannot read the new contents
    correctly.
  • To deal with these problems, Stable Storage
    policy can be implemented on the disks.

45
Stable-Storage(contd..)
  • Sectors are paired and each pair represents one
    sector-contents X.
  • The left copy of the sector may be represented as
    XL and XR as the right copy.

46
Assumptions.
  • We assume that copies are written with sufficient
    number of parity bits to reduce the chance of
    bad sector looks good when the parity checks are
    considered.
  • Also, If the read function returns a good value w
    for either XL or XR then it is assumed that w is
    the true value of X.

47
Stable -Storage Writing Policy
  1. Write the value of X into XL. Check the value has
    status good i.e., the parity-check bits are
    correct in the written copy. If not repeat write.
    If after a set number of write attempts, we have
    not successfully written X in XL, assume that
    there is a media failure in this sector. A
    fix-up such as substituting a spare sector for XL
    must be adopted.
  2. Repeat (1) for XR.

48
Stable-Storage Reading Policy
  • The policy is to alternate trying to read XL and
    XR until a good value is returned.
  • If a good value is not returned after pre chosen
    number of tries, then it is assumed that X is
    truly unreadable.

49
Error-Handling capabilities
  • Media failures
  • If after storing X in sectors XL and XR, one of
    them undergoes media failure and becomes
    permanently unreadable, we can read from the
    second one.
  • If both the sectors have failed to read, then
    sector X cannot be read.
  • The probability of both failing is extremely
    small.

50
Error-Handling Capabilities(contd..)
  • Write Failure
  • When writing X, if there is a system failure(like
    power shortage), the X in the main memory is
    lost and the copy of X being written will be
    erroneous.
  • Half of the sector may be written with part of
    new value of X, while the other half remains as
    it was.

51
Error-Handling Capabilities(contd..)
  • The possible cases when the system becomes
    available
  • The failure occurred when writing to XL. Then XL
    is considered bad. Since XR was never changed,
    its status is good. We can make a copy of XR into
    XL, which is the old value of X.
  • The failure occurred after XL is written. Then XL
    will have the good status and XR which has the
    old value of XR has bad status. We can copy the
    new value of X to XR from XL.

52
Recovery from Disk Crashes.
  • To reduce the data loss by Dish crashes, schemes
    which involve redundancy, extending the idea of
    parity checks or duplicate sectors can be
    applied.
  • The term used for these strategies is RAID i.e.
    Redundant Arrays of Independent Disks.
  • In general, if the mean time to failure of disks
    is n years, then in any given year, 1/nth of the
    surviving disks fail.

53
Recovery from Disk Crashes(contd..)
  • Each of the RAID schemes has data disks and
    redundant disks.
  • Data disks are one or more disks that hold the
    data.
  • Redundant disks are one or more disks that hold
    information that is completely determined by the
    contents of the data disks.
  • When there is a disk crash of any disk, then the
    other disks can be used to restore the failed
    disk to avoid a permanent information loss.

54
13.4
DISK FAILURES
204 Xiaqing He
55
1) Mirroring
  • The simplest scheme to recovery from Disk Crashes
  • How does Mirror work?
  • -- make two or more copies of the data and
    save on different disks
  • Benefit
  • -- save data in case of one disk is fail
  • -- same data on several disks and let access
    to several blocks at once

56
1) Mirroring (cont)
  • For mirroring, when the data can be lost?
  • -- the only way data can be lost if both
    disks crash
  • Possibility
  • Suppose
  • One disk mean time to failure 10 years
  • One of the two disk average of mean time to
    failure 5 years
  • The process of replacing the failed disk 3
    hours1/2920 year
  • So
  • the possibility of the mirror disk will fail1/10
    1/2,920 1/29,200
  • The possibility of data loss by mirroring 1/5
    1/29,200 1/146,000

57
2)Parity Blocks
  • why changes?
  • -- disadvantages of Mirroring uses too many
    redundant disks
  • Whats new?
  • -- RAID level 4 uses only one redundant disk
  • How this one redundant disk works?
  • -- modulo-2 sum
  • -- the jth bit of the redundant disk is the
    modulo-2 sum of the jth bits of all the data
    disks.
  • Example

58
2)Parity Blocks(cont)___Example
  • Data disks
  • Disk1 11110000
  • Disk2 10101010
  • Disk3 00111000
  • Redundant disk
  • Disk4 01100010

59
2)RAID 4 (cont)
  • Reading
  • -- Similar with reading blocks from any disk
  • Writing
  • 1)change the data disk
  • 2)change the corresponding block of the
    redundant disk
  • Why?
  • -- hold the parity checks for the
    corresponding blocks of all the data disks

60
2)RAID 4 (cont) _ writing
  • For a total of N data disks
  • 1) naïve way
  • read N data disks and compute the modulo-2 sum
    of the corresponding blocks
  • rewrite the redundant disk according to
    modulo-2 sum of the data disks
  • 2) better way
  • Take modulo-2 sum of the old and new version of
    the data block which was rewritten
  • Change the position of the redundant disk which
    was 1s in the modulo-2 sum

61
2)RAID 4 (cont) _ writing_Example
  • Data disks
  • Disk1 11110000
  • Disk2 10101010 ? 01100010
  • Disk3 00111000
  • to do
  • Modulo-2 sum of the old and new version of disk
    2 11001100
  • So, we need to change the positions 1,2,5,6 of
    the redundant disk.
  • Redundant disk
  • Disk4 01100010 ? 10101110

62
2)RAID 4 (cont) _failure recovery
  • Redundant disk crash
  • -- swap a new one and recomputed data from all
    the data disks
  • One of Data disks crash
  • -- swap a new one
  • -- recomputed data from the other disks including
    data disks and redundant disk
  • How to recomputed? (same rule, thats why there
    will be some improvement)
  • -- take modulo-2 sum of all the corresponding
    bits of all the other disks

63
3) An Improvement RAID 5
  • Why need a improvement?
  • -- Shortcoming of RAID level 4 suffers from a
    bottleneck defect (when updating data disk need
    to read and write the redundant disk)
  • Principle of RAID level 5 (RAID 5)
  • -- treat each disk as the redundant disk for
    some of the blocks
  • Why it is feasible?
  • The rule of failure recovery for redundant disk
    and data disk is the same
  • take modulo-2 sum of all the corresponding bits
    of all the other disks
  • So, there is no need to retreat one disk as
    redundant disk and others as data disks

64
3) RAID 5 (cont)
  • How to recognize which blocks of each disk treat
    this disk as redundant disk?
  • -- if there are n1 disks which were labeled from
    0 to N, then we can treat the ith cylinder of
    disk J as redundant if J is the remainder when I
    is divided by n1
  • Example

65
3) RAID 5 (cont)_example
  • N3
  • The first disk, labeled as 0 4,8,12
  • The second disk, labeled as 1 1,5,9
  • The third disk, labeled as 2 2,6,10
  • .
  • Suppose all the 4 disks are equally likely to
    be written, for one of the 4 disks, the
    possibility of being written
  • 1/4 3 /4 1/3 1/2
  • If Nm gt 1/m (m-1)/m 1/(m-1) 2/m

66
4) Coping with multiple disk crashes
  • RAID 6
  • deal with any number of disk crashes if
    using enough redundant disks
  • Example
  • a system of seven disks ( four data
    disks_numer 1-4 and 3 redundant disks_ number
    5-7)
  • How to set up this 37 matrix ?
  • (why is 3? there are 3 redundant
    disks)
  • 1)every column values three 1s and 0s except
    for all three 0s
  • 2) column of the redundant disk has single 1s
  • 3) column of the data disk has at least two 1s

67
4) Coping with multiple disk crashes (cont)
  • Reading
  • read form the data disks and ignore the
    redundant disk
  • Writing
  • Change the data disk
  • Change the corresponding bits of all the
    redundant disks

68
4) Coping with multiple disk crashes (cont)
  • In those system which has 4 data disks and 3
    redundant disk, how they can correct up to 2 disk
    crashes?
  • Suppose disk a and b failed
  • find some row r (in 37 matrix)in which the
    column for a and b are different (suppose a is
    0s and b is 1s)
  • Compute the correct b by taking modulo-2 sum of
    the corresponding bits from all the other disks
    other than b which have 1s in row r
  • After getting the correct b, Compute the correct
    a with all other disks available
  • Example

69
4) Coping with multiple disk crashes
(cont)_example
  • 37 matrix
  • data
    disk redundant disk
  • disk number 1 2 3 4 5
    6 7

1 1 1 0 1 0 0
1 1 0 1 0 1 0
1 0 1 1 0 0 1
70
4) Coping with multiple disk crashes
(cont)_example
  • First block of all the disks
  • disk contents
  • 1) 11110000
  • 2) 10101010
  • 3) 00111000
  • 4) 01000001
  • 5) 01100010
  • 6) 00011011
  • 7) 10001001

71
4) Coping with multiple disk crashes
(cont)_example
  • Two disks crashes
  • disk contents
  • 1) 11110000
  • 2) ?????????
  • 3) 00111000
  • 4) 01000001
  • 5) ?????????
  • 6) 00011011
  • 7) 10001001

72
4) Coping with multiple disk crashes
(cont)_example
  • In that 37 matrix, find in row 2, disk 2
    and 5 have different value and disk 2s value is
    1 and 5s value is 0.
  • so compute the first block of disk 2 by
    modulo-2 sum of all the corresponding bits of
    disk 1,4,6
  • then compute the first block of disk 2 by
    modulo-2 sum of all the corresponding bits of
    disk 1,2,3
  • 1) 11110000
  • 2) ????????? gt
    00001111
  • 3) 00111000
  • 4) 01000001
  • 5) ????????? gt
    01100010
  • 6) 00011011
  • 7) 10001001

73
13.5
ARRANGING DATA ON DISK
205 Meghna Jain
74
Data elements are represented as records, which
stores in consecutive bytes in same same disk
block. Basic layout techniques of storing data
Fixed-Length Records Allocation criteria -
data should start at word boundary. Fixed
Length record header 1. A pointer to record
schema. 2. The length of the record. 3.
Timestamps to indicate last modification or last
read.
75
Example
  • CREATE TABLE employee(
  • name CHAR(30) PRIMARY KEY,
  • address VARCHAR(255),
  • gender CHAR(1),
  • birthdate DATE
  • )
  • Data should start at word boundary and contain
    header and four fields name, address, gender and
    birthdate.

76
Packing Fixed-Length Records into Blocks
  • Records are stored in the form of blocks on the
    disk and they move into main memory when we need
    to update or access them.
  • A block header is written first, and which is
    followed by series of blocks.

77
Block header contains the following information
  • one or more blocks that are part of a network of
    blocks.
  • Information about the role played by this block
    in such a network.
  • Information about the relation, the tuples in
    this block belong to.
  • A "directory" gives the offset of each record in
    the block.
  • Time stamp(s) to indicate time of the block's
    last modification and/or access.

78
Example
  • Along with the header we can pack as many record
    as we can
  • in one block as shown in the figure and remaining
    space will
  • be unused.

79
13.6
REPRESENTING BLOCK AND RECORD ADDRESSES
206 Ramya Karri
80
INTRODUCTION
  • Address of a block and Record
  • In Main Memory
  • Address of the block is the virtual memory
    address of the first byte
  • Address of the record within the block is the
    virtual memory address of the first byte of the
    record
  • In Secondary Memory sequence of bytes describe
    the location of the block in the overall system
  • Sequence of Bytes describe the location of the
    block the device ID for the disk, Cylinder
    number, etc.

81
ADDRESSES IN CLIENT-SERVER SYSTEMS
  • The addresses in address space are represented
    in two ways
  • Physical Addresses byte strings that determine
    the place within the secondary storage system
    where the record can be found.
  • Logical Addresses arbitrary string of bytes of
    some fixed length
  • Physical Address bits are used to indicate
  • Host to which the storage is attached
  • Identifier for the disk
  • Number of the cylinder
  • Number of the track
  • Offset of the beginning of the record

82
Addresses in Client-Server Systems (Contd..)
  • Map Table relates logical addresses to physical
    addresses.

Logical Physical



83
LOGICAL AND STRUCTURED ADDRESSES
  • Purpose of logical address?
  • Gives more flexibility, when we
  • Move the record around within the block
  • Move the record to another block
  • Gives us an option of deciding what to do when a
    record is deleted?

Record 4 Record 3 Record 2 Record 1
84
POINTER SWIZZLING
  • Having pointers is common in an object-relational
    database systems
  • Important to learn about the management of
    pointers
  • Every data item (block, record, etc.) has two
    addresses
  • database address address on the disk
  • memory address if the item is in virtual memory

85
POINTER SWIZZLING (CONTD)
  • Translation Table Maps database address to
    memory address
  • All addressable items in the database have
    entries in the map table, while only those items
    currently in memory are mentioned in the
    translation table

Dbaddr Mem-addr



86
POINTER SWIZZLING (CONTD)
  • Pointer consists of the following two fields
  • Bit indicating the type of address
  • Database or memory address
  • Example 13.17

Memory
Disk
Swizzled
Block 1
Block 1
Unswizzled
Block 2
87
EXAMPLE 13.7
  • Block 1 has a record with pointers to a second
    record on the same block and to a record on
    another block
  • If Block 1 is copied to the memory
  • The first pointer which points within Block 1 can
    be swizzled so it points directly to the memory
    address of the target record
  • Since Block 2 is not in memory, we cannot swizzle
    the second pointer

88
POINTER SWIZZLING (CONTD)
  • Three types of swizzling
  • Automatic Swizzling
  • As soon as block is brought into memory, swizzle
    all relevant pointers.
  • Swizzling on Demand
  • Only swizzle a pointer if and when it is actually
    followed.
  • No Swizzling
  • Pointers are not swizzled they are accesses using
    the database address.

89
PROGRAMMER CONTROL OF SWIZZLING
  • Unswizzling
  • When a block is moved from memory back to disk,
    all pointers must go back to database (disk)
    addresses
  • Use translation table again
  • Important to have an efficient data structure for
    the translation table

90
PINNED RECORDS AND BLOCKS
  • A block in memory is said to be pinned if it
    cannot be written back to disk safely.
  • If block B1 has swizzled pointer to an item in
    block B2, then B2 is pinned
  • Unpin a block, we must unswizzle any pointers to
    it
  • Keep in the translation table the places in
    memory holding swizzled pointers to that item
  • Unswizzle those pointers (use translation table
    to replace the memory addresses with database
    (disk) addresses

91
13.7
VARIABLE LENGTH DATA AND RECORDS
221 Eswara Satya Pavan Rajesh Pinapala
92
Example
name
address
gender
birth date
0 30 286 287
297
Fig 1 Movie star record with four fields
93
Records with Variable Fields
  • An effective way to represent variable length
    records is as follows
  • Fixed length fields are Kept ahead of the
    variable length fields
  • Record header contains
  • Length of the record
  • Pointers to the beginning of all variable
  • length fields except the first one.

94
Records with Variable Length Fields
header information
record length
to address
birth date
name
address
gender
Figure 2 A Movie Star record with name and
address implemented as variable length character
strings
95
Records with Repeating Fields
  • Records contains variable number of occurrences
    of a field F
  • All occurrences of field F are grouped together
    and the record
  • header contains a pointer to the first
    occurrence of field F
  • L bytes are devoted to one instance of field F
  • Locating an occurrence of field F within the
    record
  • Add to the offset for the field F which are the
    integer multiples of L starting with 0 , L ,2L,3L
    and so on to locate
  • We stop upon reaching the offset of the field F.

96
Records with Repeating Fields
other header information
record length
to address
to movie pointers
name
address
pointers to movies
Figure 3 A record with a repeating group of
references to movies
97
Records with Repeating Fields
record header information
length of name
to name
to address
length of address
to movie references
number of references
address
name
Figure 4 Storing variable-length fields
separately from the record
98
Records with Repeating Fields
  • Advantage
  • Keeping the record itself fixed length allows
    record to be searched more efficiently, minimizes
    the overhead in the block headers, and allows
    records to be moved within or among the blocks
    with minimum effort.
  • Disadvantage
  • Storing variable length components on another
    block increases the number of disk I/Os needed
    to examine all components of a record.

99
Records with Repeating Fields
  • A compromise strategy is to allocate a fixed
    portion of the record for the repeating fields
  • If the number of repeating fields is lesser than
  • allocated space, then there will be some
    unused space
  • If the number of repeating fields is greater
    than
  • allocated space, then extra fields are stored
    in a
  • different location and
  • Pointer to that location and count of additional
  • occurrences is stored in the record

100
Variable Format Records
  • Records that do not have fixed schema
  • Variable format records are represented by
    sequence of
  • tagged fields
  • Each of the tagged fields consist of information
  • Attribute or field name
  • Type of the field
  • Length of the field
  • Value of the field
  • Why use tagged fields
  • Information Integration applications
  • Records with a very flexible schema

101
Variable Format Records
code for name
code for restaurant owned
code for string type
code for string type
length
length
N
16
S
S
14
Clint Eastwood
Hogs Breath Inn
R
Fig 5 A record with tagged fields
102
13.8
RECORD MODIFICATIONS
222 Neha Samant
103
Records that do not fit in a block
  • When the length of a record is greater than one
    block size ,then record is divided and placed
    into two or more blocks
  • Portion of the record in each block is referred
    to as a
  • RECORD FRAGMENT
  • Record with two or more fragments is called
  • SPANNED RECORD
  • Record that do not cross a block boundary is
    called
  • UNSPANNED RECORD

104
Spanned Records
  • Spanned records require the following extra
    header information
  • A bit indicates whether it is fragment or not
  • A bit indicates whether it is first or last
    fragment of
  • a record
  • Pointers to the next or previous fragment for
    the
  • same record

105
Records that do not fit in a block
block header
record header
record 2 - b
record 2 - a
record 1
record 3
block 1
block 2
Figure 6 Storing spanned records across blocks
106
BLOBS
  • Large binary objects are called BLOBS
  • e.g. audio files, video files
  • Storage of BLOBS
  • Retrieval of BLOBS

107
Modification types
  • Insertion
  • Deletion
  • Update

107
108
Insertion
  • Insertion of records without order
  • Records can be placed in a blocks free
    space or in a new block.
  • Insertion of records in fixed order
  • Space available in the block
  • No space available in the block (outside the
    block)
  • Structured address
  • Pointer to a record from outside the block.

108
109
Insertion in fixed order
  • Space available within the block
  • Use of an offset table in the header of each
    block with pointers to the location of each
    record in the block.
  • The records are slid within the block and the
    pointers in the offset table are adjusted.

Record 1
109
110
Insertion in fixed order
  • No space available within the block (outside the
    block)
  • Find space on a nearby block.
  • In case of no space available on a block, look at
    the following block in sorted order of blocks.
  • If space is available in that block ,move the
    highest records of first block 1 to block 2 and
    slide the records around on both blocks.
  • Create an overflow block
  • Records can be stored in overflow block.
  • Each block has place for a pointer to an overflow
    block in its header.
  • The overflow block can point to a second overflow
    block as shown below.

110
111
Deletion
  • Recover space after deletion
  • When using an offset table, the records can be
    slid around the block so there will be an unused
    region in the center that can be recovered.
  • In case we cannot slide records, an available
    space list can be maintained in the block header.
  • The list head goes in the block header and
    available regions hold the links in the list.

111
112
Deletion
  • Use of tombstone
  • The tombstone is placed in a record in order to
    avoid pointers to the deleted record to point to
    new records.
  • The tombstone is permanent until the entire
    database is reconstructed.
  • If pointers go to fixed locations from which the
    location of the record is found then we put the
    tombstone in that fixed location. (See examples)
  • Where a tombstone is placed depends on the nature
    of the record pointers.
  • Map table is used to translate logical record
    address to physical address.

112
113
Deletion
  • Use of tombstone
  • If we need to replace records by tombstones,
    place the bit that serves as the tombstone at the
    beginning of the record.
  • This bit remains the record location and
    subsequent bytes can be reused for another record

Record 1 can be replaced, but the tombstone
remains, record 2 has no tombstone and can be
seen when we follow a pointer to it.
113
114
Update
  • Fixed Length update
  • No effect on storage system as it occupies
    same space as before update.
  • Variable length update
  • Longer length
  • Short length

114
115
Update
  • Variable length update (longer length)
  • Stored on the same block
  • Sliding records
  • Creation of overflow block.
  • Stored on another block
  • Move records around that block
  • Create a new block for storing variable length
    fields.

115
116
Update
  • Variable length update (Shorter length)
  • Same as deletion
  • Recover space
  • Consolidate space.

116
117
14.2
BTrees Bitmap Indexes
207 Maciej Kicinski
118
Structure
  • A balanced tree, meaning that all paths from the
  • leaf node have the same length.
  • There is a parameter n associated with each Btree
  • block. Each block will have space for n search
    keys and n1 pointers.
  • The root may have only 1 parameter, but all other
  • blocks most be at least half full.

119
Structure
  • ? A typical node gt
  • ? a typical interior
  • node would have
  • pointers pointing to
  • leaves with out
  • values
  • ? a typical leaf would
  • have pointers point
  • to records
  • N search keys
  • N1 pointers

120
Application
  • The search key of the Btree is the primary key
    for the data file.
  • Data file is sorted by its primary key.
  • Data file is sorted by an attribute that is not a
    key,and this attribute is the search key for the
    Btree.

121
Lookup
  • If at an interior node, choose the correct
    pointer to use. This is done by comparing keys to
    search value.

122
Lookup
  • If at a leaf node, choose the key that matches
    what
  • you are looking for and the pointer for that
    leads
  • to the data.

123
Insertion
  • When inserting, choose the correct leaf node to
    put pointer to data.
  • If node is full, create a new node and split keys
  • between the two.
  • Recursively move up, if cannot create new pointer
    to new node because full, create new node.
  • This would end with creating a new root node, if
  • the current root was full.

124
Deletion
  • Perform lookup to find node to delete and
    delete it.
  • If node is no longer half full, perform join on
    adjacent node and recursively delete up, or key
    move if that node is full and recursively change
    pointer up.

125
Efficiency
  • Btrees allow lookup, insertion, and deletion of
    records using very few disk I/Os.
  • Each level of a Btree would require one read.
    Then you would follow the pointer of that read to
    the next or final read.

126
Efficiency
  • Three levels are sufficient for Btrees. Having
    each block have 255 pointers, 2553 is about 16.6
    million.
  • You can even reduce disk I/Os by keeping a level
    of a Btree in main memory. Keeping the first
    block with 255 pointers would reduce the reads to
    2, and even possible to keep the next 255
    pointers in memory to reduce reads to 1.

127
14.7
BTrees Bitmap Indexes
221 Deepti Kundu
128
Definition
  • A bitmap index for a field F is a collection of
    bit-vectors of length n, one for each possible
    value that may appear in that field F.1

129
What does that mean?
  • Assume relation R with
  • 2 attributes A and B.
  • Attribute A is of type Integer and B is of type
    String.
  • 6 records, numbered 1 through 6 as shown.

A B
1 30 foo
2 30 bar
3 40 baz
4 50 foo
5 40 bar
6 30 baz
130
Example Continued
  • A bitmap for attribute B is

A B
1 30 foo
2 30 bar
3 40 baz
4 50 foo
5 40 bar
6 30 baz
Value Vector
foo 100100
bar 010010
baz 001001
131
Where do we reach?
  • A bitmap index is a special kind of database
    index that uses bitmaps.2
  • Bitmap indexes have traditionally been considered
    to work well for data such as gender, which has a
    small number of distinct values, e.g., male and
    female, but many occurrences of those values.2

132
A little more
  • A bitmap index for attribute A of relation R is
  • A collection of bit-vectors
  • The number of bit-vectors the number of
    distinct values of A in R.
  • The length of each bit-vector the cardinality
    of R.
  • The bit-vector for value v has 1 in position i,
    if the ith record has v in attribute A, and it
    has 0 there if not.3
  • Records are allocated permanent numbers.3
  • There is a mapping between record numbers and
    record addresses.3

133
Motivation for Bitmap Indexes
  • Very efficient when used for partial match
    queries.3
  • They offer the advantage of buckets 2
  • Where we find tuples with several specified
    attributes without first retrieving all the
    record that matched in each of the attributes.
  • They can also help answer range queries 3

134
Another Example
  • Multidimensional Array of multiple types
  • (5,d),(79,t),(4,d),(79,d),(5,t),(6,a)
  • 5 100010
  • 79 010100
  • 4 001000
  • 6 000001
  • d 101100
  • t 010010
  • a 000001

135
Example Continued
  • (5,d),(79,t),(4,d),(79,d),(5,t),(6,a)
  • Searching for items is easy, just AND together.
  • To search for (5,d)
  • 5 100010
  • d 101100
  • 100010 AND 101100 100000

The location of the record has been traced!
136
Compressed Bitmaps
  • Assume
  • The number of records in R are n
  • Attribute A has m distinct values in R
  • The size of a bitmap index on attribute A is mn.
  • If m is large, then the number of 1s will be
    around 1/m.
  • Opportunity to encode
  • A common encoding approach is called run-length
    encoding.1

137
Run-length encoding
  • Represents runs
  • A run is a sequence of i 0s followed by a 1, by
    some suitable binary encoding of the integer i.
  • A run of i 0s followed by a 1 is encoded by
  • First computing how many bits are needed to
    represent i, Say k
  • Then represent the run by k-1 1s and a single
    0 followed by k bits which represent i in binary.
  • The encoding for i 1 is 01. k 1
  • The encoding for i 0 is 00. k 1
  • We concatenate the codes for each run together,
    and the sequence of bits is the encoding of the
    entire bit-vector

138
Understanding with an Example
  • Let us decode the sequence 11101101001011
  • Staring at the beginning (left most bit)
  • First run The first 0 is at position 4, so k
    4. The next 4 bits are 1101, so we know that the
    first integer is i 13
  • Second run 001011
  • k 1
  • i 0
  • Last run 1011
  • k 1
  • i 3
  • Our entire run length is thus 13,0,3, hence our
    bit-vector is
  • 0000000000000110001

139
Managing Bitmap Indexes
  • 1) How do you find a specific bit-vector for a
  • value efficiently?
  • 2) After selecting results that match, how do you
    retrieve the results efficiently?
  • 3) When data is changed, do you you alter bitmap
    index?

140
1) Finding bit vectors
  • Think of each bit-vector as a key to a value.1
  • Any secondary storage technique will be efficient
    in retrieving the values.1
  • Create secondary key with the attribute value as
    a search key 3
  • Btree
  • Hash

141
2) Finding Records
  • Create secondary key with the record number as a
    search key 3
  • Or in other words,
  • Once you learn that you need record k, you can
    create a secondary index using the kth position
    as a search key.1

142
3) Handling Modifications
  • Two things to remember

Record numbers must remain fixed once assigned
Changes to data file require changes to bitmap
index
143
  • Deletion
  • Tombstone replaces deleted record
  • Corresponding bit is set to 0

144
  • Insertion
  • Record assigned the next record number.
  • A bit of value 0 or 1 is appended to each bit
    vector
  • If new record contains a new value of the
    attribute, add one bit-vector.

145
  • Modification
  • Change the bit corresponding to the old value of
    the modified record to 0
  • Change the bit corresponding to the new value of
    the modified record to 1
  • If the new value is a new value of A, then insert
    a new bit-vector.

146
18.1 18.2
CONCURRENCY CONTROL
210 Chiu Luk
147
Concurrency Control
  • Concurrency control in database management
    systems (DBMS) ensures that database transactions
    are performed concurrently without the
    concurrency violating the data integrity of a
    database.
  • Executed transactions should follow the ACID
    rules. The DBMS must guarantee that only
    serializable (unless Serializability is
    intentionally relaxed), recoverable schedules are
    generated.
  • It also guarantees that committed transactions
    wont be lost, and aborted (rolled back)
    transactions wont remain in the related
    database.

148
Transaction ACID rules
Atomicity - Either the effects of all or none of
its operations remain when a transaction is
completed - in other words, to the outside world
the transaction appears to be indivisible,
atomic. Consistency - Every transaction must
leave the database in a consistent state.
Isolation - Transactions cannot interfere with
each other. Providing isolation is the main goal
of concurrency control. Durability - Successful
transactions must persist through crashes.
149
Serial and Serializable Schedules
  • In the field of databases, a schedule is a list
    of actions, (i.e. reading, writing, aborting,
    committing), from a set of transactions.
  • In this example, Schedule D is the set of 3
    transactions T1, T2, T3. The schedule describes
    the actions of the transactions as seen by the
    DBMS. T1 Reads and writes to object X, and then
    T2 Reads and writes to object Y, and finally T3
    Reads and writes to object Z. This is an example
    of a serial schedule, because the actions of the
    3 transactions are not interleaved.

150
Serial and Serializable Schedules
  • A schedule that is equivalent to a serial
    schedule has the serializability property.
  • In schedule E, the order in which the actions of
    the transactions are executed is not the same as
    in D, but in the end, E gives the same result as
    D.

151
Serial Schedule TI precedes T2
  • T1 T2
  • Read(A) A ? A100
  • Write(A)
  • Read(B) B ? B100
  • Write(B)
  • Read(A)A ? A?2
  • Write(A)
  • Read(B)B ? B?2
  • Write(B)

152
Serial Schedule T2 precedes Tl
  • T1 T2
  • Read(A)A ? A?2
  • Write(A)
  • Read(B)B ? B?2
  • Write(B)
  • Read(A) A ? A100
  • Write(A)
  • Read(B) B ? B100
  • Write(B)

153
serializable, but not serial, schedule
  • T1 T2
  • Read(A) A ? A100
  • Write(A)
  • Read(A)A ? A?2
  • Write(A)
  • Read(B) B ? B100
  • Write(B)
  • Read(B)B ? B?2
  • Write(B)
  • r1(A) w1 (A) r2(A) w2(A) r1 (B) w1 (B)
    r2(B) w2(B)

154
nonserializable schedule
  • T1 T2
  • Read(A) A ? A100
  • Write(A)
  • Read(A)A ? A?2
  • Write(A)
  • Read(B)B ? B?2
  • Write(B)
  • Read(B) B ? B100
  • Write(B)

155
schedule that is serializable only because of the
detailed behavior of the transactions
  • T1 T2
  • Read(A) A ? A100
  • Write(A)
  • Read(A)A ? A?1
  • Write(A)
  • Read(B)B ? B?1
  • Write(B)
  • Read(B) B ? B100
  • Write(B)
  • regardless of the consistent initial state the
    final state will be consistent.

156
Non-Conflicting Actions
Two actions are non-conflicting if whenever
theyoccur consecutively in a schedule, swapping
themdoes not affect the final state produced by
theschedule. Otherwise, they are conflicting.
157
Conflicting Actions General Rules
  • Two actions of the same transaction conflict
    because of the fix sequence
  • r1(A) w1(B)
  • Two actions over the same database element
    conflict, if one of them is a write
  • r1(A) w2(A)
  • w1(A) w2(A)

158
Conflict actions
  • Two or more actions are said to be in conflict
    if
  • The actions belong to different transactions.
  • At least one of the actions is a write operation.
  • The actions access the same object (read or
    write).
  • The following set of actions is conflicting
  • T1R(X), T2W(X), T3W(X)
  • While the following sets of actions are not
  • T1R(X), T2R(X), T3R(X)
  • T1R(X), T2W(Y), T3R(X)

159
Conflict Serializable
  • We may take any schedule and make as many
    nonconflicting swaps as we wish.
  • With the goal of turning the schedule into a
    serial schedule.
  • If we can do so, then the original schedule is
    serializable, because its effect on the database
    state remains the same as we perform each of the
    nonconflicting
  • swaps.

160
Conflict Serializable
  • A schedule is said to be conflict-serializable
    when the schedule is conflict-equivalent to one
    or more serial schedules.
  • Another definition for conflict-serializability
    is that a schedule is conflict-serializable if
    and only if there exists an acyclic precedence
    graph/serializability graph for the schedule.
  • Which is conflict-equivalent to the serial
    schedule ltT1,T2gt, but not ltT2,T1gt.

161
Conflict equivalent / conflict-serializable
  • Let Ai and Aj are consecutive non-conflicting
    actions that belongs to different transactions.
    We can swap Ai and Aj without changing the
    result.
  • Two schedules are conflict equivalent if they can
    be turned one into the other by a sequence of
    non-conflicting swaps of adjacent actions.
  • We shall call a schedule conflict-serializable if
    it is conflict-equivalent to a serial schedule.

162
conflict-serializable
T1 T2
R(A)
W(A)
R(A)
R(B)
W(A)
W(B)
R(B)
W(B)
163
conflict-serializable
T1 T2
R(A)
W(A)
R(B)
R(A)
W(A)
W(B)
R(B)
W(B)
164
conflict-serializable
T1 T2
R(A)
W(A)
R(A)
R(B)
W(B)
W(A)
R(B)
W(B)
Cant swap this pair!!
165
conflict-serializable
T1 T2
R(A)
W(A)
R(A)
W(B)
R(B)
W(A)
R(B)
W(B)
Serial Schedule
166
18.3 18.4
CONCURRENCY CONTROL
211 Donavon Norwood
167
INTRODUCTION
  • Enforcing serializability by locks
  • Locks
  • Locking scheduler
  • Two phase locking
  • Locking systems with several lock modes
  • Shared and exclusive locks
  • Compatibility matrices
  • Upgrading/updating locks
  • Incrementing locks

167
168
Locks
  • It works like as follows
  • A request from transaction
  • Scheduler checks in the lock table
  • Generates a serializable schedule of actions.

168
169
Consistency of transactions
  • Actions and locks must relate each other
  • Transactions can only read write only if has a
    lock and has not released the lock.
  • Unlocking an element is compulsory.
  • Legality of schedules
  • No two transactions can aquire the lock on same
    element without the prior one releasing it.

169
170
Locking scheduler
  • Grants lock requests only if it is in a legal
    schedule.
  • Lock table stores the information about current
    locks on the elements.

170
171
The locking scheduler (contd.)
  • A legal schedule of consistent transactions but
    unfortunately it is not a serializable.

171
172
Locking sched
Write a Comment
User Comments (0)
About PowerShow.com