Title: Chapter 11: Storage and File Structure
1Chapter 11 Storage and File Structure
- Overview of Physical Storage Media
- Redundant Arrays of Independent Disks (RAID)
- Buffer Management
- File Organization
2Properties of Physical Storage Media
- Types of storage media differ in terms of
- Speed of data access
- Cost per unit of data
- Reliability
- Data loss on power failure or system (software)
crash - Physical failure of the storage device
- gt What is the value of data (e.g., Maxines
Salon)?
3Classification of Storage Media
- Storage be classified as
- Volatile
- Content is lost when power is switched off
- Includes primary storage (cache, main-memory)
- Non-volatile
- Contents persist even when power is switched off
- Includes secondary and tertiary storage, as well
as battery-backed up RAM
4Storage Hierarchy, Cont.
- Storage can also be classified as
- Primary Storage
- Fastest media
- Volatile
- Includes cache and main memory
- Secondary Storage
- Moderately fast access time
- Non-volatile
- Includes flash memory and magnetic disks
- Also called on-line storage
- Tertiary Storage
- Slow access time
- Non-volatile
- Includes magnetic tape and optical storage
- Also called off-line storage
5Primary Storage
- Cache
- Volatile
- Fastest and most costly form of storage
- Managed by the computer system hardware
- Main memory (also known as RAM)
- Volatile
- Fast access (10s to 100s of nanosecs 1 nanosec
109 seconds) - Generally too small (or too expensive) to store
the entire database - Capacities of up to a few Gigabytes
- Capacities have gone up, while costs have gone
down - gt The above comments notwithstanding, main
memory databases do exist
6Physical Storage Media, Cont.
- Flash memory
- Non-volatile
- Widely used in embedded devices such as digital
cameras - Data can be written at a location only once, but
location can be erased and written to again - Can support only a limited number of write/erase
cycles. - Erasing of memory has to be done to an entire
bank of memory - Reads are roughly as fast as main memory
- Writes are slow (few microseconds), erase is
slower - Cost per unit of storage roughly similar to main
memory - Also known as Electrically Erasable Programmable
Read-Only Memory (EEPROM)
7Physical Storage Media, Cont.
- Magnetic-disk
- Non-volatile
- Survives power failures and system crashes
- Failure can destroy data, but is very rare
(??????) - Primary medium for database storage
- Typically stores the entire database
- Capacities range up to roughly 100 GB currently
- Much larger capacity than main or flash memory
- Growing constantly and rapidly with technology
improvements - Direct-access, i.e., data on disk can be accessed
in any order, unlike magnetic tape also called
random-access.
8Physical Storage Media, Cont.
- Optical storage
- Non-volatile, data is read optically from a
spinning disk using a laser - CD-ROM (640 MB) and DVD (4.7 to 17 GB) most
popular forms - Write-once, read-many (WORM) optical disks used
for archival storage (CD-R and DVD-R) - Multiple write versions available (CD-RW, DVD-RW,
and DVD-RAM) - Reads and writes are slower than with magnetic
disk
9Physical Storage Media, Cont.
- Tape storage
- Non-volatile
- Sequential-access much slower than a disk
- Very high capacity (40 to 300 GB tapes available)
- Storage costs are much cheaper than for a disk,
but high quality drives can be very expensive - Used primarily for backup (to recover from disk
failure), archiving and transfer of large
quantities of data
10Physical Storage Media, Cont.
- Juke-boxes
- For storing massive amounts of data
- Hundreds of terabytes (1 terabyte 109 bytes),
petabyte (1 petabyte 1012 bytes) or exabytes (1
exabyte 1012 bytes) - Large number of removable disks or tapes
- A mechanism for automatic loading/unloading of
disks or tapes
11Storage Hierarchy
12Magnetic Hard Disk Mechanism
NOTE Diagram is schematic, and simplifies the
structure of actual disk drives
13Magnetic Disks
- Disk assembly consists of
- A single spindle that spins continually (at 7500
or 10000 RPMs typically) - Multiple disk platters (typically 2 to 4)
- Surface of platter divided into circular tracks
- Over 16,000 tracks per platter on typical hard
disks - Each track is divided into sectors
- A sector is the smallest unit of data that can be
read or written - Sector size typically 512 bytes
- Typical sectors per track 200 (on inner tracks)
to 400 (on outer tracks) - Head-disk assemblies
- One head per platter, mounted on a common arm,
each very close to its platter. - Reads or writes magnetically encoded information
- Cylinder i consists of ith track of all the
platters
14Magnetic Disks, Cont.
- To read/write a sector
- disk arm swings to position head on right track
- sector rotates under read/write head
- data is read/written as sector passes under head
- gt Disks are the primary performance bottleneck
in a database system, in part because of the need
for physical movement
15Performance Measures of Disks
- Access time - The time it takes from when a read
or write request is issued to when data transfer
begins. - Access time consists of two parts
- Seek time - the time it takes to reposition the
arm over the correct track - 4 to 10 milliseconds on typical disks
- Rotational latency - the time it takes for the
sector to be accessed to appear under the head - 4 to 11 milliseconds on typical disks (5400 to
15000 r.p.m.)
16Performance Measures of Disks, Cont.
- Data Transfer Rate - The rate at which data can
be retrieved from or stored to the disk. - 25 to 100 MB per second is typical
- Multiple disks may share an interface controller,
so the rate that the interface controller can
handle data is also important - ATA-5 66 MB/second, SCSI-3 40 MB/s, Fiber
Channel 256 MB/s
17Performance Measures (Cont.)
- Mean time to failure (MTTF) - The average time
the disk is expected to run continuously without
any failure. - Typically 3 to 5 years
- Probability of failure of new disks is quite low
- 30,000 to 1,200,000 hours for a new disk - An MTTF of 1,200,000 hours for a new disk means
that given 1000 relatively new disks, on an
average one will fail every 1200 hours - MTTF decreases as disk ages
18Magnetic Disks (Cont.)
- The term controller is used (primarily) in two
ways, both in the book and other literature the
book does not distinguish between the two very
well. - Disk controller (use 1)
- Packaged within the disk
- Accepts high-level commands to read or write a
sector - Initiates actions such as moving the disk arm to
the right track and actually reading or writing
the data - Computes and attaches checksums to each sector
for reliability - Performs re-mapping of bad sectors
19Disk Subsystem
interface
- Interface controller (use 2)
- Multiple disks are typically connected to the
system bus through an interface controller
(a.k.a., a host adapter) - Many functions (checksum, bad sector re-mapping)
are often carried out by individual disk
controllers reduces load on the interface
controller
20Disk Subsystem
- The distribution of work between the disk
controller and interface controller depends on
the interface standard. - Disk interface standard families
- ATA (AT adaptor)/IDE
- SCSI (Small Computer System Interconnect)
- Fibre Channel, etc.
- Several variants of each standard (different
speeds and capabilities) - One computer can have many interface controllers,
of the same or different type - gt Like disks, interface controllers are also a
performance bottleneck.
21Techniques for Optimizationof Disk-Block Access
- Several techniques are employed to minimize the
negative effects of disk and controller
bottlenecks - File organization
- Organize data on disk based on how it will be
accessed - Store related information (e.g., data in the same
file) on the same or nearby cylinders. - A disk may get fragmented over time
- As data is inserted and deleted from disk, free
slots on disk tend to become scattered - Data items (e.g., files) are then broken up and
scattered over the disk - Sequential access to a fragmented data results in
increased disk arm movement - Most DBMSs have utilities to de-fragment the file
system
22Techniques for Optimizationof Disk-Block Access
- Block size adjustment
- A block is a contiguous sequence of sectors from
a single track - A DBMS transfers data between disk and main
memory in blocks - Sizes range from 512 bytes to several kilobytes
- Smaller blocks more transfers from disk
- Larger blocks may waste space due to partially
filled blocks - Typical block sizes range from 4 to 16 kilobytes
- Block size can be adjusted to accommodate
workload - Disk-arm-scheduling algorithms
- Order pending accesses so that disk arm movement
is minimized - One example is the Elevator algorithm
23Techniques for Optimizationof Disk Block Access,
Cont.
- Log disk
- A disk devoted to the transaction log
- Writes to a log disk are very fast since no seeks
are required - Often times provided with battery back-up
- Nonvolatile write buffers/RAM
- Battery backed up RAM or flash memory
- Typically associated with a disk or storage
device - Blocks to be written are first written to the
non-volatile RAM buffer - Disk controller then writes data to disk whenever
the disk has no other requests or when the
request has been pending for some time - Allows processes to continue without waiting on
write operations - Data is safe even in the event of power failure
- Allows writes to be reordered to minimize disk
arm movement
24RAID
- Redundant Arrays of Independent Disks (RAID)
- Disk organization techniques that exploit a
collection of disks - Speed, reliability or the combination are
increased - Collection appears as a single disk
- Originally inexpensive disks
- Using multiple disks increases the risk of
failure - A system with 100 disks, each with MTTF of
100,000 hours (approx. 11 years), will have a
system MTTF of 1000 hours (approximately 41 days) - Techniques that use redundancy to avoid data loss
are therefore critical
25Improvement of Reliabilityvia Redundancy
- RAID improves on reliability by using two
techniques - Parity Information
- Mirroring
- Parity Information (basic)
- Makes use of the exclusive-or operator
- 1 1 0 1 1
- 0 1 1 0 0
- Enables the detection of single-bit errors
26Improvement of Reliabilityvia Redundancy
- Mirroring
- Duplicate every disk
- Every write is carried out on both disks
- If one disk in a pair fails, data still available
- Reads can take place from either disk and, in
particular, from both disks at the same time - Data loss would occur only if a disk fails, and
its mirror disk also fails before the system is
repaired - Probability of both events is very small.
- Mean time to data loss depends on time to failure
and time to repair. - If MTTF is100,000 hours, mean time to repair is10
hours, then mean time to data loss is 500106
hours (or 57,000 years) for a mirrored pair of
disks.
27Improvement in Performancevia Parallelism
- RAID improves performance main through the use of
parallelism. - Two main goals of parallelism in a disk system
- Load balance multiple small accesses to increase
throughput (I/O rate). - Parallelize large accesses to reduce response
time (transfer rate). - Parallelism is achieved through striping.
- Striping can be done at varying levels
- bit-level
- block-level (variable size)
28Improvement in Performance via Parallelism
- Bit-level striping
- Split the bits of each byte across multiple
disks - In an array of eight disks, write bit i of each
byte to disk i. - In theory, each access can read data at eight
times the rate of a single disk. - But seek/access time worse than for a single
disk. - Not commonly used, in particular, as described.
- Block-level striping
- With n disks, block i of a file goes to disk (i
mod n) 1 - Requests for different blocks can run in parallel
if the blocks reside on different disks (improves
throughput). - A request for a long sequence of blocks can use
all disks in parallel (improves response time).
29RAID Levels
- Different RAID organizations, or RAID levels,
have differing cost, performance and reliability
characteristics. - Levels vary by source and vendor
- Standard levels 0, 1, 2, 3, 4, 5, 6
- Nested levels 01, 10, 50, 51
- Several other, non-standard, vendor specific
levels exist as well. - Our book 0, 1, 2, 3, 4, 5
- Performance and reliability are not linear in the
level . - It is helpful to compare each level to every
other level, and to the single disk option.
30RAID Levels
- RAID Level 0
- Block striping.
- Used in high-performance applications where data
lost is not critical. - RAID Level 1
- Mirrored disks with block striping.
- Offers the best write performance.
- Popular for applications such as storing log
files. - Sometimes called 01, 10, 01 or 10.
31RAID Levels, Cont.
- RAID Level 2
- Memory-Style Error-Correcting-Codes (ECC) with
bit-level striping. - Parity information is used to detect, locate and
correct errors. - Outdated Disks contain embedded functionality to
detect and locate sector errors, so only a single
parity disk is needed (for correction).
32RAID Levels, Cont.
- RAID Level 3
- Bit-level striping, with parity.
- To recover data in a damaged disk, compute XOR of
bits from other disks (including parity bit
disk). - Faster data transfer than with a single disk, but
fewer I/Os per second since every disk has to
participate in every I/O. - When writing data, corresponding parity bits must
also be computed and written to a parity bit
disk. - Subsumes Level 2 (provides all its benefits, at
lower cost).
33RAID Levels (Cont.)
- RAID Level 4
- Block-level striping, and keeps a parity block on
a separate disk for corresponding blocks from N
other disks. - To recover data in a damaged disk, compute XOR of
bits from other disks (including parity bit
disk). - When writing data block, the corresponding block
of parity bits must also be recomputed and
written to the parity disk - Can be done by using old parity block, old value
of current block and new value of current block
(2 block reads 2 block writes) - Or by re-computing the parity value using the new
values of blocks corresponding to the parity
block - More efficient for writing large amounts of data
sequentially
34RAID Levels (Cont.)
- RAID Level 4 (Cont.)
- Parity block becomes a bottleneck for independent
block writes. - Provides higher I/O rates for independent block
reads than Level 3 - Provides higher transfer rates for large,
multi-block reads compared to a single disk.
35RAID Levels (Cont.)
- RAID Level 5
- Block-level striping with distributed parity
partitions data and parity among all N 1 disks,
rather than storing data in N disks and parity in
1 disk. - For example, with 5 disks, parity block for ith
set of N blocks is stored on disk (i mod 5) 1,
with the data blocks stored on the other 4 disks.
36RAID Levels (Cont.)
- RAID Level 5 (Cont.)
- Provides same benefits of level 4, but minimizes
the parity disk bottleneck. - Higher I/O rates than Level 4.
- RAID Level 6
- PQ Redundancy scheme similar to Level 5, but
uses additional information to guard against
multiple disk failures. - Better reliability than Level 5 at a higher cost
not used as widely.
37Choice of RAID Level
- Factors in choosing RAID level
- Monetary cost of disks, controllers or RAID
storage devices - Reliability requirements
- Performance requirements
- Throughput vs. response time
- Short vs. long I/O
- Reads vs. writes
- During failure and rebuilding of a failed disk
38Choice of RAID Level, Cont.
- RAID 0 is used only when (data) reliability is
not important. - Level 2 and 4 never used since they are subsumed
by 3 and 5. - Level 3 is not used (typically) since bit-level
striping forces single block reads to access all
disks, wasting disk arm movement. - Sometimes vendors advertize a version of level 3
using byte-level striping. - Level 6 is rarely used since levels 1 and 5 offer
adequate safety for almost all applications. - So competition is between 1 and 5 only.
39Choice of RAID Level (Cont.)
- Level 1 provides much better write performance
than level 5. - Level 5 requires at least 2 block reads and 2
block writes to write a single block, whereas
Level 1 only requires 2 block writes. - Level 1 preferred for high update environments.
- Level 1 has higher storage cost than level 5,
however - I/O requirements have increased greatly, e.g. for
Web servers. - When enough disks have been bought to satisfy
required rate I/O rate, they often have spare
storage capacity - In that case there is often no extra monetary
cost for Level 1! - Level 5 is preferred for applications with low
update rate,and large amounts of data. - Level 1 is preferred for all other applications.
40Other RAID Issues
- Hardware vs. software RAID.
- Local storage buffers.
- Hot-swappable disks.
- Spare components disks, fans, power supplies,
controllers. - Battery backup.
41Storage Access
- A DBMS will typically have several files
allocated to it for storage, which are (usually)
formatted and managed by the DBMS. - In the simplest case each file maps to either a
disk or disk partition. - Each file is partitioned into blocks (a.k.a,
pages). - Consists of zero or more contiguous sectors.
- A block is the smallest unit of DBMS storage
allocation. - A block is the smallest unit of DBMS data
transfer. - Each block is partitioned into records.
- Each record is partitioned into fields.
42Buffer Management
- The DBMS will transfer blocks of data between RAM
and Disk, in a manner similar to an operating
system. - The DBMS seeks to minimize the number of block
transfers between the disk and memory. - The portion of main memory available to store
copies of disk blocks is called the buffer. - The DBMS subsystem responsible for allocating
buffer space in main memory is called the buffer
manager.
43Buffer Manager Algorithm
- Programs call the buffer manager when they need a
(disk) block. - If a requested block is already in the buffer
then the requesting program is given the address
of the block in main memory - If the block is not in the buffer then the buffer
manager - Allocates space in the buffer for the block,
throwing out some other block, if required. - Writes out to disk the thrown out block if it has
been modified since the last time it was
retrieved or written. - Reads the block from disk to the buffer once
space is available, and passes the address of the
block in main memory to requester.
44Buffer-Replacement Policies
- How is a block selected for replacement?
- Most operating systems use Least Recently Used
(LRU). - Past pattern of block references is used to
predict future references. - In a DBMS, LRU can be a bad strategy for certain
access patterns. - Queries have well-defined access patterns (such
as sequential scans), and so a DBMS can predict
future references (an OS cannot). - Mixed strategies are common.
- Statistical information can be exploited.
- Examples of statistical information
- The data dictionary is frequently accessed, so
keep those blocks in the buffer. - Index blocks are used frequently, so keep them in
the buffer. - Store and use histograms on the distribution of
column values.
45Buffer-Replacement Policies, Cont.
- Some terminology...
- A memory block that has been loaded into the
buffer from disk, and is not allowed to be
replaced is said to be pinned. - At any given time a buffer block is in one of the
following states - Unused (free) does not contain the copy of a
block from disk. - Used, but not pinned contains the copy of a
block from disk, which is available for
replacement. - Used and pinned contains the copy of a block
from disk, but which is not available for
replacement.
46Buffer-Replacement Policies, Cont.
- Page replacement strategies
- Toss immediate Frees the space in the buffer
occupied by a block, typically done when the
final tuple of a block has been used writes the
block out if necessary. - Most recently used (MRU) The moment a disk
block in the buffer becomes unpinned, it becomes
the most recently used block. - Least recently used (LRU) Of all unpinned disk
blocks in the buffer, the one referenced furthest
back in time.
47Buffer-Replacement Policies, Cont.
- Consider the following query and query plan,
chosen by the query optimizer, for implementing a
join query between the borrower and customer
relations. - select customer-name, loan-number, street, city
- from borrower, customer
- where bcustomer-name ccustomer-name
- for each tuple b of borrower do // scan
borrower sequentially - for each tuple c of customer do // scan
customer sequentially - if bcustomer-name ccustomer-name then
begin - xcustomer-name bcustomer-name
- xloan-number bload-number
- xstreet cstreet
- xcity ccity
- include x in the result of the query
- end if
- end for
- end for
48Buffer-Replacement Policies, Cont.
- Suppose that borrower consists of blocks b1, b2,
and b3, that customer consists of blocks c1, c2,
c3, c4, and c5, and that the buffer can fit five
(5) blocks total. - LRU Example
- Read in a block of borrower, and keep it in the
buffer until the last tuple in that block has
been processed (pinned). - Once all of the tuples from that block have been
processed, free up the buffer space used by that
block immediately (toss-immediately). - If a customer block is to be brought into the
buffer, and another block must be moved out in
order to make room, then move out the least
recently used block (borrower, or customer).
49Buffer-Replacement Policies, Cont.
- Applying the LRU replacement algorithm results in
the following for the first two tuples of b1 - b1 b1 b1 b1 b1 b1 b1 b1 b1 b1 b1
- c1 c1 c1 c1 c5 c5 c5 c5 c4 c4
- c2 c2 c2 c2 c1 c1 c1 c1 c5 .......
- c3 c3 c3 c3 c2 c2 c2 c2
- c4 c4 c4 c4 c3 c3 c3
- Notice that each time a customer block is read
into the buffer, the block replaced is the next
block required! - This results in a total of six (6) page
replacements.
50Buffer-Replacement Policies, Cont.
- MRU Example
- Read in a block of borrower, and keep it in the
buffer until the last tuple in that block has
been processed (pinned). - Once all of the tuples from that block have been
processed, free up the buffer space used by that
block immediately (toss-immediately). - If a customer block is to be brought into the
buffer, and another block must be moved out in
order to make room, then move out the most
recently used block (borrower, or customer).
51Buffer-Replacement Policies, Cont.
- Applying the MRU replacement algorithm results in
the following for the first two tuples of b1 - b1 b1 b1 b1 b1 b1 b1
- c1 c1 c1 c1 c1 c1
- c2 c2 c2 c2 c2 .......
- c3 c3 c3 c4
- c4 c5 c5
- This results in a total of two (2) page
replacements.
52File Organization
- Several standard low-level storage DBMS issues
- Avoiding large numbers of block reads
- Algorithmically, the bar has been raised
- Records crossing block boundaries
- Dangling pointers
- Disk fragmentation
- Allocating free space
- Other issues exist.
- Partial solutions to the above will be presented
in the following.
53File Organization
- Recall
- A database is stored as a collection of files.
- A file consists of a sequence of blocks.
- A block consists of a sequence of records.
- A record consists of a sequence of fields.
- Initial assumptions
- Record size is fixed.
- Each file has records of one particular type
only. - Different files are used for different relations.
- Tuples are stored in an arbitrary order, i.e.,
unsorted. - This case is easiest to deal with variable
length records will be considered later.
54Fixed-Length Records
- Simple approach
- Store record i starting from byte n ? (i 1),
where n is the size of each record. - Record access is simple.
- Technically, records could cross block
boundaries, but this can be fixed. - Insertion is easy
- Add each new record to the end
- Deletion of record i options
- Move records i 1, . . ., n to i, . . . , n 1
(shift) - Move record n to i
- What problems arise with each?
- gt Dont move records, but link all free
records on a free list.
55Free Lists
- Store the address, i.e., a pointer, of the first
deleted record in a file header. - Store a pointer in the first record to the second
record, etc. - Improvement - use space for a normal attribute to
store the pointers. - Records in the file could be connected with
points for efficient scanning.
56Variable-Length Records
- Variable-length records arise in databases in
several ways - Storage of multiple record types in a file
- Record types that allow variable lengths for one
or more fields - Record types that allow repeating fields
- Options for storing variable-length records
- Byte-string
- Reserved space
- Pointers
- Slotted page structure
57Variable-Length Records, Cont.
- Byte string representation
- Attach an end-of-record (?) character to the end
of each record. - Could also combine this with pointers for free
and un-free records. - Deletion insertion result in fragmentation.
- Difficulty with individual record growth.
- Reserved space can use fixed-length records of
a known maximum length unused space in shorter
records filled with a null or end-of-record
symbol. - Once again, could use with pointers for free and
un-free records.
58Variable Length Records, Cont.
- Pointer method
- A variable-length record is represented by a list
of fixed-length records, chained together via
pointers. - Can be used even if the maximum record length is
not known. - Once again, could use with pointers for free and
un-free records.
59Variable Length Records, Cont.
- Disadvantages
- Records are more likely to cross block boundaries
- Space is wasted in all records except the first
in a a chain - Partial solution is to allow two kinds of block
in file - Anchor block contains the first records of
chain - Overflow block contains records other than
those that are first in a chain. - Records are still more likely, to cross block
boundaries.
60Variable-Length Recordsthe Slotted Page
Structure
- The slotted page (e.g., block) header contains
- Number of record entries
- Pointer to the end of free space in the block
- Location and size of each record
- Records can be moved within a page to eliminate
empty space - Entry in the header must be updated.
- Pointers from outside the block do not point
directly to a record, but rather to the
corresponding entry in the header.
61Organization of Records in Files
- How are records assigned to blocks within a
specific file? - Heap a record is placed anywhere in the file
where there is space. - Sequential records are stored in sequential
order, based on the value of the search key of
each record. - Hashing a hash value is computed from some
attribute(s) of each record, which specifies the
block where the record is placed. - Clustering records of different tables are
stored in the same file. - Motivation is to store related records on the
same block to minimize I/O - The preceding slides covered heap organization.
62Sequential File Organization
- Suitable for applications that require sequential
processing of the entire file. - The records in the file are ordered by a
search-key
63Sequential File Organization (Cont.)
- Deletion must maintain pointer chains.
- Insertion locate the position where the record
is to be inserted - if there is free space insert there
- if no free space, insert the record in an
overflow block - In either case, pointer chain must be updated
- Need to reorganize the filefrom time to time to
restoresequential order.
64Clustering File Organization
- Simple file structure stores each relation in a
separate file - A clustering file organization stores several
relations in one file - Clustered organization of customer and depositor
- Good for queries involving depositor
customer, and for queries involving one single
customer and his accounts - Bad for queries involving only customer
- Results in variable size records
65Data Dictionary Storage
- The data dictionary (a.k.a. system catalog)
stores metadata - Information about relations
- names of relations
- names and types of attributes
- names and definitions of views
- integrity constraints
- Physical file organization information
- How relation is stored (sequential, hash, etc.)
- Physical location of relation
- operating system file name, or
- disk addresses of blocks containing the records
- Statistical and descriptive data
- number of tuples in each relation
- User and accounting information, including
passwords - Information about indices (Chapter 12)
66Data Dictionary Storage (Cont.)
- Catalog structure can use either
- specialized data structures designed for
efficient access - a set of relations, with existing system features
used to ensure efficient access - The latter alternative is usually preferred
- A possible catalog representation
-
Relation-metadata (relation-name,
number-of-attributes, storage-organization,
location)Attribute-metadata (attribute-name,
relation-name, domain-type, position,
length) User-metadata (user-name,
encrypted-password, group) Index-metadata
(index-name, relation-name, index-type,
index-attributes) View-metadata (view-name,
definition)
67Large Objects
- Large objects
- text documents
- images
- computer aided designs
- audio and video data
- Large objects may need to be stored in a
contiguous sequence of bytes when brought into
memory. - If an object is bigger than a block, contiguous
blocks in the buffer must be allocated to store
it. - Might be preferable to disallow direct access to
data, and only allow access through a
file-system-like API, to remove need for
contiguous storage. - Most vendors provide supporting types
- binary large objects (blobs)
- character large objects (clobs)
- text, image, etc.
68Modifying Large Objects
- If the application requires insertion or deletion
of bytes from specified regions of an object - B-tree file organization (described later in
Chapter 12) can be modified to represent large
objects - Each leaf block of the tree stores between half
and 1 block worth of data from the object - Special-purpose application programs outside the
database are frequently used to manipulate large
objects - Text data treated as a byte string manipulated by
editors and formatters. - Graphical data and audio/video data is typically
created and displayed by separate application
69End of Chapter