Relational Database M 358 Block 5 - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Relational Database M 358 Block 5

Description:

Disk platters - One of more disk platters rotate continually about the central ... A disk platter would consists of two disk surfaces. ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 30
Provided by: Ange325
Category:

less

Transcript and Presenter's Notes

Title: Relational Database M 358 Block 5


1
Relational Database (M 358)Block 5
  • Tang Kwok Wa, Billy

2
Storage Device
Disk Architecture Disk platters - One of more
disk platters rotate continually about the
central axis at constant speed, each is accessed
by a read/write head. A disk platter would
consists of two disk surfaces. A track - Data is
stored along a number of tracks around the disk A
sector - A contiguous sequence of bytes within a
single track of one surface of a disk platter. A
sector of data is the minimum amount of data
transferred between the disk and the memory A
cylinder - At any moment, heads are positioned on
the equivalent track on each of the disk
surfaces. Each cylinder consists of all these
equivalent tracks Example of calculation for
disk capacity 2000 cylinder 10 disk surfaces 8
sectors per track 4K bytes per sector Disk
capacity 4K x 8 x 2000 x 10 640000K bytes
3
Storage Device
  • Accessing Data Stored on a Disk
  • Time to access (read or write) a sector of data
    on a disk
  • Seek time - Time for the access mechanism to move
    the read/write heads from current position to the
    cylinder where the required sector resides -
    increases as the distance the arms must move
    increases.
  • Rotational delay - Time it takes waiting for the
    start of the required sector to come round and
    position itself at the read/ write head.
  • Data transfer time - is the time taken to
    transfer the contents of a sector on the disk to
    the memory
  • If two rows of data reside within the same disk
    sector, a single disk access is sufficient to
    transfer data to memory.
  • If two rows reside in different cylinders, then
    two disk accesses would be required.

4
Storage Model
Storage model - the software components that form
the interface between the DBMS and the operating
system. Two approaches for internal file
management 1. DBMS use the file organization and
access methods provided by the underlying
operating system for file management. 2. DBMS
itself manages the internal file organization -
adopted by most large scale DBMS. Necessary
software components for the second approach 1.
File manager - view database as a logical file
- a collection of blocks in memory, where each
block corresponds to a particular sector on
disk. - determine which block (disk sector)
contains the desired rows. - instruct buffer
manager to retrieve rows from buffer if they
exist in buffer. - instruct disk manager to
retrieve rows from disk if they only exist in
disk.
5
Storage Model
6
Storage Model
2. Disk manager - views database as physical
file - a collection of sectors on the
disk. - determine the physical disk address of a
required block. - retrieve rows from disk and
place them in buffer by buffer manager. 3. Buffer
manager - maintain as many blocks as possible in
buffer memory to minimize no. of disk
accesses. - manage the allocation of space in the
memory for storing the blocks , e.g., by least
recently used (LRU) scheme - replace blocks which
is least recently referenced by a user
process. - schema blocks and index blocks in the
buffer are not replaced (due to frequent
access). - support transaction uncommitted data
- are not allowed to write back to disk
(pinned). committed data - are written back to
disk (forced output). rolled back data - are
removed from buffer without writing back to
disk. - support concurrency - lock active
transaction in buffer.
7
Storage Structures
Fixed length storage records in a block -
simple structure for management. - block header
includes the block number of the block. - block
footer contains the location of a particular row
within the block. - record prefix - control
information , includes the row identifier. - when
a record is being retrieved, record prefix is
stripped off before passing to DBMS. - when a
record is being stored, record prefix is added to
the data and placed into a block - when a row is
deleted, the record prefix contains a field which
is used to indicate that the storage record is
logically deleted. Variable length storage
records in a block - column values are stored
using minimum number of bytes required (e.g.
trailing blanks of a string is not
stored). - occupy less storage space. -
drawbacks not easy to reuse space formerly
occupied by a deleted record. require
additional field in the record prefix to indicate
the no. of bytes of row data.
8
Storage Structures
Row Identifier (RID) - for locating a row of
stored records - contains a block number
and an offset from end of block (within the
block footer). It locates a field containing an
offset from the beginning of the block that
locates the storage record within that
block. - RIDs need not to be changed even if
records are moved up and down within the block,
only the offset values at the block footer have
to change.
9
Storage Structures
Pointer Chains - Store two or more tables in a
database file. - At the record prefix, store the
RID of the next storage record, forming a pointer
chain. - The location of the first storage record
in each chain needs to be held in a known
position, typically the RIDs of the first storage
record of each table will be held in contents
block (usually the first block of the database
file).
10
Storage Structures
Clustering - store records that are logically
related (or frequently used together) together
within the same or adjacent block - to minimize
the disk accesses. - clustering scheme can vary
at different time, according to the frequency of
different type of data request.
11
Storage Structures
Indexing E.g. SELECT name FROM student WHERE
student_id s05 - If whole student table is
loaded into memory for searching, it will involve
many disk accesses. - To minimize the disk
access, build an index file containing a mapping
between student_ids and RIDs. - Search this
index file (suppose small and in memory) to find
out the RID of student_id s05. Then use just
a single disk access to retrieve the data of
student_id s05 according to the searched RID.
- Indexing overhead - when new rows are
inserted, a new entry will also have to be added
to the index.
12
Storage Structures
  • Indexing facilitates
  • Direct access to individual rows of a table (just
    like the previous example)
  • Sequential access to all, or some, rows of a
    table where the sequence is that defined by the
    ordering of the values in the index file.
  • Multi-level (B tree) index
  • - for more efficient index searching

13
Storage Structures
  • Hashing
  • By computation to determine the location of a
    desired row from the column value
  • To retrieve a row of a given column value, the
    DBMS performs the computation and instructs the
    file manager to fetch the corresponding record at
    the computed position.
  • Two hashing schemes
  • 1. Suppose 13 rows of data per block. Divide the
    column value by 13 to obtain the location
    information. E.g student_id s38
  • 38 / 13 2 remainder 12,
  • then retrieve s38 data from block number 2 1
    3, and offset from end of block is 12
  • data may not uniformly distribute and waste
    space
  • 2. Suppose use 3 blocks to store 30 students.
    Divide the column value by 3 to obtain the block
    number. Offset value is determine by file
    manager.
  • 38/3 12 remainder 2, block number is 2 1
    3 and offset value is assigned by file
    manager, say the next available slot.
  • less space. But space indicated by computed
    blocked number might be fully occupied, need
    extra algorithm to handle such situation

14
Comparison of Indexing and Hashing
15
Distributed Data
Database divided into a number of regional
databases
user process b
user process a
Region 2
Region 1
Region 3
Some form of communication network must be
provided to join the local databases and allow
access from both local and distributed users
Region 2 students
Region 1 students
Region 3 students
Region 2 staff
Region 1 staff
Region 3 staff
Local users - users who require access to only
the data stored at a specific location. Distribute
d users - users who require access to data stored
at multiple locations.
16
Distributed Data
  • Three approaches for distribution of processing
    and data
  • 1. Client multi-server requires explicit
    navigation by the user process between multiple
    database servers that are independent of each
    other.
  • 2. Distributed database hide the location of the
    underlying data and how it is distributed so that
    it appears as a single database from users point
    of view.
  • 3. Replication systems makes use of data
    replication, distributing the copies to allow
    local processing of data within each local
    database system.

17
Client Multi-server Systems
user process
client
server
DBMS
DBMS
database
schema
database
schema
Region 1 Student Database
Region 2 Student Database
  • Need to ensure integrity of data among different
    database. E.g. need to keep student_id in Region
    1 and Region 2 databases above be unique -
    possibly assigning a particular range of
    student_id for each region.
  • User needs to know which database he is dealing
    with -gt Connection management
  • Establish connection CONNECT
  • Make current connection for a dormant connection
    SET CONNECTION
  • Close down communication DISCONNECT

18
Client Multi-server Systems
  • Transaction Management - two phase commit
  • E.g. System A performs a transaction including an
    INSERT on System B and a DELETE on System C

insert
DELETE GET READY COMMIT
INSERT DELETE COMMIT first phase second
phase done
INSERT GET READY COMMIT
delete
get ready
get ready
request system B to prepare to commit/rollback
transaction
time out
commit the local transaction
yes
yes
commit
commit
When both system B and C all reply yes
final commit of transaction
OK
OK
System C
System A
System B
19
Distributed Databases
network
  • Local users can interface with DBMS. Distributed
    users must interface with DDBMS.
  • Location independence Users do not have to know
    anything about where data is stored nor how it is
    processed to meet their requirement.

20
Distributed Databases
  • Global schema schema describing all the data for
    a distributed user (like logical schema)
  • Distributed schema schema defining where data
    is located, i.e. in which local database (like
    storage schema).
  • Location of global and distribution schema

21
Distributed Databases
  • Data fragmentation
  • Horizontal fragmentation a subset of complete
    rows of a table is stored in one location.
  • Vertical fragmentation some columns of a table
    are stored in one location and other columns in
    another.
  • A combination of horizontal and vertical
    fragmentation.
  • Processing queries in distributed databases
  • Subsidiary query - retrieve some part of the
    required data from a given location.
  • Combine intermediate tables resulting from
    subsidiary queries from different locations to
    obtain final results. Combining tables involves
    movement of data among locations, performed by
  • UNION, for horizontally fragmented database
  • JOIN, for vertically fragmented database

22
Replication Databases
  • Replication is a way of providing copies of data
    to other database servers so that each user
    process can perform all processing locally.
  • minimizing the use of remote connections.
  • reducing the dangers of remote site failure or
    connection failure.
  • Advantages
  • shorter response time.
  • higher availability.
  • presence of data snapshots can facilitate for
    auditing purpose or trend analysis.

23
Replication Databases
network
  • Replication server
  • to receive or collect updates from its local
    database and pass copies on to other sites that
    require them
  • to receive or collect updates from remote sites,
    via their replications servers and apply them to
    the local database.
  • Ways for transfer replicates
  • Sender pushes replicates to receiver, whenever
    there is data change.
  • Receiver requests, from time to time, replicates
    from sender (pull method).

24
Replication Databases
  • Ways of replication
  • Consolidation
  • collection of data fragments from a number of
    locations, into one location to construct a
    global view.
  • Dissemination
  • one database and copies data fragments for
    distribution of different locations.
  • A primary source provide copies of data to
    different locations, the data copies are often
    the subset of primary copy.

primary
primary B
primary A
copy A copy B
copy A
copy B
25
Data Warehouse
  • Data warehouse
  • extracting data from one or more databases to
    build up a large historical database, enabling
    users to look for trends or patterns within the
    data stored by an organization.
  • Separate from operational system (e.g. order
    processing system) which usually holds data only
    necessary for supporting day-to-day operations.
    Data values held in operational system is
    constantly changing and older data is archived or
    even deleted. Data warehouses are simplified
    structures which have not necessarily been
    normalized in order to ease for making queries.
  • Characteristics of data warehouse
  • Subject oriented - data is organized around
    subjects (such as sales) rather than operation
    applications (such as order processing).
  • Non-volatile - data, once placed in the
    warehouse, is usually not subject to change.
  • Integrated - data is consistent. Data comes from
    various systems of different technologies. Their
    format and meaning should be consistent when
    putting into warehouse.
  • Time variant - historical data is recorded.

26
Data Warehouse
  • Dimensions When performing data analysis in
    data warehouse, the decision maker would make
    analysis on different dimensions, say sales by
    product, sales by member (product and member are
    then the dimensions). Time is always one of the
    dimensions in a data warehouse.
  • Dimensional analysis is to identify what is the
    subject (e.g. sales) for the data warehouse, and
    what are the dimensions in the data warehouse,
    which finally often depicted in star schema.
  • Facts of the subject - they are the attributes
    that can be analyzed across the dimensions, e.g.
    cost, selling price (for the subject sales).
    Facts must can be totaled over a given time
    period.

subject
dimension 1
time
dimension 2
27
Data Warehouse Model
Extraction component - copies data from the
existing data sources to allow them to put into
the warehouse database. Integration component -
ensures the consistency of data from different
data sources, which includes format integration
- logically similar data types (e.g. dates of
different formats ) are converted so that they
have the same physical data format. Presence of
a set of integration rules as a kind of map which
specifies how information has to be converted
before it is allowed into the data
warehouse. semantic integration - to make
consistent for the meaning of information. Each
kind of data should be understood by everyone -gt
requires metadata to describe about the data.
28
Data Warehouse Model
Data warehouse component - stores and manages
data. - in star schema. - the fact table (the
centre of the star) usually has a large number of
rows and a small number columns. - fact columns
should be summable which facilitates the
production of summary tables for management
decision. - the dimension tables (the points of
the star) are joined to the fact table through
foreign keys. Aggregates - in order to avoid
full scan of fact tables to process users
queries every time, some aggregated fact tables
(e.g. summing data by day or by month) are
prepared in the DBMS so that users can access
those if they happen to satisfy the requirements
of the query. Aggregate navigation component -
receives user query, analyzes which level of
aggregate tables can be applied, and if
applicable, directs the query automatically to
the aggregate tables for execution. - for
improving the query execution time. Presentation
tools - tools presenting data to users. - many
client tools available for presenting forms,
charts etc for users.
29
THE END
Write a Comment
User Comments (0)
About PowerShow.com