Relational Database M 358 Block 5

About This Presentation

Title:

Relational Database M 358 Block 5

Description:

Disk platters - One of more disk platters rotate continually about the central ... A disk platter would consists of two disk surfaces. ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 30

Provided by: Ange325

Category:

more less

Transcript and Presenter's Notes

Title: Relational Database M 358 Block 5

1
Relational Database (M 358)Block 5

Tang Kwok Wa, Billy

2
Storage Device
Disk Architecture Disk platters - One of more
disk platters rotate continually about the
central axis at constant speed, each is accessed
by a read/write head. A disk platter would
consists of two disk surfaces. A track - Data is
stored along a number of tracks around the disk A
sector - A contiguous sequence of bytes within a
single track of one surface of a disk platter. A
sector of data is the minimum amount of data
transferred between the disk and the memory A
cylinder - At any moment, heads are positioned on
the equivalent track on each of the disk
surfaces. Each cylinder consists of all these
equivalent tracks Example of calculation for
disk capacity 2000 cylinder 10 disk surfaces 8
sectors per track 4K bytes per sector Disk
capacity 4K x 8 x 2000 x 10 640000K bytes
3
Storage Device

Accessing Data Stored on a Disk
Time to access (read or write) a sector of data
on a disk
Seek time - Time for the access mechanism to move
the read/write heads from current position to the
cylinder where the required sector resides -
increases as the distance the arms must move
increases.
Rotational delay - Time it takes waiting for the
start of the required sector to come round and
position itself at the read/ write head.
Data transfer time - is the time taken to
transfer the contents of a sector on the disk to
the memory
If two rows of data reside within the same disk
sector, a single disk access is sufficient to
transfer data to memory.
If two rows reside in different cylinders, then
two disk accesses would be required.

4
Storage Model
Storage model - the software components that form
the interface between the DBMS and the operating
system. Two approaches for internal file
management 1. DBMS use the file organization and
access methods provided by the underlying
operating system for file management. 2. DBMS
itself manages the internal file organization -
adopted by most large scale DBMS. Necessary
software components for the second approach 1.
File manager - view database as a logical file
- a collection of blocks in memory, where each
block corresponds to a particular sector on
disk. - determine which block (disk sector)
contains the desired rows. - instruct buffer
manager to retrieve rows from buffer if they
exist in buffer. - instruct disk manager to
retrieve rows from disk if they only exist in
disk.
5
Storage Model
6
Storage Model
2. Disk manager - views database as physical
file - a collection of sectors on the
disk. - determine the physical disk address of a
required block. - retrieve rows from disk and
place them in buffer by buffer manager. 3. Buffer
manager - maintain as many blocks as possible in
buffer memory to minimize no. of disk
accesses. - manage the allocation of space in the
memory for storing the blocks , e.g., by least
recently used (LRU) scheme - replace blocks which
is least recently referenced by a user
process. - schema blocks and index blocks in the
buffer are not replaced (due to frequent
access). - support transaction uncommitted data
- are not allowed to write back to disk
(pinned). committed data - are written back to
disk (forced output). rolled back data - are
removed from buffer without writing back to
disk. - support concurrency - lock active
transaction in buffer.
7
Storage Structures
Fixed length storage records in a block -
simple structure for management. - block header
includes the block number of the block. - block
footer contains the location of a particular row
within the block. - record prefix - control
information , includes the row identifier. - when
a record is being retrieved, record prefix is
stripped off before passing to DBMS. - when a
record is being stored, record prefix is added to
the data and placed into a block - when a row is
deleted, the record prefix contains a field which
is used to indicate that the storage record is
logically deleted. Variable length storage
records in a block - column values are stored
using minimum number of bytes required (e.g.
trailing blanks of a string is not
stored). - occupy less storage space. -
drawbacks not easy to reuse space formerly
occupied by a deleted record. require
additional field in the record prefix to indicate
the no. of bytes of row data.
8
Storage Structures
Row Identifier (RID) - for locating a row of
stored records - contains a block number
and an offset from end of block (within the
block footer). It locates a field containing an
offset from the beginning of the block that
locates the storage record within that
block. - RIDs need not to be changed even if
records are moved up and down within the block,
only the offset values at the block footer have
to change.
9
Storage Structures
Pointer Chains - Store two or more tables in a
database file. - At the record prefix, store the
RID of the next storage record, forming a pointer
chain. - The location of the first storage record
in each chain needs to be held in a known
position, typically the RIDs of the first storage
record of each table will be held in contents
block (usually the first block of the database
file).
10
Storage Structures
Clustering - store records that are logically
related (or frequently used together) together
within the same or adjacent block - to minimize
the disk accesses. - clustering scheme can vary
at different time, according to the frequency of
different type of data request.
11
Storage Structures
Indexing E.g. SELECT name FROM student WHERE
student_id s05 - If whole student table is
loaded into memory for searching, it will involve
many disk accesses. - To minimize the disk
access, build an index file containing a mapping
between student_ids and RIDs. - Search this
index file (suppose small and in memory) to find
out the RID of student_id s05. Then use just
a single disk access to retrieve the data of
student_id s05 according to the searched RID.
- Indexing overhead - when new rows are
inserted, a new entry will also have to be added
to the index.
12
Storage Structures

Indexing facilitates
Direct access to individual rows of a table (just
like the previous example)
Sequential access to all, or some, rows of a
table where the sequence is that defined by the
ordering of the values in the index file.
Multi-level (B tree) index
- for more efficient index searching

13
Storage Structures

Hashing
By computation to determine the location of a
desired row from the column value
To retrieve a row of a given column value, the
DBMS performs the computation and instructs the
file manager to fetch the corresponding record at
the computed position.
Two hashing schemes
1. Suppose 13 rows of data per block. Divide the
column value by 13 to obtain the location
information. E.g student_id s38
38 / 13 2 remainder 12,
then retrieve s38 data from block number 2 1
3, and offset from end of block is 12
data may not uniformly distribute and waste
space
2. Suppose use 3 blocks to store 30 students.
Divide the column value by 3 to obtain the block
number. Offset value is determine by file
manager.
38/3 12 remainder 2, block number is 2 1
3 and offset value is assigned by file
manager, say the next available slot.
less space. But space indicated by computed
blocked number might be fully occupied, need
extra algorithm to handle such situation

14
Comparison of Indexing and Hashing
15
Distributed Data
Database divided into a number of regional
databases
user process b
user process a
Region 2
Region 1
Region 3
Some form of communication network must be
provided to join the local databases and allow
access from both local and distributed users
Region 2 students
Region 1 students
Region 3 students
Region 2 staff
Region 1 staff
Region 3 staff
Local users - users who require access to only
the data stored at a specific location. Distribute
d users - users who require access to data stored
at multiple locations.
16
Distributed Data

Three approaches for distribution of processing
and data
1. Client multi-server requires explicit
navigation by the user process between multiple
database servers that are independent of each
other.
2. Distributed database hide the location of the
underlying data and how it is distributed so that
it appears as a single database from users point
of view.
3. Replication systems makes use of data
replication, distributing the copies to allow
local processing of data within each local
database system.

17
Client Multi-server Systems
user process
client
server
DBMS
DBMS
database
schema
database
schema
Region 1 Student Database
Region 2 Student Database

Need to ensure integrity of data among different
database. E.g. need to keep student_id in Region
1 and Region 2 databases above be unique -
possibly assigning a particular range of
student_id for each region.
User needs to know which database he is dealing
with -gt Connection management
Establish connection CONNECT
Make current connection for a dormant connection
SET CONNECTION
Close down communication DISCONNECT

18
Client Multi-server Systems

Transaction Management - two phase commit
E.g. System A performs a transaction including an
INSERT on System B and a DELETE on System C

insert
DELETE GET READY COMMIT
INSERT DELETE COMMIT first phase second
phase done
INSERT GET READY COMMIT
delete
get ready
get ready
request system B to prepare to commit/rollback
transaction
time out
commit the local transaction
yes
yes
commit
commit
When both system B and C all reply yes
final commit of transaction
OK
OK
System C
System A
System B
19
Distributed Databases
network

Local users can interface with DBMS. Distributed
users must interface with DDBMS.
Location independence Users do not have to know
anything about where data is stored nor how it is
processed to meet their requirement.

20
Distributed Databases

Global schema schema describing all the data for
a distributed user (like logical schema)
Distributed schema schema defining where data
is located, i.e. in which local database (like
storage schema).
Location of global and distribution schema

21
Distributed Databases

Data fragmentation
Horizontal fragmentation a subset of complete
rows of a table is stored in one location.
Vertical fragmentation some columns of a table
are stored in one location and other columns in
another.
A combination of horizontal and vertical
fragmentation.
Processing queries in distributed databases
Subsidiary query - retrieve some part of the
required data from a given location.
Combine intermediate tables resulting from
subsidiary queries from different locations to
obtain final results. Combining tables involves
movement of data among locations, performed by
UNION, for horizontally fragmented database
JOIN, for vertically fragmented database

22
Replication Databases

Replication is a way of providing copies of data
to other database servers so that each user
process can perform all processing locally.
minimizing the use of remote connections.
reducing the dangers of remote site failure or
connection failure.
Advantages
shorter response time.
higher availability.
presence of data snapshots can facilitate for
auditing purpose or trend analysis.

23
Replication Databases
network

Replication server
to receive or collect updates from its local
database and pass copies on to other sites that
require them
to receive or collect updates from remote sites,
via their replications servers and apply them to
the local database.
Ways for transfer replicates
Sender pushes replicates to receiver, whenever
there is data change.
Receiver requests, from time to time, replicates
from sender (pull method).

24
Replication Databases

Ways of replication
Consolidation
collection of data fragments from a number of
locations, into one location to construct a
global view.
Dissemination
one database and copies data fragments for
distribution of different locations.
A primary source provide copies of data to
different locations, the data copies are often
the subset of primary copy.

primary
primary B
primary A
copy A copy B
copy A
copy B
25
Data Warehouse

Data warehouse
extracting data from one or more databases to
build up a large historical database, enabling
users to look for trends or patterns within the
data stored by an organization.
Separate from operational system (e.g. order
processing system) which usually holds data only
necessary for supporting day-to-day operations.
Data values held in operational system is
constantly changing and older data is archived or
even deleted. Data warehouses are simplified
structures which have not necessarily been
normalized in order to ease for making queries.
Characteristics of data warehouse
Subject oriented - data is organized around
subjects (such as sales) rather than operation
applications (such as order processing).
Non-volatile - data, once placed in the
warehouse, is usually not subject to change.
Integrated - data is consistent. Data comes from
various systems of different technologies. Their
format and meaning should be consistent when
putting into warehouse.
Time variant - historical data is recorded.

26
Data Warehouse

Dimensions When performing data analysis in
data warehouse, the decision maker would make
analysis on different dimensions, say sales by
product, sales by member (product and member are
then the dimensions). Time is always one of the
dimensions in a data warehouse.
Dimensional analysis is to identify what is the
subject (e.g. sales) for the data warehouse, and
what are the dimensions in the data warehouse,
which finally often depicted in star schema.
Facts of the subject - they are the attributes
that can be analyzed across the dimensions, e.g.
cost, selling price (for the subject sales).
Facts must can be totaled over a given time
period.

subject
dimension 1
time
dimension 2
27
Data Warehouse Model
Extraction component - copies data from the
existing data sources to allow them to put into
the warehouse database. Integration component -
ensures the consistency of data from different
data sources, which includes format integration
- logically similar data types (e.g. dates of
different formats ) are converted so that they
have the same physical data format. Presence of
a set of integration rules as a kind of map which
specifies how information has to be converted
before it is allowed into the data
warehouse. semantic integration - to make
consistent for the meaning of information. Each
kind of data should be understood by everyone -gt
requires metadata to describe about the data.
28
Data Warehouse Model
Data warehouse component - stores and manages
data. - in star schema. - the fact table (the
centre of the star) usually has a large number of
rows and a small number columns. - fact columns
should be summable which facilitates the
production of summary tables for management
decision. - the dimension tables (the points of
the star) are joined to the fact table through
foreign keys. Aggregates - in order to avoid
full scan of fact tables to process users
queries every time, some aggregated fact tables
(e.g. summing data by day or by month) are
prepared in the DBMS so that users can access
those if they happen to satisfy the requirements
of the query. Aggregate navigation component -
receives user query, analyzes which level of
aggregate tables can be applied, and if
applicable, directs the query automatically to
the aggregate tables for execution. - for
improving the query execution time. Presentation
tools - tools presenting data to users. - many
client tools available for presenting forms,
charts etc for users.
29
THE END

Write a Comment

User Comments (0)