Database management systems - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

Database management systems

Description:

Sandra. Lee. 009. HR. James. Johnson. 004. IT. Bill. Williams. 010. Accounting ... Division-remainder uses mod: 0110 mod 100. 10 is address of the disk page ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 39

Provided by: thebutl

Category:

more less

Transcript and Presenter's Notes

Title: Database management systems

1
Database management systems
2
Physical Storage

Managed by DBMS, OS or DBMS/OS
Includes
Primary Storage fast and expensive
Secondary Storage cheap and slow
The combination so must be optimized
DBMS makes request to buffer manager need a
block is needed from disk. If block is already
in buffer, buffer manager passes address to
requestor. If the block isnt in buffer, space
is allocated (possibly removing some other
block).

3
File operations

Retrieval or update
DBMS decomposes selects to basic operation, then
Open allocates buffers, retrieves file header,
sets pointer
Find Searches for first true, transfers block
to buffer, sets file pointer
Read Copies record from buffer to program
variable
FindNext Searches for next true, transfers to
buffer
Close releases buffers, closes file

4
DBMS Storage Hierarchy

Page
aka, block, data block, blocking unit, control
interval, row group
a storage location for rows of data, typically
the same size across the storage medium
multiples of 1024 bytes
contain an integral number of rows
rows in a page preferably come from the same
table
used for
a minimal unit for disk i/o (but can read
multiple pages)
locking
caching to a buffer pool
housekeeping (includes header information)
Extent
a group of contiguous, stored pages (not for I/O
read groups are for that)
provides for allocation performance improvement
File
a group of contiguous extents
Partition
also a group of contiguous extent, but may be
part of a file or multiple files

5
Record blocking

Fixed length
x Records/Block (per floor function)
Unused space b (bfr R) bytes
Variable length
Spanning provides pointer at end of first block
to surface, track, block location of remainder of
record
b r/bfr blocks (per ceiling function)
Slotted-page structure
Block Header Records

6
Records in a block

How to store records in blocks?
number of records r
block size B
record size R
blocking factor bf number of records in a
block
Bf ?B/R? (spanned, unspanned)
number of blocks needed b
b ?r/bf?

7
File header

Contains description used for access of records
in the file
Disk addresses of blocks
Record formats
Field lengths
Field order (fixed length)
Field type
Separators
Codes (variable length)

8
Physical Storage

Goal minimize block transfers
Heap file
Ordered file
Hashed file
Indexes
Hashed
Trees
Bitmaps

Physical Record Storage (Heap File)

Block 1
Block 2
10
Heap Files

Unordered
Easy writes
New records inserted at end of file
Once a block is full, pointer set to new block
and rest of record is written
Requires a linear search for anything
Large number/size of records decrease speed
Deletes require write to buffer, mark record as
deleted, write back to disk with deleted space
left in place.
Modifications force periodic file reorganization
to recover disk space
Fixed-length fields contiguously allocated in
unspanned blocks improves search latency (i.e.,
i/bfr denotes block, i mod bfr denotes location
in block)

11
Ordered Files

One field (attribute) selected for ordering
If a key field, data is key-sequenced
Allows binary searches for faster retrieval
(always retrieves mid-page between upper and
lower limits until correct page is found), since
log2(B) blocks accessed
Inserts and deletes require ordering to be
maintained (may require writing all pages above
affected record)
Overflow (transaction) file will help to reduce
this problem
Typically only used if a primary index is applied
No gain for non-ordered fields
Typically requires indexed file access path

12
Indexes

Ordered indices values in sorted fashion
Hash indices values distributed across
buckets by using a function
An index record consists of a value and pointers
to one or more records with that value. Can be
Dense every value group indexed
Sparse only some values are indexed
Include
Compound indexes (values from more than one data
column)
Covering index (uses values in the index for the
SELECT clause)
Unique index
Clustering indexes (stores similar data rows near
each other)
Bitmap indexes (assigns 1 if a value is true, 0
if false)

13
B-Trees

Well established as the most common structures
for indexes
Multi-level
d is the order of the tree it is a measure of
the tree node capacity
Every node except the root contains m entries,
where d/2 lt m lt d
The root node contains 1 lt m lt d entries
Non-leaf nodes with m index entries contain m1
pointers to children
Pointer Pi points to a subtree with K values such
that Ki-1 lt K lt Ki

14
B-Trees (order 2)

Query find all values with a pointer value of P
If search value is lt SearchKey value, go left
otherwise, go right

15
B Tree index/sequence sets
16
B Trees datapage leaf
Sort order Anizy,Apach,Apensen,Ardwick,Arnham,Ath
ens
17
B Tree Example

Rules for this example
d is the order of the tree it is a measure of
the capacity of child nodes
Every node except the root contains m entries,
where d/2 lt m lt d
The root node contains 1 lt m lt d entries
Non-leaf nodes with m index entries contain
between (m1)/2 and m1 pointers to children
Pointer Pi points to a subtree with K values such
that Ki-1 lt K lt Ki
Search uses pointer to the right for greater than
or equal to in non-leaf nodes, greater than in
leaf nodes until equal to is found or not found

18
Beginning tree(order of 4)
d 4, so root contains between 1 and 4 values
and 2 and 5 node pointers, non-leaf nodes have
between 2 and 4 values and 3 and 5 node pointers
and leaf nodes have between 2 and 4 entries
gt look right algorithm
19
Insert 28
20
Insert 70
21
Insert 95
22
Delete 70
23
Delete 25
24
Delete 60
25
B-trees performance impact

A 4k page can many records per page
((4 b/pointer 4b/field)n, 4b/pointer) order
of 512
Root 511 records
Level 1 261,632 records
Level 2 133,955,584 records
Total 134,217,727 records
Shallow is better

26
B-trees performance impact

1000000 records of 300B (including header)
Search key is a 4 byte int a pointer requires 4
bytes
4KB blocks, no block header, random placement,
avg. retrieval time 5.6 ms
No time for memory reads
13.6 records/block 76924 blocks to store data
512 indexes/block 1954 blocks to store index
No index
(76924/2) 38462 block accesses (avg.)
Time to find a record 38462 5.6 ms 215.4 s
Indexed, binary search
log(1954) 1 11 1 12 block accesses
(maximum)time to find a record 12 5.6 ms
67.2 ms
Indexing increased speed by 3205 times.

27
Hash Files

Records written in non-sequential order
Hash function calculates address of the page
where record is stored
based on a one or more base fields (hash field)
If a key field, called hash key
Hash function creates even spread of records
across file
Folding applies math to different parts of the
has field (empID 0110 could become (01)10. 11
is address of disk page
Division-remainder uses mod 0110 mod 100. 10 is
address of the disk page

28
Database management systems

Database Planning

29
Database management

The database application lifecycle
Planning and defining
Requirements collection and analysis
Design
Prototyping
Implementation
Loading
Testing
Maintenance

30
Database management

Planning and defining
Create a mapping between business goals and
information needs
Mission statement (Strategy)
Mission objectives (Tactics)
Describe the scope and boundaries of uses for the
projected future, particularly in regard to
What data are needed
How the data are used
Describe the current IT infrastructure
Compare strengths and weakness of existing
systems with regard to those goals
Compare strengths and weakness of existing
systems to commercially available systems

31
Database management

Requirements collection and analysis for each
major user view
Requirements (functional) specifications
A description of the data used
Details of the process
Details of constraints on the process and data
Consolidate multiple user views
Centralized 1 set of specifications, 1 global
data model
Integrated multiple sets of specifications and
data models
Mixed a combination of centralized and
integrated

32
Database management

Design (Data modeling)
Top-down
from conceptual entities and relationships to
logical relations to data
Entity-Relationship (ER) concepts are applied
Bottom-up
from data to attributes to logical relations to
relationships
Functional dependencies and Normalization
concepts are applied
Inside-out
Focus on the most important entities (top-down),
implement, and grow
Mixed
A combination of both top-down and bottom-up

33
Database management

Data modeling
Easy to comprehend
Provides a mechanism for agreement
Assists in delegation of authority and
responsibility
Remains independent of physical implementation

34
Database management

Data modeling criteria
Structural validity consistent with the
business operation
Simplicity for both technical and operational
personnel
Expressibility distinguishes between different
data, relationships and constraints
Nonredundancy no extraneous or repetitive
information
Shareability not limited by technology
Extensibility evolves easily to meet new needs
Integrity ensures consistent use of information
Diagrammatic representation visually
understandable

35
Database management

Design phases
Conceptual
Close interaction between IT and Operations staff
No physical considerations
Identifies entities, relationships, attributes,
cardinality ratios
Logical
Mapped from the conceptual model
Typically a specified (e.g., relational) data
model
Considers the needs of an unspecified DBMS using
that data model
Utilizes the normalization process
Physical
Mapped from the logical model
Considers the needs of a specified DBMS
Describes base relations and constraints, file
organizations, indexes, secondary storage
requirements, security requirements
The most volatile of the three phases