Database management systems - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Database management systems

Description:

Sandra. Lee. 009. HR. James. Johnson. 004. IT. Bill. Williams. 010. Accounting ... Division-remainder uses mod: 0110 mod 100. 10 is address of the disk page ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 39
Provided by: thebutl
Category:

less

Transcript and Presenter's Notes

Title: Database management systems


1
Database management systems
2
Physical Storage
  • Managed by DBMS, OS or DBMS/OS
  • Includes
  • Primary Storage fast and expensive
  • Secondary Storage cheap and slow
  • The combination so must be optimized
  • DBMS makes request to buffer manager need a
    block is needed from disk. If block is already
    in buffer, buffer manager passes address to
    requestor. If the block isnt in buffer, space
    is allocated (possibly removing some other
    block).

3
File operations
  • Retrieval or update
  • DBMS decomposes selects to basic operation, then
  • Open allocates buffers, retrieves file header,
    sets pointer
  • Find Searches for first true, transfers block
    to buffer, sets file pointer
  • Read Copies record from buffer to program
    variable
  • FindNext Searches for next true, transfers to
    buffer
  • Close releases buffers, closes file

4
DBMS Storage Hierarchy
  • Page
  • aka, block, data block, blocking unit, control
    interval, row group
  • a storage location for rows of data, typically
  • the same size across the storage medium
  • multiples of 1024 bytes
  • contain an integral number of rows
  • rows in a page preferably come from the same
    table
  • used for
  • a minimal unit for disk i/o (but can read
    multiple pages)
  • locking
  • caching to a buffer pool
  • housekeeping (includes header information)
  • Extent
  • a group of contiguous, stored pages (not for I/O
    read groups are for that)
  • provides for allocation performance improvement
  • File
  • a group of contiguous extents
  • Partition
  • also a group of contiguous extent, but may be
    part of a file or multiple files

5
Record blocking
  • Fixed length
  • x Records/Block (per floor function)
  • Unused space b (bfr R) bytes
  • Variable length
  • Spanning provides pointer at end of first block
    to surface, track, block location of remainder of
    record
  • b r/bfr blocks (per ceiling function)
  • Slotted-page structure
  • Block Header Records

6
Records in a block
  • How to store records in blocks?
  • number of records r
  • block size B
  • record size R
  • blocking factor bf number of records in a
    block
  • Bf ?B/R? (spanned, unspanned)
  • number of blocks needed b
  • b ?r/bf?

7
File header
  • Contains description used for access of records
    in the file
  • Disk addresses of blocks
  • Record formats
  • Field lengths
  • Field order (fixed length)
  • Field type
  • Separators
  • Codes (variable length)

8
Physical Storage
  • Goal minimize block transfers
  • Heap file
  • Ordered file
  • Hashed file
  • Indexes
  • Hashed
  • Trees
  • Bitmaps

9
  • Physical Record Storage (Heap File)

Block 1
Block 2
10
Heap Files
  • Unordered
  • Easy writes
  • New records inserted at end of file
  • Once a block is full, pointer set to new block
    and rest of record is written
  • Requires a linear search for anything
  • Large number/size of records decrease speed
  • Deletes require write to buffer, mark record as
    deleted, write back to disk with deleted space
    left in place.
  • Modifications force periodic file reorganization
    to recover disk space
  • Fixed-length fields contiguously allocated in
    unspanned blocks improves search latency (i.e.,
    i/bfr denotes block, i mod bfr denotes location
    in block)

11
Ordered Files
  • One field (attribute) selected for ordering
  • If a key field, data is key-sequenced
  • Allows binary searches for faster retrieval
    (always retrieves mid-page between upper and
    lower limits until correct page is found), since
    log2(B) blocks accessed
  • Inserts and deletes require ordering to be
    maintained (may require writing all pages above
    affected record)
  • Overflow (transaction) file will help to reduce
    this problem
  • Typically only used if a primary index is applied
  • No gain for non-ordered fields
  • Typically requires indexed file access path

12
Indexes
  • Ordered indices values in sorted fashion
  • Hash indices values distributed across
    buckets by using a function
  • An index record consists of a value and pointers
    to one or more records with that value. Can be
  • Dense every value group indexed
  • Sparse only some values are indexed
  • Include
  • Compound indexes (values from more than one data
    column)
  • Covering index (uses values in the index for the
    SELECT clause)
  • Unique index
  • Clustering indexes (stores similar data rows near
    each other)
  • Bitmap indexes (assigns 1 if a value is true, 0
    if false)

13
B-Trees
  • Well established as the most common structures
    for indexes
  • Multi-level
  • d is the order of the tree it is a measure of
    the tree node capacity
  • Every node except the root contains m entries,
    where d/2 lt m lt d
  • The root node contains 1 lt m lt d entries
  • Non-leaf nodes with m index entries contain m1
    pointers to children
  • Pointer Pi points to a subtree with K values such
    that Ki-1 lt K lt Ki

14
B-Trees (order 2)
  • Query find all values with a pointer value of P
  • If search value is lt SearchKey value, go left
    otherwise, go right

15
B Tree index/sequence sets
16
B Trees datapage leaf
Sort order Anizy,Apach,Apensen,Ardwick,Arnham,Ath
ens
17
B Tree Example
  • Rules for this example
  • d is the order of the tree it is a measure of
    the capacity of child nodes
  • Every node except the root contains m entries,
    where d/2 lt m lt d
  • The root node contains 1 lt m lt d entries
  • Non-leaf nodes with m index entries contain
    between (m1)/2 and m1 pointers to children
  • Pointer Pi points to a subtree with K values such
    that Ki-1 lt K lt Ki
  • Search uses pointer to the right for greater than
    or equal to in non-leaf nodes, greater than in
    leaf nodes until equal to is found or not found

18
Beginning tree(order of 4)
d 4, so root contains between 1 and 4 values
and 2 and 5 node pointers, non-leaf nodes have
between 2 and 4 values and 3 and 5 node pointers
and leaf nodes have between 2 and 4 entries
gt look right algorithm
19
Insert 28
20
Insert 70
21
Insert 95
22
Delete 70
23
Delete 25
24
Delete 60
25
B-trees performance impact
  • A 4k page can many records per page
  • ((4 b/pointer 4b/field)n, 4b/pointer) order
    of 512
  • Root 511 records
  • Level 1 261,632 records
  • Level 2 133,955,584 records
  • Total 134,217,727 records
  • Shallow is better

26
B-trees performance impact
  • 1000000 records of 300B (including header)
  • Search key is a 4 byte int a pointer requires 4
    bytes
  • 4KB blocks, no block header, random placement,
    avg. retrieval time 5.6 ms
  • No time for memory reads
  • 13.6 records/block 76924 blocks to store data
  • 512 indexes/block 1954 blocks to store index
  • No index
  • (76924/2) 38462 block accesses (avg.)
  • Time to find a record 38462 5.6 ms 215.4 s
  • Indexed, binary search
  • log(1954) 1 11 1 12 block accesses
    (maximum)time to find a record 12 5.6 ms
    67.2 ms
  • Indexing increased speed by 3205 times.

27
Hash Files
  • Records written in non-sequential order
  • Hash function calculates address of the page
    where record is stored
  • based on a one or more base fields (hash field)
  • If a key field, called hash key
  • Hash function creates even spread of records
    across file
  • Folding applies math to different parts of the
    has field (empID 0110 could become (01)10. 11
    is address of disk page
  • Division-remainder uses mod 0110 mod 100. 10 is
    address of the disk page

28
Database management systems
  • Database Planning

29
Database management
  • The database application lifecycle
  • Planning and defining
  • Requirements collection and analysis
  • Design
  • Prototyping
  • Implementation
  • Loading
  • Testing
  • Maintenance

30
Database management
  • Planning and defining
  • Create a mapping between business goals and
    information needs
  • Mission statement (Strategy)
  • Mission objectives (Tactics)
  • Describe the scope and boundaries of uses for the
    projected future, particularly in regard to
  • What data are needed
  • How the data are used
  • Describe the current IT infrastructure
  • Compare strengths and weakness of existing
    systems with regard to those goals
  • Compare strengths and weakness of existing
    systems to commercially available systems

31
Database management
  • Requirements collection and analysis for each
    major user view
  • Requirements (functional) specifications
  • A description of the data used
  • Details of the process
  • Details of constraints on the process and data
  • Consolidate multiple user views
  • Centralized 1 set of specifications, 1 global
    data model
  • Integrated multiple sets of specifications and
    data models
  • Mixed a combination of centralized and
    integrated

32
Database management
  • Design (Data modeling)
  • Top-down
  • from conceptual entities and relationships to
    logical relations to data
  • Entity-Relationship (ER) concepts are applied
  • Bottom-up
  • from data to attributes to logical relations to
    relationships
  • Functional dependencies and Normalization
    concepts are applied
  • Inside-out
  • Focus on the most important entities (top-down),
    implement, and grow
  • Mixed
  • A combination of both top-down and bottom-up

33
Database management
  • Data modeling
  • Easy to comprehend
  • Provides a mechanism for agreement
  • Assists in delegation of authority and
    responsibility
  • Remains independent of physical implementation

34
Database management
  • Data modeling criteria
  • Structural validity consistent with the
    business operation
  • Simplicity for both technical and operational
    personnel
  • Expressibility distinguishes between different
    data, relationships and constraints
  • Nonredundancy no extraneous or repetitive
    information
  • Shareability not limited by technology
  • Extensibility evolves easily to meet new needs
  • Integrity ensures consistent use of information
  • Diagrammatic representation visually
    understandable

35
Database management
  • Design phases
  • Conceptual
  • Close interaction between IT and Operations staff
  • No physical considerations
  • Identifies entities, relationships, attributes,
    cardinality ratios
  • Logical
  • Mapped from the conceptual model
  • Typically a specified (e.g., relational) data
    model
  • Considers the needs of an unspecified DBMS using
    that data model
  • Utilizes the normalization process
  • Physical
  • Mapped from the logical model
  • Considers the needs of a specified DBMS
  • Describes base relations and constraints, file
    organizations, indexes, secondary storage
    requirements, security requirements
  • The most volatile of the three phases

36
Database management
  • DBMS selection
  • Identify the top three
  • Rate functionality with weighted requirement
    specifications
  • Rank each product, noting limitations for any
    requirement specifications

37
Database management
  • Application design
  • Three main types of transactions
  • Retrieval
  • Update
  • mixed
  • For each transaction
  • Identify data used
  • Identify functional characteristics
  • Describe output
  • Calibrate rate of use
  • Indicate significance to users

38
Database management
  • Fact-finding techniques
  • Documentation
  • Forms
  • Standards
  • Procedures
  • Interviews
  • Group
  • Individual
  • Observation
  • Time-motion studies
  • Process-control analyses
  • Research
  • Internal
  • external
  • Questionnaires
  • Closed
  • Open-ended
Write a Comment
User Comments (0)
About PowerShow.com