DATABASE - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

DATABASE

Description:

DATABASE PHYSICAL DESIGN Chandra S. Amaravadi * – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 57
Provided by: wiu58
Learn more at: http://faculty.wiu.edu
Category:
Tags: database | hashing

less

Transcript and Presenter's Notes

Title: DATABASE


1
DATABASE PHYSICAL DESIGN
Chandra S. Amaravadi
2
INTRODUCTION
3
PHYSICAL DATABASE DESIGN
Physical database design is concerned with issues
revolving around data base implementation
  • Implementation design
  • Database storage, access location
  • File organization constraints

4
THE THREE FORMS OF DATA
External
Cust Name Address Balance
100 Gordon 110 Oak Street 400
200 Prasad 22 Birch place 2500
300 . .......
Conceptual/ Base table
Internal/ Hardware level
These three levels provide logical and physical
data independence
5
THE THREE TYPES OF MODELS
External
Create view Drop view
Views
Conceptual
Create table Alter table
Schemas
File Organizations
Create index drop index
Internal
Models
Facilities
6
DATABASE PHYSICAL DESIGN
Inputs?
7
COMPONENTS OF PHYSICAL DESIGN
1. Implementation design 2. Storage, access
distribution strategies 3. File organizations
4. Specifications for integrity constraints
(later)
8
Implementation design
Concerned with taking the results of
normalization and designing tables, attributes,
data types for implementation.
  • Decide on tables (de-normalization)
  • Decide on primary and cross reference keys (not
    discussed further)
  • Decide on attribute data types (not discussed
    further)
  • E.g. fixed vs variable length fields
  • integer vs double integer
  • Design reports and forms (not discussed further)

Field Name Data type Description Length Decimals
Prod Numeric Unique prod code 6 0
Descr Text Short prod description 25 0
Price Currency Product price 6 2
9
DECIDING ON TABLES
Denormalization is going back in the normal forms
to reduce schema overhead
Denormalization Example (for 11)
Parts(Part, PartName, ) Container (ContainerID,
fin, needed, Part)
Parts(Part, PartName, ContainerID, fin,
needed)
10
DECIDING ON TABLES..
Denormalization Example (for MN)
Ord
Ord_dt
Prod
Descr.
ORDERS
PRODUCTS
Are for
Qty
What tables does normalization result in?
11
DENORMALIZATION
Orders(ord, ord_dt, ..) Product(prod., descr,
..) Orders for prod (prod., ord, qty)
Orders(ord, ord_dt, ..) Product(prod., ord,
descr., qty..)
12
COMPONENTS OF PHYSICAL DESIGN..
1. Implementation design 2. Storage and access
strategies 3. Distribution strategies 4. File
organizations 5. Specifications for integrity
constraints (later)
13
STORAGE ACCESS STRATEGIES
ALSO CALLED VOLUME USAGE ANALYSIS
OBJECTIVES
  • Estimate storage requirements (Volume analysis)
  • Determine media to be used (not discussed)
  • Study how data is being acccessed (Usage
    analysis)
  • Use these to develop file organization (later)

Volume and Usage analysis is carried out with a
composite usage map.
14
COMPOSITE USAGE MAP
A composite usage map is simply an ER chart
(without attr), that shows the number of records,
and the frequency/pattern with which they are
accessed.
  • Used for volume usage analysis ? file org.
  • Superimposed on ER Chart
  • Attributes are not shown
  • Shows estimated number of records (volume)
  • Shows type of access (dotted lines )

15
VOLUME USAGE ANALYSIS
  • Equipment, Parts and PE tables
  • Equipment 100
  • Parts12,000
  • PE 10,000
  • 20 inquiries per hour to Equipment
  • 300 inquiries per hour on Parts table
  • 70 of these inquiries also need to know
    Equipment info.

Draw a composite usage map, estimate storage
requirements and develop a suitable file
organization
16
COMPOSITE USAGE MAP
20
EQUIPMENT
(100)
ARE FOR
PE
???
(10,000)
PARTS
(12,000)
????
17
FOR DISCUSSION
How can one estimate the size of a database?
18
ESTIMATING STORAGE REQMTS. FOR PARTS AND EQUIPMENT
7 10
12 2 1 1 EQUIPMENT
(Model, Descr, Mfr., Price, HP, WT)
1 10 12 2
PARTS(Part, Descr, Mfr, Price) 7
1 1 PE (Model, Part, Qty)
Equipment table 71012211 33
bytes/record Parts table ?? PE table ??
Total storage requirements ??
19
A MORE ELABORATE EXAMPLE
70
40
  • Parts are manufactured parts and purchased parts
  • Parts 1,000 Suppliers50 Quotations 2,500
  • Total of 200 parts inquiries
  • 60 direct inquiries to purchased parts
  • Of the purchased parts inquiries, 80 are also to
  • quotation
  • Of these 80, 70 are to supplier as well.
  • 75 direct queries to supplier
  • Of these 40 are for quotation
  • All of these are also for parts

20
ANOTHER EXAMPLE..
200
75
SUPPLIER
PART
70
60
(50)
(1000)
Is-a
80
140
40
70
40
MANU- FACTURED
PURCH- ASED
QUOTA- TION
(700)
(400)
(2500)
40
80
A COMPOSITE USAGE MAP
Note of records are in red the of accesses
are in blue
21
STORAGE REQUIREMENTS
PART TABLE
PART_NO (5) DESCRIPTION (15) LOCATION (10) QUAN
TITY (1) RECORD SIZE 31 FILE SIZE
31 1100 34,300 Bytes
QUOTATION TABLE
Estimated record size 150 Estimated
file size 1502500
375,000 Bytes
Note This is done similarly for other tables.
22
COMPONENTS OF PHYSICAL DESIGN..
1. Implementation design 2. Storage access
strategies 3. Distribution strategies 4. File
organizations 5. Specifications for integrity
constraints (later)
23
DISTRIBUTION STRATEGIES
Distribution strategies are concerned with where
the files are physically located.
1. Centralized
Replicated (not discussed)
2. Distributed
Partitioned
24
DISTRIBUTION STRATEGIES
Centralized -- All the data is stored in one
physical
location. Distributed -- The data is stored in
multiple physical
locations. Replicated -- The database is
duplicated in multiple
locations. Partitioned -- The database is divided
into fragments and each
fragment is stored in a different
location.
25
CENTRALIZED VS DISTRIBUTED
  • Which is bottleneck?
  • Which causes security problems?
  • Which method may be required for business
    reasons?
  • In which setup is data more accessible?
  • Which provides better performance?

26
CENTRALIZED STRATEGY
General Principle
Maximize local access, minimize remote access
S1
S2
100
600
S3
WHERE SHOULD WE LOCATE THE DATABASE? S1, S2 or S3
500
27
This slide is blank
28
DISTRIBUTED DATABASE
partitioning
EID Name City
2356 Armstrong LA
3286 Nickerson SF
3356 Forrester MPLS
MPLS
LA
SF
29
COMPONENTS OF PHYSICAL DESIGN..
1. Implementation design 2. Storage access
strategies 3. Distribution strategies 4. File
organizations 5. Specifications for integrity
constraints (later)
30
FILE ORGANIZATION
File 1



Rec. 1,2..
Tracks
Sectors
How records are arranged on secondary storage
or mapping between ____ and ______?
31
DATA ACCESS (FYI)
O/S
DBMS
Requests
Consults
FAT/NTFS
Directory tables
Generates instructions to IOP
Hard drive
Partition
IOP
RAM
Database storage
32
FILE ORGANIZATION
Selection Criteria
  • Retrieval time (disk access)
  • Access type (direct, sequential)
  • Storage space
  • Maintenance effort

33
OVERVIEW OF FILE ORGANIZATIONS
  • Sequential
  • Hashed
  • Indexed

ISAM
VSAM
34
OVERVIEW OF FILE ORGANIZATIONS..
Sequential -- Records are stored one after
another in pkey sequence. Hashed
-- Record address is determined by subjecting
pkey to hashing algorithm. Indexed -- Same as
sequential except that there is an index file
which places keys into a separate file for
ease of searching.
35
THE SEQUENTIAL ORGANIZATION
  • Records in Pkey sequence
  • Access only sequential
  • Insertions/Deletions in sequential order
  • Simple organization
  • good for batch updates

Part Descr.
100 Aux. motors
120 Scrapers
124 Rotors
..... ............
36
THE HASHING ORGANIZATION
A type of file organization where record
addresses are generated by subjecting primary
keys to a hashing routine, usually by dividing by
a prime
Hashing Algorithm
Pkey
Hash Address

Address of Starting Block
REM (Pkey)/(Prime)
3432
37
HASHING CONCEPTS
3432
Following are important concepts in hashing
Record address hash address physical addr
  • Hashing algorithm
  • Hash address
  • Buckets Bucket size
  • Slots
  • Collisions/overflows
  • Load factor
  • Search length

43
1 2 3 4 5 6 7 .. n
File space
Pkey 43 Hash address (43 remainder 7)
1 Record address 3432 1 3433
38
HASHING CONCEPTS..
Hashing algorithm the formula used to calculate
a record address Hash address an address
(within block) where a hashed record is
stored Buckets storage area for a group of
records bucket size refers to of slots. Slots
storage area for an individual record Collision
when two records hash to the same address Load
factor is the ratio of of records to the
total space allocated Average search length is
the time it takes to retrieve a record on the
avg. (usually expressed in terms of disk
accesses) Disk access every time a disk is
accessed for getting a record (if the record is
stored in its hardware address, one access
otherwise it depends on record location)
39
HASHING ALGORITHM
  • Choose load factor
  • Identify of buckets to be allocated
  • Select a prime close to this number
  • Divide each pkey by prime
  • Remainder record address
  • Sequentially number the buckets
  • Place each record to its address
  • If there are overflows, use Open

40
HASHING CONCEPTS..
Collision When two keys hash to the same address
1 2 3 4 5 6 7 .. n
  • Open overflow(store in unallocated slots)
  • Chained overflow(a separate area)

OVERFLOWS
41
HASHING EXAMPLE
Given Parts 100 Gears 120 Scrapers 130 Aux
motors 140 Crankshafts 145 Cylinder
heads 150 Pistons
100 Mod 7 2 120 Mod 7 1 130 Mod 7 4
140 Mod 7 0 145 Mod 7 5 150 Mod 7 3
  • assume 8 buckets (0..7)
  • assume 1 slot per bucket
  • assume disk access time of 20 ms

42
HASHING EXAMPLE..
Bucket size ?
0
140 Crankshaft
120 Scrapers
1
Insert 135 Shovel? 135 Mod 7 2
2
100 Gears
150 Pistons
3
130 Aux. motor
4
Average search length? 6 records -gt 1 access 1
record -gt 2 accesses
145 Cylinders
5
6
7
Load factor ?
FILE LOADINGS
43
THE HASHING ORGANIZATION
EVALUATION
  • H(pkey) --gt record address
  • Records in hash sequence
  • Need to allocate extra space
  • Load factor between 60-80
  • Good for low activity (FAR) files
  • Real-time and OO applns.

44
DISCUSSION
A parts file with Part as the pkey includes
records with the following part values
23,37,46,48, 56,18, 10, 71, 16, 24, 39, 47 and
69. The file uses 8 buckets numbered 0 to 7.
Each bucket holds two records. Load these
records into the file in the given order using
the hash function h(K) K mod 8. Calculate the
average search length in terms of of disk
accesses.
45
INDEXED ORGANIZATION
A method of file organization where a subset of
key values are stored in an index. Types are
  • Primary key
  • Secondary key
  • Clustered

46
THE INDEXED ORGANIZATION
(ISAM)
  • Records are in pkey sequence (master file)
  • But are organized into groups
  • Grouping information is stored in
  • index file
  • Records can be inserted at random
  • Records can be accessed in sequence or at random

47
THE INDEXED ORGANIZATION
Master file (sequence set)
Index file (index set)
Emp ID
48
THE INDEXED ORGANIZATION
CYLINDER2
CYLINDER1
TRACKS
CYLINDER2
CYLINDER1
49
THE ISAM ORGANIZATION
Track index
Cylinder index
87 189 300
Index Set
43 69 87
136 150
.
122 136
.
24 32 43
141 150 172
Sequence Set
45 62 69
.
250 300
74 77 87
175 181 189
278 281 300
CYLINDER1
CYLINDER N..
Overflow tracks
Note Assume that the corresponding HW addresses
are stored along with the pkeys
50
INSERTIONS IN ISAM
  • Identify track where record needs
  • to be inserted
  • If the track is full, insert in overflow area
  • If the track has room insert pkey in sequence
  • Update track index and cylinder index if necessary

51
ISAM ADVANTAGES AND DISADVANTAGES
  • Access is direct or sequential?
  • Access time dependent on?
  • Rewrite sequentially
  • Retrieval time uniform
  • Suitable for volatile files?
  • Workhorse organization used in
  • most apps.

52
SECONDARY KEY INDEX
E_TITLE REC
Analyst 2,5,6 Manager
4,7 Programmer 1,3
EMPLOYEE
REC E_SSN E_NAME
E_TITLE E_SALARY
1. 456-34-8895 Smith
Programmer 35,000 2. 459-66-6785
Johnson Analyst
27,000 3. 467-89-8898 Weintraub
Programmer 60,000 4. 478-64-8005
Dickson Manager
64,000 5. 489-12-5575 Holland Analyst
47,000 6. 492-93-4438
Rao Analyst 71,000
7. 537-89-8898 McDonald Manager
85,000
53
CLUSTERED INDEX
E_TITLE REC
Also known as Inverted file organization
Analyst 1 Manager
4 Programmer 6
EMPLOYEE
REC E_SSN E_NAME
E_TITLE E_SALARY
1. 459-66-6785 Johnson Analyst
27,000 2. 489-12-5575
Holland Analyst 47,000
3. 492-93-4438 Rao Analyst
71,000 4. 478-64-8005
Dickson Manager
64,000 5. 467-89-8898 McDonald
Manager 85,000 6. 467-89-8898
Weintraub Programmer
60,000 7. 456-34-8895 Smith
Programmer 35,000
54
INDEXING STRATEGIES
  • Index if you must
  • Index on pkey
  • Index on foreign keys
  • Index on secondary key
  • (depending on query frequency)

55
DISCUSSION
  • What activities are part of identifying storage
    strategies?
  • How is denormalization carried out for MN
    relationships?
  • How many indexes can you have per table?
  • How many clustered indexes?
  • Can we sequentially update all records in
  • a) hashing organization? b) in indexing?
  • Is indexing suitable for volatile files?
  • If an index consists of 3 levels of indexes with
    the
  • main index in RAM, and a disk access time of 20
    MS,
  • how long on the average does it take to retrieve
    a record?
  • What problems do overflow records cause in
    hashing?

56
THE END!
Write a Comment
User Comments (0)
About PowerShow.com