Title: DATABASE
1DATABASE PHYSICAL DESIGN
Chandra S. Amaravadi
2INTRODUCTION
3PHYSICAL DATABASE DESIGN
Physical database design is concerned with issues
revolving around data base implementation
- Implementation design
- Database storage, access location
- File organization constraints
4THE THREE FORMS OF DATA
External
Cust Name Address Balance
100 Gordon 110 Oak Street 400
200 Prasad 22 Birch place 2500
300 . .......
Conceptual/ Base table
Internal/ Hardware level
These three levels provide logical and physical
data independence
5THE THREE TYPES OF MODELS
External
Create view Drop view
Views
Conceptual
Create table Alter table
Schemas
File Organizations
Create index drop index
Internal
Models
Facilities
6DATABASE PHYSICAL DESIGN
Inputs?
7COMPONENTS OF PHYSICAL DESIGN
1. Implementation design 2. Storage, access
distribution strategies 3. File organizations
4. Specifications for integrity constraints
(later)
8Implementation design
Concerned with taking the results of
normalization and designing tables, attributes,
data types for implementation.
- Decide on tables (de-normalization)
- Decide on primary and cross reference keys (not
discussed further) - Decide on attribute data types (not discussed
further) - E.g. fixed vs variable length fields
- integer vs double integer
- Design reports and forms (not discussed further)
Field Name Data type Description Length Decimals
Prod Numeric Unique prod code 6 0
Descr Text Short prod description 25 0
Price Currency Product price 6 2
9DECIDING ON TABLES
Denormalization is going back in the normal forms
to reduce schema overhead
Denormalization Example (for 11)
Parts(Part, PartName, ) Container (ContainerID,
fin, needed, Part)
Parts(Part, PartName, ContainerID, fin,
needed)
10DECIDING ON TABLES..
Denormalization Example (for MN)
Ord
Ord_dt
Prod
Descr.
ORDERS
PRODUCTS
Are for
Qty
What tables does normalization result in?
11DENORMALIZATION
Orders(ord, ord_dt, ..) Product(prod., descr,
..) Orders for prod (prod., ord, qty)
Orders(ord, ord_dt, ..) Product(prod., ord,
descr., qty..)
12COMPONENTS OF PHYSICAL DESIGN..
1. Implementation design 2. Storage and access
strategies 3. Distribution strategies 4. File
organizations 5. Specifications for integrity
constraints (later)
13STORAGE ACCESS STRATEGIES
ALSO CALLED VOLUME USAGE ANALYSIS
OBJECTIVES
- Estimate storage requirements (Volume analysis)
- Determine media to be used (not discussed)
- Study how data is being acccessed (Usage
analysis) - Use these to develop file organization (later)
Volume and Usage analysis is carried out with a
composite usage map.
14COMPOSITE USAGE MAP
A composite usage map is simply an ER chart
(without attr), that shows the number of records,
and the frequency/pattern with which they are
accessed.
- Used for volume usage analysis ? file org.
- Superimposed on ER Chart
- Attributes are not shown
- Shows estimated number of records (volume)
- Shows type of access (dotted lines )
15VOLUME USAGE ANALYSIS
- Equipment, Parts and PE tables
- Equipment 100
- Parts12,000
- PE 10,000
- 20 inquiries per hour to Equipment
- 300 inquiries per hour on Parts table
- 70 of these inquiries also need to know
Equipment info.
Draw a composite usage map, estimate storage
requirements and develop a suitable file
organization
16COMPOSITE USAGE MAP
20
EQUIPMENT
(100)
ARE FOR
PE
???
(10,000)
PARTS
(12,000)
????
17FOR DISCUSSION
How can one estimate the size of a database?
18ESTIMATING STORAGE REQMTS. FOR PARTS AND EQUIPMENT
7 10
12 2 1 1 EQUIPMENT
(Model, Descr, Mfr., Price, HP, WT)
1 10 12 2
PARTS(Part, Descr, Mfr, Price) 7
1 1 PE (Model, Part, Qty)
Equipment table 71012211 33
bytes/record Parts table ?? PE table ??
Total storage requirements ??
19A MORE ELABORATE EXAMPLE
70
40
- Parts are manufactured parts and purchased parts
- Parts 1,000 Suppliers50 Quotations 2,500
- Total of 200 parts inquiries
- 60 direct inquiries to purchased parts
- Of the purchased parts inquiries, 80 are also to
- quotation
- Of these 80, 70 are to supplier as well.
- 75 direct queries to supplier
- Of these 40 are for quotation
- All of these are also for parts
20ANOTHER EXAMPLE..
200
75
SUPPLIER
PART
70
60
(50)
(1000)
Is-a
80
140
40
70
40
MANU- FACTURED
PURCH- ASED
QUOTA- TION
(700)
(400)
(2500)
40
80
A COMPOSITE USAGE MAP
Note of records are in red the of accesses
are in blue
21STORAGE REQUIREMENTS
PART TABLE
PART_NO (5) DESCRIPTION (15) LOCATION (10) QUAN
TITY (1) RECORD SIZE 31 FILE SIZE
31 1100 34,300 Bytes
QUOTATION TABLE
Estimated record size 150 Estimated
file size 1502500
375,000 Bytes
Note This is done similarly for other tables.
22COMPONENTS OF PHYSICAL DESIGN..
1. Implementation design 2. Storage access
strategies 3. Distribution strategies 4. File
organizations 5. Specifications for integrity
constraints (later)
23DISTRIBUTION STRATEGIES
Distribution strategies are concerned with where
the files are physically located.
1. Centralized
Replicated (not discussed)
2. Distributed
Partitioned
24DISTRIBUTION STRATEGIES
Centralized -- All the data is stored in one
physical
location. Distributed -- The data is stored in
multiple physical
locations. Replicated -- The database is
duplicated in multiple
locations. Partitioned -- The database is divided
into fragments and each
fragment is stored in a different
location.
25CENTRALIZED VS DISTRIBUTED
- Which is bottleneck?
- Which causes security problems?
- Which method may be required for business
reasons? - In which setup is data more accessible?
- Which provides better performance?
26CENTRALIZED STRATEGY
General Principle
Maximize local access, minimize remote access
S1
S2
100
600
S3
WHERE SHOULD WE LOCATE THE DATABASE? S1, S2 or S3
500
27This slide is blank
28DISTRIBUTED DATABASE
partitioning
EID Name City
2356 Armstrong LA
3286 Nickerson SF
3356 Forrester MPLS
MPLS
LA
SF
29COMPONENTS OF PHYSICAL DESIGN..
1. Implementation design 2. Storage access
strategies 3. Distribution strategies 4. File
organizations 5. Specifications for integrity
constraints (later)
30FILE ORGANIZATION
File 1
Rec. 1,2..
Tracks
Sectors
How records are arranged on secondary storage
or mapping between ____ and ______?
31DATA ACCESS (FYI)
O/S
DBMS
Requests
Consults
FAT/NTFS
Directory tables
Generates instructions to IOP
Hard drive
Partition
IOP
RAM
Database storage
32FILE ORGANIZATION
Selection Criteria
- Retrieval time (disk access)
- Access type (direct, sequential)
- Storage space
- Maintenance effort
33OVERVIEW OF FILE ORGANIZATIONS
- Sequential
- Hashed
- Indexed
ISAM
VSAM
34OVERVIEW OF FILE ORGANIZATIONS..
Sequential -- Records are stored one after
another in pkey sequence. Hashed
-- Record address is determined by subjecting
pkey to hashing algorithm. Indexed -- Same as
sequential except that there is an index file
which places keys into a separate file for
ease of searching.
35THE SEQUENTIAL ORGANIZATION
- Records in Pkey sequence
- Access only sequential
- Insertions/Deletions in sequential order
- Simple organization
- good for batch updates
Part Descr.
100 Aux. motors
120 Scrapers
124 Rotors
..... ............
36THE HASHING ORGANIZATION
A type of file organization where record
addresses are generated by subjecting primary
keys to a hashing routine, usually by dividing by
a prime
Hashing Algorithm
Pkey
Hash Address
Address of Starting Block
REM (Pkey)/(Prime)
3432
37HASHING CONCEPTS
3432
Following are important concepts in hashing
Record address hash address physical addr
- Hashing algorithm
- Hash address
- Buckets Bucket size
- Slots
- Collisions/overflows
- Load factor
- Search length
43
1 2 3 4 5 6 7 .. n
File space
Pkey 43 Hash address (43 remainder 7)
1 Record address 3432 1 3433
38HASHING CONCEPTS..
Hashing algorithm the formula used to calculate
a record address Hash address an address
(within block) where a hashed record is
stored Buckets storage area for a group of
records bucket size refers to of slots. Slots
storage area for an individual record Collision
when two records hash to the same address Load
factor is the ratio of of records to the
total space allocated Average search length is
the time it takes to retrieve a record on the
avg. (usually expressed in terms of disk
accesses) Disk access every time a disk is
accessed for getting a record (if the record is
stored in its hardware address, one access
otherwise it depends on record location)
39HASHING ALGORITHM
- Choose load factor
- Identify of buckets to be allocated
- Select a prime close to this number
- Divide each pkey by prime
- Remainder record address
- Sequentially number the buckets
- Place each record to its address
- If there are overflows, use Open
40HASHING CONCEPTS..
Collision When two keys hash to the same address
1 2 3 4 5 6 7 .. n
- Open overflow(store in unallocated slots)
- Chained overflow(a separate area)
OVERFLOWS
41HASHING EXAMPLE
Given Parts 100 Gears 120 Scrapers 130 Aux
motors 140 Crankshafts 145 Cylinder
heads 150 Pistons
100 Mod 7 2 120 Mod 7 1 130 Mod 7 4
140 Mod 7 0 145 Mod 7 5 150 Mod 7 3
- assume 8 buckets (0..7)
- assume 1 slot per bucket
- assume disk access time of 20 ms
42HASHING EXAMPLE..
Bucket size ?
0
140 Crankshaft
120 Scrapers
1
Insert 135 Shovel? 135 Mod 7 2
2
100 Gears
150 Pistons
3
130 Aux. motor
4
Average search length? 6 records -gt 1 access 1
record -gt 2 accesses
145 Cylinders
5
6
7
Load factor ?
FILE LOADINGS
43THE HASHING ORGANIZATION
EVALUATION
- H(pkey) --gt record address
- Records in hash sequence
- Need to allocate extra space
- Load factor between 60-80
- Good for low activity (FAR) files
- Real-time and OO applns.
44DISCUSSION
A parts file with Part as the pkey includes
records with the following part values
23,37,46,48, 56,18, 10, 71, 16, 24, 39, 47 and
69. The file uses 8 buckets numbered 0 to 7.
Each bucket holds two records. Load these
records into the file in the given order using
the hash function h(K) K mod 8. Calculate the
average search length in terms of of disk
accesses.
45INDEXED ORGANIZATION
A method of file organization where a subset of
key values are stored in an index. Types are
- Primary key
- Secondary key
- Clustered
46THE INDEXED ORGANIZATION
(ISAM)
- Records are in pkey sequence (master file)
- But are organized into groups
- Grouping information is stored in
- index file
- Records can be inserted at random
- Records can be accessed in sequence or at random
47THE INDEXED ORGANIZATION
Master file (sequence set)
Index file (index set)
Emp ID
48THE INDEXED ORGANIZATION
CYLINDER2
CYLINDER1
TRACKS
CYLINDER2
CYLINDER1
49THE ISAM ORGANIZATION
Track index
Cylinder index
87 189 300
Index Set
43 69 87
136 150
.
122 136
.
24 32 43
141 150 172
Sequence Set
45 62 69
.
250 300
74 77 87
175 181 189
278 281 300
CYLINDER1
CYLINDER N..
Overflow tracks
Note Assume that the corresponding HW addresses
are stored along with the pkeys
50INSERTIONS IN ISAM
- Identify track where record needs
- to be inserted
- If the track is full, insert in overflow area
- If the track has room insert pkey in sequence
- Update track index and cylinder index if necessary
51ISAM ADVANTAGES AND DISADVANTAGES
- Access is direct or sequential?
- Access time dependent on?
- Rewrite sequentially
- Retrieval time uniform
- Suitable for volatile files?
- Workhorse organization used in
- most apps.
52SECONDARY KEY INDEX
E_TITLE REC
Analyst 2,5,6 Manager
4,7 Programmer 1,3
EMPLOYEE
REC E_SSN E_NAME
E_TITLE E_SALARY
1. 456-34-8895 Smith
Programmer 35,000 2. 459-66-6785
Johnson Analyst
27,000 3. 467-89-8898 Weintraub
Programmer 60,000 4. 478-64-8005
Dickson Manager
64,000 5. 489-12-5575 Holland Analyst
47,000 6. 492-93-4438
Rao Analyst 71,000
7. 537-89-8898 McDonald Manager
85,000
53CLUSTERED INDEX
E_TITLE REC
Also known as Inverted file organization
Analyst 1 Manager
4 Programmer 6
EMPLOYEE
REC E_SSN E_NAME
E_TITLE E_SALARY
1. 459-66-6785 Johnson Analyst
27,000 2. 489-12-5575
Holland Analyst 47,000
3. 492-93-4438 Rao Analyst
71,000 4. 478-64-8005
Dickson Manager
64,000 5. 467-89-8898 McDonald
Manager 85,000 6. 467-89-8898
Weintraub Programmer
60,000 7. 456-34-8895 Smith
Programmer 35,000
54INDEXING STRATEGIES
- Index if you must
- Index on pkey
- Index on foreign keys
- Index on secondary key
- (depending on query frequency)
55DISCUSSION
- What activities are part of identifying storage
strategies? - How is denormalization carried out for MN
relationships? - How many indexes can you have per table?
- How many clustered indexes?
- Can we sequentially update all records in
- a) hashing organization? b) in indexing?
- Is indexing suitable for volatile files?
- If an index consists of 3 levels of indexes with
the - main index in RAM, and a disk access time of 20
MS, - how long on the average does it take to retrieve
a record? - What problems do overflow records cause in
hashing?
56THE END!