MASSIVE ARRAYS OF IDLE DISKS FOR STORAGE ARCHIVES

About This Presentation

Title:

MASSIVE ARRAYS OF IDLE DISKS FOR STORAGE ARCHIVES

Description:

All MAID configurations achieve similar power consumptions ... MAID can achieve average response times comparable to that of an always on ... – PowerPoint PPT presentation

Number of Views:216

Avg rating:3.0/5.0

Slides: 16

Provided by: jeha1

Learn more at: https://www2.cs.uh.edu

Category:

more less

Transcript and Presenter's Notes

Title: MASSIVE ARRAYS OF IDLE DISKS FOR STORAGE ARCHIVES

1
MASSIVE ARRAYS OF IDLE DISKS FOR STORAGE ARCHIVES

D. Colarelli
D. Grunwald
U. Colorado, Boulder

2
Highlights

Paper proposes
To replace tape libraries by large non-redundant
arrays of disks
To cache on active drives
Files that have been recently accessed
Update logs for other files
To keep other drives mostly inactive by spinning
them down between accesses

3
Introduction (I)

Robotic tape libraries are now the standard
solution for archiving very large amounts of data
Disadvantages include
Slow access timesaverage search time of 41s for
T9940 drives
Not much cheaper than disk drives
Could we replace tem by massive arrays of hard
drives?

4
Introduction (II)

Major limitation of hard drive solution is power
consumption
Almost ten times that of equivalent tape library
Could power down disks that are not currently
accessed
50 of data are likely to be never accessed
25 of data are likely to be accessed once

5
Introduction (III)

Must be at least as reliable as tape libraries
No need to use a redundant scheme
Solution is Massive Array of Inactive Drives
Paper investigates design issues through
trace-driven simulations

6
Design Issues

Two major design decisions
Data migration or duplication (caching)
File system or block-level interface

7
Migration or caching

Migration would move hot data to active drives
Migration uses disk space more efficiently
Requires a map or directory mechanism that maps
the storage across all drives

Caching would cache read data and act as a write
log for write data
Keeps two copies of all cached files
Maps or directories are proportional to size of
cache

8
File system or block interface

Could use file system information to cache entire
files
Would probably perform better
Would require system modifications

Would work with existing systems

9
MAID with caching
Passive drives(spin up/down)
Active drives (always on)
Passive Drive Manager
Cache Manager
Virtualization Manager
10
Design choices (I)

Compared MAID-cache and MAID-no cache
MAID-cache
Caches read and writes on active drives
Caching unit is chunk of 64 sectors
Cache policy is LRU
All writes are placed in the cache write-log
where they wait to be committed to the
non-active (passive) drives

11
Design choices (II)

Must always check write log before reading data
from the cache or the passive drives
Passive drives remain on standby until
A cache miss occurs
The write log becomes too long
Return to standby when spin-down inactivity time
limit is reached
Varying time limit is primary way to affect
system performance and energy consumption

12
Simulation parameters

Power management policy
Always on
Fixed-delay spin-down
Adaptive spin-down
Data layout
Linear keep successive blocks on same drive
Striped the opposite
Caching/No caching

13
Simulation results

Based on a supercomputer center workload
All MAID configurations achieve similar power
consumptions
15 to 16 of that of always on configuration
MAID configurations w/o cache have average
response times comparable to that of always on
configuration
Workload had little locality

14
Simulation results (II)

Average response times of MAID configurations
with cache much worse than that of always on
configuration
0.680 to 0.720 s compared to 0.303 s
Striped configuration with fixed spin-down delay
has lowest average response time of all MAID
configurations
0.309 s

15
Conclusion

MAID can achieve average response times
comparable to that of an always on configuration
with a much lower power consumption

IMPORTANT In a more
recent paper, the authors found out that cached
configurations worked much better for workloads
exhibiting more locality of accessesthan their
supercomputer center workload

Write a Comment

User Comments (0)