MASSIVE ARRAYS OF IDLE DISKS FOR STORAGE ARCHIVES - PowerPoint PPT Presentation

About This Presentation
Title:

MASSIVE ARRAYS OF IDLE DISKS FOR STORAGE ARCHIVES

Description:

All MAID configurations achieve similar power consumptions ... MAID can achieve average response times comparable to that of an always on ... – PowerPoint PPT presentation

Number of Views:216
Avg rating:3.0/5.0
Slides: 16
Provided by: jeha1
Learn more at: https://www2.cs.uh.edu
Category:

less

Transcript and Presenter's Notes

Title: MASSIVE ARRAYS OF IDLE DISKS FOR STORAGE ARCHIVES


1
MASSIVE ARRAYS OF IDLE DISKS FOR STORAGE ARCHIVES
  • D. Colarelli
  • D. Grunwald
  • U. Colorado, Boulder

2
Highlights
  • Paper proposes
  • To replace tape libraries by large non-redundant
    arrays of disks
  • To cache on active drives
  • Files that have been recently accessed
  • Update logs for other files
  • To keep other drives mostly inactive by spinning
    them down between accesses

3
Introduction (I)
  • Robotic tape libraries are now the standard
    solution for archiving very large amounts of data
  • Disadvantages include
  • Slow access timesaverage search time of 41s for
    T9940 drives
  • Not much cheaper than disk drives
  • Could we replace tem by massive arrays of hard
    drives?

4
Introduction (II)
  • Major limitation of hard drive solution is power
    consumption
  • Almost ten times that of equivalent tape library
  • Could power down disks that are not currently
    accessed
  • 50 of data are likely to be never accessed
  • 25 of data are likely to be accessed once

5
Introduction (III)
  • Must be at least as reliable as tape libraries
  • No need to use a redundant scheme
  • Solution is Massive Array of Inactive Drives
  • Paper investigates design issues through
    trace-driven simulations

6
Design Issues
  • Two major design decisions
  • Data migration or duplication (caching)
  • File system or block-level interface

7
Migration or caching
  • Migration would move hot data to active drives
  • Migration uses disk space more efficiently
  • Requires a map or directory mechanism that maps
    the storage across all drives
  • Caching would cache read data and act as a write
    log for write data
  • Keeps two copies of all cached files
  • Maps or directories are proportional to size of
    cache

8
File system or block interface
  • Could use file system information to cache entire
    files
  • Would probably perform better
  • Would require system modifications
  • Would work with existing systems

9
MAID with caching
Passive drives(spin up/down)
Active drives (always on)
Passive Drive Manager
Cache Manager
Virtualization Manager
10
Design choices (I)
  • Compared MAID-cache and MAID-no cache
  • MAID-cache
  • Caches read and writes on active drives
  • Caching unit is chunk of 64 sectors
  • Cache policy is LRU
  • All writes are placed in the cache write-log
    where they wait to be committed to the
    non-active (passive) drives

11
Design choices (II)
  • Must always check write log before reading data
    from the cache or the passive drives
  • Passive drives remain on standby until
  • A cache miss occurs
  • The write log becomes too long
  • Return to standby when spin-down inactivity time
    limit is reached
  • Varying time limit is primary way to affect
    system performance and energy consumption

12
Simulation parameters
  • Power management policy
  • Always on
  • Fixed-delay spin-down
  • Adaptive spin-down
  • Data layout
  • Linear keep successive blocks on same drive
  • Striped the opposite
  • Caching/No caching

13
Simulation results
  • Based on a supercomputer center workload
  • All MAID configurations achieve similar power
    consumptions
  • 15 to 16 of that of always on configuration
  • MAID configurations w/o cache have average
    response times comparable to that of always on
    configuration
  • Workload had little locality

14
Simulation results (II)
  • Average response times of MAID configurations
    with cache much worse than that of always on
    configuration
  • 0.680 to 0.720 s compared to 0.303 s
  • Striped configuration with fixed spin-down delay
    has lowest average response time of all MAID
    configurations
  • 0.309 s

15
Conclusion
  • MAID can achieve average response times
    comparable to that of an always on configuration
    with a much lower power consumption

IMPORTANT In a more
recent paper, the authors found out that cached
configurations worked much better for workloads
exhibiting more locality of accessesthan their
supercomputer center workload
Write a Comment
User Comments (0)
About PowerShow.com