Storage Growth is Exponential - PowerPoint PPT Presentation

About This Presentation
Title:

Storage Growth is Exponential

Description:

Storage will grow even with good practices (such as eliminating unnecessary replicas) ... Data lifetime management to unclog storage. Extract subsets efficiently ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 3
Provided by: ArieSh
Category:

less

Transcript and Presenter's Notes

Title: Storage Growth is Exponential


1
Storage Growth is Exponential
  • Unlike compute and network resources, storage
    resources are not reusable
  • Unless data is explicitly removed
  • Need to use storage wisely
  • Checkpointing, etc.
  • Time consuming, tedious tasks
  • Data growth will scale with compute scaling
  • Storage will grow even with good practices (such
    as eliminating unnecessary replicas)
  • Likely to be faster than historical growth
  • Not necessarily on supercomputers
  • but, on user/group machines
  • and archival storage
  • Storage cost is a consideration
  • Has to be part of science growth cost
  • But, storage costs going down at a rate similar
    to data growth
  • Need continued investment in new storage
    technologies

Storage Growth 1998-2006 at ORNL (rate 2X /
year)
Storage Growth 1998-2006 at NERSC (rate 1.7X /
year)
The challenges are in managing the data
2
Data and Storage ChallengesEnd-to-End 3 Phases
of Scientific Investigation)
  • Data production phase
  • Data movement
  • I/O to parallel file system
  • Moving data out of supercomputer storage
  • Sustain data rates of GB/sec
  • Observe data during production
  • Automatic generation of metadata
  • Post-processing phase
  • Large-scale (entire datasets) data processing
  • Summarization / statistical properties
  • Reorganization / transposition
  • Generate data at different granularity
  • on-the-fly data processing
  • computations for visualization / monitoring
  • Data extraction / analysis phase
  • Automate data distribution / replication
  • Synchronize replicated data
  • Data lifetime management to unclog storage
  • Extract subsets efficiently
  • Avoid reading unnecessary data
  • Efficient indexes for fixed content data
  • Automated use of metadata
  • Parallel analysis tools
  • Statistical analysis tools
  • Data mining tools

General Data Challenges
  • Multiple parallel file systems
  • A common data model
  • Coordinated scheduling of resources
  • Reservations and workflow management
  • Multiple data formats
  • Running coupled codes
  • Coordinated data movement (not just files)
  • Data Reliability / monitoring / recovery
  • Tracking data for long running jobs
  • Security authentication authorization
Write a Comment
User Comments (0)
About PowerShow.com