HDF5 - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

HDF5

Description:

HDF5. A new file format & software for high performance ... foo/bar' Special Storage Options. chunked. compressed. extendable. split file. Metadata for Fred ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 19
Provided by: HDF6
Category:
Tags: foobar | hdf5

less

Transcript and Presenter's Notes

Title: HDF5


1
HDF5
  • A new file format software for high performance
    scientific data management

2
High performance data requirements
  • larger datasets (gt terabyte)
  • bigger, faster machines and storage systems
  • varied architectures and I/O paradigms
  • parallel computing environments
  • complex subsetting
  • complex data

3
HDF5 based on lessons learned from
  • Existing standards
  • HDF, PDB, AIO, netCDF, MPI-IO and others
  • Computer science
  • ASCI physics applications and users
  • Earth science applications and users
  • Other users

4
and ASCI Requirements
  • Compatibility with vector bundle model
  • Collective access
  • MPI-IO
  • Transform data between memory storage
  • Parallel file systems PIOFS, HPSS, etc.

5
Data model
  • Datatypes (array elements)
  • integer float
  • strings pointers
  • compound (record structures)
  • Aggregate object types
  • dataset multidimensional array
  • grouping structure
  • each object has a name attributes

6
Basic data object array of records
3
5
Dimensionality 5 x 3
int8
int4
int16
float32
Number type
Record
7
Storage Capacity
HDF4
HDF5
  • Store large objects
  • Store large numbers of objects

Limit2 gigabytes
no limit
Limit 20,000 objects
no limit
8
Dataset components
  • a multidimensional array of data elements
  • header with metadata
  • datatype
  • dataspace
  • attributes
  • storage info

Dataset Fred
Metadata header
Data
Dataspace
Attributes
int16
Datatype
2
Chunked compressed
Rank
Storage info
Dimensions
9
Groups
  • Group structure for organizing the file
  • Every file starts with a root group
  • Like directories in file system
  • Groups have attributes

/
/foo
/foo/bar
10
Special Storage Options
Improves subsetting access time
  • chunked
  • compressed
  • extendable
  • split file

Improves storage efficiency
Arrays can be extended individually
Metadata in one file. Raw data in another.
11
The HDF5 Library
  • New API and programming model
  • Smaller, better, faster
  • Able to support parallel I/O better
  • OO compatible
  • C Fortran still primary, others considered
  • I/O performance emphasized
  • Current platforms
  • ASCI IBM SP2, SGI Origin 2000, Intel Teraflop
  • Solaris, Linux, HPUX, IRIX, NT

12
Sub-selection Options
  • Flexibility in mappings between data in memory
    and object in file
  • Selection regions can be
  • points
  • hyperslabs
  • unions of hyperslabs
  • Selection region in memory can be different shape
    from selection in file
  • Supports I/O needs for parallel computation

13
Mappings between file dataspaces/selections and
memory dataspaces/selections.
(b) A regular series of blocks from a 2D array
to a contiguous sequence at a certain offset in a
1D array
(a) A hyperslab from a 2D array to the corner of
a smaller 2D array
(c) A sequence of points from a 2D array to a
sequence of points in a 3D array.
(d) Union of hyperslabs in file to union of
hyperslabs in memory. Number of elements must be
equal.
14
HDF5 Raw Data Pipeline
  • Handles all aspects of data storage and transfer
    of data between file and application.
  • Deals with multiple storage options
  • chunking, compression, number conversion,...
  • Optimized performance for common usage
  • Hooks for new filters
  • compression schemes, encryption, checksum,...
  • user-specified filters

15
Performance tuning
  • Facilities for performance measurement
  • timing tests in test suite
  • Pablo instrumentation
  • Caching
  • app can set cache size for metadata chunks
  • Parallel optimizations
  • efficient metadata management
  • chunking
  • can control placement on physical media

16
HDF5 and ASCI Applications
  • Multi-lab collaboration
  • DOE Tri-lab Livermore, Sandia, Los Alamos
  • NCSA, Limit Point Systems
  • Motivation
  • Data sharability
  • Application interoperability
  • Leverage experiences
  • EXODUS (SNL), SILO PDB (LLNL)
  • HDF (NCSA) , netCDF (UCAR)

17
ASCI DMF Data Abstraction
  • Objectives
  • Sound data model withrobust data abstractions
  • Computational mechanicsdata meshes fields
  • Based on mathematical field of fiber bundles
  • Common format allows common tools sharing
  • Common API shield apps from model complexities

APPLICATION
Mesh APIs (SNL/LANL)
Fiber Bundle Kernel (LLNL)
Data Structure Layer (LLNL)
HDF5 (NCSA)
MPI IO (ANL)
18
HDF5 driver projects
Project Application Types of data ASCI
Computational Fields on meshes
mechanics structured, unstructured
hierarchical CANIS UIUC Digital Concept
space Object store for large Library Project
analysis of medical collection of small
abstracts objects (noun phrases) TRAPPIST
non- Non-destructive NDT experiment
datadestructive testing testing tomography
and consortium radiology NASA Earth
Observing Earth Science data Remote sensing
System management swath, grid and point data
Write a Comment
User Comments (0)
About PowerShow.com