Title: Whats New with HDF
1Whats Newwith HDF?
- Mike Folk
- mfolk_at_ncsa.uiuc.edu
- http//hdf.ncsa.uiuc.edu
2Outline
- HDF project overview
- HDF5 Data Model
- HDF5 Library and tools
- HDF Info Center
3What is HDF?
4Why HDF?
- Big data
- Need to manage large complex collections of
data - Need a variety of data types and structures
- Large data structures and objects
- Metadata in a variety of forms
- Availability of data
- Need to move from place to place
- Need to share data
- Open standard encourages wide use
5Why HDF?
- Ease of access
- Software has to work on many machines
- I/O library, as well as tools
- Efficiency
- Fast I/O
- Efficient storage
HDF was created to address these concerns so that
others dont have to.
6What is HDF?
- Flexible, self-describing file format
- Datatypes and data objects for scientific data
- Software libraries and tools
7Two HDFs
- HDF4 original version of HDF
- HDF5 new format and library
- http//hdf.ncsa.uiuc.edu/HDF5
- Why?
- bigger, faster machines and storage systems
- larger datasets
- new I/O paradigms
- parallel computing and I/O
- complex data structures
- complex subsetting
- thread safety
8Example HDF5 file
/ (root)
/foo
lat lon temp -------------- 12 23
3.1 15 24 4.2 17 21 3.6
Table
Raster image
Raster image
2-D array
9HDF Software
10HDF Applications Software
- Free software
- NCSA HDF library and utilities
- Other software
- Commercial/other software that understands
- most of HDF (Noesys, IDL, HDF Explorer)
- certain HDF objects (MATLAB, WebWinds)
- HDF applications
- http//hdf.ncsa.uiuc.edu/tools.html
11Major Project 1 EOSDIS
Earth Observing System Data Information System
- Open standard for exchange of remote-sensed data
- Scores of instruments and datasets
- 1 terabytes per day per platform
- HDF Requirements
- Earth science data types
- Swath, grid, point data
- Efficient storage and access
- Support for scientists, data producers,
archiving, etc. - HDF tools, utilities, access software
12EOS Constellation
13HDF-EOS Swath profile
Geolocation fields
Data fields
Brightness Temperature
Time
Dimension Name Geotrack Size 21
Latitude
Longitude
14Major Project 2 ASCI
- ASCI Data Models and Formats (DMF) Group
- Open standard exchange format and I/O library for
ASCI - DOE tri-lab ASCI applications
- HDF requirements
- large datasets (gt a terabyte)
- ASCI data types, especially meshes
- good performance in massive parallel environments
15(No Transcript)
16 ASCI DMF Data Abstraction
- Objectives
- Sound data model withrobust data abstractions
- Computational mechanicsdata meshes fields
- Based on mathematical field of fiber bundles
- Common format allows common tools sharing
- Common API shield apps from model complexities
APPLICATION
Mesh APIs (SNL/LANL)
Fiber Bundle Kernel (LLNL)
Data Structure Layer (LLNL)
HDF5 (NCSA)
MPI IO (ANL)
17HDF5
18New HDF5 Features
- More scalable
- Larger arrays and files
- More objects
- Improved data model
- New datatypes
- Single comprehensive dataset object
- Improved software
- More flexible, robust library
- More flexible API
- More I/O options
19HDF5 file structure
File header infoVersion , etc.
User block
Root group/
Other objects (datasets groups, etc.)
20HDF5 data model
- Two primary objects
- Dataset
- multidimensional array of elements
- rich variety of datatypes
- group
- directory-like structure
- contains datasets, groups, other objects
21Dataset components
- a multidimensional array of data elements
- header with metadata
- datatype
- dataspace
- attributes
- storage info
22Simple datatypes
- The usual scalars integer float
- user-defined scalars (e.g. 13-bit integers)
- variable length (e.g. strings)
- pointers to objects or regions of datasets
- enumeration
- opaque
23Compound datatypes
- User-defined
- Comparable to C structs
- Members can be simple or compound types
- Members can be multidimensional
24HDF5 dataset array of elements
3
5
Dimensionality 5 x 3
int8
int4
int16
2x3 array of float32
Datatype
25Groups
- A mechanism for collections of related objects
- Every file starts with a root group
- Similar to UNIXdirectories
- Can have attributes
26Example HDF5 file
/ (root)
/foo
/a
/foo/z
lat lon temp -------------- 12 23
3.1 15 24 4.2 17 21 3.6
/c
/b
/foo/b
/foo/x
/foo/y
Table
Raster image
Raster image
2-D array
27Special Storage Options
Better subsetting access time extendable
- chunked
- compressed
- extendable
- split file
Improves storage efficiency, transmission speed
Arrays can be extended in any direction
Metadata in one file, raw data in another.
28The HDF5 Library
29Features
- Support for high performance applications
- Ability to create complex data structures
- Complex subsetting
- Flexible, efficient I/O (parallel, remote, etc.)
- Support for key language models
- OO compatible
- C Fortran primarily
- Also Java, C
30Subsetting and subsamplingMappings between file
arrays/selections and memory arrays/selections.
(b) A regular series of blocks from a 2D array
to a contiguous sequence at a certain offset in a
1D array
(a) A hyperslab from a 2D array to the corner of
a smaller 2D array
(c) A sequence of points from a 2D array to a
sequence of points in a 3D array.
(d) Union of hyperslabs in file to union of
hyperslabs in memory. Number of elements must be
equal.
31Files neednt be files - Virtual File Layer
VFL A public API for writing I/O drivers
Hid_t
File Handle
VFL Virtual File I/O Layer
I/O drivers
memory
mpio
stdio
network
Storage
Memory
Network
Files
32HDF5 tools
- Current
- hdf5ls - lists contents of HDF5 file
- h5dumper - higher level view
- hdf5?hdf4 converter
- VisAD data adapter
- Future
- Convert HDF5 ? ascii, binary, GIFF, etc
- Convert HDF4 ? HDF5
- Java tools
- XML-based tools
33HDF5 Information
- HDF website
- http//hdf.ncsa.uiuc.edu/
- HDF5 Information Center
- http//hdf.ncsa.uiuc.edu/HDF5/
- HDF Help email address
- hdfhelp_at_ncsa.uiuc.edu
- HDF users mailing list
- hdfnews_at_ncsa.uiuc.edu