Title: HDF Update
1HDF Update
- Mike Folk
- The HDF Group
- HDF and HDF-EOS Workshop X
- November 29, 2006
2Outline
- Organizational info
- HDF Software Update
- Other Activities of Interest
3Organizational info
4The HDF Group THG
Founded Dec. 2006
Went solo July 15, 2006
Non-profit
5THG missionTo support the vast community of HDF
users and to ensure the sustainable development
of HDF technologies and the ongoing accessibility
of HDF-stored data.
6The HDF Team
Frank Baker Christian Chilan Peter Cao Vailin
Choi Mike Folk Anne Jennings Barbara
Jones Quincey Koziol James Laird Raymond Lu
John Mainzer Matthew Needham Pedro Nunes Tammi
ONeill Elena Pourmal Binh-minh Ribler Randy
Ribler Rishi Sinha Kent Yang
And all those wonderful folks out there who
contribute ideas, requests, bug reports, code,
and support.
7Who is supporting HDF?
- Organizations providing broad support
- NASA, DOE, Boeing
- Agencies supporting RD (2006)
- NASA, NARA, DOE, NCSA, Agilent, Aberdeen Test
Center, DD(X) - Collaborators who make in-kind contributions
- Cactus, PyTables, NeXUS, CGNS, many others
8HDF Software Update
9HDF4 update
10Platforms to be dropped
- Operating systems
- HPUX 11.00
- Crays SV1 and TS IEEE
- AIX 5.1 and 5.2
- SGI IRIX64-6.5
- Linux 2.4
- Solaris 2.7, 2.8, 2.9
- Windows 2000
- MAC OSX 10.3
- Compilers
- GNU C compilers older than 3.4 (Linux)
- Intel 8.
- PGI V. 5., 6.0
11Platforms to be added
- Systems
- MAC OSX 10.4 (Intel)
- Solaris 2. on Intel
- Cray XT3
- Windows 64-bit (?)
- Linux 2.6
- HPUX 11.23
- IBM Power 5
- Compilers
- g95
- PGI V. 6.1
- Intel 9.
12New features
- Configuration
- Switched to use F77_FUNC macro for better Fortran
support (no hard-coded compilers anymore!) - Support for shared libraries
- Library
- No hard-coded limit on number of opened files
- New APIs to control number of files opened by
application - Fortran support for SZIP compression
13Bugs fixes
- Tools
- A lot of improvements to the hdp, hrepack, hdiff
and hdfimport utilites based on users feedback - Library
- Data corruption bug for several opened unlimited
dimension SDSs - Better handling of SDSs with duplicated names in
SDgetdimscale and more
14HDF5 update
15No new releases!
- Focus on HDF5 release 1.8
- HDF5-1.8.0 Alpha 5 release is available
fromhdf.ncsa.uiuc.edu/HDF5/release/alpha/obtain
518.html
16Platforms to be dropped
- Operating systems
- HPUX 11.00
- MAC OS 10.3
- AIX 5.1 and 5.2
- SGI IRIX64-6.5
- Linux 2.4
- Solaris 2.8 and 2.9
- Compilers
- GNU C compilers older than 3.4 (Linux)
- Intel 8.
- PGI V. 5., 6.0
- MPICH 1.2.5
http//www.hdfgroup.org/HDF5/release/alpha/obtain5
18.html
17Platforms to be added
- Systems
- Alpha Open VMS
- MAC OSX 10.4 (Intel)
- Solaris 2. on Intel (?)
- Cray XT3
- Windows 64-bit (32-bit binaries)
- Linux 2.6
- BG/L
- Compilers
- g95
- PGI V. 6.1
- Intel 9.
- MPICH 1.2.7
- MPICH2
18New Features in HDF5 1.8
19HDF5 1.8 new library features
- Datatype and dataspace features
- Serialized dataspaces and datatypes
- Ability to create data type from text description
- Integer to float conversions during I/O
- Revised exception handling during type conversion
- Compact storage for N-bit data types
- Offsetsize storage filter, saving space
- Null dataspace datasets with no elements
- Data transformation filter
20HDF5 1.8 new library features
- Group revisions
- Creation order access
- Compact groups small groups take less space
- Large group storage improvements
- Intermediate group creation
21HDF5 1.8 new library features
- Link improvements
- External links -- can refer to objects in another
file - User defined links apps create own kinds of
links - Attribute improvments
- Storage improvements for large numbers of attr
- Iterate or look up by creation order
22HDF5 1.8 new library features
- Support for Unicode UTF-8 character set
- Shared header info duplicate header info
shared, possibly saving space - Metadata cache improvements faster I/O on files
with many objects - Data transformation filter
- Stackable Virtual File Drivers
- Better UNIX/Linux portability
23HDF5 1.8 new APIs
- New extendible error-handling API
- New APIs to copy objects between files fast
- Dimension scale model and API
- HDFpacket API to read/write packets
efficiently
24HDF5 1.8 backward and forward compatibility
25HDF5 1.8 vs. 1.6.5
- Differences between 1.8 vs. 1.6.5
- Some file format changes
- Several new routines added
- Old APIs deprecated -- removed in later release
- Consequences
- Application requiring 1.8 format changes will
write objects that 1.6.5 library cannot read - To exploit 1.8 changes, apps need to be rewritten
26Principle of Maximum file format compatibility
- Unless instructed otherwise, the HDF5 library
will write objects using the earliest version of
the format possible for describing the
information. Assures forward compatibility with
the older versions whenever possible objects in
new files can be read with old libraries if those
objects are known to the old libraries.
27Example Datatype header message
- Compound datatype encoding
- Version 1 used by 1.6.5 and earlier encodes
compound datatypes with explicit array fields - Version 2 used for 1.8.0 has a new encoding,
reducing storage overhead for compound data - By default 1.8.0 writes compound data in format
compatible with 1.4.0 1.6.X libraries - But if feature is requested, compound data
created by 1.8.0 will not be readable by earlier
versions
28HDF5 Forward Compatibility
- Format
- Can old libraries access files made by new
library? - Old library versions will read all objects in a
file created by a newer library if objects are
known to the old library - API
- Can old applications link with the new library?
- Applications written to work with an older
version of library will compile, link and run as
expected with a newer version
29HDF5 Backward Compatibility
- File Format
- Can new library access files made by old library?
- Newer version of the library will always read
files created with an older version - Library APIs
- Can new applications link with the older
libraries? - Application written for the newer version will
compile and link with the older library unless
new features are used
30HDF5 Compatibility information
- Backward and forward compatibility issues
- http//hdfgroup.org/HDF5/faq/bkfwd-compat.html
- API changes from release to release
- http//hdfgroup.org/HDF5/doc_1.8pre/doc/ADGuide/C
hanges.html - File Format changes
- http//hdfgroup.org/HDF5/doc/H5.format.html
31Command line tools
32New features for old tools
- h5dump
- Dump data in binary format
- h5diff
- Compare dataset regions
- Parallel h5diff (ph5diff)
- Compare two files in MPI parallel environment
- h5repack
- Efficient data copy using H5Gcopy()
- Able to handle big datasets
33New HDF5 Tools
- h5copy
- Copies an group, dataset or named datatype from
one location to another location - Copies within a file or across files
- h5check
- Verifies an HDF5 file against the defined HDF5
File Format Specification - h5stat
- Reports statistics about a file and objects in a
file
34HDF Java Products
35HDFView changes
- Quality improvements for HDF-java package
- Full documentation of hdf-java object package
- Test suite for hdf-java object package
- Support 64-bit Java on Linux and Solaris
- Many new features, including
- Change font size easily
- Grab and move image
- Create new table (compound dataset) from template
- Filter out fill value for image creation
- -geometry option for very high resolution displays
36Future work for Java
- Update HDF5 JNI APIs for HDF5 1.8 release
- Release HDFView 2.4 with bug fixes/new features
with HDF5 1.8 release - New GUI features dealing with table, image and
animation - Writing capability for HDF5-SRB model
37Website Development for HDF-EOS Tools
Information Center
38Website for HDF-EOS Tools
- THG now manages HDF-EOS web site
- Registered domain names hdfeos.net/.org/.com
- Re-implemented major topic areas
- Re-designed interface
- Registered google search
- Will continue maintenance
- Phase two
- Host mailing list
- Support simple forum features
39Website for HDF-EOS Tools
40Other Activities of Interest
41Performance RD
42HDF5 - PnetCDF performance comparison
uP Power 5
I/O performance of PnetCDF is comparable with
parallel HDF5 when the libraries are used in
similar manners.
43PnetCDF4 - PnetCDF comparison
I/O performance of parallel NetCDF4 is comparable
with PnetCDF with about 15 slowness on average
for the output of ROMS history file.
44Collective I/O improvements
- HDF5 supports collective IO for non-regular
selections - Collective IO for chunked storage is not trivial.
- Non-regular selection performance optimizations
- Added IO options to achieve good collective IO
performance - Added APIs for applications to participate in the
optimization process - See the poster
45DOE Labs
Lawrence Livermore NationalLaboratory
Sandia NationalLaboratory
46DOE ASC and Others
- Support HDF5 on major systems at Sandia
Lawrence Livermore National Laboratories - RD efforts underway
- File recovery after a crash
- Very fast write speed goal is 300 MB/sec
- Read-while-writing capability
- Java library and HDFView improvements
Advanced Scientific Computing project
47Flight test
48Flight test collect, then process
49Boeing HDF5 for flight test data
- Boeing 787 active archive
- 10 TB per flight-test day
- Must handle raw, real-time data
- High speed ingest, by packet
- Post-processing, by time-history
- Boeing High Level APIs
- HDFpacket released with HDF5 1.8
- HDFtime_history new, open version likely
50Product data
STEP
51Bioinformatics
caacaagccaaaactcgtacaa Cgagatatctcttggaaaaact gctc
acaatattgacgtacaag gttgttcatgaaactttcggta Acaatcgt
tgacattgcgacct aatacagcccagcaagcagaat
Managing genomic data
52C HDF5 API for Agilent
53Agilent C project
- Why?
- Heavy use of C at Agilent
- Compatibility with Matlab
- Other interest in HDF5 at Agilent
- What?
- Prototype API in C for Windows XP
- Basic functions to create, open, close, read,
write - Limited datatypes, no partial I/O
- When?
- March 2007
54HDF5 Software
Fortran
C
Java
C
C API
HDF I/O Library
HDF File
55NetCDF 4
56NetCDF 4 project
- Enhanced NetCDF-4 Interface to HDF5
- Combine features of netCDF and HDF5
- Take advantage of their separate strengths
- Collaboration between NCSA, THG, Unidata
- Currently in Alpha Release
- Waiting for beta release
57NetCDF-4 Architecture
netCDF-3 applications
netCDF-4 applications
HDF5 applications
netCDF files
netCDF-4 Library
netCDF-4 HDF5 files
HDF5 files
HDF5 Library
- Supports access to netCDF files and HDF5 files
created through netCDF-4 interface
58Archival formats
- Proposal to NOAA Scientific Data Stewardship
program - Will investigate use of OAIS Archive Information
Package standard with HDF5 - PI Ruth Duerr (NSIDC) and Kent Yang
OAIS Open Archival Information System
59Asymmetries between collecting and accessing data
60- Huge streams of data collected
- To be accessed in little bits
61Challenge efficient remote access
- How do we efficiently find and access data from
distributed repositories, when the data are big
and complex? - Storage Resource Broker (SRB)
- Efficient access to HDF5 objects in repository
- OPeNDAP
- Powerful protocol for remote querying and
subsetting of scientific data
62Example Storage resource broker
- Storage Resource Broker repository for
heterogeneous data collections - Simplifies storage, query and access to massive
amounts of scientific data - Has data in HDF5, netCDF, other formats
63Normal SRB configuration
client
HDF5 File (whole file or a sequence of bytes)
SRB Server
MCAT
64OPeNDAP-HDF5 project
- OPeNDAP
- Powerful protocol for remote querying and
subsetting of scientific data - Replaces direct file access with remote query and
access - Widely used in Earth Sciences
65OPeNDAP HDF5 Project
- A NASA ROSES NRA project
- Tasks
- HDF5-DAP2 server (now a prototype)
- HDF5-DAP4 server
- DAP4 to HDF5 conversion utility
- Investigate integrated DAP-aware HDF5 library
66SQL Server and HDF5 with Microsoft
67SQL Server and HDF5
- Microsoft dream environment for scientists
- Combine data management, computing
- SQL Server 2005 solution
- Combine RDBMS with scientific analysis tools,
together in one integrated system. - HDF5 other formats manage scientific objects
68HDF5 in SQL server
OLAP and Data Mining
Libraries (MATLAB,)
Web Services (XML, REST, RSS)
Visualization
Reporting
.NET Languages with Language Integrated Query
Entity Framework (EDM, eSQL, O-R mapping)
HDF5 EDM model
SQL Server
69Thank you allandThank you NASA!
70Acknowledgement
- This report is based upon work supported in part
by a Cooperative Agreement with NASA under NASA
NNG05GC60A. Any opinions, findings, and
conclusions or recommendations expressed in this
material are those of the author(s) and do not
necessarily reflect the views of the National
Aeronautics and Space Administration.
71Questions/comments?
72Information Sources
- HDF website
- http//hdfgroup.org/
- HDF5 Information Center
- http//hdfgroup.org/HDF5/
- HDF Helpdesk
- hdfhelp_at_hdfgroup.org
- HDF users mailing list
- hdfnews_at_ncsa.uiuc.edu coming soon
news_at_hdfgroup.org