Title: Mike Folk
1HDF Update
- Mike Folk
- National Center for Supercomputing Applications
- HDF and HDF-EOS Workshop VIII
- October 27, 2004
2Topics
- HDF Team and Supporters
- HDF software update
- Other Activities of Interest
3The HDF Team
Xuan Bai Frank Baker Peter Cao Vailin Choi Mike
Folk Barbara Jones Quincey Koziol James
Laird Raymond Lu
John Mainzer Robert McGrath Pedro Nunes Elena
Pourmal Binh-minh Ribler Eric Shapiro Rishi
Sinha Kent Yang
And all those wonderful folks out there who
contribute ideas, requests, bug reports, code,
and support.
4Organization
HDF Project
Basic library development
Support, doc, QA, maintenance
Tools and Java
Parallel I/O, Grid, big machines
- Staff breakdown
- User support, documentation
- QA, maintenance, testing
- Software development
- System administration
- Management
- See Thursday tutorial on HDF Software Process
5Who is supporting HDF?
- Organizations and communities with institutional
and financial commitment to HDF - NCSA, NASA, State of IL, DOE, Boeing
- Agencies supporting RD
- NCSA, NASA, NARA, DOE, NSF, ONR
- Collaborators who make in-kind contributions
- Cactus, PyTables, NeXUS, CGNS, many others
6HDF Software Update
7HDF software milestones in FY 2004
HDF 4.2r0
HDF5 1.6.2
HDF5 Java 2.0HDF5 High Level
Flexible parallel HDF5 (Alpha)
HDF5 1.6.3
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Dec
2003
2004
8HDF4.2 Release 0 Dec. 2003
- Bug fixes
- New features
- Support for new platforms and compilers
9HDF4.2 Release 0Bug fixes
- Support for reading NetCDF 3.5 files with
multiple unlimited dimensions - Multiple bug fixes and improvements to HDF4
dumper utility hdp - Improvements to HDF ? GIF converter hdf2gif
10HDF4.2r0New Features
- Tools (per DAAC and Instrument Team requests)
- hdfimport converts float/integer data to
SDS/raster - Replaces fp2hdf
- Hdiff compares two HDF4 files
- Revision of earlier hdfdiff tool
- Hrepack makes a copy of an HDF4 file
- optionally rewrite objects with compression,
chunking, etc. - h4cc, h4fc, h4redeploy
- Helper scripts to facilitate compilation and
installation
11HDF4.2r0New Features
- Szip compression
- Fast compression method
- Available on all platforms except Crays
- NCSA distributes Szip source and binaries
- HDF Library binaries come with SZIP enabled
- SZIP Documentation available from
http//hdf.ncsa.uiuc.edu/SZIP
12HDF4.2r0New Configuration
- Addressing key needs
- Porting to new platforms
- New versions of JPEG and ZLIB libraries
- Optional SZIP compression
- Many features were hard coded, but could be done
at configuration time
13HDF4.2r0New Compilers and Platforms
- New compilers
- Intel C and Fortran
- Portland Group Compilers (C only for now)
- New OS
- Mac OSX
- RedHat 8/9
- AIX 5.1 64-bit
- OSF1
- Linux 64 (SuSE and RH8) (JPL machines)
- Altix (Aura Team)
14HDF5 1.6.2 Feb. 2004
- New functions
- better user control over open/close objects
- Bug fixes
- Parallel improvements
- h5pcc, h5pfc helper scripts for parallel compiles
- Configure improvements
- Improved parallel performance
- Speed improvements of data conversion routines
- Some SZIP improvements
15HDF5 1.6.2
- Support for new compilers and platforms
- IBM Fortran on MacOS X
- Support for gcc 3.3.4
- Linux 64 (SuSE and RH) at JPL
- Altix (Aura team) including parallel C and
Fortran Libraries - Investigated SX-6 (NEC) port
16HDF5 1.6.3 Oct. 2004
- Windows
- Improvements to the build, test, and installation
- New API routines
- H5Fget_filesize. Returns size of opened file.
- New H5Fget_name. Returns name of file by object
ID - Some F90 and C routines added
17HDF5 1.6.3
- Utilities
- H5repack utility (new)
- Regenerates an HDF5 file from another HDF5 file,
- Optionally applies filters, chunking to new file
- H5dump utility improvements
- Print new info, such as dataset filters, storage
layout, fill value info
18Szip in HDF5 1.6.3
- HDF5 can now include SZIP compression with or
without Szip's encoder - Required to create SZIP compressed files
- Not required to read SZIP compressed files
- Info on Szip and Szip licensing
- http//hdf.ncsa.uiuc.edu/doc_resource/SZIP/
19HDF5 1.6.3 New platforms compilers
- PGI Fortran for Linux64 (x86-64)
- Absoft F95 for Linux 2.4 -32 bit
- IBM XL Fortran and Absoft F95 for Mac OS X
20HDF Java Products 2.0 March 2004
- Tested with HDF5-1.6.2
- Platforms
- Windows (98/NT/2000/XP)
- Solaris
- Linux
- AIX
- IRIX 6.5
- Mac OSX
- OSF1
- http//hdf.ncsa.uiuc.edu/hdf-java-html/
21Modular HDFView
Modular HDFView improved HDFView where I/O and
GUI components are replaceable modules.
Application (HDFView)
- Replaceable modules
- File I/O (file/data format)
- Tree view (show file structure)
- Table view (spreadsheet-like)
- Text view (view/edit text dataset)
- Image view (view/process image)
- Palette view (view/change palette)
- Metadata (attribute) view
- http//hdf.ncsa.uiuc.edu/hdf-java-html/hdfview/
Interfaces I/O, TreeView, TableView, etc
Default Implementation
User Implementation
22HDFView Web Browser Plug-in
- Goal Click-and-view HDF files remotely and
locally from popular web browsers. - See poster.
23Parallel HDF5 in 2004
- A few performance improvements
- MPICH/MPE instrumentation feature added
- performance analysis tools for their MPI programs
- Flexible parallel HDF5 programming model
- More flexible model for parallel HDF5
- Other options currently under investigation
24Parallel HDF5 developments
- New parallel platforms supported
- Solaris 2.8 (32 64 bits)
- OSF 5.1
- Cray T3E, SV1, T90
- HPUX 11.0
- FreeBSD
25Flexible Parallel HDF5 (FPHDF5)
- Problem
- Parallel computation requires a consistent view
of file metadata across all processes - Parallel HDF5 does this by requiring all
operations that modify metadata to be executed
collectively - This is clumsy at best.
- E.g. suppose each of 1,000 processes needs to
create its own dataset. - Then there must be 1,000 collective creations --
each requiring the participation of all processes.
26Flexible Parallel HDF5 (FPHDF5)
- Approach
- Allow individual processes to modify the file
metadata without explicit, application level
synchronization between processes. - Use "Set Aside Process (SAP) to set up a shared
metadata cache, allowing individual processes to
read or write lock individual pieces of metadata
as required. - Easier to program, simpler to understand.
- http//hdf.ncsa.uiuc.edu/Parallel_HDF/PHDF5/FPH5/
27Flexible Parallel HDF5 (FPHDF5)
- New problem
- The cache is managed by a single process (SAP),
and the metadata accessed frequently. - SAP becomes bottleneck, affecting performance and
scalability. - Currently investigating other solutions.
28Other Activities of Interest
29DOE/ASCI
ASCI provides the integrating simulation and
modeling capabilities and technologies needed
for future design assessment and certification
of nuclear weapons and their components
- Massively parallel computing and I/O
- Complex data models and big data
- HDF5 a standard format for ASCI apps
Advanced Simulation and Computing Program
30BoeingHDF5 for real-time flight test data
- Needed for flight test data systems
- Must handle raw, real-time data
- Implemented API to read/write data
- Based on HDF5 table API
- Challenge Variable length data
- Possible Boeing-wide standard
- Potential applications to many domains
- See poster
31NCASSR Indexing viewing tables
- Opportunities arising from Boeing work
- Make test-data features widely available
- Common data model and API for tabular data in
HDF5 - Indexing for post-processing
- Viewing capabilities
- Tasks
- Identify apps to study and gather requirements
- Develop data model and API for tabular data
- Include general purpose indexing structures and
API - Implement prototype API and viewer
National Center for Advanced Secure Systems
Research
32National Archives and Records Administration
(NARA)
- Investigate HDF5 as format for records archiving
- Focus on geospatial data
- Images (e.g. elevation models, aerial
photography) - Features (e.g. boundaries, roads, rivers)
- Results so far
- HDF5 data model handles all data types
- Feature (vector) data present access and size
challenges - Work is leading to good performance lessons
- See poster about study of vector data
33SciDAC/PMODELArithmetic Data Transform
- Apply algebraic operations to dataset during
read/write. - Initial goal
- transform individual elements (e.g., x 1.8
32). - During reads, applies to result in memory.
During writes, data in the file changed. - Implemented in HDF5 v1.7, to be released in v1.8
- Future
- Transformations on attributes or multiple
datasets (e.g. (A B) / 2.0) - http//hdf.ncsa.uiuc.edu/PMODELS/datatransform/
34Weather Research Forecast (WRF) Model
- WRF NCAR community standard model
- HDF5 I/O module for NCARs WRF
- HDF5-WRF parallel I/O studies
- Improved performance for computations with large
I/O - Sequential HDF5-WRF studies
- Compression can save disk space
- See the poster
- And see http//hdf.ncsa.uiuc.edu/apps/WRF-ROMS
35netCDF-HDF Project
- Enhanced NetCDF-4 Interface to HDF5
- Combine features of netCDF and HDF5
- Take advantage of their separate strengths
- Collaboration between NCSA and Unidata
- See poster Merging the netCDF and HDF5
libraries to achieve gains in performance and
interoperability
36OPeNDAP netCDF HDF5
- OPeNDAP
- A system for the transmitting data across the
Internet - Supports selection of data using constraint
expressions - Can translate data from one format to another
- NetCDF and HDF5
- Formats of major interest to the OPeNDAP
community - All three are in heavy use in the earth sciences
- So the question is
37Are the planets finally aligned?
HDF5
netCDF
To harmonize OPeNDAPnetCDFHDF5?
OPeNDAP
38OpenDAP/netCDF/HDF5 Harmonization
- Opportunity
- Unidata is creating netcdf-4
- Existing OPeNDAP work with netcdf and HDF5
- OPeNDAP project working on a new spec (4.0)
- John Caron working on new java-netCDF library
(2.2) - Creates a "common data model" which is
more-or-less a union of the 3 models. - But there are important differences
- Different ecological niche
- Some very different object types
- So a union of all the models is unlikely
39 OpenDAP/netCDF/HDF5 Harmonization
- Goal map between the three models, and possibly
tweak the models to better make them harmonize. - Tackle certain important differences
- OPeNDAP Sequences
- Hard to represent in the netCDF API
- But seems like they might work in HDF5.
- HDF5 attributes
- Hard to represent in the DAP.
- Also perhaps devise a formal mapping between the
three models
40Thank you
Acknowledgements This report is based upon work
supported in part by a Cooperative Agreement with
NASA under NASA grant NAG 5-2040 and NAG
NCCS-599. Any opinions, findings, and
conclusions or recommendations expressed in this
material are those of the author(s) and do not
necessarily reflect the views of the National
Aeronautics and Space Administration. Other
support provided by NCSA and other sponsors and
agencies. (http//hdf.ncsa.uiuc.edu/acknowledge.ht
ml). Made on location in Champaign Illinois. To
the best of our knowledge, no animals were abused
in the making of these slides.
41Questions/comments?
42Information Sources
- HDF website
- http//hdf.ncsa.uiuc.edu/
- HDF5 Information Center
- http//hdf.ncsa.uiuc.edu/HDF5/
- HDF Helpdesk
- hdfhelp_at_ncsa.uiuc.edu
- HDF users mailing list
- hdfnews_at_ncsa.uiuc.edu