Introduction to HDF5 - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to HDF5

Description:

Data model, library and file format for managing data ... Total Column Ozone (Dobson) 60 385 610. Life and nature. Answering big questions ... 10/15/08 ... – PowerPoint PPT presentation

Number of Views:211
Avg rating:3.0/5.0
Slides: 81
Provided by: TheHDF6
Learn more at: http://www.hdfeos.org
Category:

less

Transcript and Presenter's Notes

Title: Introduction to HDF5


1
Introduction to HDF5
  • HDF HDF-EOS Workshop XII
  • October 15, 2008

1
2
Topics Covered
  • Introduce HDF5
  • Describe HDF5 Data and Programming Models
  • Walk Through Example Code

2
3
For More Information
  • All workshop slides will be available from
  • http//hdfeos.org/workshops/ws12/workshop_twelve.
    php

4
What is HDF5?
  • HDF Hierarchical Data Format
  • Data model, library and file format for managing
    data
  • Tools for accessing data in the HDF5 format

5
Brief History of HDF
  • 1987 At NCSA (University of Illinois), a task
    force formed to create an architecture-independen
    t format and library
  • AEHOO (All Encompassing Hierarchical Object
    Oriented format)
  • Became HDF
  • Early NASA adopted HDF for Earth Observing
    System project
  • 1990s
  • 1996 DOEs ASC (Advanced Simulation and
    Computing) Project began collaborating with the
    HDF group (NCSA) to create Big HDF
    (Increase in computing power of DOE systems at
    LLNL, LANL and Sandia National labs, required
    bigger, more complex data files).
  • Big HDF became HDF5.
  • 1998 HDF5 was released with support from
    National Labs, NASA, NCSA
  • 2006 The HDF Group spun off from University of
    Illinois as non-profit corporation

6
Why HDF5?
  • In one sentence ...

6
7
Answering big questions
7
8
involves big data
8
9
varied data
LCI Tutorial
Thanks to Mark Miller, LLNL
9
10
and complex relationships
SNP Score
Contig Summaries
Discrepancies
Contig Qualities
Coverage Depth
Trace
Reads
Aligned bases
Read quality
Contig
Percent match
10
11
on big computers
11
12
How do we
  • Describe our data?
  • Read it? Store it? Find it? Share it? Mine it?
  • Move it into, out of, and between computers and
    repositories?
  • Achieve storage and I/O efficiency?
  • Give applications and tools easy access our data?

12
13
Solution HDF5!
  • Can store all kinds of data in a variety of ways
  • Runs on most systems
  • Lots of tools to access data
  • Emphasis on standards (HDF-EOS, CGNS)
  • Library and format emphasis on I/O efficiency and
    storage

14
Structure of HDF5 Library
Applications
Object API (C, F90, C, Java)
Library internals
Virtual file I/O
File or other storage
15
HDF Tools
  • - HDFView and Java Products
  • - Command-line utilities (h5dump, h5ls, h5cc,
    h5diff, h5repack)

15
16
HDF5 Applications Domains
HDF-EOS CGNS ASC
Communities
HDF5 Data Model API
Virtual File Layer (I/O Drivers)
Stdio
Custom
Split Files
MPI I/O
Storage
?
HDF5 format
User-defined device
Split metadata and raw data files
File on parallel file system
File
17
Lots of Layers in HDF5!
Ogres are like onions.
Shrek ? HDF5 ?Monster??
Just like Shrek, once you get to know HDF5 you
will really like it!!
18
The HDF5 Format
18
19
An HDF5 file is a container
into which you can put your data objects.
lat lon temp -------------- 12 23
3.1 15 24 4.2 17 21 3.6
19
20
HDF5 Structures for Organizing Objects
20
21
HDF5 Data Model
  • Primary Objects
  • Groups
  • Datasets
  • Additional ways to organize and annotate data
  • Attributes
  • Storage and access properties

Everything else is built from these parts.
21
22
HDF5 Dataset
22
23
Dataspaces
  • Two roles
  • Dataspace contains spatial info about a dataset
    stored in a file
  • Rank and dimensions
  • Permanent part of dataset definition
  • Partial I/0 Dataspace describes applications
    data buffer and data elements participating in
    I/O

Rank 2 Dimensions 4x6
Rank 1 Dimension 10
23
24
Write from memory to disk
memory
disk
24
25
Partial I/O
Move just part of a dataset
disk
memory
(a) Slab from a 2D array to the corner of a
smaller 2D array
Elements in each must be same.
25
26
Datatypes (array elements)
  • Datatype how to interpret a data element
  • Permanent part of the dataset definition
  • Two classes atomic and compound

26
27
Datatypes
  • HDF5 atomic types include
  • integer float
  • user-definable (e.g., 13-bit integer)
  • variable length types (e.g., strings)
  • references to objects/dataset regions
  • enumeration - names mapped to integers
  • HDF5 compound types
  • Comparable to C structs (records)
  • Members can be atomic or compound types

27
28
HDF5 dataset array of records
3
5
Dimensionality 5 x 3
int8
int4
int16
2x3x2 array of float32
Datatype
Record
28
29
Properties
  • Properties are characteristics of HDF5 objects
    that can be modified
  • Default properties handle most needs
  • By changing properties can take advantage of the
    more powerful features in HDF5

30
Special Storage Properties
30
31
Attributes (optional)
  • Attribute data of the form name value,
    attached to an object
  • Operations similar to dataset operations, but
  • Not extensible
  • No compression or partial I/O
  • Can be overwritten, deleted, added during the
    life of a dataset

31
32
HDF5 Dataset (again)
32
33
Groups
  • A mechanism for organizing collections
  • Every file starts with a root group
  • Similar to UNIX directories
  • Can have attributes

/
C
A
B
l
k
m
33
34
Path to HDF5 Object in a File
/
/ (root) /x /foo /foo/temp /foo/bar/temp
x
foo
bar
temp
temp
34
35
Shared Objects
/
A
C
B
R
P
P
/A/P
/B/R
/C/P
35
36
Questions So Far?
37
Useful Tools For New Users
h5dump Tool to dump or display contents of
HDF5 files h5cc, h5c, h5fc Scripts to
compile applications HDFView Java browser to
view HDF4 and HDF5 files
38
H5dump Command-line Utility To View HDF5 File
h5dump --header -a -d ltnamesgt -g
ltnamesgt -l ltnamesgt -t
ltnamesgt -p ltfilegt --header Display
header only no data is displayed. -a ltnamesgt
Display the specified attribute(s). -d
ltnamesgt Display the specified dataset(s). -g
ltnamesgt Display the specified group(s) and all
the members. -l ltnamesgt Displays the value(s)
of the specified soft link(s). -t ltnamesgt
Display the specified named datatype(s). -p
Display properties. ltnamesgt is one or more
appropriate object names.
39
Example of h5dump Output
HDF5 "dset.h5" GROUP "/" DATASET "dset"
DATATYPE H5T_STD_I32BE DATASPACE
SIMPLE ( 4, 6 ) / ( 4, 6 ) DATA
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24
dset
40
HDF5 Compile Scripts
  • h5cc HDF5 C compiler command
  • h5fc HDF5 F90 compiler command
  • h5c HDF5 C compiler command
  • To compile
  • h5cc h5prog.c
  • h5fc h5prog.f90

40
41
Compile option -show
  • -show displays the compiler commands and
    options without executing them

h5cc show Sample_c.c
gcc -I/home/packages/hdf5_1.6.6/Linux_2.6/include
-UH5_DEBUG_API -DNDEBUG -I/home/packages/szip/sta
tic/encoder/Linux2.6-gcc/include
-D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE
-D_FILE_OFFSET_BITS64 -D_POSIX_SOURCE
-D_BSD_SOURCE -stdc99 -Wno-long-long -O
-fomit-frame-pointer -finline-functions -c
Sample_c.c gcc -stdc99 -Wno-long-long -O
-fomit-frame-pointer -finline-functions
-L/home/packages/szip/static/encoder/Linux2.6-gcc
/lib Sample_c.o -L/home/packages/hdf5_1.6.6/Linux
_2.6/lib /home/packages/hdf5_1.6.6/Linux_2.6/lib/l
ibhdf5_hl.a /home/packages/hdf5_1.6.6/Linux_2.6/li
b/libhdf5.a -lsz -lz -lm -Wl,-rpath
-Wl,/home/packages/hdf5_1.6.6/Linux_2.6/lib
41
42
Browsing HDF5 Files with HDFView
43
HDFView
Structure of File
Contents of Dataset
44
HDFView File Menu
45
(No Transcript)
46
Simple HDF5 File in HDFView
Right-click and select Open with mouse
Right-click and select Show Properties with
mouse
47
Simple HDF5 File in HDFView
48
HDF-EOS5 File in HDFView
49
Right-click and select Open As with mouse
50
  • What you cant see
  • with slides
  • Picture displayed instantly
  • File size is 906,229,176


51
Introduction to HDF5 Programming Model and APIs

51
52
Operations Supported by the API
  • Create objects (groups, datasets, attributes,
    complex data types, )
  • Assign storage and I/O properties to objects
  • Perform complex subsetting during read/write
  • Use variety of I/O devices (parallel, remote,
    etc.)
  • Transform data during I/O
  • Make inquiries on file and object structure,
    content, properties

52
53
General Programming Paradigm
  • Properties of object are optionally defined
  • Creation properties
  • Access property lists
  • Object is opened or created
  • Object is accessed, possibly many times
  • Object is closed

53
54
Order of Operations
  • An order is imposed on operations by argument
    dependencies
  • For Example
  • A file must be opened before a dataset
    -because- the dataset open call requires a
    file handle as an argument.
  • Objects can be closed in any order.

54
55
The General HDF5 API
  • Currently C, Fortran 90, Java, and C bindings.
  • C routines begin with prefix H5?
  • ? is a character corresponding to the type of
    object the function acts on

Example Functions H5D Dataset
interface e.g., H5Dread H5F
File interface e.g., H5Fopen
H5S dataSpace interface e.g.,
H5Sclose
55
56
HDF5 Defined Types
For portability, the HDF5 library has its own
defined types hid_t object identifiers
(native integer) hsize_t size used for
dimensions (unsigned long or unsigned long
long) hssize_t for specifying coordinates and
sometimes for dimensions (signed long or
signed long long) herr_t function return
value hvl_t variable length datatype For
C, include hdf5.h in your HDF5 application.
56
57
The HDF5 API
  • For flexibility, the API is extensive
  • 300 functions
  • This can be daunting but there is hope
  • A few functions can do a lot
  • Start simple
  • Build up knowledge as more features are needed

Victronix Swiss Army Cybertool 34
57
58
Basic Functions
  • H5Fcreate (H5Fopen) create (open) File
  • H5Screate_simple create dataSpace
  • H5Dcreate (H5Dopen) create (open) Dataset
  • H5Dread, H5Dwrite access Dataset
  • H5Dclose close Dataset
  • H5Sclose close dataSpace
  • H5Fclose close File

59
Other Common Functions
  • DataSpaces H5Sselect_hyperslab (Partial
    I/O)
  • H5Sselect_elements (Partial I/O)
  • Groups H5Gcreate, H5Gopen, H5Gclose
  • Attributes H5Acreate, H5Aopen_name,
    H5Aclose, H5Aread, H5Awrite
  • Property lists H5Pcreate, H5Pclose
  • H5Pset_chunk, H5Pset_deflate

60
High Level APIs
  • Included along with the HDF5 library
  • Simplify steps for creating, writing, and reading
    objects
  • Do not entirely wrap HDF5 library

61
Example HDF5 Code
62
Steps to Create a File
  • Decide on special properties the file should have
  • Creation properties, like size of user block
  • Access properties, such as metadata cache size
  • Use default properties (H5P_DEFAULT)
  • Create property lists, if necessary
  • Create the file
  • Close the file and the property lists, as needed

62
63
Code Create a File

hid_t file_id herr_t status
file_id H5Fcreate ("file.h5", H5F_ACC_TRUNC,
H5P_DEFAULT,
H5P_DEFAULT) status H5Fclose (file_id)
/ (root)
Note Return codes not checked for errors in code
samples.
63
64
Dataset Components
64
65
Steps to Create a Dataset
  • 1. Define dataset characteristics
  • Dataspace - 4x6
  • Datatype integer
  • Properties if needed, or use H5P_DEFAULT
  • 2. Decide where to put it
  • Obtain location ID
  • Group ID puts it in a Group
  • File ID puts it in Root Group
  • 3. Create dataset in file
  • 4. Close everything

/ (root)
65
3-D array of floats
66
HDF5 Pre-defined Datatype Identifiers
  • HDF5 defines set of Datatype Identifiers per
    HDF5 session.
  • For example
  • C Type HDF5 File Type HDF5 Memory Type
  • int H5T_STD_I32BE H5T_NATIVE_INT
  • H5T_STD_I32LE
  • float H5T_IEEE_F32BE H5T_NATIVE_FLOAT
  • H5T_IEEE_F32LE
  • double H5T_IEEE_F64BE H5T_NATIVE_DOUBLE
  • H5T_IEEE_F64LE
  • Value of datatype is NOT fixed

67
Pre-defined File Datatype Identifiers
Examples H5T_IEEE_F64LE Eight-byte,
little-endian, IEEE floating-point H5T_STD_I32LE F
our-byte, little-endian, signed two's
complement integer
Programming Type
Architecture
NOTE What you see in the file. Name is the
same everywhere and explicitly defines a
datatype. STD An architecture with a
semi-standard type like 2s complement integer,
unsigned integer
68
Pre-defined Native Datatypes
Examples of predefined native types in
C H5T_NATIVE_INT (int) H5T_NATIVE_F
LOAT (float ) H5T_NATIVE_UINT
(unsigned int) H5T_NATIVE_LONG (long
) H5T_NATIVE_CHAR (char ) NOTE Memory
types. Different for each machine. Used for
reading/writing.
69
Dataset Creation Property List
Dataset creation property list information on
how to organize data in storage.
Chunked
Chunked compressed
H5P_DEFAULT contiguous
69
70
Code Create a Dataset
1 hid_t file_id, dataset_id, dataspace_id
2 hsize_t dims2 3 herr_t status
4 file_id H5Fcreate (file.h5",
H5F_ACC_TRUNC,
H5P_DEFAULT, H5P_DEFAULT) 5 dims0 4 6
dims1 6 7 dataspace_id H5Screate_simple
(2, dims, NULL) 8 dataset_id
H5Dcreate(file_id,A",H5T_STD_I32BE,
dataspace_id, H5P_DEFAULT)
9 status H5Dclose (dataset_id) 10 status
H5Sclose (dataspace_id) 11 status
H5Fclose (file_id)
Create a dataspace
Create a dataset
Terminate access to dataset, dataspace, file
70
71
Example Code - H5Dwrite
Dataset Identifier from H5Dcreate or H5Dopen
Memory Datatype
  • status H5Dwrite (dataset_id, H5T_NATIVE_INT,
    H5S_ALL, H5S_ALL, H5P_DEFAULT,
    dset_data)

72
Example Code H5Dwrite
  • status H5Dwrite (dataset_id, H5T_NATIVE_INT,
    H5S_ALL, H5S_ALL,
  • H5P_DEFAULT,
    dset_data)

Data Transfer Property List (MPI I/O,
Transformations, )
Memory Dataspace
File Dataspace
H5S_ALL selects entire dataspace
73
Partial I/O
  • Memory Dataspace File Dataspace (disk)

H5S_ALL
H5S_ALL
Get a Dataspace H5Screate_simple
H5Dget_space Modify Dataspace
H5Sselect_hyperslab H5Sselect_elements
74
Example Code H5Dread
  • status H5Dread (dataset_id, H5T_NATIVE_INT,
  • H5S_ALL, H5S_ALL, H5P_DEFAULT,
    dset_rdata)

75
High Level APIs HDF5 Lite (H5LT)
  • include "H5LT.h"
  • file_id H5Fcreate (file.h5",
    H5F_ACC_TRUNC, H5P_DEFAULT,
    H5P_DEFAULT)
  • status H5LTmake_dataset (file_id,A", 2,
    dims, H5T_STD_I32BE, data)
  • status H5Fclose (file_id)

76
High Level APIs
  • HDF5 Lite
  • HDF5 Image
  • HDF5 Table
  • HDF5 Dimension Scales
  • HDF5 Packet Table

77
Example Create a Group
/ (root)
4x6 array of integers
file.h5
77
78
Steps to Create a Group
  • Decide where to put it root group
  • Obtain location ID
  • Decide name B
  • Create group in file
  • (Eventually) close the group.

78
79
Code Create a Group
hid_t file_id, group_id ... / Open file.h5
/ file_id H5Fopen (file.h5,
H5F_ACC_RDWR, H5P_DEFAULT) /
Create group "/B" in file. / group_id
H5Gcreate (file_id,"B",0) / Close group and
file. / status H5Gclose (group_id) status
H5Fclose (file_id)
Size hint for number of bytes to store names of
objects. 0default
79
80
Thank you!
  • This work was supported by the Cooperative
    Agreement with the National Aeronautics and Space
    Administration (NASA) under NASA grant NNX06AC83A
    and NNX08A077A. Any opinions, findings,
    conclusions or recommendations expressed in this
    material are those of the author(s) and do not
    necessarily reflect the views of NASA.
Write a Comment
User Comments (0)
About PowerShow.com