Introduction to NetCDF4 - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to NetCDF4

Description:

Funded by NASA ESTO AIST Program. Joint project between Unidata and HDF Group ... Watch for Snapshot. http://www.unidata.ucar.edu/software/netcdf/builds ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 46
Provided by: peter1061
Learn more at: http://hdfeos.org
Category:

less

Transcript and Presenter's Notes

Title: Introduction to NetCDF4


1
Introduction to NetCDF4
  • MuQun Yang
  • The HDF Group

2
Notes
  • Require basic knowledge of HDF5 and netCDF3
  • Cover general NetCDF4 concepts
  • - Several new features and their performances
  • Cover some NetCDF4 APIs but wont review all new
    APIs
  • Is not a netCDF3 tutorial

3
Contents
  • History review
  • Overview of NetCDF4 features, builds and etc
  • Performance issues
  • Suggestions for users

4
History Review
  • Funded by NASA ESTO AIST Program
  • Joint project between Unidata and HDF Group
  • Used HDF5 as the storage layer of NetCDF

5
NetCDF-4/HDF5 Goals
  • Combine desirable characteristics of netCDF and
    HDF5, while taking advantage of their separate
    strengths
  • - Widespread use and simplicity of netCDF
  • - Generality and performance of HDF5
  • Preserve format and API compatibility for netCDF
    users
  • Demonstrate benefits of combination in advanced
    Earth science modeling efforts

(From Russ Rew etcs talk at VII HDF and
HDF-EOS workshop)
6
NetCDF-4 Architecture
(From Russ Rew etcs talk at VII HDF and
HDF-EOS workshop)
7
(No Transcript)
8
Contents
  • History review
  • Overview of NetCDF4 features, builds and etc
  • Performance issues
  • Suggestions for users

9
Current Status
  • http//www.unidata.ucar.edu/software/netcdf/netcdf
    -4/
  • 4.0 beta 1 based on HDF5 1.8 beta 1 on April,
    2007
  • 4.0 beta 2 release is coming soon

10
Compilers, platforms and language supports
  • Platforms
  • Linux, IBM AIX, Sun OS, HP-UX, OSF1, IRIX, Cygwin
  • Programming Languages
  • - C/C and fortran
  • Compilers
  • - Vendor compilers on the supported platforms
  • Watch for Snapshot
  • http//www.unidata.ucar.edu/software/netcdf/build
    s/snapshot/netcdf-4

11
Configuration
  • Only NetCDF3 will be built if you just type
    ./configure
  • Before building NetCDF4, one must
  • install HDF5 1.8 beta 1 or later (note parallel
    HDF5 needs separate build)
  • install zlib library if using data compression
  • To build sequential version
  • - ./configure --enable-netcdf-4
    --with-hdf5/HDF5path --with-zlib/zlibpath
  • To build parallel version
  • - ./configure --enable-netcdf-4
    enable-parallel disable-shared
    --with-hdf5/parallel HDF5path --with-zlib/zlibpa
    th
  • Parallel NetCDF4 needs more work. It has been
    tested on IBM AIX.

12
API Changes
  • Existing APIs
  • Essentially no differences but with new flags
  • NetCDF3
  • NetCDF4
  • Adding new APIs for new features
  • such as
  • nc_def_var_deflate(ncid, varid, shuffle,
    deflate, deflate level)
  • Hereafter blue color in APIS implies this is an
    output parameter

nc_create(FILE_NAME, NC_NOCLOBBER, ncid)
nc_create(FILE_NAME, NC_NETCDF4,ncid)
13
Overview of NetCDF4 new features
  • Data Type
  • - Compound data type
  • Variable length type
  • Group
  • Multiple Unlimited Dimension
  • Compression
  • Parallel IO

14
A compound datatype example
types compound wind_vector_t float
eastward float northward
dimensions lat 18 lon 36
pres 15 time 4 variables
wind_vector_t gwind(time, pres, lat, lon)
windlong_name "geostrophic wind vector"
windstandard_name "geostrophic_wind_vector"
data gwind 1, -2.5, -1, 2, 20, 10,
1.5, 1.5, ...
15
Variable length type
Simple example ragged array types
float() row_of_floats dimensions m
50 variables row_of_floats
ragged_array(m)
16
An Example variable length and compound datatype
struct sea_sounding int sounding_no
nc_vlen_t temp_vl dataDIM_LEN /1.
Create a netcdf-4 file. / nc_create(FILE_NAME,
NC_NETCDF4, ncid) / 2. Create the vlen
type, with a float base type. /
nc_def_vlen(ncid, "temp_vlen", NC_FLOAT,
temp_typeid) / 3. Create the compound
type to hold a sea sounding. /
nc_def_compound(ncid, sizeof(struct
sea_sounding), "sea_sounding", sounding_typeid)
nc_insert_compound(ncid, sounding_typeid,
"sounding_no", NC_COMPOUND_OFFSET(struct
sea_sounding, sounding_no), NC_INT)
nc_insert_compound(ncid, sounding_typeid,
"temp_vl", NC_COMPOUND_OFFSET(struct
sea_sounding, temp_vl), temp_typeid) / 4.
Define a dimension, and a 1D var of sea sounding
compound type. / nc_def_dim(ncid, DIM_NAME,
DIM_LEN, dimid) nc_def_var(ncid,
"fun_soundings", sounding_typeid, 1, dimid,
varid) / 5. Write our array of phone data
to the file, all at once. / nc_put_var(ncid,
varid, data) /6. Close the file/
nc_close(ncid)
17
Group
  • Use of Groups is optional, with backward
    compatibility maintained by putting everything in
    the top-level unnamed Group.
  • Unlike HDF5, netCDF-4 requires that Groups form a
    strict hierarchy.
  • Potential uses for Groups include
  • Factoring out common information
  • Containers for data within regions, ensembles
  • Organizing a large number of variables
  • Providing name spaces for multiple uses of same
    names for dimensions, variables, attributes
  • Modeling large hierarchies

18
Group APIs
  • APIs for creating group( define APIs)
  • nc_def_grp(parent_group_id, group name,
    group_id)
  • Examples
  • nc_def_grp(ncid, HENRY_VII, henry_vii_id)
  • nc_def_grp(henry_vii_id, MARGARET, margaret_id)
  • APIs for inquiring information from a group
  • ( inquiry APIs)
  • number of groups nc_inq_grps(group_id,
    num_grps, NULL)
  • children group id list nc_inq_grps(group_id,
    NULL, group_id_list)
  • children group name
  • nc_inq_grpname(group_id_list0,
    children_group_name)

19
Multiple Unlimited Dimension APIs
  • APIs for defining multiple unlimited dimensions
  • Old API with the same flag
  • nc_def_dim(ncid, dimension name, NC_UNLIMITED,
    int idp)
  • Examples
  • nc_def_dim(ncid, dimname_1, NC_UNLIMITED,
    dimid0)
  • nc_def_dim(ncid, dimname_2,NC_UNLIMITED,
    dimid1)
  • APIs for inquiring multiple dimensions
  • Old API with the same flag nc_inq_unlimdim(ncid
    ,,int idp)
  • New API nc_inq_unlimdims(ncid, int
    nunlimdims_in, int unlimdimid )
  • How to use the new API
  • 1) First obtain the number of unlimited
    dimensions
  • nc_inq_unlimdims(ncid, nunlimdims ,NULL)
  • 2) Then obtain the unlimited dimensional list
  • nc_inq_unlimdims(ncid, nunlimdims,
    unlimdimid)

20
Compression
  • Deflate now
  • Scaleoffset, N-bit and maybe szip in the future
  • Only need to add one routine
  • nc_def_var_deflate( int netcdf id,
    int variable id,
  • int shuffle, int deflate,
  • int deflate_level)

21
Compression example code
  • ----- Data writing --------
  • 1. Define variable
  • nc_def_var(ncid, VAR_BYTE_NAME, NC_BYTE, 2,
    dimids, byte_varid)
  • 2. Set deflate compression
  • nc_def_var_deflate(ncid, byte_varid, 0, 1,
    DEFLATE_LEVEL_3)
  • 3. Write the data
  • nc_put_var_schar(ncid, byte_varid, (signed char
    )byte_out)
  • ----- Data reading --------
  • nc_get_var_schar(ncid, byte_varid, (signed char
    )byte_in)

22
Parallel IO
  • Support either collective or independent
  • Support MPI-IO or MPI-POSIX IO via parallel HDF5
  • Special functions are used to create/open a
    netCDF file in parallel.

23
New APIs to do parallel IO
  • nc_create_par
  • nc_create_par
  • (const char path, int mode,MPI_Comm comm,
    MPI_Info info, int ncidp)
  • mode must be NC_NETCDF4NC_MPIIO or
    NC_NETCDF4NC_MPIPOSIX
  • nc_var_par_access
  • nc_var_par_access
  • (int ncid, int var_id, int data_access )
  • Data_access can be either NC_COLLECTIVE or
    NC_INDEPENDENT
  • nc_open_par
  • nc_open_par
  • (const char path,int mode ,MPI_Comm comm,
    MPI_Info info,ncid)
  • mode must be either NC_MPIIO or NC_MPIPOSIX

24
Parallel IO Programming Model
  • Data writing
  • / 1. Initialize MPI. /
  • MPI_Init(argc,argv)
  • / 2. Create a parallel netcdf-4 file. /
  • nc_create_par(FILE, NC_NETCDF4NC_MPIIO, comm,
    info, ncid)
  • nc_var_par_access(ncid, v1id, NC_COLLECTIVE)
  • / 3. Write data. /
  • nc_put_vara_int(ncid, v1id, start, count,data)
  • /4. Close the file /
  • nc_close(ncid)
  • / 5. Shut down MPI. /
  • MPI_Finalize()
  • Data reading
  • Use nc_open_par instead of nc_create_par

25
Other features
  • Datatype
  • - More atomic datatype unsigned integer(1,2,4
    and 8 bytes)
  • Strings replace character arrays
  • Enums,Opaque types
  • User-defined datatype
  • Fletcher32 checksum filter
  • UTF-8 support
  • Reader-Makes-Right conversion
  • Using HDF5 dimensional scale

26
Content
  • History review
  • Overview of NetCDF4 features, builds and etc
  • Performance issues
  • Suggestions for users

27
NetCDF4 Data Compression Size
lt2
28
NetCDF4 Data Compression Data Write time
29
NetCDF4 Data Compression Data Read Time
30
WRF Output in HDF5 -File Size
31
WRF Output in HDF5- Data writing time
32
EUMETNET OPERA Report in 2006
They evaluated the following data format
  • FM 92 GRIB, NORDRAD, Universal Format,
  • netCDF, HDF4,HDF5,
  • XML and Scalable Vector Graphics (SVG), and
    GeoTIFF

Their Recommendation
  • Based on the results of the detailed evaluation,
    HDF5 is recommended for consideration as an
    official European standard format for weather
    radar data and products.

Why?
  • Compared to other formats, HDF5s compression
    algorithm (ZLIB) is more efficient
  • A file format with efficient compression and
    platform independence is essential

PyTables
One of the beauties of PyTables is that it
supports compression on tables and arrays
33
Evaluation of Parallel NetCDF4 Performance
  • Regional Oceanographic Modeling System
  • History file writer in parallel NetCDF4(PnetCDF4)
  • History file writer in parallel NetCDF from
    Argonne(PnetCDF)
  • Data
  • 60 1D-4D double-precision float and integer
    arrays

34
PnetCDF4 and PnetCDF performance comparison
PNetCDF collective
NetCDF4 collective
160
140
120
100
Bandwidth (MB/S)
80
60
40
20
0
0
16
32
48
64
80
96
112
128
144
Number of processors
  • Fixed problem size 995 MB
  • Performance of PnetCDF4 is close to PnetCDF

35
ROMS Output with Parallel NetCDF4
  • The IO performance gets improved as the file
    size increases.
  • It can provide decent I/O performance for big
    problem size.

36
Chunking
  • Using chunking wisely
  • Review chunking tips for HDF5

37
Content
  • History review
  • Overview of NetCDF4 features, builds and etc
  • Performance issues
  • Suggestions for users

38
NetCDF Classic Model

39
Using the NetCDF Classic Model
  • NetCDF-4 files can be created with the
    CLASSIC_MODEL flag. This enforces the rules of
    the classic netCDF data model on this file.
  • nc_create(FILE_NAME, NC_NETCDF4NC_CLASSIC_MODEL
    , ncid)
  • Once a classic model file, always a classic model
    file. This sticks with the file and there is no
    way to change in within the netCDF API.
  • Classic model files don't use any elements of the
    expansion of the data model in netCDF-4. They
    don't have groups, user-defined types, multiple
    unlimited dimensions, or the new atomic types.
  • Since they conform to the classic model, they can
    be read and understood by any existing netCDF
    software (as soon as that software upgrades to
    netCDF-4 and HDF5 1.8.0).
  • NetCDF-4 features which don't affect the data
    model are still available compression, parallel
    I/O.

40
HDF5 Features not in current NetCDF4.0
  • No Scaleoffset, N-bit, szip filters (Plan for 4.1
    release)
  • No supports for user-defined filters
  • Can only read HDF5 files having dimensional
    scales
  • Can only write data in chunking storage
  • No Fortran 90 APIs
  • No corresponding APIs for optimizations
  • - cache, MPI-IO

41
NetCDF 4.1 Plan
  • http//www.unidata.ucar.edu/software/netcdf/netcdf
    -4/req_4_1.html

42
NetCDF4, HDF5 which one should I use?
Evaluate the followings
  • Familiarity
  • Features
  • Performance
  • Compatibility
  • Release/feature lags

43
Based on stability of NetCDF4
Priority
Recommendation
High Performance many advanced HDF5 features HDF5 definitely
Care about performance, Possibly need to use many new advanced features HDF5 maybe
NetCDF4Avoid transition cost from NetCDF to HDF5 NetCDF4 maybe
1. Just need one or two HDF5 features for intensive NetCDF applications NetCDF4/CLASSIC_MODEL (compression ,parallel IO) 2. Existing NetCDF software or applications that dont care about performance NetCDF4 definitely
44
More NetCDF4 information
  • Release and snapshot http//www.unidata.ucar.edu
    /software/netcdf/netcdf-4/
  • Tutorial in 2007 NetCDF workshop
  • http//www.unidata.ucar.edu/software/netcdf/works
    hops/2007/
  • Paper in 2006 AMS annual meeting
  • http//www.unidata.ucar.edu/software/netcdf/paper
    s/2006-ams.pdf

45
Acknowledgements
  • Thanks Russ Rew and Ed Hartnett from Unidata for
    generously allowing me to use their slides and
    sharing their compression performance results in
    this workshop
  • Some contents that describe New features of are
    copied from 2007 Unidata NetCDF workshop
  • The Radar NetCDF data compression performance
    results are provided by Ed Hartnett at Unidata
Write a Comment
User Comments (0)
About PowerShow.com