Better data management through NetCDF - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Better data management through NetCDF

Description:

1. self-describing - all data and meta-data is encapsulated in one file ... installation. Installing NetCDF. Let us test wether this new library is working fine. ... – PowerPoint PPT presentation

Number of Views:167
Avg rating:3.0/5.0
Slides: 49
Provided by: caosIis
Category:

less

Transcript and Presenter's Notes

Title: Better data management through NetCDF


1
Better data management through NetCDF
Jaison Kurian
CAOS IISc
2
Introduction
There are different formats for geoscience data
...
HDF, CDF, NetCDF, Binary, GRIB, ASCII
and many more..............
Is NetCDF the best one ???????
3
Let us start with few examples.......
The story of a Rainfall data....................
4
Features of NetCDF
1. self-describing - all data and meta-data is
encapsulated in one file 2. machine independent -
works on almost all platforms 3. direct access
efficiently read subsets of large datasets 4.
appendable - data can be quickly added to old
files 5. sharable one writing process and
several reading process can
occure at once 6. easy to learn 7.
freely available, well documented well
supported 8. supported by variety of data
analysis, processing, and visualization
tools
5
NetCDF conventions
There are different conventions ....
COARDS is the most widely accepted one.
COARDS ( Cooperative Ocean/Atmosphere Research
Data Service)
Filename extension .cdf or .nc
???? .nc !!!
6
Commands ''ncdump'' ''ncgen''
ncdump can be used
to get a CDL (network Common Data form
Language) file CDL file is the ascii
representation of a NetCDF file ncdump / CDL
file provides an easy way to look at the
structure and contents of a
NetCDF file.
user_at_machine ncdump -c ncfile.nc less
OR user_at_machine ncdump -c ncfile.nc gt
ncfile.cdl user_at_machine man ncdump
7
ncgen can be used 1. to check the syntax
of input CDL file. 2. to make a fortran/c
program to write the NetCDF file described
in input CDL file.
3. to make a binary NetCDF file from fom given
CDL file. We will see the details
later..................
8
coards
Components of a NetCDF file are
1. Header part 1.1 Dimensions 1.2
Variables 1.3 Attributes 2. Data part
9
Rules....
1. dimension/variable name should start with
a letter and can have digits
and '_'. ''temp'' 2.
names are case sensitive
''1temp''
''temp'' ''Temp'' ''TEMP''
10
dimensions
1.1 Dimensions
Maximum number of dimensions for a file is 512
(netcdf-3.5.1)
dimensions time 31 height
1 latitude 122 longitude
182 .........................
.........................
length/size
name
Maximum dimensions for a variable is 4
11
dimensions
unlimited dimension ???
dimensions time UNLIMITED // (31
currently) height 1 latitude
122 longitude 182
.........................
.........................
A NetCDF dataset can have at most one unlimited
dimension, but need not have any. NetCDF model
does not cater for variables with several
changeable dimension sizes. Variables should
have rectangular shapes.
12
variables
1.2 Variables
Maximum number of variables for a file is 4096
(netcdf-3.5.1)
13
variables
variable data types
Type Fortran NetCDF Bits byte
BYTE NF_BYTE 8 char
CHARACTER NF_CHAR 8 short
INTEGER2 NF_SHORT 16 long
INTEGER4 NF_LONG 32 float(real) REAL
4 NF_FLOAT 32
NF_REAL
32 double DOUBLE PRECISION NF_DOUBLE 64
REAL8
64
14
variables
Which dimension varies fastest ????? CDL / C
short
wind_speed(time, height, latitude, longitude)
slowest varying dim
fastest varying dim
Fortran INTEGER2 wind_speed(longitude,
latitude, height, time)
slowest varying dim
fastest varying dim
15
t_end
Lat(Y)
Depth / Height
Time Tn
Lon(X)
T0 1900-01-01 000000
16
variables
Coordinate / Independent Variables (with same
name as dims)
dimensions time 31 height
1 latitude 122 longitude
182 variables float time(time)
timeunits "hours since 1900-1-1
000" timetime_of_day
"1200" float height(height)
heightunits "meters"
heightpositive "up" float
latitude(latitude)
latitudeunits "degrees_N" float
longitude(longitude)
longitudeunits "degrees_E" short
wind_speed(time, depth, latitude, longitude)
17
variables
Coordinate variables have no special meaning to
the NetCDF library. But it typically defines a
physical coordinate corresponding to that
dimension for the software using this
library Softwares/packages that make use of
coordinate variables commonly assume they are
numeric vectors and strictly monotonic
all values are different either
increasing or
decreasing and no
missing/Fill values
18
variables
Primary / Dependent Variables
dimensions time 31 depth 1
latitude 122 longitude
182 variables .........................
... ............................
short wind_speed(time, depth, latitude,
longitude) wind_speedlong_name
"wind speed"
..................................................
............. short zonal_wind_speed(time,
depth, latitude, longitude)
zonal_wind_speedlong_name "zonal wind speed"
.................................
...............................
19
attributes
1.3 Attributes
Variable attributes gt provides information
about a particular variable short
wind_speed(time, depth, latitude, longitude)
wind_speedlong_name "wind speed"

variable name
attr. name
attr. data (character string)
(numeric value)
wind_speedmissing_value 32767s
20
attributes
Character Variable Attributes
short wind_speed(time, depth, latitude,
longitude) wind_speedlong_name
"wind speed" lt Title
wind_speedunits "m/s" gt OR
"ms-1"
long_name units are recognized by tools like
Ferret We can add any other attributes if
needed but this does not be
recognized by any tools....example..
wind_speedvar_desc "scalar wind speed"
wind_speeddataset
"quikscat_01_2001.nc
wind_speedlevel_desc "Surface"
wind_speedstatistic "3 day Mean"
wind_speedparent_stat "Satellite
Observation" wind_speedhistory
''no processing''
21
attributes
Character Variable Attributes
float latitude(latitude)
latitudelong_name "Latitude in Degrees''

latitudeunits "degrees_N"
latitudepoint_spacing ''even'' lt
perfomance improvement float
longitude(longitude)
longitudelong_name ''Longitude in Degrees''
longitudeunits "degrees_E"
longitudemodulo '' ''
longitudepoint_spacing ''even''
degrees_E / degrees_east / degree_E /
degree_east degrees_N / degrees_north / degree_N
/ degree_north
22
attributes
Character Variable Attributes
float depth(depth)
depthlong_name ''Depth wrt sea surface''
depthunits "meters"
depthpositive "down" float
height(height) heightlong_name
''Height wrt Ground''
heightunits "meters"
heightpositive "up"
for ocean
for atmosp.
23
attributes
Character Variable Attributes
float time(time)
timelong_name ''Time''
timeunits "hours since 1900-1-1 000"
timetime_of_day "1200"
timecalendar JULIAN gt OR
calendar_type
Reccomented time units are seconds, minutes,
hours days. months
years are not of equal length
calendar(tool specific)
GREGORIAN or STANDARD 365.2425 default
calendar JULIAN
365.25 with leap years
NOLEAP or COMMON_YEAR 365 no leap
years 360_DAY
360 each month is 30 days
24
attributes
Character Variable Attributes
climatological time axis float
time(time) timelong_name
''Climatological Time''
timeunits "hours since 0000-1-1 000"
timemodulo '' ''
25
attributes
Numeric Variable Attributes
short wind_speed(time, depth, latitude,
longitude) wind_speedlong_name
"wind speed"
............................
...................................
wind_speedvalid_min 0.f
wind_speedvalid_max 60.f OR
wind_speedvalid_range 0.f, 60.f
Numeric ''type'' of attribute should be same as
that of variable.
26
attributes
Numeric Variable Attributes
short wind_speed(time, depth, latitude,
longitude) wind_speedlong_name
"wind speed"
..................................................
...... wind_speedscale_factor
0.01f wind_speedadd_offset
0.f
scale_factor offset together offers
''packing'' of data while a tool ''reads''
packed data first multiply by scale_factor

then add offset while
''packing'' data first
subtract offset
then devide by
scale_factor
scale_factor and add_offset gt of the type of
unpacked data(float or double) ''packed'' data is
typically of type byte or short
27
attributes
Numeric Variable Attributes
short wind_speed(time, depth, latitude,
longitude) wind_speedlong_name
"wind speed"
..................................................
........... wind_speedmissing_val
ue 32767s wind_speed_FillValu
e 32767s
_FillValue value used to pre-fill disk space
allocated to the variable
scalar, same ''type'' as the variable
missing_value value/values indicating missing
data scalar/vector,
same ''type'' as the variable These values
should all be outside the valid_range. If
variable is ''packed'' gt missing_value/_FillValu
e flags are likewise packed
28
attributes
Global Attributes
provides information about the netCDF dataset as
a whole such as
title, processing history, instrument ......
can be of character / numeric type a good
option to store all the necessary details about
the data set to make it ''really
self-describing''
29
attributes
Global Attributes
// global attributes
WOCE_Version "3.0"
CONVENTIONS "COARDS/WOCE"
long_name "QuikSCAT daily mean wind fields"
producer_agency "IFREMER"
producer_institution "CERSAT"
product_version "1.0"
time_resolution "one day mean"
spatial_resolution "0.5 degrees"
platform_id "QuikSCAT"
instrument "QuikSCAT"
objective_method "kriging"
data_processing "data missing dates are filled
with dummy
_FillValue-s"
time_modification "to avoid the problems
with 1200 hrs in ferret"
30
data
2. Data
time 902891.9, 902915.9, 902939.9, 902963.9,
902987.9, 903011.9,
903035.9, 903059.9, 903083.9, 903107.9, 903131.9,
903155.9, ...........................
..................................................
........................ height 10
latitude 30.25, 29.75, 29.25, 28.75, 28.25,
27.75, 27.25, 26.75, 26.25,
..................................................
..............................................
longitude 29.75, 30.25, 30.75, 31.25,
31.75, 32.25, 32.75, 33.25, 33.75,
..............................................
................................................
wind_speed -129, -129, -129, -129, -129,
-129, -129, -129, -129, -129, -129, -129,
..................................................
..................................................
............... .................................
..................................................
................................
31
Fortran Interf
Fortran Interface
How to read/write NetCDF files using Fortran ??
- use the ''include'' header file
to define NetCDF related variables.
INCLUDE
'netcdf.inc' - Explicitly
specify NetCDF ''include'' ''lib'' directories
if the files
''netcdf.inc'' and ''libnetcdf.a'' are not in
default search
directories for the compiler (like /usr/include
/usr/lib)
user_at_machine f77 mync_pgm.f
-I/home/pkgs/netcdf-3.5.1/include
-L/home/pkgs/netcdf-3.5.
1/lib -lnetcdf
32
Fortran Interf
Fortran Interface
steps to create a new NetCDF file
1. open a new NetCDF file err
NF_CREATE ( 'let_me_learn.nc', NF_WRITE, ncid )
2. define all the required dimensions
err NF_DEFINE_DIM( ncid, 'latitude', 180,
dimid_lat ) 3. define all the required variables
err NF_DEFINE_VAR( ncid,
'latitude', NF_REAL, 1,

dimid_lat, varid_lat) 4. define all attributes
err NF_PUT_ATT_TEXT( ncid,
varid_lat, 'units', 9,

'degrees_N' ) 5. leave define mode ( and enter
''data'' mode ) err NF_ENDDEF
(ncid) lt Very Important 6. write data
err NF_PUT_VARA_REAL(ncid, varid_lat,
1, 180, lat) 7. close NetCDF file
err NF_CLOSE (ncid)
33
Fortran Interf
Fortran Interface
steps to read an existing NetCDF file
1. open existing NetCDF file err
NF_OPEN ( 'let_me_learn.nc', NF_NOWRITE, ncid )
2. get all the required variable ''id''s
err NF_INQ_VARID( ncid, 'latitude',
varid_lat ) 3. get variable ''data''
err NF_GET_VARA_REAL( ncid, varid_lat,
start, count, lat) 4. close NetCDF file
err NF_CLOSE (ncid)
34
Fortran Interf
Fortran Interface
OMODE Flags
NF_CLOBBER overwrite any existing dataset with
the same file name NF_NOCLOBBER do not
overwrite (clobber) an existing dataset NF_WRITE
open dataset with read-write access.
- add/change dim, var, att
data - delete
att NF_SHARE same as NF_WRITE
- one process may be writing the
dataset and one or more
other processes reading the dataset
concurrently NF_NOWRITE open dataset with
read-only access
35
Fortran Interf
Fortran Interface
How to write the program in an efficient way ????
1. Use IMPLICIT NONE option 2. Use
HANDLE_ERR subroutine err NF_CREATE
( 'let_me_learn.nc', NF_CLOBBER, ncid )
if (err .NE. NF_NOERR) call HANDLE_ERR(err)
SUBROUTINE HANDLE_ERR(ERR)
IMPLICIT NONE INCLUDE 'netcdf.inc'
INTEGER ERR PRINT ,'netcdf
error ', NF_STRERROR(ERR) STOP
'Stopped' END 3. Segmentation
fault (core dumped) gt check for number of
arguments
36
Fortran Interf
Fortran Interface
Let us see few examples...........................
37
ncgen 1. to check the syntax of input CDL
file. 2. to make a fortran/c program to
write the NetCDF file described
in input CDL file. 3. to make
a binary NetCDF file from fom given CDL file.
user_at_machine man ncgen
38
installation
Installing NetCDF (netcdf-3.5.1 RH Linux-9)
Get netcdf-3.5.1.tar.Z from unidata's download
site. Login as root (if needed) Do the
following 'setenv' stuff (export for bash shell)
export CC/usr/bin/c99 export
CPPFLAGS'-DNDEBUG -Df2cFortran' export
CFLAGS-O export FC/usr/bin/f77
export FFLAGS'-O -w' export
CXX/usr/bin/c
root_at_machine zcat ./netcdf-3.5.1.tar.Z tar
xvf - root_at_machine ./configure
root_at_machine make root_at_machine make
test root_at_machine make install
root_at_machine make clean
39
Installing NetCDF
Let us test wether this new library is working
fine............
40
What about curvilinear data ???
41
Limitations of NetCDF
1. File size increases in some cases even with
missing data. 2. Only one UNLIMITED dimension is
possible. 3. Limited number of external data
types......inefficient use of disc space. 4.
File size maximum of 2GB. 5. The extent to which
data can be completely self-describing is
limited there is always some
assumed context without which sharing and
archiving data would be impractical. 6.
No support for multiple concurrent writers. 7.
Dimensions, Variables Data cannot be DELETED
!!!!!!
42
So ..... how is NetCDF ???????
43
Good ?????
44
Beware !!!!!!!!!!!!!!
Data being in NetCDF format doesnot guarantee
that it is better than having the
data in other formats (unless it is supplied in
proper shape with all necessary
details/informations). Here is an
example................from Argo data archive.
45
Questions Please ..............
46
Some usefull sites.... NetCDF home page
http//my.unidata.ucar.edu/content/software/netcdf
/index.html why NetCDF
http//www.cgd.ucar.edu/ccr/bettge/CSM-netCDF/csm_
why_netcdf.html Software for Manipulating or
Displaying netCDF Data http//www.unidata.ucar.edu
/packages/netcdf/software.html Documentaion
http//www.unidata.ucar.edu/packages/netcdf/docs
.html COARDS NetCDF Convention
http//ferret.wrc.noaa.gov/noaa_coop/coop_cdf_pr
ofile.html
47
NetCDF man pages user_at_machine man
ncdump user_at_machine man ncgen
user_at_machine man netcdf
48
THANK YOU
Write a Comment
User Comments (0)
About PowerShow.com