Title: Using HDF5 for Geospatial Vector Data
1Using HDF5 for Geospatial Vector Data
Different HDF5 arrangements of vertices
Question How suitable is a general purpose
format like HDF5 for storing and accessing
geospatial feature data?
HDF5 1-D dataset, each element is a
variable-length array containing all vertices for
a shape.
- Ragged array 1-D array of variable-length data
types - Index array of offsets to data values in single
linear array. Similar to Shapefiles. - 2-D array one shape per row, multiple arrays
when shape sizes vary.
metadata
x
y
x
y
1
metadata
x
y
2
metadata
x
y
x
y
x
y
3
4
metadata
x
y
5
metadata
x
y
x
y
Feature data example
HDF5 1-D dataset containing all vertices, in order
Index
x
y
x
y
x
y
x
y
x
y
x
y
x
y
x
y
x
y
metadata
1
0
metadata
2
2
metadata
3
3
metadata
4
6
- Test case ESRI Shapefiles
- Store geometry and attribute information for
spatial features as shapes with vector
coordinates. - Support point, line, and area features.
- Widely used file format for geospatial feature
data.
metadata
5
7
HDF5 2-D datasets, each row containing all
vertices for a shape
Large shapes
Small shapes
Distribution showing vertices/shape
HDF5 example (1 file)
Shapefile format (3 files)
Data compression recovers unused space
Results Comparing Shapefile and HDF5
.shp
Main file - each record describes a shape with a
list of its vertices
File size
Access time
.shx
Index file - each record contains offset of
corresp. main file record
- Overhead for variable-length structures (ragged
array) is high. HDF5 file always bigger than
Shapefile. - HDF5 linear array with index is comparable to
shapefile. - Compression
- HDF5 linear array with index saves up to 40 vs.
Shapefile. - HDF5 2-D arrays comparable to Shapefile when
compression used. Without compression, HDF5 files
much larger.
- I/O overhead from variable length and compound
types significantly slows access in HDF5. (HDF5
reads 5-20 times slower than Shapefile reads). - Can be improved considerably by turning off
internal free lists. - When compound and variable-length types not used,
HDF5 access time is comparable to Shapefile
access.
.dbx
A dBASE table - feature attributes for each
record.
Shapefiles tested
ESRI Environmental Systems Research Institute,
Inc