Title: OPeNDAPandTHREDDS: AccessandDiscoveryofDistributedScientificData
1OPeNDAP and THREDDSAccess and Discovery of Distr
ibuted Scientific Data
- Yuan Ho
- Ethan Davis
- UCAR Unidata
2Access and Discovery of Distributed Scientific Da
ta
- OPeNDAP access to scientific data but no
standard inventory or discovery mechanisms - THREDDS cataloging, describing, and discovery
of scientific data
3What is OPeNDAP
- OPeNDAP (Open source Project for a Network Data
Access Protocol) is a protocol for accessing
distributed scientific data (aka DODS DAP). - OPeNDAP is a generic data exchange mechanism that
lies at the core of a variety of discipline data
system. - OPeNDAP is two reference implementations of the
protocol (C and Java) - OPeNDAP is a software framework that simplifies
all aspects of scientific data networking,
allowing simple access to remote data. - OPeNDAP is a community of users and developers
- OPeNDAP is a non-profit corporation called
OPeNDAP Inc..
4Design Principles
- The user should be able to share their data via
OPeNDAP over network (server). - The user should be able to use their application
package to examine or analyze the data of
interest (client).
5Client/Server Interaction
- Data access (client)
- Access to remote data in users normal application
- IDL (win32)
- Matlab
- Ferret
- GrADS
- Any netCDF application
- Excel
- Dont need to know the data format in which the
data is stored - Can access data subsets.
- Data publishing (server)
- Network interface via http
- DAP provides common/network representation for
data - Can serve data in various formats
- netCDF
- HDF
- SQL
- FreeForm
- JGOFS
- DSP
- Allows subsetting of data
6OPeNDAP Status
- OPeNDAP/DODS 3.4 release
- OPeNDAP Java 1.1.3
- OPeNADP Data Connector 2.3X
- OPeNDAP DAP Specification 4.0
7OPeNDAP Data Object
- Three important OPeNDAP data objects
- DDX
- The DDX is an XML representation of the structure
of all or part of a data set, as well as a
description of the variables within that
datasets. - Blob
- Binary data transfer from the data source to the
client. The Blob contains the serialized data
represented by the DDX. - ErrorX
- The ErrorX object is an XML document containing
information about any errors that may have been
encountered by the server while processing a
request.
8DDX Example
- DDX Example
- ltDatasets namefnoc1.nc
- xmlnsxsihttp//www.w3.org/2001/
XMLSchema-instance - xmlnshttp//www.opendap.org/ns/O
PeNDAP - xsischemaLocationhttp//www.op
endap.org/ns/OPeNDAP -
http//dods.coas.oregonstate.edu8080/opendap/ope
ndap.xsdgt - ltAttribute nameDescription
typeStringgt - ltvaluegtFleet Numerical
Wind Datalt/valuegt - lt/Attributegt
- ltArray nameugt
- ltAttribute
namelong_name typeStringgt -
ltvaluegtU_Wind_Vectorlt/valuegt - lt/Attributegt
-
- ltFloat32/gt
- ltdimension size16
namelatitudegt - ltdimension size17
namelongitudegt - ltdimension size21
nametimegt
9Variables and Attributes
- Each variable consists of a name, a type, a value
and a collection of Attributes. - Atomic variables atomic data types are
indivisible. - integer, floating-point, string, and binary
images. - Example
- ltFloat64 nameDepth/gt
- ltBinary namesound_sample size17623/gt
- Constructor variables a constructor variable is
assembled from collections of other variables,
including both atomic and constructor types. - array, structure, grid, and sequence.
- Example
- ltArray nametempgt
- ltByte/gt
- ltdimension size5 namelon/gt
- ltdimension size3 namelat/gt
- lt/Arraygt
10Variables and Attributes
- An attributes is composed of a name, a type, and
a value. - Each variable may have zero or more attributes.
- Types Boolean, Byte, IntXX, UIntXX, FloatXX,
String, URL. - Example
-
- ltDataset nametestgt
- ltStructure namemeasurementgt
- ltAttribute namedata typeStringgt
- ltvaluegt 18 Mar 03lt/valuegt
- lt/Attributegt
- ltAttribute nameother
typeStructuregt - ltAttribute namesatellite_name
typeStringgt - ltvaluegtGOESlt/valuegt
- ltAttribute nameexperiment
number typeint32gt - ltvaluegt898976lt/valuegt
- lt/Attributegt
- lt/Attributegt
- ltFloat64 namevaluegt
- ltArray nametime_seriesgt
11Requests/Responses
- Responses four categories of information pass
from the server to client - Information about the data DDX
- The data Blob
- Error messages ErrorX object
- Information about the server version messages
and server capabilities document - Requests a constraint expression provides a way
for client to request certain information from a
dataset, such certain variables, or parts of
certain variables. - Projection clause a collection of one or more
project elements - Selection clause one or more select elements.
- Example
- ltConstraintgt
- ltProject variable/sample/temp/gt
- ltProject variable/sample/salt/gt
- ltSelect condition/sample/saltgt34.0
targetsample/gt - lt/Constraintgt
12Problems of searching and retrieving datasets
from OPeNDAP server
- Metadata
- Use metadata metadata at the data level
- Search metadata metadata at the directory level
- OPeNDAP has been built from data level, high
functionality at the data acquisition level. - OPeNDAP AIS (ancillary information service)
adding metadata information into OPeNDAP data
stream. The role of ancillary data is to
translate and access of data - ODC is more a directory services with limit data
searching functionality.
13Summary of OPeNDAP
- OPeNDAP data delivery architecture provides
remote access of data via internat. - OPeNDAP uses HTTP (FTP, GridFTP, Telnet, et
cetera) to transport its data object. - OPeNDAP has proved very versatile.
- XML for the persistent form of the data objects.
- OPeNDAP is a data access tool, need a data
discovery tool to complement each other.
14THREDDS Project
- Develop a framework to bridge the gap between
data providers and data users, to make scientific
data discoverable and usable as well as
referencable from scientific publications and
educational materials. - The framework should be
- Scalable for large and small projects
- Easy to use yet powerful and flexible
- Capable of supporting various user interfaces
15THREDDS Catalogs
THREDDS catalogs are for communicating
information about a collection of datasets
- Hierarchal structure of datasets
- Dataset access methods
- Structure on which to hang (reference) metadata
1
0..
0..
0..
0..
16THREDDS Catalogs
THREDDS catalogs are for communicating
information about a collection of datasets
- Hierarchal structure of datasets
- Dataset access methods
- Structure on which to hang (reference) metadata
1
0..
0..
0..
0..
17THREDDS Catalogs
ltcatalog version"0.6"gt ltdataset
name"Unidata IDD Model Data"gt ltdataset
name"NCEP Eta 80km CONUS model data"gt
ltmetadata metadataType"DublinCore"
xlinkhref"http//server/dods/et
a.xml" /gt ltdataset name"NCEP Eta
80km CONUS 2003-09-24 12Z"gt
ltaccess serviceType"DODS"
urlPath"http//server/dods/2003092412_eta.
nc" /gt lt/datasetgt
18THREDDS Catalogs
THREDDS catalogs are for communicating
information about a collection of datasets
- Hierarchal structure of datasets
- Dataset access methods
- Structure on which to hang (reference) metadata
1
0..
0..
0..
0..
19THREDDS Catalogs
ltcatalog version"0.6"gt ltdataset
name"Unidata IDD Model Data"gt ltdataset
name"NCEP Eta 80km CONUS model data"gt
ltmetadata metadataType"DublinCore"
xlinkhref"http//server/dods/et
a.xml" /gt ltdataset name"NCEP Eta
80km CONUS 2003-09-24 12Z"gt
ltaccess serviceType"DODS"
urlPath"http//server/dods/2003092412_eta.
nc" /gt lt/datasetgt
20THREDDS Catalogs
THREDDS catalogs are for communicating
information about a collection of datasets
- Hierarchal structure of datasets
- Dataset access methods
- Structure on which to hang (reference) metadata
1
0..
0..
0..
0..
21THREDDS Catalogs
ltcatalog version"0.6"gt ltdataset
name"Unidata IDD Model Data"gt ltdataset
name"NCEP Eta 80km CONUS model data"gt
ltmetadata metadataType"DublinCore"
xlinkhref"http//server/dods/e
ta.xml" /gt ltdataset name"NCEP Eta
80km CONUS 2003-09-24 12Z"gt
ltaccess serviceType"DODS"
urlPath"http//server/dods/2003092412_eta.
nc" /gt lt/datasetgt
22THREDDS Catalogs
ltdctitlegtNCEP Eta 80km CONUS model
datalt/dctitlegt ltdccreatorgtNOAA/NCEPlt/dccreatorgt
ltdcsubjectgtNCEP Eta Model data Real-time
datalt/dcsubjectgt ltdcdescriptiongt This
collection of real-time NOAA/NCEP Eta model data
contains five days worth of data. The data is on
a 80km CONUS grid (GRIB grid 211). Daily 00Z and
12Z runs are available where each dataset
includes analysis data and forecast data from a
single Eta run. Each dataset contains forecasts
for every 6 hours going out two and a half days
(60hrs) from the run time. lt/dcdescriptiongt
23THREDDS Catalogs
THREDDS catalogs are for communicating
information about a collection of datasets
- Hierarchal structure of datasets
- Dataset access methods
- Structure on which to hang (reference) metadata
1
0..
0..
0..
0..
24THREDDS DQC(Dataset Query Capabilities)
- THREDDS DQC documents describe how a subset of a
data collection can be requested. - Large and time varying data collections are
cumbersome to view as a hierarchical structure - THREDDS DQC documents describes the set of
requests that can be made to one or more DQC
services and the form of those requests. - THREDDS DQC documents are an abstract
representation of a collection of datasets
25THREDDS DQCSubsetting Large Collections
26THREDDS DQC
lt?xml version"1.0" encoding"UTF-8"?gt ltqueryCapab
ility name"Unidata IDD NEXRAD Level 3 Radar
Data" version"0.2"gt ltquery
base"http//motherlode.ucar.edu/cgi-bin/thredds/R
adarServer.pl"
construct"append" returns"catalog"/gt
ltselectStation id"station" title"Stations"
multiple"true" required"true"gt
ltstation name"ANCHORAGE/Bethel AK" value"ABC"gt
ltlocation latitude"60.78"
longitude"-161.87"/gt lt/stationgt
lt/selectStationgt ltselectList
id"product" title"Products" multiple"true"
required"true"gt ltchoice name".5
reflectivity .54nm res" value"N0R"
description".5 reflectivity .54nm res
16 levels id 19/r"/gt
lt/selectListgt ltselectList id"time"
title"Times" required"true"gt ltchoice
name"Latest" value"latest"/gt
lt/selectListgt lt/queryCapabilitygt
27THREDDS Services
- THREDDS catalogs are sources of information about
a collection of data on top of which complex
services can be built. For instance, tools that - Provide interoperability with GIS systems
- Supply external discovery systems with needed
information (e.g., Dublin Core, DIF, FGDC) - Supply information to improve data display and
analysis, e.g., geolocation information
28THREDDS and Discovery Systems
- To supply external discovery services with the
information they require, we need - The proper information added to a catalog, e.g.,
title and description of a dataset, spatial and
temporal ranges, parameters, dataset ID. - Service to provide metadata in desired encoding
- Service to feed information to discovery system
- Use discovery systems to search for data
29THREDDS and Discovery Systems
Communicate with Discovery Systems
THREDDS Services with data server
Discovery System (e.g., DLESE)
Dublin Core Generator
Metadata Harvester
Searches
Reads
Catalog
Writes
Metadata Repository
References
Data server
30Search and Discovery Services
31THREDDS Status
- Working on new versions of the catalog and DQC
schemas - Working on updating existing tools to use new
schemas - Working with UCAR DMWG and NCAR CDP on enhancing
descriptive metadata - Working with OPeNDAP developers on integrating
THREDDS and OPeNDAP
32OPeNDAP and THREDDS
- Enhance OPeNDAP C implementation to serve
THREDDS catalogs - THREDDS DQC replace OPeNDAP File Servers
33OPeNDAP and THREDDSMore Information
- OPeNDAP Web page http//www.unidata.ucar.edu/pack
ages/dods/ - OPeNDAP Email list dods_at_unidata.ucar.edu,
subscribe at http//www.unidata.ucar.edu/packages/
dods/home/mailLists/ - THREDDS Email list thredds_at_unidata.ucar.edu,
subscribe at http//www.unidata.ucar.edu/projects/
THREDDS/maillists/ - THREDDS Web page http//www.unidata.ucar.edu/proj
ects/THREDDS/ - Support questions support_at_unidata.ucar.edu