LEAD Tools THRUST Group - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

LEAD Tools THRUST Group

Description:

Tool: ESML. Desc: External Structural Metadata and tools. Main Institution ... Metadata Formats and Tools. Tool: THREDDS. Desc: Data discovery and access tools ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 49
Provided by: Kei183
Category:
Tags: lead | thrust | group | tools

less

Transcript and Presenter's Notes

Title: LEAD Tools THRUST Group


1
LEAD Tools THRUST Group
Slides and Background Materialsfor LEAD All
Hands Meeting June 2004 Presentation at the
meeting to be given by Sara Graves and Mohan
Ramamurthy
2
Need more than high level concepts?
3
Main Issues for Tools Thrust
  • Settling on the overall system architecture
    (driven by scenarios) with emphasis on component
    dependencies
  • Agreeing on a priority order and schedule for
    implementing the tools at each testbed
  • Defining input/output interfaces among the tools
    in order to develop an interoperable set
  • Establishing a mechanism for turning tools into
    web/grid services and then creating those
    services
  • Implementing the workflow orchestration that
    chains the tools together into useful end-to-end
    systems

4
Key Tools
  • ADAM for data mining
  • ADAS for data assimilation
  • Decoders for format transformation
  • IDV visualization
  • LDM/IDD for real-time data delivery
  • OPeNDAP, ESML/OPeNDAP, ADDE data services
  • THREDDS catalog generation and services
  • WRF model
  • gridFTP for file transfer

5
Comprehensive Mining Testbed ComponentsDynamic
event detection and response using LEAD testbed
technologies
Mesocyclone Detection
Event
Notify
3
email
Yes
Mining
Store Events In Data Pool
Application Data Access
Local NWS radar
5
Data File Locations and other metadata
6
Online Data Pool
DODS Server
THREDDS Catalog
IDV
generate
4
1
8
ESML
Other data
LDM
METAR
NEXRAD Cache
Data Pool Order and FTP Access
Decoded
2
ESML
NCEPModels
7
Data Access
WCS
Maps
WMS
Subsetted Data
Map Access
OGC Viewer
6
Mining Testbed Component Explanations
  • Test beds receive LEAD data sets via LDM from
    upstream sites.
  • The LDM nodes are configured to decode and cache
    the incoming data streams to the sitess online
    data archive.
  • Mesocyclone detection and other ADaM mining
    algorithms are run on the NEXRAD incoming stream
    in near-real time in an effort to quickly target
    developing weather situations. A notification
    service is triggered by the event detection to
    alert listeners and possibly automatically send
    them subscribed data sets based on the ongoing
    event.
  • The UAH Data Pool provides online access to
    global passive microwave data and LEAD regional
    data. The Data Pool provides OpenDAP (DODS), FTP
    and HTTP access to the data sets.
  • Applications will primarily utilize OpenDAP
    protocols for data access.
  • THREDDS catalogs are generated daily (or more
    often if necessary) to provide metadata to
    applications on the location of data sets.
  • The Data Visualization and Access workflow
    contains OpenGIS-compliant data services that
    provide public data access and visualization
    capabilities to data pool contents.
  • Users will receive notification of detected
    events and will be able to access the data
    through usual FTP/HTTP, applications such as IDV
    using OpenDAP protocols, and through OpenGIS
    protocols such as WMS and WCS.

7
Phased Implementation NeededBut with concurrent
work on some aspects of each phase
  • Define high level, comprehensive architecture(s)
    such as the mining diagram shown earlier
  • Start by building minimal end-to-end system (show
    later)
  • Define interfaces among tools to facilitate
    parallel work
  • Add remaining key tools
  • Convert tools to web services
  • Develop workflow orchestration for existing tools
  • Integrate tools and services into LEAD portal and
    MyLEAD
  • Incorporate web services and workflow into GRID
    framework

8
Initial End-to-End SystemWalk before we try to
run
7
WRFoutput to IDVinput Decoder
Eta to WRFinput Decoder
IDV
LDM
WRF Regional Model
9
Minimal System Component Explanations
  • Initial (3 month) Components
  • Test beds receive LEAD NCEP ETA data set via LDM.
  • The LDM nodes are configured to decode and cache
    the incoming data streams into a form suitable
    for initializing the WRF regional model and store
    the decoded datasets into the sitess online data
    archive.
  • WRF regional model reads the decoded data from
    the testbed archive and runs regional model.
  • WRF regional model output is stored into testbed
    data store.
  • Applications will initially utilize OpenDAP
    protocols for data access.
  • THREDDS catalogs are generated for all datasets
    on an ongoing basis
  • Initial visualization of datasets is via the IDV.
  • Next Steps (1 year)
  • Building on experience with minimal system,
    define interfaces for additional tools and
    datasets
  • Incorporate ADDE and ESML/OPeNDAP for serving
    other data types
  • Replace NCEP ETA initialization for WRF with
    ADAS-based true data assimilation system.
  • Utilize ADAM data mining for model trigger and
    guidance mechanism (as shown in the Comprehensive
    diagram.
  • Develop LEAD-specific visualization facilities
    for all data types.
  • Construct orchestrated workflow for minimal
    system components
  • Select several tools and develop web prototype
    web services versions

10
Advantages of Minimalist Approach
  • Gets end-to-end system running soon (can
    demonstrate now with Workstation ETA in place of
    WRF)
  • Clarifies remaining tool interface work
  • Provides examples of working tools for
    conversion to Web services
  • Establishes working testbed of tools for workflow
    orchestration group
  • Gives us working system to demonstrate and
    experiment with
  • Allows each group to make progress in own area of
    expertise in parallel

11
Unidata Focus
  • Tailor IDV for LEAD
  • Replace Unidata Workstation ETA with WRF
  • Modify current ETA decoder to create WRF
    initialization data
  • Create WRF output decoder for CF conventions
  • Support other testbeds implementing LDM,
    decoders, THREDDS, OPeNDAP, and ADDE
  • Work with OU on ADAS for initialization
  • Work with UAH to incorporate ADAM and
    ESML/OPeNDAP
  • Work with CS experts on conversion of tools to
    orchestrated services

12
Next Steps
  • Discuss and refine the suggested approach to
    tools planning
  • Agree on overall plan
  • Develop very specific plans and commitments for
    each set of tools developers for 3-month time
    frame
  • Develop specific plans for 1-year time frame
  • Get realistic tools implementation commitments
    from testbed sites
  • At 3-month intervals, each tools developer and
    testbed site reports on progress and revised
    goals for next 3 months
  • Based on progress reports and revised goals for
    each individual group, publish a revised overall
    plan annually

13
Component Tools and Services
  • Infrastructure Tools and Services
  • Data formats
  • RT data transport
  • On-Demand data transport
  • Metadata formats
  • Catalog services
  • Portal
  • Modeling and Analysis Tools
  • Assimilation
  • Models
  • Mining
  • Visualization

14
Infrastructure Tools and Services
  • Data formats
  • RT data transport
  • On-Demand data transport
  • Metadata formats
  • Catalog services
  • Portal

15
Data Formats and Representations
Tool netCDF Desc Network Common Data Form Main
Institution(s) Unidata Availability
Now Expected installations Some test
sites Prerequisites none
Tool McIDAS AREA files Main Institution(s)
Unidata? Availability Now? Expected
installations Some test sites Prerequisites none
Tool NcML Desc netCDF metadata in XML Main
Institution(s) Unidata Availability
Now Expected installations Some test
sites Prerequisites none
Tool GML Main Institution(s) OGC Availability
Now Expected installations Some
archives Prerequisites none
16
Initial Test Bed Data Sets
  • METARS (Meteorology Aviation Routine report)
  • 12-hourly upper-air balloon soundings (aka
    rawinsondes, radiosondes)
  • 5-minute ACARS
  • NEXRAD Level II
  • NEXRAD Level III
  • (GOES) visible and infrared imagery
  • Eta Forecast Model

Expect all products available via LDM.
Individual sites will decide how much of each to
cache. LDM ID Data Set -------------------------
--------------------------------------------------
------ IDS/DDPLUS METARS IDS/DDPLUS 12-hourly
balloon soundings PCWS 5-minute
ACARS NEXRD2 NEXRAD Level II NNEXRAD NEXRAD
Level III UNIWISC (GOES) visible and infrared
CONDUIT Eta Forecast Model
17
NetCDF
NetCDF (network Common Data Form) is an interface
for array-oriented data access and a library that
provides an implementation of the interface. The
netCDF library also defines a machine-independent
format for representing scientific data.
Together, the interface, library, and format
support the creation, access, and sharing of
scientific data. The netCDF software was
developed at the Unidata Program Center in
Boulder, Colorado.
18
NetCDF Markup Language (NcML)
NcML is an XML representation of netCDF metadata,
(roughly) the header information one gets from a
netCDF file with the "ncdump -h" command. NcML is
similar to the netCDF CDL (network Common data
form Description Language), except, of course, it
uses XML syntax.
19
McIDAS AREA
20
Geography Markup Language (GML)
Geography Markup Language is an XML grammar
written in XML Schema for the modelling,
transport, and storage of geographic information.
21
Real Time Data Systems and Transport Services
Tools LDM/IDD/Decoders Desc Data
Streaming/Transport Main Institution(s)
Unidata Availability Now Expected installations
All testbed sites Prerequisites None
22
IDD/LDM
23
On-Demand Web Data Transport Services
Tool OPeNDAP Main Institution(s)
URI Availability Now Expected installations All
archive sites Prerequisites Web server
Tool ADDE Main Institution(s) ? Availability
Now Expected installations Where
necessary Prerequisites ?
Tool OPeNDAP-ESML Server Main Institution(s)
UAH Availability Now (beta test) Expected
installations Data Archives Prerequisites
OPeNDAP Server
24
OPeNDAP (DODS) Server Architecture
DODS Client
Data Analysis Application
Data Set Specific DODS Server
Dataset
Internet via HTTP
DODS Lib
Data Set Specific DODS Server
Dataset
Local Dataset
25
DODS-ESML Server Architecture
ESML Descriptions
DODS Client
Data Analysis Application
DODS Server
Dataset
ESML
Dataset
Dataset
Internet via HTTP
DODS Lib
Data Set Specific DODS Server
Dataset
Local Dataset
26
ADDE
27
Metadata Formats and Tools
Tool ESML Desc External Structural Metadata and
tools Main Institution(s) UAH Availability
Now Expected installations Some test sites,
other tools Prerequisites none
Tool THREDDS Desc Data discovery and access
tools Main Institution(s) Unidata Availability
Now Expected installations Data Archives, other
tools Prerequisites none
Tool FGDC Desc Content metadata for geographic
data Main Institution(s) USGS Availability
Now Expected installations Data
Archives Prerequisites none
28
Thematic Realtime Environmental Distributed Data
Services
The mission of THREDDS is for students, educators
and researchers to publish, contribute, find, and
interact with data relating to the Earth system
in a convenient, effective, and integrated
fashion. Just as the World Wide Web and
digital-library technologies have simplified the
process of publishing and accessing multimedia
documents, THREDDS is building infrastructure
needed for publishing and accessing scientific
data in a similarly convenient fashion.
29
THREDDS Support for Distributed Data Servers
30
Distributed THREDDS Catalogs
31
Interoperability Problem
DATA FORMAT 3
DATA FORMAT 2
DATA FORMAT 1
FORMAT CONVERTER
READER 1
READER 2
APPLICATION
  • Requires specialized code for every format
  • Difficult to assimilate new data types
  • Makes applications tightly coupled to data
  • One possible solution - enforce a Standard Data
    Format
  • Not practical for legacy datasets

32
Interchange Technology Solution
DATA FORMAT 1
DATA FORMAT 3
DATA FORMAT 2
ESML FILE 2
ESML FILE 1
ESML FILE 3
ESML LIBRARY
APPLICATION
  • ESML (external metadata) files containing the
    structural description of the data format
  • Applications utilize these descriptions to
    interpret how to read data files resulting in
    data interoperability for applications

33
What is ESML?
  • It is a specialized markup language for Earth
    Science structural metadata based on XML
  • It is a machine-readable and -interpretable
    representation of the structure of any data file,
    regardless of data format (machine readable
    README)
  • ESML description files contain external metadata
    that can be generated by either data producer or
    data consumer (at collection, data set, and/or
    granule level)
  • ESML provides the benefits of a standard,
    self-describing data format (like HDF, HDF-EOS,
    netCDF, geoTIFF, ) without the cost of data
    conversion
  • ESML is the basis for core Interchange Technology
    that allows data/application interoperability

34
Catalog Services
Tool THREDDS Main Institution(s)
Unidata Availability Now Expected installations
Data Archives Prerequisites none
Tool MyLEAD Main Institution(s)
IU Availability ? Expected installations
? Prerequisites ?
Tool MCS Main Institution(s) ? Availability
Now Expected installations ? Prerequisites ?
35
Thematic Realtime Environmental Distributed Data
Services
The mission of THREDDS is for students, educators
and researchers to publish, contribute, find, and
interact with data relating to the Earth system
in a convenient, effective, and integrated
fashion. Just as the World Wide Web and
digital-library technologies have simplified the
process of publishing and accessing multimedia
documents, THREDDS is building infrastructure
needed for publishing and accessing scientific
data in a similarly convenient fashion.
36
myLEAD
myLEAD is an active catalog for scientific
metadata, with specialized facilities for
searching, content storage, data object
cataloging, and active engagement.
37
MCS
38
Portal Tools
Tool LEAD Portal Desc Web based Workflow
Application Main Institution(s) IU Availability
? Expected installations portal
host Prerequisites none
39
LEAD Portal
40
Modeling and Analysis Tools
  • Assimilation
  • Models
  • Mining
  • Visualization

41
Data Assimilation
Tools ADAS Desc Assimiliation
Transformations Main Institution(s)
OU Availability Now Expected installations Data
Modeling sites Prerequisites Domain knowledge
42
ARPS Data Analysis System (ADAS)
ADAS, the ARPS Data Analysis System, is a
3-dimensional weather analysis program. This
implementation is using Rapid Update Cycle (RUC)
forecasts from the National Center for
Environmental Prediction (NCEP) as background
fields. Oklahoma Mesonet , surface airways, NOAA
wind profiler , and radiosonde data are used in
the analysis.
43
Models
Tool WRF Desc Meteorological Model Main
Institution(s) OU Availability Now Expected
installations Modeling sites Prerequisites none?
44
Data Mining
Tools ADaM Desc Data Mining/Image Processing
Toolkit Main Institution(s) UAH Availability
Now Expected installations Data analysis
sites Prerequisites Mining expertise
45
ADaM System Overview
  • Developed by the Information Technology and
    Systems Center at the University of Alabama in
    Huntsville
  • Consists of over 75 interoperable mining and
    image processing components
  • Each component is provided with a C application
    programming interface (API), an executable in
    support of scripting tools (e.g. Perl, Python,
    Tcl, Shell)
  • ADaM components are lightweight and autonomous,
    and have been used successfully in a grid
    environment
  • ADaM has several translation components that
    provide data level interoperability with other
    mining systems (such as WEKA and Orange), and
    point tools (such as libSVM and svmLight)
  • Components include Python wrappers and web
    service interfaces are planned

46
Visualization
Tool IDV Desc Web based visualization Main
Institution(s) Unidata Availability
Now Expected installations Client
sites Prerequisites none
Tool OGC Web Map Services Desc Web based
visualization Main Institution(s) Unidata,
UAH Availability Now Expected installations
Data Servers Prerequisites none
47
Integrated Data Viewer
The Integrated Data Viewer (IDV) from Unidata is
a Java(TM)-based software framework for analyzing
and visualizing geoscience data. The IDV brings
together the ability to display and work with
satellite imagery, gridded data, surface
observations, balloon soundings, NWS WSR-88D
Level II and Level III RADAR data, and NOAA
National Profiler Network data, all within a
unified interface.
48
OGC Web Mapping Services
A Web Map Service (WMS) produces maps of
geo-referenced data. A particular WMS provider
in a distributed WMS network need only be the
steward of its own data collection. This stands
in contrast to vertically-integrated web mapping
sites that gather in one place all of the data to
be made accessible by their own private
interface.
Write a Comment
User Comments (0)
About PowerShow.com