Title: IRIS/UNAVCO Web Services Workshop
1QuakeSim/iSERVO
- IRIS/UNAVCO Web Services Workshop
- Andrea Donnellan Jet Propulsion Laboratory
- June 8, 2005
2Quakesim
- Under development in collaboration with
researchers at JPL, UC Davis, USC, UC Irvine, and
Brown University.
- Geoscientists develop simulation codes, analysis
and visualization tools. - Need a way to bind distributed codes, tools, and
data sets. - Need a way to deliver it to a larger audience
- Instead of downloading and installing the code,
use it as a remote service.
3Objective
Develop real-time, large-scale, data assimilation
grid implementation for the study of earthquakes
that will
- Assimilate distributed data sources and complex
models into a parallel high-performance
earthquake simulation and forecasting system - Simplify data discovery, access, and usage from
the scientific user point of view - Provide capabilities for efficient data mining
4QuakeSim Portal Examples
5Philosophy
- Store simulated and observed data
- Archive simulation data with original simulation
code and analysis tools - Access heterogeneous distributed data through
cooperative federated databases - Couple distributed data sources, applications,
and hardware resources through an XML-based Web
Services framework. - Users access the services (and thus distributed
resources) through Web browser-based Problem
Solving Environment clients. - The Web services approach defines standard,
programming language-independent application
programming interfaces, so non-browser client
applications may also be built.
6Five Components of QuakeSim
- Web Services
- Indiana University (Geoffrey Fox and Marlon
Pierce) - Metadata Services and Federated Database System
- USC (Dennis McLeod)
- Data Assimilation Infrastructure
- JPL (Jay Parker, Greg Lyzenga), UC Davis (John
Rundle) - Datamining Infrastructure
- JPL (Robert Granat), UC Davis (John Rundle)
- High Performance Modeling Software (FEM, BEM)
- JPL (Jay Parker, Greg Lyzenga, Charles Norton)
- UC Davis (John Rundle)
- Brown (Terry Tullis)
7Geographic Distribution
complexity.ucs.indiana.edu 8 processor Sun
Sunblade server. This runs the portal.
CSEBEO parallel Beowulf cluster that currently
has 22 opteron nodes - runs Virtual California
for data assimilation, as well as other codes.
Indiana U. (Web Services)
kamet, danube, darya.ucs.indiana.edu dual
(duel) processor linux hosts with various code
services (GeoFEST, patterninfo, RDAHMM, Virtual
California).
UC Davis (Data Assimilation)
JPL (Lead)
USC ( Federated Database)
gf1.ucs.indiana.edu a 4 processor linux server.
This hosts the current QuakeTables DB and the
Web Feature Service
siro-lab.usc.edu information management
development and storage platform. On line June
2005.
gf2.ucs.indiana.edu a 4 processor linux server.
This hosts various code services services.
jabba.jpl.nasa.gov 8 processor SGI runs Riva and
web services for making movies
infogroup.usc.edu This was the database server
orion.jpl.nasa.gov 64 processor linux cluster
runs GeoFEST
grids.ucs.indiana.edu a Sun server that runs
Disloc and Simplex services
losangeles.jpl.nasa.gov 8 processor runs GeoFEST
8Web Services
- Build clients in the following styles
- Portal clients ubiquitous, can combine
- Fancier GUI client applications
- Embed Web service client stubs (library routines)
into application code - Code can make direct calls to remote data
sources, etc. - Regardless of the client one builds, the services
are the same in all cases - my portal and your application code may each use
the same service to talk to the same database. - So we need to concentrate on services and let
clients bloom as they may - Client applications (portals, GUIs, etc.) will
have a much shorter lifecycle than service
interface definitions, if we do our job correctly - Client applications that are locked into
particular services, use proprietary data formats
and wire protocols, etc., are at risk
9SERVO Grid
Solid Earth Research Virtual Observatory using
grid technologies and high-end computers
RepositoriesFederated Databases
Sensor Nets
Streaming Data
Database
Loosely Coupled Filters (Coarse Graining)
Analysis and Visualization
Closely Coupled Compute Nodes
10NASAs Main Interest
- Developing the necessary data assimilation and
modeling infrastructure for future InSAR missions.
InSAR is the fourth component of EarthScope
11iSERVO Web Services
- Job Submission supports remote batch and shell
invocations - Used to execute simulation codes (VC suite,
GeoFEST, etc.), mesh generation (Akira/Apollo)
and visualization packages (RIVA, GMT). - File management
- Uploading, downloading, backend crossloading
(i.e. move files between remote servers) - Remote copies, renames, etc.
- Job monitoring
- Apache Ant-based remote service orchestration
- For coupling related sequences of remote actions,
such as RIVA movie generation. - Database services support SQL queries
- Data services support interactions with
XML-based fault and surface observation data. - For simulation generated faults (i.e. from
Simplex) - XML data model being adopted for common formats
with translation services to legacy formats. - Migrating to Geography Markup Language (GML)
descriptions.
12Our Approach to Building Grid Services
- There are several competing visions for Grid Web
Services. - WSRF (US) and WS-I (UK) are most prominent
- We follow the WS-I approach
- Build services on proven basic standards (WSDL,
SOAP, UDDI) - Expand this core as necessary
- GIS standards implemented as Web Services
- Service orchestration, lightweight metadata
management
13Grid Services Approach
- We stress innovative implementations
- Web Services are essentially message-based.
- SERVO applications require non-trivial data
management (both archives and real-time streams). - We can support both streams and events through
NaradaBrokering messaging middleware. - HPSearch uses and manages NaradaBrokering events
and data streams for service orchestration. - Upcoming improvements to the Web Feature Service
will be based on streaming to improve
performance. - Sensor Grid work is being based on
NaradaBrokering. - Core NaradaBrokering development stresses the
support for Web Service standards - WS-Reliability, WS-Eventing, WS-Security
14NaradaBrokeringManaging Streams
- NaradaBrokering
- Messaging infrastructure for collaboration,
peer-to-peer and Grid applications - Implements high-performance protocols (message
transit time of 1 to 2 ms per hop) - Order-preserving, optimized message transport
with QoS and security profiles for sent and
received messages - Support for different underlying protocols such
as TCP, UDP, Multicast, RTP - Discovery Service to locate nearest brokers
15HPSearchArchitecture Diagram
Files Sockets Topics
Network Protocol
DataBase
JDBC
Web Service
SOAP/HTTP
SOAP/HTTP
HPSearch Control Events using PUB/SUB on
predefined topic
Data buffers sent / received as Narada Events
. . .
HPSearch Kernel
16Problem Solving Environment
High-level architecture showing grids, portals,
and grid computing environments.
Loosely coupled systems that use asynchronous
message exchanges between distributed services
17SERVOGrid Application Descriptions
- Codes range from simple rough estimate codes to
parallel, high performance applications. - Disloc handles multiple arbitrarily dipping
dislocations (faults) in an elastic half-space. - Simplex inverts surface geodetic displacements
for fault parameters using simulated annealing
downhill residual minimization. - GeoFEST Three-dimensional viscoelastic finite
element model for calculating nodal displacements
and tractions. Allows for realistic fault
geometry and characteristics, material
properties, and body forces. - Virtual California Program to simulate
interactions between vertical strike-slip faults
using an elastic layer over a viscoelastic
half-space - RDAHMM Time series analysis program based on
Hidden Markov Modeling. Produces feature vectors
and probabilities for transitioning from one
class to another. - PARK Boundary element program to calculate fault
slip velocity history based on fault frictional
properties.a model for unstable slip on a single
earthquake fault. - Preprocessors, mesh generators
- Visualization tools RIVA, GMT
18SERVOGrid Behind the Scenes
Data can be stored and retrieved from the 3rd
part repository (Context Service)
WS Context (Tambora)
GPS Database (Gridfarm001)
NaradaBroker network Used by HPSearch engines
as well as for data transfer
WMS
Data Filter (Danube)
WMS submits script execution request (URI of
script, parameters)
Virtual Data flow
HPSearch hosts an AXIS service for remote
deployment of scripts
- PI Code Runner
- (Danube)
- Accumulate Data
- Run PI Code
- Create Graph
- Convert RAW -gt GML
GML (Danube)
Actual Data flow HPSearch controls the Web
services Final Output pulled by the WMS
19Federated Database
- Understand the meaning and format of
heterogeneous data sources and requirements of
simulation and analysis codes - Desire to interoperate various codes with various
information sources (subject to security) - Problem of semantic and naming conflicts between
various federated datasets - Discovery, management, integration and use of
data difficult - Presence of many large federated datasets in
seismology - Different interpretations and analysis of the
same datasets by different experts
Ontology-based federated information management
20Database Goal
- Support interoperation of data and software
- Support data discovery
- Semi-automatically extract ontologies from
federated datasets - Ontology concepts and inter-relationships
- Mine for patterns in data to discover new
concepts in these federated ontologies - Employ generalized geo-science ontology
- Sources Geon, GSL, Intellisophic,
21Database Approach
- A semi-automatic ontology extraction methodology
from the federated relational database schemas - Devising a semi-automated lexical database system
to obtain inter-relationships with users
feedback - Providing tools to mine for new concepts and
inter-relationships - Ontronic a tool for federated ontology-based
management, sharing, discovery - Interface to Scientist Portal
22Evaluation Plan
- Initially employ three datasets
- QuakeTables Fault Database (QuakeSim)
- The Southern California Earthquake Data Center
(SCEDC) - The Southern California Seismology Network (SCSN)
- From these large scale and inherently
heterogeneous and federated databases we are
evaluating - Semi-automatic extraction
- Checking correctness
- Evaluating mapping algorithm
23Where Is the Data?
- QuakeTables Fault Database
- SERVOs fault repository for California.
- Compatible with GeoFEST, Disloc, and
VirtualCalifornia - http//infogroup.usc.edu8080/public.html
- GPS Data sources and formats (RDAHMM and others).
- JPL ftp//sideshow.jpl.nasa.gov/pub/mbh
- SOPAC ftp//garner.ucsd.edu/pub/timeseries
- USGS http//pasadena.wr.usgs.gov/scign/Analysis/p
lotdata/ - Seismic Event Data (RDAHMM and others)
- SCSN http//www.scec.org/ftp/catalogs/SCSN
- SCEDC http//www.scecd.scec.org/ftp/catalogs/SCEC
_DC - Dinger-Shearer http//www.scecdc.org/ftp/catalogs
/dinger-shearer/dinger-shearer.catalog - Haukkson http//www.scecdc.scec.org/ftp/catalogs/
hauksson/Socal - This is the raw material for our data services in
SERVO
24Geographical Information System Services as a
Data Grid
- Data Grid components of SERVO are implemented
using standard GIS services. - Use Open Geospatial Consortium standards
- Maximize reusability in future SERVO projects
- Provide downloadable GIS software to the
community as a side effect of SERVO research. - Implemented two cornerstone standards
- Web Feature Service (WFS) data service for
storing abstract map features - Supports queries
- Faults, GPS, seismic records
- Web Map Service (WMS) generate interactive maps
from WFSs and other WMSs. - Maps are overlays
- Can also extract features (faults, seismic
events, etc) from user GUIs to drive problems
such as the PI code and (in near future) GeoFEST,
VC.
25Geographical Information System Services as a
Data Grid
- Built these as Web Services
- WSDL and SOAP programming interfaces and
messaging formats - You can work with the data and map services
through programming APIs as well as browser
interfaces. - Running demos and downloadable code are available
from www.crisisgrid.org. - We are currently working on these steps
- Improving WFS performance
- Integrating WMS clients with more applications
- Making WMS clients publicly available and
downloadable (as portlets). - Implementing SensorML for streaming, real-time
data.
26Screen Shot From the WMS Client
27When you select (i) and click on a feature in the
map
28WFS by the Numbers
- The following data is available in the SERVO Web
Feature Services - These were collected from public sites
- We have reformatted to GML
- Data
- Filtered GPS archive (297 stations) from
48.02MB - Point GPS archive (766 stations) 42.94MB
- SCEDC Seismic archive 34.83MB
- SCSN Seismic archive 26.34MB
- California Faults (from QuakeTables Fault DB)
62KB - CA Fault Segments (from QuakeTables Fault DB)
41KB - Boundaries of major European Cities 12.7KB
- European map data 636KB
- Global Seismic Events14.8MB
- US Rivers 11KB
- US Map-State Borders 1.13MB
- US State Capitals5.75KB
- WFS URLs
- http//gf1.ucs.indiana.edu7474/axis/services/wfs?
wsdl - http//gf1.ucs.indiana.edu7474/wfs/testwfs.jsp
29GEOFEST Northridge Earthquake Example
- Select faults from database
- Generate and refine mesh
- Run finite element code
- Receive e-mail with URL of movie when run is
complete
30GeoFEST FEM and Mesh Decomposition
1992 Landers earthquake finite element mesh
decomposed using PYRAMID. Colors indicate
partitioning among processors (64 in this run).
Partitions cluster near domain center due to the
high mesh density
that is used near the faults.
GeoFEST has been run for 60 million elements on
1024 processors (capable of larger problems)
31Virtual California
Simulations show b-values and clustering of
earthquakes in space and time similar to what is
observed. Will require numerous runs on
high-performance computers to study the behavior
of the system. Accessible through the portal.
1000 years of simulated earthquakes
32QuakeSim Users
- http//quakesim.jpl.nasa.gov
- Click on QuakeSim Portal tab
- Create and account
- Documentation can be found off the QuakeSim page
We are looking for friendly users for beta
testing (e-mail andrea.donnellan_at_jpl.nasa.gov if
interested) Coming soon Tutorial
classes quakesim_at_list.jpl.nasa.gov
33Ontronic Architecture
SCSN
Ontology DAG
Ontology Tree
Ontology Extractor
SCEDC
Quake Tables
Ontology Visualization API
Metadata Manager
Ontology Mapper
Updating metadata
Add Inter- relationship
Diverse Information Sources
Visualize ontology
Lexical Database
WordNet Wrapper
Jena API
LexicalDB Wrapper
Java Applet
import/ export
Ontronic Database
WordNet
RDF files
Client
Server
34Mapping Algorithm
Ontologies extracted from the Federated datasets
are denoted by Fi The Global ontology is denoted
by Gi For each Fi For each Concept Ci in
Fi Begin Try an exact string match to each
concept in Gi If no matches were found then
Lookup the Lexical database by Fi If
no results are found in this lookup then
Lookup WordNet for synonyms Si of Fi
Find the closest synonym to Fi in Si by string
matching If no synonyms were found
then Ask for user input on this
mapping Store this mapping in the
Lexical database Else
Store the mapping in the Lexical database
Else Store the mapping in the
Lexical database Else Store the mapping in
the Lexical database End
- A Standard ontology for the domain
- Extracting ontologies from the federated datasets
- e.g., using relational metadata to extract the
table and column names - (or file structures)
- Mapping and storing relationships
- Mapping algorithm
35Mapping Process
Domain Expert
Standard Ontology
Verify the mapping
Discover the best matches between 1. local
concept name 2. concept name of global
ontology using WordNet API and our lexical
database
Mapping local concepts and inter-relationships
to standardized ontology
WordNet
Ontronic
Lexical database
Extraction from database (relational, file)
databasen
database1
database2
database3
36Visual Ontology Manager in Ontronic
37Metadata and Information Services
- We like the OGC but their metadata and
information services are too specialized to GIS
data. - Web Service standards should be used instead
- For basic information services, we developed an
enhanced UDDI - UDDI provides registry for service URLs and
queryable metadata. - We extended its data model to include GIS
capabilities.xml files. - You can query capabilities of services.
- We added leasing to services
- Clean up obsolete entries when the lease expires.
- We are also implementing WS-Context
- Store and manage short-lived metadata and state
information - Store personalized metadata for specific users
and groups - Used to manage shared state information in
distributed applications - See http//grids.ucs.indiana.edu/maktas/fthpis/
38Service Orchestration with HPSearch
- GIS data services, code execution services, and
information services need to be connected into
specific aggregate application services. - HPSearch CGLs project to implement service
management - Uses NaradaBrokering to manage events and
stream-based data flow - HPSearch and SERVO applications
- We have integrated this with RDAHMM and Pattern
Informatics - These are classic workflow chains
- UC-Davis has re-designed the Manna code to use
HPSearch for distributed worker management as a
prototype. - More interesting work will be to integrate
HPSearch with VC. - This is described in greater detail in the
performance analysis presentation and related
documents. - See also supplemental slides.
39HPSearch and NaradaBrokering
- HPSearch uses NaradaBrokering to route data
streams - Each stream is represented by a topic name
- Components subscribe / publish to specified topic
- The WSProxy component automatically maps topics
to Input / Output streams - Each write (byte buffer) and
- byte read() call is mapped to a
NaradaBrokering event
40In Progress
- Integrate HPSearch with Virtual California for
loosely coupled grid application parameter space
study. - HPSearch is designed to handle, manage multiple
loosely coupled processes communicating with
millisecond or longer latencies. - Improve performance of data services
- This is the current bottleneck
- GIS data services have problems with non-trivial
data transfers - But streaming approaches and data/control channel
separation can dramatically improve this. - Provide support for higher level data products
and federated data storage - CGL does not try to resolve format issues in
different data providers - See backup slides for a list for GPS and seismic
events. - GML is not enough
- USCs Ontronic system researches these issues.
- Provide real time data access to GPS and other
sources - Implement SensorML over NaradaBrokering messaging
- Do preliminary integration with RDAHMM
- Improve WMS clients to support sophisticated
visualization