Title: Developing SERVOGrid: eScience for Earthquake Simulation
1Developing SERVOGrid e-Science for Earthquake
Simulation
Marlon Pierce Community Grids Lab Indiana
University
2Some slides to introduce myself
3What is Informatics?
- Informatics is...
- understanding the impact technology has on
people. - the development of new uses for technology.
- the application of information technology in the
context of another field. - http//www.informatics.indiana.edu/overview/what_i
s_informatics.asp
4A Personal Example
- My graduate training is in computational
condensed matter physics. - I developed quantum Monte Carlo codes for
simulating helium physically adsorbed on
graphite. - My problems are not so much parallel computing,
but - Finding enough computing resources for parameter
space studies. - Managing lots and lots of data files and their
metadata. - How can I keep track of all the information I am
generating?
5A Personal Example, Cont.
- Price of success codes used by others in my
advisers group in their Ph. D. work, but - How do we remove quirks of file names, parameter
settings? - How can we simplify running the applications and
reduce the learning time. - How can we avoid wasted computing with incorrect
settings? - Using He-3 mass with He-4 potentials
- How can I share results with collaborators?
- These problems became more interesting to me and
have been the themes of my postgraduate career.
6What Have I Learned?
- Science Informatics should apply appropriate
information technologies and other tools to
science problems - Working scientists vary widely in their expertise
in tools. - Some technologies we will examine
- Web services, Web portals, Semantic Web
- My emphasis is on application of technologies.
- Must have a broad knowledge of available
technologies. - Must avoid reinventions.
- Avoid the Hammer A and Hammer B
anti-patterns. - Hammer Fallacy A all problems are nails.
- Hammer Fallacy B always use the big hammer.
7Now introduce servogrid as a science informatics
application/challenge
8SERVOGrid Solid Earth Research Virtual
Observatory
- Grid Services and Portals to Support Earthquake
Science
9First, explain the problems, give background
10Solid Earth Research Virtual Observatory (iSERVO)
- Web-services (portal) based Problem Solving
Environment (PSE) - Couples data with simulation, pattern recognition
software, and visualization software - Enable investigators to seamlessly merge multiple
data sets and models, and create new queries. - Data
- Spaced-based observational data
- Ground-based sensor data (GPS, seismicity)
- Simulation data
- Published/historical fault measurements
- Analysis Software
- Earthquake fault
- Lithospheric modeling
- Pattern recognition software
11Philosophy
- Store simulated and observed data
- Archive simulation data with original simulation
code and analysis tools - Access heterogeneous distributed data through
cooperative federated databases - Couple distributed data sources, applications,
and hardware resources through an XML-based Web
Services framework. - Users access the services (and thus distributed
resources) through Web browser-based Problem
Solving Environment clients. - The Web services approach defines standard,
programming language-independent application
programming interfaces, so non-browser client
applications may also be built.
12SERVOGrid Basics
- Under development in collaboration with
researchers at JPL, UC-Davis, USC, and Brown
University. - Geoscientists develop simulation codes, analysis
and visualization tools. - We need a way to bind distributed codes, tools,
and data sets. - This is referred to as a Grid.
- We need a way to deliver it to a larger audience
- Instead of downloading and installing the code,
use it as a remote service.
13SERVOGrid Codes, Relationships
Elastic Dislocation Inversion
Viscoelastic FEM
Viscoelastic Layered BEM
Elastic Dislocation
Pattern Recognizers
Fault Model BEM
14SERVOGrid Application Descriptions
- Codes range from simple rough estimate codes to
parallel, high performance applications. - Disloc handles multiple arbitrarily dipping
dislocations (faults) in an elastic half-space. - Simplex inverts surface geodetic displacements
for fault parameters using simulated annealing
downhill residual minimization. - GeoFEST Three-dimensional viscoelastic finite
element model for calculating nodal displacements
and tractions. Allows for realistic fault
geometry and characteristics, material
properties, and body forces. - Virtual California Program to simulate
interactions between vertical strike-slip faults
using an elastic layer over a viscoelastic
half-space - RDAHMM Time series analysis program based on
Hidden Markov Modeling. Produces feature vectors
and probabilities for transitioning from one
class to another. - PARK Boundary element program to calculate fault
slip velocity history based on fault frictional
properties.a model for unstable slip on a single
earthquake fault. - Preprocessors, mesh generators
- Visualization tools RIVA, GMT
15Problems Data Access and Sharing, Code
Integration
- Codes all use custom text formats for describing
input and output. - Input and output data often combined with
code-specific information. - Number of iterations, array sizes, etc.
- Data files often created by hand from journals,
online repositories - Online repositories themselves use differing
formats - Challenges are to develop common data formats,
access services, and client query tools.
16Data Formats
- Faults, GPS or seismic data used in this project
are retrieved from different servers. - Supported seismic data formats
- SCSN
- SCEDC
- Dinger-Shearer
- Haukkson
- Supported GPS data formats
- JPL
- SOPAC
- USGS
17Next, present an overall architecture
18SERVOGrid Architecture
- Challenging problems like SERVOGrid are solved by
starting with the right architecture. - Implementations and tools may change.
- Having the right architecture and vision of the
solution allows flexibility with point solutions.
19Service Oriented Architectures
- SERVOGrid is built around the Service Oriented
Architecture Model. - W3C
- Constituent pieces
- Remotely accessible services
- Capabilities are defined through interface
definition languages (WSDL). - Accessible through messages and protocols (SOAP).
- Implementations may change but interfaces must
remain the same. - Client applications access remote services.
- Client hosting environments
- Web Portals are an example.
- Going beyond services
- Semantic descriptions for service and information
modeling. - Programming/orchestration tools for connecting
distributed services.
20Browser Interface
JSP Client Stubs
DB Service 1
Job Sub/Mon And File Services
Viz Service
JDBC
DB
Operating and Queuing Systems
RIVA
Host 1
Host 2
Host 3
21Web Services
- Web services are the fundamental pieces of
distributed Service Oriented Architectures. - We should define lots of useful services that are
remotely available - Archival data access services supporting queries,
real time sensor access, and mesh generation all
seem to be popular choices. - Web services have two important parts
- Distributed services
- Client applications
- These two pieces are decoupled one can build
clients to remote services without caring about
the programming language implementation of the
remote service. - Java, C, Python
22Web Services, Continued
- Clients can be built in any number of styles
- We build portal clients ubiquitous, can combine
- One can build fancier GUI client applications.
- You can even embed Web service client stubs
(library routines) in your application code, so
that your code can make direct calls to remote
data sources, etc. - Regardless of the client one builds, the services
are the same in all cases - my portal and your application code may each use
the same service to talk to the same database. - So we need to concentrate on services and let
clients bloom as they may - Client applications (portals, GUIs, etc.) will
have a much shorter lifecycle than service
interface definitions, if we do our job
correctly. - Client applications that are locked into
particular services, use proprietary data formats
and wire protocols, etc., are at risk.
23SERVOGrid Required Services
- Computing Grid services
- Remote command execution/job submission, file
transfer, job monitoring. - These services
- We may develop these using any number of toolkits
- Globus, Apache Axis, GSoap.
- Data Grid services
- Access data bases and other data sources (faults,
GPS, Seismic records). - Information Grid services
- Metadata management
24Here follows some descriptions about building
services
25Execution Grid Service Examples(with Ahmet Sayar)
- Simplest of these just run remote execution
calls. - More interesting combining several services into
a single meta-service. - Run Disloc, when done move the output from darya
to danube, generate a PDF image of the output
using GMT, then pull the output back to the
client browser for display. - Expressing these workflows in languages is an
active area. - Simple solution Apache Ant build tool.
- Not a full fledged programming language, but it
can do most of the workflow problems I encounter,
and is easy to extend. - Tasks are expressible in XML, so you can build
authoring tools to hide antisms and validate
scripts. - Open source and because it is generally
applicable, likely to outlive most workflow tools.
26Templating Applications and Generating Interfaces
- Users fill in ant templates through web forms
- Ant execution services then invoke scripts.
- Ant is a good way to wrap applications.
- Ant template authoring tools simplify deployment
of new wrapped services. - Ant scripts also can be used to automate user
interface generation.
Figure Here
27Some Screen Shots of Prototype
28SERVOGrid Data Services(with Galip Aydin)
- SERVO applications need real data sources
- Online GPS and Seismic Activity catalogs
- Lots of different formats.
- Typically, a geoscientist downloads a catalog by
hand, prunes out the undesired parts with
scripts, and then runs analysis code. - Data services that unify formats and support
database queries is obviously useful.
29Data Sources
- A summary of all supported formats can be found
here - http//grids.ucs.indiana.edu/gaydin/servo
- Information about supported seismicity catalog
formats can be found in http//www.scecdc.scec.or
g/catalogs.html - Information about supported GPS data formats can
be found in http//www.scign.org - Future step directly grab data from sensors
- Ric McMullen, Knowledge Acquisition Lab
30GML Schemas as Data Models for Services
- Fault and GPS Schemas are based on GML-Feature
object. - Seismicity Schema is based on GML-Observation
object. - Working schema available from http//grids.ucs.ind
iana.edu/gaydin/schemas/
31Metadata Management
- Common problems in computational science
- Where are the input and output files? When was
this created? What parameters did I use to create
this output? What version of the code? Is there
a validation scenario for this code? - These are all metadata problems.
32Context Management Service
- Metadata may be organized into tree-like
structures (see figure). - Context nodes hold one or more leaves and nodes.
- Leaves are name/value pairs.
- We usually need to create arbitrary trees.
- Represent with recursive XML schema.
- Search with XPath.
- Context data storage is implementation dependent
but service interface is independent.
Figure here
33Context Manager Service Architecture
Client
SOAP/HTTP
Axis Servlet
Shared WSDL Interface
Context Manager
Internal Communication
Context Data
FS
XMLDB
34Now Describe portals
35Grid Client Environments
- The services we have previously described are
headless. - WSDL descriptions are all you need to create
client stubs (if not client applications). - Clients to services can built with anything
- Java, Python, .NET GUIs
- Browser clients an extremely common example.
- Web Portals
- Client Hosting Environments
36Computational Web Portal Stack
- Web service dream is that core services, service
aggregation, and user inteface development
decoupled. - How do I manage all those user interfaces?
- Use portlets.
Aggregate Portals
User Interfaces
Application Web Services and Workflow
Core Web Services
37Portal Architecture
Clients (Pure HTML, Java Applet ..)
Aggregation and Rendering
Portlet ClassWebForm
Gateway (IU)
Web/Gridservice
Computing
Remoteor ProxyPortlets
Portlet ClassIFramePortlet
Web/Gridservice
Data Stores
Portlet ClassJspPortlet
GridPort etc.
Web/Gridservice
Instruments
Portlet ClassVelocityPortlet
(Java) COG Kit
Hierarchical arrangement
Jetspeed Internal Services
LocalPortlets
Clients
Portal Portlets
Libraries
Services
Resources
(Jetspeed)
38Open Grid Computing Environment Collaboratory
Members
- University of Chicago
- Gregor von Laszewski
- University of Illinois/NCSA
- Jay Alameda
- Joe Futrelle
- Indiana University/Community Grids Lab and CS
- Marlon Pierce
- Geoffrey Fox
- Dennis Gannon
- Beth Plale
- University of Michigan
- Charles Severance
- Joseph Hardin
- University of Texas/TACC
- Mary Thomas
- Jay Boisseau
39What Are Grid Portals?
- Computing portals provide ubiquitous,
browser-based access to grid resources. - No special client software or platform needed
- Access information in visually intuitive form
- Provide services to support user interactions
- Job archiving?portal metadata management services
- Combine core grid services into custom services
- Launch multistage jobs with dependencies
- Couple execution, file transfer,
visualization/analysis - Many, many such projects
- Concurrency and Computation Practice and
Experiences special issue described more than
two dozen in 2001. - GCE Research Group of the GGF is the community
forum. - Thomas, Gannon, and Fox are chairs.
40General Portal Architectures
41What Are the Problems?
- NMI team members have worked in various
combinations on other projects - Alliance Portal (Gannon, PI)
- SciDAC Fusion Portal (Thomas, PI)
- DOD Computing Portal (Thomas, PI)
- SciDAC CMCS (contributions from Severance,
Hardin, and von Laszewski). - Problems are always the same
- How do we share portal services?
- How do we reuse components between projects and
groups? - Can we provide a standard abstraction for portal
services and interfaces? - Can we provide an architecture that allows
services and user interface components to be
added in a standard way? - Need to shorten the standard service deployment
phase so that we can concentrate on harder
problems, specific sophisticated services - Fusion Grid needs very interactive, visual
interface for setting up problems - Need to be able to deploy standard components
like MyProxy, GridFTP, etc interfaces quickly
42Portlets and Containers
- Provide a portal container/component system
- Portal components are called portlets
- Create a packaged, easy-to-install, customizable
portal system with standard, useful components. - Pick and choose from available functionality
- Support community extensions
- Plug useful contributions from other groups
- We base our system on Jakartas Jetspeed project
- JSR 168 (released this summer) standardizes
portlet systems - Commercial and open source implementations should
interoperate - WSRP will provide standard ways to build remote
portlets
43OGCE Initial Architecture
44Evolving Portal Architecture
45(No Transcript)
46Whats In the Release?
- A component-based portal container
- Jetspeed with CHEF enhancements, patches
- Will evolve to JSR 168 standards
- Portlet components and services
- Discussion boards, MOTDs, message boards, chat
- Calendar tools
- Newsgroups and citation/reference managers
- Grid information services (LDAP-based, GPIR)
- Portlet interfaces to MyProxy credential
management - Portlet interfaces to GridFTP
- Scheduled for release, SC2003
47Deliverables Science Portal Tools
- Will concentrate on science applications
- Provide services and examples for building
Science Portals - Deliverables include
- Application Manager Web Service with sample
application - Portlets for IU Extreme Labs Application tools
Application Factories, XEvents, Xdirectory,
Xbooks - Portlets and services for QuakeSim Earthquake
simulation - Metadata repository user interfaces and services
48Deliverables Portal Collaboratory
- We will use our own tools to provide a community
portal - Provide information, collaboration for portal
building community - Demonstrate capabilities
- Provide a repository for community contributions
- CMCS, NEESGrid, GridLab, and others
49QuakeSim Portal Shots
50Making SERVO Semantic
- Application of Semantic Web tools and concepts to
SERVOGrid
51Where Is the Semantic Web?(with Mehmet Aktas)
- Last summer I went on a quest to find this
somewhat elusive entity. - What I found lots of great ideas, even a few
implementations, but - Too much semantic, not enough web.
- My conclusion it really needs some driving
applications and more distributed computing
infrastructure. - Driving application Scientific Metadata
52Semantic Web in One Slide
dry_at_stateu.edu
http//.../CMCS/Entry/1
dccreator
vcardEMAIL
http//.../People/DrY
dctitle
H20
vcardN
RDF provides a subject/predicate/value syntax.
Predicates and values are URIs.
vcardFamily
vcardGiven
53Semantic Needs for SERVOGrid
- SERVOGrid has many types of metadatalittle
ontologies - Computing resources
- Applications
- Data
- Services
- I have designed XML schemas and built services
for this sort of metadata before, but they were
too monolithic. - RDF has an interesting way of expressing linkages
between different RDF fragments. - If we can exploit this, it will make for much
more flexible metadata services.
54A SERVOGrid Ontology
55Making It Work
- One of the problems we encountered with
processing RDF metadata is that tools assume all
data is local. - What we really have though are metadata fragments
scattered throughout SERVOGrid. - Need ways of processing RDF triplets when
predicate values are not local.