Title: Developing SERVOGrid: eScience for Earthquake Simulation
1Developing SERVOGrid e-Science for Earthquake
Marlon Pierce Community Grids Lab Indiana
2Some slides to introduce myself
3What is Informatics?
- Informatics is...
- understanding the impact technology has on
people. - the development of new uses for technology.
- the application of information technology in the
context of another field. - http//www.informatics.indiana.edu/overview/what_i
4A Personal Example
- My graduate training is in computational
condensed matter physics. - I developed quantum Monte Carlo codes for
simulating helium physically adsorbed on
graphite. - My problems are not so much parallel computing,
but - Finding enough computing resources for parameter
space studies. - Managing lots and lots of data files and their
metadata. - How can I keep track of all the information I am
5A Personal Example, Cont.
- Price of success codes used by others in my
advisers group in their Ph. D. work, but - How do we remove quirks of file names, parameter
settings? - How can we simplify running the applications and
reduce the learning time. - How can we avoid wasted computing with incorrect
settings? - Using He-3 mass with He-4 potentials
- How can I share results with collaborators?
- These problems became more interesting to me and
have been the themes of my postgraduate career.
6What Have I Learned?
- Science Informatics should apply appropriate
information technologies and other tools to
science problems - Working scientists vary widely in their expertise
in tools. - Some technologies we will examine
- Web services, Web portals, Semantic Web
- My emphasis is on application of technologies.
- Must have a broad knowledge of available
technologies. - Must avoid reinventions.
- Avoid the Hammer A and Hammer B
anti-patterns. - Hammer Fallacy A all problems are nails.
- Hammer Fallacy B always use the big hammer.
7Now introduce servogrid as a science informatics
8SERVOGrid Solid Earth Research Virtual
- Grid Services and Portals to Support Earthquake
9First, explain the problems, give background
10Solid Earth Research Virtual Observatory (iSERVO)
- Web-services (portal) based Problem Solving
Environment (PSE) - Couples data with simulation, pattern recognition
software, and visualization software - Enable investigators to seamlessly merge multiple
data sets and models, and create new queries. - Data
- Spaced-based observational data
- Ground-based sensor data (GPS, seismicity)
- Simulation data
- Published/historical fault measurements
- Analysis Software
- Earthquake fault
- Lithospheric modeling
- Pattern recognition software
- Store simulated and observed data
- Archive simulation data with original simulation
code and analysis tools - Access heterogeneous distributed data through
cooperative federated databases - Couple distributed data sources, applications,
and hardware resources through an XML-based Web
Services framework. - Users access the services (and thus distributed
resources) through Web browser-based Problem
Solving Environment clients. - The Web services approach defines standard,
programming language-independent application
programming interfaces, so non-browser client
applications may also be built.
12SERVOGrid Basics
- Under development in collaboration with
researchers at JPL, UC-Davis, USC, and Brown
University. - Geoscientists develop simulation codes, analysis
and visualization tools. - We need a way to bind distributed codes, tools,
and data sets. - This is referred to as a Grid.
- We need a way to deliver it to a larger audience
- Instead of downloading and installing the code,
use it as a remote service.
13SERVOGrid Codes, Relationships
Elastic Dislocation Inversion
Viscoelastic FEM
Viscoelastic Layered BEM
Elastic Dislocation
Pattern Recognizers
Fault Model BEM
14SERVOGrid Application Descriptions
- Codes range from simple rough estimate codes to
parallel, high performance applications. - Disloc handles multiple arbitrarily dipping
dislocations (faults) in an elastic half-space. - Simplex inverts surface geodetic displacements
for fault parameters using simulated annealing
downhill residual minimization. - GeoFEST Three-dimensional viscoelastic finite
element model for calculating nodal displacements
and tractions. Allows for realistic fault
geometry and characteristics, material
properties, and body forces. - Virtual California Program to simulate
interactions between vertical strike-slip faults
using an elastic layer over a viscoelastic
half-space - RDAHMM Time series analysis program based on
Hidden Markov Modeling. Produces feature vectors
and probabilities for transitioning from one
class to another. - PARK Boundary element program to calculate fault
slip velocity history based on fault frictional
properties.a model for unstable slip on a single
earthquake fault. - Preprocessors, mesh generators
- Visualization tools RIVA, GMT
15Problems Data Access and Sharing, Code
- Codes all use custom text formats for describing
input and output. - Input and output data often combined with
code-specific information. - Number of iterations, array sizes, etc.
- Data files often created by hand from journals,
online repositories - Online repositories themselves use differing
formats - Challenges are to develop common data formats,
access services, and client query tools.
16Data Formats
- Faults, GPS or seismic data used in this project
are retrieved from different servers. - Supported seismic data formats
- Dinger-Shearer
- Haukkson
- Supported GPS data formats
17Next, present an overall architecture
18SERVOGrid Architecture
- Challenging problems like SERVOGrid are solved by
starting with the right architecture. - Implementations and tools may change.
- Having the right architecture and vision of the
solution allows flexibility with point solutions.
19Service Oriented Architectures
- SERVOGrid is built around the Service Oriented
Architecture Model. - W3C
- Constituent pieces
- Remotely accessible services
- Capabilities are defined through interface
definition languages (WSDL). - Accessible through messages and protocols (SOAP).
- Implementations may change but interfaces must
remain the same. - Client applications access remote services.
- Client hosting environments
- Web Portals are an example.
- Going beyond services
- Semantic descriptions for service and information
modeling. - Programming/orchestration tools for connecting
distributed services.
20Browser Interface
JSP Client Stubs
DB Service 1
Job Sub/Mon And File Services
Viz Service
Operating and Queuing Systems
Host 1
Host 2
Host 3
21Web Services
- Web services are the fundamental pieces of
distributed Service Oriented Architectures. - We should define lots of useful services that are
remotely available - Archival data access services supporting queries,
real time sensor access, and mesh generation all
seem to be popular choices. - Web services have two important parts
- Distributed services
- Client applications
- These two pieces are decoupled one can build
clients to remote services without caring about
the programming language implementation of the
remote service. - Java, C, Python
22Web Services, Continued
- Clients can be built in any number of styles
- We build portal clients ubiquitous, can combine
- One can build fancier GUI client applications.
- You can even embed Web service client stubs
(library routines) in your application code, so
that your code can make direct calls to remote
data sources, etc. - Regardless of the client one builds, the services
are the same in all cases - my portal and your application code may each use
the same service to talk to the same database. - So we need to concentrate on services and let
clients bloom as they may - Client applications (portals, GUIs, etc.) will
have a much shorter lifecycle than service
interface definitions, if we do our job
correctly. - Client applications that are locked into
particular services, use proprietary data formats
and wire protocols, etc., are at risk.
23SERVOGrid Required Services
- Computing Grid services
- Remote command execution/job submission, file
transfer, job monitoring. - These services
- We may develop these using any number of toolkits
- Globus, Apache Axis, GSoap.
- Data Grid services
- Access data bases and other data sources (faults,
GPS, Seismic records). - Information Grid services
- Metadata management
24Here follows some descriptions about building
25Execution Grid Service Examples(with Ahmet Sayar)
- Simplest of these just run remote execution
calls. - More interesting combining several services into
a single meta-service. - Run Disloc, when done move the output from darya
to danube, generate a PDF image of the output
using GMT, then pull the output back to the
client browser for display. - Expressing these workflows in languages is an
active area. - Simple solution Apache Ant build tool.
- Not a full fledged programming language, but it
can do most of the workflow problems I encounter,
and is easy to extend. - Tasks are expressible in XML, so you can build
authoring tools to hide antisms and validate
scripts. - Open source and because it is generally
applicable, likely to outlive most workflow tools.
26Templating Applications and Generating Interfaces
- Users fill in ant templates through web forms
- Ant execution services then invoke scripts.
- Ant is a good way to wrap applications.
- Ant template authoring tools simplify deployment
of new wrapped services. - Ant scripts also can be used to automate user
interface generation.
Figure Here
27Some Screen Shots of Prototype
28SERVOGrid Data Services(with Galip Aydin)
- SERVO applications need real data sources
- Online GPS and Seismic Activity catalogs
- Lots of different formats.
- Typically, a geoscientist downloads a catalog by
hand, prunes out the undesired parts with
scripts, and then runs analysis code. - Data services that unify formats and support
database queries is obviously useful.
29Data Sources
- A summary of all supported formats can be found
here - http//grids.ucs.indiana.edu/gaydin/servo
- Information about supported seismicity catalog
formats can be found in http//www.scecdc.scec.or
g/catalogs.html - Information about supported GPS data formats can
be found in http//www.scign.org - Future step directly grab data from sensors
- Ric McMullen, Knowledge Acquisition Lab
30GML Schemas as Data Models for Services
- Fault and GPS Schemas are based on GML-Feature
object. - Seismicity Schema is based on GML-Observation
object. - Working schema available from http//grids.ucs.ind
31Metadata Management
- Common problems in computational science
- Where are the input and output files? When was
this created? What parameters did I use to create
this output? What version of the code? Is there
a validation scenario for this code? - These are all metadata problems.
32Context Management Service
- Metadata may be organized into tree-like
structures (see figure). - Context nodes hold one or more leaves and nodes.
- Leaves are name/value pairs.
- We usually need to create arbitrary trees.
- Represent with recursive XML schema.
- Search with XPath.
- Context data storage is implementation dependent
but service interface is independent.
Figure here
33Context Manager Service Architecture
Axis Servlet
Shared WSDL Interface
Context Manager
Internal Communication
Context Data
34Now Describe portals
35Grid Client Environments
- The services we have previously described are
headless. - WSDL descriptions are all you need to create
client stubs (if not client applications). - Clients to services can built with anything
- Java, Python, .NET GUIs
- Browser clients an extremely common example.
- Web Portals
- Client Hosting Environments
36Computational Web Portal Stack
- Web service dream is that core services, service
aggregation, and user inteface development
decoupled. - How do I manage all those user interfaces?
- Use portlets.
Aggregate Portals
User Interfaces
Application Web Services and Workflow
Core Web Services
37Portal Architecture
Clients (Pure HTML, Java Applet ..)
Aggregation and Rendering
Portlet ClassWebForm
Gateway (IU)
Remoteor ProxyPortlets
Portlet ClassIFramePortlet
Data Stores
Portlet ClassJspPortlet
GridPort etc.
Portlet ClassVelocityPortlet
(Java) COG Kit
Hierarchical arrangement
Jetspeed Internal Services
Portal Portlets
38Open Grid Computing Environment Collaboratory
- University of Chicago
- Gregor von Laszewski
- University of Illinois/NCSA
- Jay Alameda
- Joe Futrelle
- Indiana University/Community Grids Lab and CS
- Marlon Pierce
- Geoffrey Fox
- Dennis Gannon
- Beth Plale
- University of Michigan
- Charles Severance
- Joseph Hardin
- University of Texas/TACC
- Mary Thomas
- Jay Boisseau
39What Are Grid Portals?
- Computing portals provide ubiquitous,
browser-based access to grid resources. - No special client software or platform needed
- Access information in visually intuitive form
- Provide services to support user interactions
- Job archiving?portal metadata management services
- Combine core grid services into custom services
- Launch multistage jobs with dependencies
- Couple execution, file transfer,
visualization/analysis - Many, many such projects
- Concurrency and Computation Practice and
Experiences special issue described more than
two dozen in 2001. - GCE Research Group of the GGF is the community
forum. - Thomas, Gannon, and Fox are chairs.
40General Portal Architectures
41What Are the Problems?
- NMI team members have worked in various
combinations on other projects - Alliance Portal (Gannon, PI)
- SciDAC Fusion Portal (Thomas, PI)
- DOD Computing Portal (Thomas, PI)
- SciDAC CMCS (contributions from Severance,
Hardin, and von Laszewski). - Problems are always the same
- How do we share portal services?
- How do we reuse components between projects and
groups? - Can we provide a standard abstraction for portal
services and interfaces? - Can we provide an architecture that allows
services and user interface components to be
added in a standard way? - Need to shorten the standard service deployment
phase so that we can concentrate on harder
problems, specific sophisticated services - Fusion Grid needs very interactive, visual
interface for setting up problems - Need to be able to deploy standard components
like MyProxy, GridFTP, etc interfaces quickly
42Portlets and Containers
- Provide a portal container/component system
- Portal components are called portlets
- Create a packaged, easy-to-install, customizable
portal system with standard, useful components. - Pick and choose from available functionality
- Support community extensions
- Plug useful contributions from other groups
- We base our system on Jakartas Jetspeed project
- JSR 168 (released this summer) standardizes
portlet systems - Commercial and open source implementations should
interoperate - WSRP will provide standard ways to build remote
43OGCE Initial Architecture
44Evolving Portal Architecture
45(No Transcript)
46Whats In the Release?
- A component-based portal container
- Jetspeed with CHEF enhancements, patches
- Will evolve to JSR 168 standards
- Portlet components and services
- Discussion boards, MOTDs, message boards, chat
- Calendar tools
- Newsgroups and citation/reference managers
- Grid information services (LDAP-based, GPIR)
- Portlet interfaces to MyProxy credential
management - Portlet interfaces to GridFTP
- Scheduled for release, SC2003
47Deliverables Science Portal Tools
- Will concentrate on science applications
- Provide services and examples for building
Science Portals - Deliverables include
- Application Manager Web Service with sample
application - Portlets for IU Extreme Labs Application tools
Application Factories, XEvents, Xdirectory,
Xbooks - Portlets and services for QuakeSim Earthquake
simulation - Metadata repository user interfaces and services
48Deliverables Portal Collaboratory
- We will use our own tools to provide a community
portal - Provide information, collaboration for portal
building community - Demonstrate capabilities
- Provide a repository for community contributions
- CMCS, NEESGrid, GridLab, and others
49QuakeSim Portal Shots
50Making SERVO Semantic
- Application of Semantic Web tools and concepts to
51Where Is the Semantic Web?(with Mehmet Aktas)
- Last summer I went on a quest to find this
somewhat elusive entity. - What I found lots of great ideas, even a few
implementations, but - Too much semantic, not enough web.
- My conclusion it really needs some driving
applications and more distributed computing
infrastructure. - Driving application Scientific Metadata
52Semantic Web in One Slide
RDF provides a subject/predicate/value syntax.
Predicates and values are URIs.
53Semantic Needs for SERVOGrid
- SERVOGrid has many types of metadatalittle
ontologies - Computing resources
- Applications
- Data
- Services
- I have designed XML schemas and built services
for this sort of metadata before, but they were
too monolithic. - RDF has an interesting way of expressing linkages
between different RDF fragments. - If we can exploit this, it will make for much
more flexible metadata services.
54A SERVOGrid Ontology
55Making It Work
- One of the problems we encountered with
processing RDF metadata is that tools assume all
data is local. - What we really have though are metadata fragments
scattered throughout SERVOGrid. - Need ways of processing RDF triplets when
predicate values are not local.