Title: SCEC Community Modeling Environment CME System Architecture Discussion
1SCEC Community Modeling Environment (CME) System
Architecture Discussion
Phil Maechling 7 May 2003 maechlin_at_usc.edu http//
www.scec.org/cme
2Agenda
- Goals of meeting
- Computational Pathway Concepts
- System Requirements (Use Case)
- SCEC/CME Application Architecture
- Managing Computational Pathways
- Design Issues Overview
- Design Issue Details
3Goals of Meeting
- Interested in dialogue with group on design
issues relating to the SCEC/CME. - Want to focus on the SCEC/CME systems use of
technologies, as opposed to the technologies
themselves. - Trying to take advantage of knowledge and
experience of collaborators. Want groups
knowledge, experience, opinions, options, open
issues in handling design issues we face. - Agenda asks questions, but dialogue can extend
outside of these questions. - Will present many candidate solutions. Looking
for better solutions, within scope, cost,
staffing constraints of project. - Want to go away with more knowledge of
technological issues, references to work we
should know about, examples of similar work, new
issues to consider, but not necessarily decisions
on SCEC/CME system design.
4SCEC/CME Data Processing Model
Example of SCEC/CME Computational Pathway
Extract Seismogram
AWM Solver
Find Peak Acceleration
Plot Hazard Map
AWM Model Config File (5kb)
1 XML File with 3000 datapoints (100Kb)
9000 seismogram files (9000 x 0.5 Mb 4.5Gb)
1 JPEG (38Kb)
3 files of 4D Wavefield data (3x4GB 12 Gb)
5Computational Step Concept
Computational Step modeled like a dataflow
diagram Datatypes inputs, transforming program,
datatypes outputs
Hazard Map Value Dataset
GRD File
GMT Script
Resource Type Predicate Type
Resource Type
Resource anything you can reference with a
URI Predicate anything that can produce a
Resource
6Computational Pathway Concept
A Computational Pathway is series of
Computational steps connected together
Hazard Map Value Dataset
JPEG Image
GRD File
GMT Script
Grd-Image
7Example Scientist Use Case
- User Selects Calculation program (called a
predicate within the SCEC/CME) that they wish
to run. - SCEC/CME system displays a list of input
datatypes (called resource types by SCEC/CME)
required by predicate. - For each input resource type, system displays all
known instances of resources of that type.
(Presumably the resource instances are stored in
a digital library). - User selects resource instances they wish to use
for calculation. - User specifies Run.
- SCEC/CME system calls web service-based
predicate program with input parameters of
URLs to input resource. - Predicate program performs calculation, creates
output dataset (output resource instance), and
registers output resource with digital library.
8SCEC/CME Application Architecture
SCEC/CME Testbed Grid
Users Computer
SCEC/CME Testbed Portal
Apache AXIS
Apache Httpd
Apache Tomcat
Apache Tomcat
Web Service Implementations
HTML/HTTP
XML/SOAP
Apache Struts
Browser Based User Interface
JSPs
Digital Library I/F
Java Web-Service Client Action Methods
Globus Grid Scheduler
Digital Library I/F
mySQL RDBMS
Application Programs
9SCEC/CME Management of Pathways Using RDBMS
- SCEC/CME RDBMs contains lists of all known
PredicateTypes and ResourceTypes. (Note key
concepts of ResourceInstance versus
ResourceType). - SCEC/CME RDBMS contains information on what
input and output resource types are required by
each predicate type. - SCEC/CME RDBMS contains information about each
resource instance in the system. - SCEC/CME RDBMCS contains information about
metadata for each resource instance in the
system. - SCEC/CME RDBMS information contains processing
status information as computational steps are
completed in a computational pathway calculation.
10SCEC/CME RDBMS Schema
11SCEC/CME Design Issues Overview
- Design Issues Discussion
- Resource Identification
- Metadata Content
- Metadata Handling
- Web Services
- Grid Portal
12SCEC/CME Digital Resource IDs
- Unique Identifiers for Digital Resource
- We use the concept and term URI to mean a WWW
unique identifier for both a resource (dataset)
and a predicate (program). DOIs are a type of
URI. - We plan to store URIs for all resources, and all
predicates in our RDBMS. - We will generate our own URIs for each new
resource using a convention www.scec.org/Resource
TypeName/PK. Design allows us to use other URIs
if appropriate. - We will maintain a URI to URL mapping table. URL
may be updated as object is moved. - Not sure how to maintain URI to file mapping
except naming the file using the URI. Concerned
about over-writing non-unique datafile, and
breaking URI-URL mapping. - Interested in standard methods Global Persistent
Identifiers, Handles, DOIs or other approaches,
if there are advantage over generating our own
URI. - Interested in use of URL to identify predicates
(programs). Is this possible.
13SCEC/CME Metadata Content
- Metadata Content
- We are adopting a namevalue approach similar to
Java properties files, including dot notation to
get sub-sections of info. Names, descriptions,
example, types, and ranges are defined in a data
dictionary. We believe this format can be
converted to other format (e.g. xml) easily. - SeismicSimulation.SimulationSpace.Origin.Latitude
34.11 - SeismicSimulation.SimulationSpace.Origin.Longitude
-118.47 - SeismicSimulation.SimulationSpace.Origin.Depth
0 - SeismicSimulation.SimulationSpace.Origin.Datum
WGS-84 - SeismicSimulation.SimulationSpace.CoordinateSystem
.Description right-handed cartesian - SeismicSimulation.SimulationSpace.CoordinateSystem
.PositiveX - SeismicSimulation.SimulationSpace.CoordinateSystem
.PositiveY - SeismicSimulation.SimulationSpace.CoordinateSystem
.PositiveZ - SeismicSimulation.SimulationSpace.XDimension
20000 - SeismicSimulation.SimulationSpace.YDimension
15000 - SeismicSimulation.SimulationSpace.ZDimension
5000 - SeismicSimulation.PointsPerMinSWavelenth 5
- How can this type of metadata be stored in a
digital library. - Section (e.g. person info) is repeated
frequently. Can we reference others metadata
documents ? How is reference handled ? URI. - How can we determine if metadata definition is
complete/useful/well-done ?
14SCEC/CME Metadata Handling
- Metadata Handling
- Typically, every resource instance has metadata.
Metadata is saved as external file. Frequently
resource is a binary data file, so combining
metadata and data not possible. - We plan to generate a metadata file for every
resource instance. We will maintain a URI and URI
to URL mapping for every metadata file. - If metadata is stored in resource file, metadata
URI points to resource file. - Metadata file contains its own URI. Metadata file
contains URI of resource that it describes. - How can a program that reads a Metadata file know
how to reach a URI to URL resolver ?
15SCEC/CME Web Services
- Current concept is that every predicate is
defined behind a web service. This allows us to
specify a predicate with a URL. - Predicate services will store resulting resources
into digital library. They will also update
SCEC/CME RDBMS with status of resource instances. - Predicate web-services must schedule jobs to a
Globus Grid. How can this be done ? - Web-based assembly tool should not block while
computations are done. We need non-blocking calls
to Web Services, not JAX-RPC. - Plan to pass URLs of input resources to predicate
web-services. Is there are reason to pass URIs
instead. - Web-service inputs may be binary types.
Web-services may return binary types (e.g. JPEG
map). How can we pass binary types ? What
technologies do we need in addition to the ones
weve specified.
16SCEC/CME Grid Portal
- SCEC/CME requires web based user interface.
- SCEC/CME must queue jobs to Globus Grid in order
to meet computational requirements. - SCEC/CME must use Grid authentication prior to
running computational jobs, or prior to providing
access to resource. - What are current Grid Portal technologies ?
- What are dependencies of Grid Portal technologies
?
17SCEC/CME Design Discussion
- Conclusions
- Action Items
- Milestones
- Demonstrations
- Meeting Notes
- Adjournment