Title: AHM04: Sep 2004 Nottingham
1eMinerals Environment from the Molecular Level
Managing simulation data
- Lisa Blanshard
- e- Science Data Management Group
- CCLRC Daresbury Laboratory, UK
2University of Reading
Royal Institution
3(No Transcript)
4Discover
Search for a crystal structure
Retrieve
Download crystal structure data
Transform
Convert crystal data into format suitable for
application
Transfer
Transfer crystal structure to compute node
Analysis
Run calculation to perform some analysis on
crystal
Transform Results
Convert results into format suitable for storage
Results Storage
Transfer results to permanent archive
Catalogue
Catalogue the results
Publish
Make results available online
5CCLRC Data Portal
Discover
- search many data resources simultaneously
- uses CCLRC standard for scientific metadata XML
on the wire - download scientific datasets directly to own
machine for preparation - transfer datasets to compute node
Retrieve
Transform
Transfer
Analysis
Transform Results
Results Storage
Catalogue
Publish
6Discover
- data not necessarily in correct format for
application - cut paste
- conversion code
- common format e.g. CML
- some codes on project now produce CML
Retrieve
Transform
Transfer
Analysis
Transform Results
Results Storage
Catalogue
Publish
7Storage Resource Broker
Discover
- each institution wants to manage its own files
- however shared access desirable within project
- deployed SRB vaults at several locations
- coordinating SRB and database at CCLRC
- provides virtual file system each user has a
home directory - many different interfaces and APIs
- provides a personal workspace independent of
computational grid where users can upload there
input files - SRB vaults professionally managed and backed up
preventing loss of data - SRB provides sophisticated access control system
Retrieve
Transform
Transfer
Analysis
Transform Results
Results Storage
Catalogue
Publish
8Discover
- designated job submission nodes
- allows users to create simple scripts to download
input from SRB, run job on minigrid and transfer
results back to SRB - uses Condor-G as client to Globus running on
compute clusters - uses SRB S-commands to download and upload files
- so results are automatically in permanent archive
- however results are stored as generated
Retrieve
Transform
Transfer
Analysis
Transform Results
Results Storage
Catalogue
Publish
9Metadata Editor
Discover
- forms based web application to manually create
annotation for groups of files - files are grouped into datasets and datasets into
studies - each study holds details of investigators,
description of study, dates, key words or topics - datasets hold location of a directory of files in
SRB or on other file system - once entered metadata and files are available via
Data Portal
Retrieve
Transform
Transfer
Analysis
Transform Results
Results Storage
Catalogue
Publish
10Discover
Data Portal search across data resources
simultaneously
but have to link up more resources
Retrieve
Download data from Data Portal or Storage
Resource Broker
Transform
Probably have to change format of file manually
use CML for input / output some codes to address
this
Transfer
Transfer file to SRB
manually via SRB tools
Analysis
Script downloads input from SRB, runs job on grid
using Condor-G
Transform Results
Not yet tackled
some output in CML though
Results Storage
Script transfers results to SRB
results stored as they are either in text files
or CML
Catalogue
Metadata Editor - catalogue the results via web
form
need to generate metadata automatically
Publish
Results then available online via Data Portal
11Particular successes
- deployment of distributed data resources via SRB
- set up project RDBMS for metadata/catalogue info
and interfaces to add/edit metadata and searching - used CCLRC Multi-disciplinary Scientific Metadata
Format for transport of metadata - use of CML to format input/output to some codes
- integration with data and computation dedicated
nodes to submit jobs via Condor-G, scripts to
download input and upload results to SRB
12Issues to overcome
- many codes still input /output proprietary text
format - auto-generate metadata
- results stored as generated need to consider
more sophisticated data storage - further use of CML
- integrated portals for compute and data to manage
whole workflow - integrate more data resources for discovery
13Further Information
Environment from the Molecular Level http//www.e-
science.clrc.ac.uk/web/projects/eminerals http//e
minerals.org/ UK CCLRC e-Science
Centre http//www.e-science.clrc.ac.uk