Title: The Chemical Knowledge Cycle
1The Chemical Knowledge Cycle
and its ramifications for e-Science Or the
other way round
- Jeremy Frey
- School of Chemistry
- University of Southampton, UK
2Talk
- The Comb-e-Chem Project
- Smart Lab
- National Crystallography Service
- Cluster Computing
- Dissemination Publication
3The Comb-e-Chem Project
- The exponential world of Combinatorial Synthesis
and High throughput analysis meets the
exponentially growing power of computing - FundingEPSRC, JISC, IBM, GSK, AZ, Southampton
4 5People
- Chemistry (Southampton Bristol)
- Mike Hursthouse, Chris Frampton, Jon Essex,
Jeremy Frey, Guy Orpen, Stephan Christensen,
Thomas Gelbrich, Sam Peppe, Hongchen Fu, Graham
Tizard, Suzanna Ward, Lefteris Danos, Jamie
Robinson, Kieron Talyor, Chris Woods, Rob
Gledhill - National Crystallography Service (NCS)
- Simon Coles, Mark Light, Ann Bingham, Peter
Horton - Electronics and Computer Science (Southampton)
- Dave De Roure, Luck Moreau, Mike Luck, Hugo
Mills, Graham Smith, Simon Miles, Nicky Harding,
Gareth Hughes, Nick Humphries, monica schraefel,
Terry Payne - It-Innovation (Southampton)
- Mike Surridge, Ken Meacham, Steve Taylor, Daren
Marvin - Statistics (Southampton)
- Alan Welsh, Sue Lewis, Ralph Manson, Dave Woods
- Rutherford Appleton Laboratory, Atlas Data Centre
- IBM Colin Bird, Syd Chapman
6Design (statistics)
Plan Access to data
Experiments Smart Labs
CombeChem Data and Knowledge Cycle End-to-End
Management
Literature
High Throughput measurement
Analysis Statistics
Dissemination E-Bank Data
7Plans
Small set of fixed plans
NCS
Variable plans, written by chemist (difficult!)
Continuum of plan types
Tea
Ad-hoc, implied by process execution
SHG
8A chemistry lab is a hostile environment without
much room to maneuver
what can be captured captured
automatically with sensors? what must rely on
manual annotation?
The chemist
The fume cupboard
9Competition for space
very precise scales - but not connected to any
recording device
10Industrial support
Big block to publication_at_source if its not
digital, its difficult to share
critical data entry
11By Making Tea!
Getting not just the what and how, but the why
12Making Tea design elicitation through analogy
- Developed and validated the analogy with chemists
- Gave us a way to ask questions that would not
otherwise have been possible - Let us maximize observation
- Gave us repeatability
- Derived rudiments of a process model, too
- Provided lingua franca with chemists
13Pervasive Grid Smart Flight
Tablet?
14Results
I can go anywhere and its, like, this is me and
my data. Its all there! Bang!
- In real use, chemists were able to record their
experiments - After about ten minutes of use, they forgot about
it as a new thing, and just used it
15Data model
Intended actions guide to chemist, or later
workflow
Plan
Measurements Processes Annotations
Process record
Service invocations Secure time-stamps etc
Provenance record
16Databases
- Database will become the key method of handling
all data - Metadata must be generated at inception and added
as data traverses the workflow - Version control, audit and backup handled at the
database level.
17(No Transcript)
18(No Transcript)
19Databases - Our experience
- What do you do when the actual users keep
changing their mind? - Is a traditional relational database suitable?
- Danger of re-enforcing scientific bias against
relational database for laboratory data. - RDF RDFS!
20(No Transcript)
21(No Transcript)
22(No Transcript)
23Lessons
- That we need two related ontologies
- Plan that are going to be done
- Record what was done
- Not necessarily the same thing
- Steps are added/repeated during the experiment
- Different annotations required for each
24Process Record Ontology
25NCS Grid Service Architecture
26The Grid Zone
- Security is fundamental
- Who is using our experiments
- Insulate them from each other and from the rest
of our institution - Process Role based security
- Use DMZ
- This combination creates a Grid Zone
27(No Transcript)
28(No Transcript)
29Cluster Computation
- Needed for Design of Experiments
- Stats computationally intensive
- Simulations
- Protein dynamics
- Clusters, Cycle Steeling
- Schools engagement e-Malaria
30What do we want to Compute?
- Combechem is compiling a large database of
molecules. The database contains the properties
of these molecules, e.g. their crystal structure
or solvent accessible surface area (SASA). Some
of these properties are measured from experiment
while others are calculated from simulations run
on the GRID.
- Comberobots continually scan the database for
empty fields. They can automatically submit
simulations to calculate any unknown properties.
These simulations run on the GRID by stealing the
spare cycles of a heterogeneous network of
computers.
31(No Transcript)
32The reality of working on the GRID
33Dissemination Publication
- A different approach is required to provide data
to the community - The grid provides the necessary medium
- What How do we want to make available
34Publication_at_SourceDissemination
Bibliography
Student
Journal
Professional Body Archive
Institution
Laboratory
35The Data Trail
Workflow
Raw data
Process
Derive
Model
Plot
Provenance
The graphical model of the workflow used as the
front end of a typical workflow enactor can also
act as the navigation tool for the provenance
publication.
36The need for xtl-Prints
100s of structures
How do we disseminate?
National Crystallography Service
37The need for xtl-Prints
Combechem
DATA
PUBLICATION
DISSEMINATION
Combichem
38Crystallographic e-Prints
39Crystallographic e-Prints
40Direct access to data
41Direct access to data
42Dolphin RDF Browser
43SVG active graphics
44e-worries
WSRF
Must ensure this is not a problem for applications
GTi
45The Semiotic Web
- Chemists use signs and symbols as much as, if not
more than words - Icons have a great significance The Periodic
Table - People Computers need to communicate with each
other as well as themselves - Need a more powerful (general) concept than the
semantic web grid.
46E-Lab
E-Lab
E-Lab
Combinatorial
Properties
X-Ray
Samples
Samples
Synthesis
Measurement
Crystallography
Laboratory
Laboratory
Laboratory
Processes
Processes
Processes
Quantum
Structures
Properties
Mechanical
Properties
DB
DB
Analysis
Data Mining,
Prediction
Design of
QSAR, etc
Experiment
Data
Provenance
Data Streaming
Authorship/
Visualisation
Submission
Agent Assistant