The Chemical Knowledge Cycle - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

The Chemical Knowledge Cycle

Description:

The exponential world of Combinatorial Synthesis and High throughput analysis ... Mills, Graham Smith, Simon Miles, Nicky Harding, Gareth Hughes, Nick Humphries, ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 46
Provided by: nes76
Category:

less

Transcript and Presenter's Notes

Title: The Chemical Knowledge Cycle


1
The Chemical Knowledge Cycle
and its ramifications for e-Science Or the
other way round
  • Jeremy Frey
  • School of Chemistry
  • University of Southampton, UK

2
Talk
  • The Comb-e-Chem Project
  • Smart Lab
  • National Crystallography Service
  • Cluster Computing
  • Dissemination Publication

3
The Comb-e-Chem Project
  • The exponential world of Combinatorial Synthesis
    and High throughput analysis meets the
    exponentially growing power of computing
  • FundingEPSRC, JISC, IBM, GSK, AZ, Southampton

4
  • Comb-e-Chem Partners
  • IT
  • Innovation
  • IBM
  • NCS
  • CCDC
  • Bristol
  • Chemistry
  • ECS
  • Chemistry
  • Pfizer
  • Combi
  • Centre
  • Stats
  • GSK
  • AZ
  • Southampton

5
People
  • Chemistry (Southampton Bristol)
  • Mike Hursthouse, Chris Frampton, Jon Essex,
    Jeremy Frey, Guy Orpen, Stephan Christensen,
    Thomas Gelbrich, Sam Peppe, Hongchen Fu, Graham
    Tizard, Suzanna Ward, Lefteris Danos, Jamie
    Robinson, Kieron Talyor, Chris Woods, Rob
    Gledhill
  • National Crystallography Service (NCS)
  • Simon Coles, Mark Light, Ann Bingham, Peter
    Horton
  • Electronics and Computer Science (Southampton)
  • Dave De Roure, Luck Moreau, Mike Luck, Hugo
    Mills, Graham Smith, Simon Miles, Nicky Harding,
    Gareth Hughes, Nick Humphries, monica schraefel,
    Terry Payne
  • It-Innovation (Southampton)
  • Mike Surridge, Ken Meacham, Steve Taylor, Daren
    Marvin
  • Statistics (Southampton)
  • Alan Welsh, Sue Lewis, Ralph Manson, Dave Woods
  • Rutherford Appleton Laboratory, Atlas Data Centre
  • IBM Colin Bird, Syd Chapman

6
Design (statistics)
Plan Access to data
Experiments Smart Labs
CombeChem Data and Knowledge Cycle End-to-End
Management
Literature
High Throughput measurement
Analysis Statistics
Dissemination E-Bank Data
7
Plans
Small set of fixed plans
NCS
Variable plans, written by chemist (difficult!)
Continuum of plan types
Tea
Ad-hoc, implied by process execution
SHG
8
A chemistry lab is a hostile environment without
much room to maneuver
what can be captured captured
automatically with sensors? what must rely on
manual annotation?
The chemist
The fume cupboard
9
Competition for space
very precise scales - but not connected to any
recording device
10
Industrial support
Big block to publication_at_source if its not
digital, its difficult to share
critical data entry
11
By Making Tea!
Getting not just the what and how, but the why
12
Making Tea design elicitation through analogy
  • Developed and validated the analogy with chemists
  • Gave us a way to ask questions that would not
    otherwise have been possible
  • Let us maximize observation
  • Gave us repeatability
  • Derived rudiments of a process model, too
  • Provided lingua franca with chemists

13
Pervasive Grid Smart Flight
Tablet?
14
Results
I can go anywhere and its, like, this is me and
my data. Its all there! Bang!
  • In real use, chemists were able to record their
    experiments
  • After about ten minutes of use, they forgot about
    it as a new thing, and just used it

15
Data model
Intended actions guide to chemist, or later
workflow
Plan
Measurements Processes Annotations
Process record
Service invocations Secure time-stamps etc
Provenance record
16
Databases
  • Database will become the key method of handling
    all data
  • Metadata must be generated at inception and added
    as data traverses the workflow
  • Version control, audit and backup handled at the
    database level.

17
(No Transcript)
18
(No Transcript)
19
Databases - Our experience
  • What do you do when the actual users keep
    changing their mind?
  • Is a traditional relational database suitable?
  • Danger of re-enforcing scientific bias against
    relational database for laboratory data.
  • RDF RDFS!

20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
Lessons
  • That we need two related ontologies
  • Plan that are going to be done
  • Record what was done
  • Not necessarily the same thing
  • Steps are added/repeated during the experiment
  • Different annotations required for each

24
Process Record Ontology
25
NCS Grid Service Architecture
26
The Grid Zone
  • Security is fundamental
  • Who is using our experiments
  • Insulate them from each other and from the rest
    of our institution
  • Process Role based security
  • Use DMZ
  • This combination creates a Grid Zone

27
(No Transcript)
28
(No Transcript)
29
Cluster Computation
  • Needed for Design of Experiments
  • Stats computationally intensive
  • Simulations
  • Protein dynamics
  • Clusters, Cycle Steeling
  • Schools engagement e-Malaria

30
What do we want to Compute?
  • Combechem is compiling a large database of
    molecules. The database contains the properties
    of these molecules, e.g. their crystal structure
    or solvent accessible surface area (SASA). Some
    of these properties are measured from experiment
    while others are calculated from simulations run
    on the GRID.
  • Comberobots continually scan the database for
    empty fields. They can automatically submit
    simulations to calculate any unknown properties.
    These simulations run on the GRID by stealing the
    spare cycles of a heterogeneous network of
    computers.

31
(No Transcript)
32
The reality of working on the GRID
33
Dissemination Publication
  • A different approach is required to provide data
    to the community
  • The grid provides the necessary medium
  • What How do we want to make available

34
Publication_at_SourceDissemination
Bibliography
Student
Journal
Professional Body Archive
Institution
Laboratory
35
The Data Trail
Workflow
Raw data
Process
Derive
Model
Plot
Provenance
The graphical model of the workflow used as the
front end of a typical workflow enactor can also
act as the navigation tool for the provenance
publication.
36
The need for xtl-Prints
100s of structures
How do we disseminate?
National Crystallography Service
37
The need for xtl-Prints
Combechem
DATA
PUBLICATION
DISSEMINATION
Combichem
38
Crystallographic e-Prints
39
Crystallographic e-Prints
40
Direct access to data
  • DERIVED DATA

41
Direct access to data
  • RAW DATA

42
Dolphin RDF Browser
43
SVG active graphics
44
e-worries
WSRF
Must ensure this is not a problem for applications
GTi
45
The Semiotic Web
  • Chemists use signs and symbols as much as, if not
    more than words
  • Icons have a great significance The Periodic
    Table
  • People Computers need to communicate with each
    other as well as themselves
  • Need a more powerful (general) concept than the
    semantic web grid.

46
  • Changing the way we work

E-Lab
E-Lab
E-Lab
Combinatorial
Properties
X-Ray
Samples
Samples
Synthesis
Measurement
Crystallography
Laboratory
Laboratory
Laboratory
Processes
Processes
Processes
Quantum
Structures
Properties
Mechanical
Properties
DB
DB
Analysis
Data Mining,
Prediction
Design of
QSAR, etc
Experiment
Data
Provenance
Data Streaming
Authorship/
Visualisation
Submission
Agent Assistant
Write a Comment
User Comments (0)
About PowerShow.com