Smart Storage for Physical Properties - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Smart Storage for Physical Properties

Description:

Smart Storage for Physical Properties. Or. How on Earth do we ... Provenance = Senary relational model? Property Value Error Units Source Method Author Note ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 13
Provided by: combe5
Learn more at: http://www.combechem.org
Category:

less

Transcript and Presenter's Notes

Title: Smart Storage for Physical Properties


1
Smart Storage for Physical Properties
  • Or
  • How on Earth do we Store this Stuff?

Kieron Taylor with Jeremy Frey and Jonathan Essex
2
What makes up chemical data?
  • Numbers - big, small, precise and vague
  • Circumstances - How hot? What pressure?
  • Assumptions
  • This is pretty pure, let's say it's pure
  • Standard conditions? More or less
  • That peak on the spectrum isn't important

3
Using the Data QSPR
Take lots of data
Magical statistics occur
Validate results
Predictive model
4
So What is Real Data like?
  • Bad - take the commercial Physprop Database
  • Can we handle these melting points?

5
Let's Make a Database
  • One data source is not enough
  • Good(?) data isn't free
  • Different sources have varied style of content
  • Most database software not suited to data mining
  • We cannot plumb these varied sources for data, we
    must reconcile them to make sensible statistics

6
Relational Design
For one molecule Cyclohexanone
Property Value Units Solubility 2500 mg/L Melti
ng point -31 C Boiling point 155.4 C
Property Value Error Units Source Solubility 25
00 /-50 mg/L Physprop 2650 /-60 mg/L Ou
r lab Melting point -31 /-0.1 C Detherm Boiling
point 155.4 /-0.5 C Merck Index
Property Value Error Units Source Method Author S
olubility 2500 /-50 mg/L Physprop Laboratory ...
2650 /-60 mg/L Southampton Simulation Me Melt
ing point -31 /-0.1 C Detherm Laboratory ... Boi
ling point 155.4 /-0.5 C Merck
Index Laboratory ...
Property Value Error Units Source Method Author
Note Solubility 2500 /-50 mg/L Physprop Laborat
ory ... 2650 /-60 mg/L Southampton Simulatio
n Me Superceded 2599 /-25 mg/L Southampton Sim
ulation B Me Melting point -31 /-0.1 C Detherm
Laboratory ... Boiling point 155.4 /-0.5 C Merck
Index Laboratory ... Decomposing
Arbitrary numbers of points are hard to store in
relational databases We're not done yet We
still have to account for multiple experimental
conditions, statements of validity and
molecules. Provenance Senary relational model?
7
RDF Triplestore is the Solution
  • RDF describes trees and networks of entities
  • Data of this complexity lends itself well to a
    tree representation
  • RDF trees enable additional clever things
  • Triplestores provide persistent RDF models

8
(No Transcript)
9
(No Transcript)
10
What can we do with this?
  • Store almost any chemical data as normal
  • Track the where, when and how of each and every
    data point
  • Filter values down whether real, simulated, old,
    new, from a particular source, or done by a
    particular person.
  • Bolt on RDF schemas such as FOAF and our units
    system.

11
What have we done with this?
  • http//green.chem.soton.ac.uk/triangle/query.html

12
Thanks to
  • AKT and Steve Harris for 3store
  • Rob Gledhill for web tech and discussion
  • Perl for s/ / /g
Write a Comment
User Comments (0)
About PowerShow.com