Title: An evaluation of representing proteomics experiments within FuGE-OM
1An evaluation of representing proteomics
experiments within FuGE-OM
- Andrew Jones
- Department of Computer Science,
- University of Manchester
- Angel Pizarro
- Institute for Translational Medicine and
Therapeutics, - University of Pennsylvania
2Requirements of functional genomics
- FG produces large complex data sets ( lots of
metadata!) - Various standard formats being developed (PSI,
MGED, SMRS) - Most investigation types have a hypothesis,
biological sample, basic lab procedures etc - Similar analyses performed over different types
of FG data set - Queries, statistics and so on
3Benefits of shared model components
- Queries over common annotation
- Samples, hypotheses, protocols
- Shared software for experimental annotation and
analysis - Microarrays, proteomics and metabolomics
performed in same lab - Developing standards for each technique is a hard
problem - Shared resources could alleviate problems
4The Functional Genomics Experiment Object Model
(FuGE-OM)
- Aim
- Model to support generation of functional
genomics data formats - Learn from experience of MAGE-OM ( PEDRo)
- Divide into generic and technology-specific
- Result
- FuGE-OM is far simpler than MAGE-OM
- Covers a wide range of use cases due to its
generic structure - 2 namespaces in core Bio and Common
- MAGE extension in development
5Purpose of this Evaluation
- Test if proteome experiments could be modelled
within FuGE-OM - To verify if Common and Bio are generic to
functional genomics - Demonstrate correspondence with PSI-OM (and
PEDRo) for experiment, lab procedures etc. - Does FuGE-OM give additional features to the
models? - Gain feedback from microarray / proteome
community on modelling issues
6FuGE.Common
FuGE.Bio
- Audit
- Description
- Ontology
- Protocol
- Reference
- Experiment
- Material
- BioSequence
- Data
Common does not rely upon Bio, it can be used
independently
7Common.Protocol
- Action
- Ordered list of atomic actions (simple protocol
steps) - Plain text or standard term from CV
- Allows nesting of protocols
- (Complex actions become a nested protocol)
- Association to Equipment and Software
- Parameters and default values can be captured
The class Protocol describes a standard protocol
8Common.Protocol
- Date performed
- If parameter values differ from default
- Particular instrument used e.g. from a pool
- Protocol operator (Person)
ProtocolApplication is an instance of a Protocol
9Common.Protocol
1. Material package
3. Data package
2. Data package
- Convert one type of substance to another
- - protein solubilisation
- Acquire data by analysing a particular material
- - gel scanning
- Create a new data set by transforming an existing
data set - - gel image analysis
10Bio.Experiment
- Package can describe a wide range of
investigation types - Class names are open for discussion!
11PSI-OM Study
- Experimental Factors can be biological or
technical - e.g. Time course
- Factor Value captures difference between factors
- e.g. Time points / strains / treatments
- Link to Data dimension
Experiment similar to Project, ExperimentDesign
similar to Study
12Experiment Find gene protein expression in
three different mouse strains
Description Replicate design, Quality
control Etc
Experiment
OT OntologyTerm
Link from each FactorValue to corresponding data
values
ExperimentDesign
DesignType
BiologicalProperty (OT)
ExperimentalFactor
FactorType
BioSourceType (OT)
FactorValue
bioSource
Material
Strain A (OT)
FactorValue
bioSource
Material
Strain B (OT)
FactorValue
bioSource
Material
Strain C (OT)
Data of Interest
13Experiment Find gene protein expression
sampling from a population at 4 time points
Description Replicate design, Quality
control Etc
Experiment
OT OntologyTerm
Information about how sampling was done in
Material package
ExperimentDesign
DesignType
MethodologicalDesign (OT)
FactorType (OT)
ExperimentalFactor
TimePoint
FactorValue
1 hour (OT)
bioSource
FactorValue
2 hour (OT)
Material
FactorValue
4 hours (OT)
FactorValue
24 hours (OT)
Data of Interest
14Experiment Find difference in protein
expression detected using two types of mass
spectrometer
Description Replicate design, Quality
control Etc
Experiment
OT OntologyTerm
Difference between workflows only in the Data
package different DataAcquisitions
ExperimentDesign
DesignType (OT)
MethodologicalDesign
FactorType (OT)
ExperimentalFactor
hardwareVariation
FactorValue
Micromass (OT)
bioSource
FactorValue
Material
Q-star (OT)
Data of Interest
15Bio.Material
- Materials have a type and various
characteristics, described using an ontology
Can have sub-components e.g. Plate and Plate Well
Used for describing a cycle of treatments (in
conjuction with Protocol)
16Proteome workflow
Material Measurement
Material Measurement
Material
Material Treatment
Gel2D Dimensions Composition Etc.
Gel Separation Voltage Time
Material
Material Treatment
Separated Gel
Spot Picking X/Y Coords
Material
- Create subclasses of Material and
MaterialTreatment to specify attributes about
gels and separation - Allows use of other Bio and Common features
ontology references (for Material), link to
Experiment, auditing etc
17PSI-OM Assay
Material Measurement
Material Measurement
Material
Material Treatment
Material
Material Treatment
Gel2D
Gel Separation
Separated Gel
Spot Picking
18Proteome workflow
Material Measurement
Material Measurement
Material
Material Treatment
Material
Material Treatment
Material
Column Type Size Beads Etc
Column Separation Time Flow rate
Collect Fraction startPoint endPoint
- Extensions allow specific attributes to be
captured about column, separation and collection
of fractions - End Material can go to any other kind of
treatment e.g. mass spec / gel etc
192D LC-MS use case
Yeast culture
1. Extract Proteins
2. Digest into peptides
Ion exchange
Reverse phase
Protein list ABC1 DFR2 DFF .
MassSpec Informatics
Mass Spectrometry
Peak lists
202D LC-MS use case
Protocol Application
mzData
CollectFraction
Various ways mzData / mzIdent could reference a
FuGE entry
Bio or Common
mzIdent
1
n
PSI extension
IdentifiedProtein
21DIGE Use Case
2. Pool
1. Extract protein
1. Extract protein
Cy2
Cy3
Cy5
3. Labelling
4. Pool
5. Gel Electrophoresis
6. Imaging
22DIGE Use Case
Material (Sample A)
Material (Sample B)
Described using ontologies
MaterialTreament (ProteinExtraction)
MaterialTreament (ProteinExtraction)
Material (Protein mix.)
Material (Protein mix.)
Protocol Application
MaterialTreament (ProteinExtraction)
Mat. Treat.
Protocol
Material (pooled)
Material (Cy3)
Protocol Application
MaterialTreament (ProteinExtraction)
Material (Cy2)
Material Treament (Labelling)
Material Treament (Labelling)
Material Treament (Labelling)
Material (Cy5)
Material (Cy2 Labelled)
Material (Cy3 Labelled)
Material (Cy5 Labelled)
23DIGE Use Case
Material (Cy2 Labelled)
Material (Cy3 Labelled)
Material (Cy5 Labelled)
Mat. Treat (pooling)
Material (Pooled mix.)
Gel2D ( gel params)
GelSeparation (separation params)
SeparatedGel
Bio or Common
DataAcquisition (Scanning)
PSI extension
ImageData
24mzData
- Could be linked by mapping mzData back to UML
- OR
- Use an XLink associate a FuGE-ML entry with an
mzData doc - (for sample processing / experiment description
etc.)
25Issues for community involvement
- How can FuGE help PSI model development process?
- Does Experiment package capture all proteome /
RSBI use cases? - How could mzData be integrated with FuGE?
- E.g. XLink, UML level (shared model components)
26Conclusions
- Common and Bio allow descriptions of experiment,
lab procedures, data dimensions - Could give additional annotation capabilities to
current proteome models - PSI could extend components to make explicit
proteome concepts to be captured - Mass spec formats could reference a FuGE entry
for samples, hypothesis, experiment etc
27Plan for future work
- XML Schema generated in near future for creating
test data sets - Use cases defined and made available to
demonstrate correct use of model - Weekly conference call to solve modelling issues
- We would like to encourage PSI involvement in
further development of Bio and Common
http//fuge.sourceforge.net
28Acknowledgements
- Modelling
- Paul Spellman, Chris Taylor, Norman Paton and
many others at MGED meetings
http//fuge.sourceforge.net
29ICAT Use Case
30DIGE Use Case
Tracking materials through separations /
labelling / pooling
31FactorValue
FactorValue
FactorValue
Data
DimensionElement
DimensionElement
DimensionElement
Dimension
Strain A
Strain B
Strain C
Dimension
DimensionElement
DimensionElement
DimensionElement
DimensionElement
DimensionElement
Array Spot 1
Gel Spot 1
Measure 1
Measure 2
Measure 3
DimensionElement
DimensionElement
Array Spot 2
Gel Spot 2
DimensionElement
DimensionElement
Array Spot 3
Gel Spot 3
. . .
. . .
Matrix
32Common.Data
- Ordered set of Dimensions
- Data stored in Matrix
- Matrix must be extended with subclasses
33Common.Ontology
- Source of term captured in DatabaseEntry
- Term stored in name
- Value some ontology concepts need a value
supplied by user - Self-association for nested terms
34class Age namespace http//mged.sourceforge.net/
ontologies/MGEDOntology.daml documentation The
time period elapsed since an identifiable point
in the life cycle of an organism. If a
developmental stage is specified, the
identifiable point would be the beginning of that
stage. Otherwise the identifiable point must be
specified such as planting (e.g. 3 days post
planting). constraints restriction
has_measurement has-class Measurementrestriction
has_initial_time_point has-class InitialTimePoint
class Age namespace http//mged.sourceforge.net/
ontologies/MGEDOntology.daml documentation The
time period elapsed since an identifiable point
in the life cycle of an organism. If a
developmental stage is specified, the
identifiable point would be the beginning of that
stage. Otherwise the identifiable point must be
specified such as planting (e.g. 3 days post
planting).
Example
3 days post planting
hasMeasurement
hasUnit
OntologyTerm
OntologyTerm
OntologyTerm
Name Measurement Value 3
Name Unit Value null
Name days Value null
OntologyTerm
Name Age Value null
OntologyTerm
OntologyTerm
Name InitialTimePoint Value null
Name planting Value null
hasInitialTimePoint
35Common.Reference
Subclasses of Identifiable can be linked to
bibliographic or database entries
36Common.Description
- Many classes inherit from Describable
- Link to Audit / Security details
- URI and text description
37Common.Audit
- Manages changes to the document
- Linked to Contacts
38Common.Data
- Ordered set of Dimensions
- Data stored in Matrix
- Matrix must be extended with subclasses