Title: A Systematic Approach to Modelling, Capturing and Disseminating Proteomics Experimental Data
1 Promoting Coherent Minimum Reporting Guidelines
for Biological Biomedical Investigations The
MIBBI Project Chris Taylor, EMBL-EBI NEBC
chris.taylor_at_ebi.ac.uk MIBBI www.mibbi.org
HUPO Proteomics Standards Initiative
psidev.sf.net Research Information Network
www.rin.ac.uk
2On standards bodies
- What defines a standards-generating body?
- A beer and an airline (Zappa)
- Formats, reporting guidelines, controlled
vocabularies - Regular open attendance meetings, discussion
lists, etc. - e.g., MGED (transcriptomics), PSI (proteomics),
GSC (genomics) - Hugely dependent on their respective communities
- Requirements gathering (What are we doing and
why?) - Development (By the people, for the people)
- Testing (No it isnt finished, but yes Id like
you to use it) - Uptake by stakeholders
- Publishers, funders, vendors, tool/database
developers - The user community (capture, store, search,
analyse)
3Modelling the biosciences
MS
MS
Gels
NMR
Arrays
Columns
FTIR
Scanning
Arrays Scanning
Columns
4Modelling the biosciences (slightly differently)
Investigation Medical syndrome, environmental effect, etc.
Study Toxicology, environmental science, etc.
Assay Omics and miscellaneous techniques
5Multiple all that by three (kinds of standard)
6What biologists need
7Well-oiled cogs meshing perfectly (would be nice)
- How well are things working?
- Cue the Tower of Babel analogy
- Situation is improving with respect to standards
- But few tools, fewer carrots (though some sticks)
- Why do we care about that..?
- Data exchange
- Comprehensibility of work
- Scope for reuse (parallel or orthogonal)
8Rise of the Metaprojects
- Investigation / Study / Assay (ISA)
Infrastructure - http//isatab.sourceforge.net/
- Ontology of Biomedical Investigations (OBI)
- http//obi.sourceforge.net/
- Functional Genomics Experiment (FuGE)
- http//fuge.sourceforge.net/
9Reporting guidelines a case in point
- MIAME, MIAPE, MIAPA, MIACA, MIARE, MIFACE,
MISFISHIE, MIGS, MIMIx, MIQAS, MIRIAM, (MIAFGE,
MIAO), My Goodness - MI checklists usually developed independently,
by groups working within particular biological or
technological domains - Difficult to obtain an overview of the full range
of checklists - Tracking the evolution of single checklists is
non-trivial - Checklists are inevitably partially redundant one
against another - Where they overlap arbitrary decisions on wording
and sub structuring make integration difficult - Significant difficulties for those who routinely
combine information from multiple biological
domains and technology platforms - Example An investigation looking at the impact
of toxins on a sentinel species using proteomics
(eco-toxico-proteomics) - What reporting standard(s) should they be using?
10The MIBBI Project (mibbi.org)
- International collaboration between communities
developing Minimum Information (MI) checklists - Two distinct goals (Portal and Foundry)
- Raise awareness of various minimum reporting
specifications - Promote gradual integration of checklists
- Lots of enthusiasm (drafters, users, funders,
journals) - 31 projects committed (to the portal) to date,
including - MIGS, MINSEQE MINIMESS (genomics, sequencing)
- MIAME (µarrays), MIAPE (proteomics), CIMR
(metabolomics) - MIGen MIQAS (genotyping), MIARE (RNAi),
MISFISHIE (in situ)
11Nature Biotechnol 26(8), 889896
(2008) http//dx.doi.org/10.1038/nbt.1411
12The MIBBI Project (www.mibbi.org)
13The MIBBI Project (www.mibbi.org)
14The MIBBI Project (www.mibbi.org)
Interaction graph for projects (line thickness
colour saturation show similarity)
15The MIBBI Project (www.mibbi.org)
16(No Transcript)
17MICheckout Supporting Users
18(No Transcript)
19The objections to fuller reporting
- Why should I dedicate resources to providing data
to others? - Pro bono arguments have no impact
- Sticks from funders and publishers get the bare
minimum - This is just a make work scheme for
bioinformaticians - Bioinformaticians get a buzz out of having big
databases - Bioinformaticians benefitting from others work
- I dont trust anyone elses data Id rather
repeat work - Problems of quality, which are justified to an
extent - But what of people lacking resource for this, or
people who want to refer to proteomics data but
dont do proteomics - How on earth am I supposed to do this anyway..?
- Perception that there is no money to pay for this
- No mature free tools Excel sheets are no good
for HT - Worries about vendor support, legacy systems
(business models)
20Credit where credits due
- Data sharing is more or less a given now, and
tools are emerging - Lots of sticks, but they only get the bare
minimum - How to get the best out of data generators?
- Only meaningful credit will work
- Need central registries of data sets that can
record reuse - Well-presented, detailed papers get cited more
frequently - The same principle should apply to data sets
- So, OpenIDs for people, DOIs for data?
- Side-benefits, challenges
- Would also clear up problems around paper
authorship - Would enable other kinds of credit (training,
curation, etc.) - May have to be self-policing researchers own
their credit portfolio (though an enforcement
body would also be useful) - Problem of micro data sets and legacy data