Title: UKOLN is supported by:
1Monica Duke m.duke_at_ukoln.ac.uk Project Manager,
SageCite Project http//blogs.ukoln.ac.uk/sagecite
/ sagecite Developing Data Attribution and
Citation Practices and Standards An International
Symposium and Workshop August 22-23, 2011
UKOLN is supported by
2- Citation in the domain of disease network
modelling - Funded August 2010 July 2011
3SageCite project overview
- Review of data citation (issues, technology)
- Understanding the domain
- Sage Bionetworks partners in project
- Site visit
- Documenting processes (workflow tools)
4SageCite project overview
- Demonstrator
- Adding support for data citation
- Using DataCite services
- Working with publishers
- Benefits analysis KRDS Taxonomy
5(No Transcript)
6www.sagebase.org
- US-based non-profit organisation
- Creating a resource for community-based,
data-intensive biological discovery - Community-based analysis is required to build
accurate model
7 8www.sagebase.org
- US-based non-profit organisation
- Creating a resource for community-based,
data-intensive biological discovery - Community-based analysis is required to build
accurate models
9Slide by Lara Mangravite Sage Bionetworks
10Sage data and processes
- Idealised 7-stage process
- A combination of phenotypic, genetic, and
expression data are processed to determine a list
of genes associated with diseases - Different people are responsible for different
stages of the modelling process. One person
oversees the whole process.
11- Stage 1 Data Curation
- basic data validation to ensure integrity and
completeness - datasets include microarray data and clinical
data. Â - ensures that the format of the data is understood
and the required metadata is present.
12(No Transcript)
13Agreeing standards to support sharing
- Derry J et. al Developing predictive Molecular
Maps of Human Disease through Community-based
Modeling. - http//precedings.nature.com/documents/5883/versio
n/1/files/npre20115883-1.pdf
14Workflow capture using Taverna http//www.vimeo.co
m/27287109
- Documenting data processes through workflow tools
- supports better citation
- makes the cited resource more re-usable
- strengthening the reproducibility and validation
of the research.
15Data Citation Purposes
- For attribution
- Leading to credit and reward
- For reproducibility
- Supports validation, re-use
- Eric Schadt at Sage Bionetworks Congress 2011
- http//fora.tv/2011/04/16/Eric_Schadt_Map_Building
(start at 4.28)
16Open challenges attribution
- Preserving link with original data
- Some discipline-based repositories have their own
identifiers - Bi-directional links
- Attributing data creators
- including individuals?
- Defining creation of new intellectual object e.g.
curated dataset? - Cultural challenge in recognising non-standard
contributions microattribution - New metrics
- Identification of contributors
17Open challenges reproducibility
- Identification and granularity
- Discipline identifiers, global identifiers
- How much value has been added since the data
entered the workflow? - Identifying processes and software
18Acknowledgements
- University of Manchester
- Carole Goble
- Peter Li
- British Library
- Max Wilkinson
- Tom Pollard
- Sage Bionetworks
- UKOLN
- Liz Lyon
- Monica Duke
- Nature Genetics
- Myles Axton
- PLoS Comp Bio
- Phil Bourne