Title: Crosslinking and Referencing Data and Publications in CLADDIER
1Cross-linking and Referencing Data and
Publications in CLADDIER
- Brian Matthews,
- E-Science Centre,
- STFC Rutherford Appleton Laboratory
2About CLADDIER
Citation, Location and Deposition in Discipline
and Institutional Repositories
Funded via a JISC grant, through the Digital
Repositories programme - July 2005-Oct 2007
- Bryan Lawrence (PI, BADC)
- Sam Pepler (Project Manager, BADC)
- Sue Latham (BADC)
- Pauline Simpson (NOCS)
- Jessie Hey (Southampton)
- Brian Matthews (STFC)
- Catherine Jones (STFC)
- Alistair Miles (STFC)
- Katie Portwin (STFC)
- Shoaib Sufi (STFC)
- Kevin ONeil (STFC)
- Katherine Bouton (Reading, NCAS)
3(No Transcript)
4Citation and linking in repositories
- In order to achieve this scenario we need to
provide a set of key mechanisms -
- Publishing of Data
- Conventions for the citation of data
- Can then treat data citation in similar way to
publications - Browsing and searching
- across different repositories
- across data and publication
- Cross-citation of data and publication
- forward and backward citation
- need to maintain currency of citation links
- A simple mechanism to push citation information
between repositories - A practical look at citation of data and how
repositories could communicate citation
information.
5Data Publication
- In this context publication is defined as the
process through which data is fixed and made
retrievable over the long term, and may imply
that there has been some quality control process.
- Defining data fixing and encapsulating a
meaningful data set - Quality Control Publishers, Data Centres
Natural Environment Research Council,
Mesosphere-Stratosphere-Troposphere Radar
Facility Thomas, L. Vaughan, G. .
Mesosphere-Stratosphere-Troposphere Radar
Facility at Aberystwyth, Internet. Version 2,
Cartesian products. British Atmospheric Data
Centre (BADC), 1990- cited 2006 Apr 25.
Available from http//badc.nerc.ac.uk/data/mst.
6Browsing and Searching
- Browsing and searching
- across different repositories
- across data and publication
- CLADDIER has provided a harvesting and search
tool to support cross-repository searching
7Discovery Service
- The Discovery Service gives a broad-brush search
- Give you both publications and data sets
- indexed by keyword
- Google across repositories.
- Uses OAI-PMH a conventional approach
- Simple but it works!
- Simple key-word searching
- Three participating repositories in the pilot
BADC, STFC ePubs, SOTON ePrints
8Adding Cross-Citations
- Cannot tell whether the data and publication are
actually related. - what data and publications inspire a piece of
work (generating a new data set) - what publications arise from a data set
- We need to exploit the concept of cross-citation
to see whether items are actually related.
Traditional Citation
Cross Citation
9Maintaining Links
- Ideally the archives holding the datasets and
publications would be notified that a paper
citing them had been submitted. - Metadata associated with those records would be
updated to reflect the citations. - The metadata in the publication repository should
also link to the metadata in the data archives
and vice versa. - It would be great if this notification could be
done automatically. - Tedious to enter citations
- forward citations (cited-by) are hard to
track - We adapted a protocol from the world of Blogging
- Trackback
- Designed to allow cross-referencing of blog
articles - Extended to allow richer metadata
10Trackback Protocol
11Sender Publication
This publication has a citation to a technical
report
12Adds Citation
Sends trackback call to this URI
13Embedded Metadata
Trackback URI
Formats accepted
14After Trackback cited-by link added
Receiver Publication
Added this cited by link
15Notes on Trackback
- A simple existing protocol
- P2P loosely federates repositories
- Extended to carry metadata of the citation
- To add cited-by links
- Can also indicate which metadata is expected
- Simple Dublin Core
- ePrints Application Profile
- Can also use the metadata of the receiver
- Improves the citation metadata
- Implemented in ePubs
- Also partially in BADC
- Receiver only send email to admin.
- Some problems or extensions are under
consideration - Link to metadata not full text
- Spamming anyone could send trackbacks
- Whitelists
- Administrator intervention
- Multiple entries
- Same citation multiple times
- Same citation in different repositories
- Retraction of citation
- A delete protocol
16Conclusions
- CLADDIER supports the scientific process with
federated repositories - This requires the cross-linking network of
information objects. - Which needs to be stored, maintained and searched
- Now doing some user testing
- Tools and ideas relatively straightforward
- Lots of gluing of existing components
- Keep it simple so it will get used
- http//claddier.badc.ac.uk/