- PowerPoint PPT Presentation

About This Presentation
Title:

Description:

Title Tomorrow, and tomorrow, and tomorrow : the players on the curation stage Last modified by: Authorized OCLC User Document presentation format – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 64
Provided by: oclcOrgco
Learn more at: https://www.oclc.org
Category:

less

Transcript and Presenter's Notes

Title:


1
Tomorrow, and tomorrow, and tomorrow the
players on the curation stage
  • Chris Rusbridge
  • Presentation at OCLC

2
  • "To-morrow, and to-morrow, and to-morrow,
  • Creeps in this petty pace from day to day,
  • To the last syllable of recorded time
  • And all our yesterdays have lighted fools
  • The way to dusty death.
  • Out, out, brief candle!
  • Life's but a walking shadow a poor player,
  • That struts and frets his hour upon the stage,
  • And then is heard no more it is a tale
  • Told by an idiot, full of sound and fury,
  • Signifying nothing."
  • Shakespeare Macbeth

3
  • Dunsinane Hill
  • Photo by Fabrice

4
(No Transcript)
5
(No Transcript)
6
Contents
  • Curation and the Digital Curation Centre
  • Science and Data Citations
  • The poor players of data curation
  • Sustainability of curated data
  • Macbeth again

7
Curation
  • Data increasingly important as evidence
  • Experimental verifiability (the basis of science)
  • Unrepeatable observations experiments
    (particularly environmental in broadest sense)
  • Legal, compliance transactions
  • Cultural resources
  • Preservation view vs Publishing view

8
Lynch remarks
  • Closing the Curation Conference
  • 3 views of digital curation
  • Finite process, handover to preservation
  • Whole life process, evolving object(s)
  • Collection as a living thing

9
Digital curation?
For later use
Static
Digital preservation
10
Digital curation?
For later use
In use now (and the future)
Static
Dynamic Long-term
Digital preservation
Digital curation
11
Digital curation
For later use
In use now (and the future)
Static
Dynamic Long-term
Digital curation preservation
maintaining and adding value to a trusted body
of digital information for current and future
use
12
Mission
  • The over-riding purpose of the DCC is to
    support and promote continuing improvement in the
    quality of data curation, and of associated
    digital preservation

13
Organisation to Engage Collaborate
curation organisations eg DPC
communities of practice users
community support outreach
service definition delivery
management admin support
Associates Network
research collaborators
research
development co-ordination
testbeds tools
Industry
standards bodies
14
Organisation to Engage Collaborate Leads
curation organisations eg DPC
communities of practice users
Bath
Associates Network
research collaborators
Glasgow
Edinburgh
Edinburgh
CCLRC
testbeds tools
Industry
standards bodies
15
Associated work
  • DCC LOCKSS Technical Support Service
  • (Lots of Copies Keep Stuff Safe)
  • DCC SCARP Project
  • Disciplinary approaches to sharing, curation,
    re-use and preservation
  • EU projects associated
  • CASPAR
  • Digital Preservation Europe
  • PLANETS

16
Phase 2
  • Externally-moderated, reflective self-evaluation
    completed
  • Phase 2 proposal (2007/10) to JISC
  • Accepted focus on science data, reduced scale
  • EPSRC-funded Research continues until 2007/8

17
2nd International Digital Curation Conference
  • Research invited presentations
  • Glasgow, 21/22 November, 2006
  • Please register at http//www.dcc.ac.uk/events/dc
    c-2006/

18
(No Transcript)
19
Data resource stages
  • Curated data is created
  • Observations? Fixed!
  • Or Acquired
  • Data brought/bought from outside
  • Ingest
  • Development
  • Derived, refined, combined, processed data
  • Potentially many stages

20
TWOMASS (Infrared)
SDSS (Visual)
Slide from Rajendra Bose
21
Slide from Rajendra Bose
22
New discovery
  • National Virtual Observatory
  • Johns Hopkins press release Scientists working
    to create the NVO, an online portal for
    astronomical research unifying dozens of large
    astronomical databases, confirmed discovery of
    a new brown dwarf recently. The star emerged
    from a computerized search of information on
    millions of astronomical objects in two separate
    astronomical databases. Thanks to an NVO
    prototype, that search, formerly an endeavor
    requiring weeks or months of human attention,
    took approximately two minutes.

23
Context
  • Data meaningless without context
  • Linkage
  • Metadata of many kinds
  • Workflow!
  • Provenance
  • Computational lineage
  • Authenticity

24
NASA
research group3
University research group1
local decision-making body
University research group2
Slide from Rajendra Bose
25
Access and re-use
  • Ethics and rights control access
  • Weak in expressing this long-term
  • Collaboration tools
  • Annotation, discussion, review
  • Re-use leading to change and development
  • Publication
  • Not just in print
  • Underlying data should be published, too
  • Citation

26
CLADDIER citation investigation
  • My last example was an MST data set held at the
    BADC, and I was suggesting something like this
    (for a citation)
  • ltCitationgtltAuthorgt Natural Environment Research
    Council lt/Authorgt
  • ltTitlegt Mesosphere-Stratosphere-Troposphere Radar
    at Aberystwyth lt/Titlegt
  • ltMediumgt Internet lt/Mediumgt
  • ltPublishergt British Atmospheric Data Centre
    (BADC) lt/Publishergt
  • ltPublicationDate status"ongoing"gt
    1990lt/PublicationDategt
  • ltIdentifiergt badc.nerc.ac.uk/data/mst/v3/upd150320
    06lt/Identifiergt
  • ltFeaturegtltFeatureTypegthttp//featuretype.registry/
    verticalProfilelt/FeatureTypegtltLocalIDgt200409031205
    lt/LocalIDgtlt/Featuregt
  • ltAccessDategt Sep 21 2006 lt/AccessDategt
  • ltAvailableAtgtlturlgthttp//badc.nerc.ac.uk/data/mst/
    v3/lt/urlgtlt/AvailableAtgt
  • lt/Citationgt
  • (Made up tags!)
  • Bryan Lawrence Weblog

27
CLADDIER 2 Version of record
  • Role of Publisher add value
  • provision of catalogue metadata
  • some commitment to maintenance of the resource at
    the AvailableAt url
  • some commitment to the resource being conformant
    to the description of the Feature
  • some commitment to the maintenance of the mapping
    between the identifier LocalID and the
    resource.
  • Bryan Lawrence Weblog

28
CLADDIER 3 persistence
  • Wayback Machine
  • Only snapshots (eg only 2004 version of Bryans
    home page!)
  • WebCite
  • allows the creater of content to submit URLs for
    archiving, thus ensuring when one writes an
    academic document, the material will be archived,
    and the citation will be persistent
  • But no real help for data
  • only allow data citation when we believe in
    the persistence of the organisation making the
    data available
  • Bryan Lawrence Weblog

29
(No Transcript)
30
Citation
  • Needs a stable resource to cite

OWL Web Ontology Language Reference W3C
Proposed Recommendation 15 December 2003 This
version http//www.w3.org/TR/2003/PR-owl-ref-2003
1215/ Latest version http//www.w3.org/TR/owl-ref
/ Previous version http//www.w3.org/TR/2003/CR-o
wl-ref-2003081
  • (FRBR works expressions?)

31
Citation
  • The date alone (as in common web citation
    approaches) is not enough!
  • Cited object likely to have changed
  • Citation should link to the cited object as it
    was!
  • 6 The CIA World Factbook.
  • www.cia.gov/cia/publications/factbook/.
  • Retrieved on 8 Jan 2006.

32
Citation needs
  • An efficient way to reference and access
    archived past states of a changing dataset
    (work in progress, Buneman et al)
  • Not important for original observations
  • Dont mess with those data
  • Less important for incremental datasets
  • Later stuff should not invalidate earlier
  • Very important for revisable datasets
  • Eg Genomics datasets that result from the
    combined work of curators, or contain opinions or
    facts likely to change
  • Eg Mapping OS maps represent a huge database
    that changes on a daily basis

33
XMLArch System Architecture
Pre-processor
Version Merger
  • Carwyn Edwards

34
Who are the curation players?
35
Curation Individual
  • Small science 2-3 times more data than Big
    science, but much more at risk
  • PhD student? RA? PI? Administrator? IT support?
  • Data potentially on local hard drives, or at best
    shared network drives
  • May be inadequately protected
  • Liable for policy-led deletion on resignation
  • Individual knows too much
  • Documentation/metadata unlikely to be adequate
  • Tomorrow gone!

36
Department eCrystals
  • Specialist department archive ( national
    service)
  • Workflow recording of lab parameters (R4L)
  • Public private elements
  • Trying to build eCrystals federation (eBank 3)
  • But ReciprocalNet? French COD efforts?
    Fragmented discipline!
  • Tomorrow likely to continue

37
Institution Cambridge Chemistry
  • 175,000 small molecule structures in CML
  • Alongside Archaeology, Manuscripts, Learning
    Materials, etc
  • No library curation skills dependent on research
    group enthusiast
  • Collection isolated from other Chemistry
  • Tomorrow assured

38
Community CDL
  • Shared effort from group of institutions
  • Comparison OhioLink?
  • Document tradition, not data
  • Passive role re collections
  • Rely on departmental domain expertise
  • Tomorrow assured

39
Community SDSC?
  • Data specialists
  • Multiple disciplines
  • Distinct from domains curation dependent on
    external expertise
  • Research ethos
  • Tomorrow dependent on grant/contract income
    research priorities

40
Community LOCKSS?
  • Self-selected group of collectors closest to
    genuine open activity (despite Alliance)?
  • Traditionally libraries collecting eJournals
  • Model respects IPR
  • No domain expertise rely on origins
  • Data limitations
  • Tomorrow potentially very persistent (low cost,
    high reliability, attack resistance, distributed)

41
Discipline Archaeology
  • Staffed by archaeologist curators
  • Understand special legal issues
  • Strong relationship with community peers
  • Internationally still fragmented?
  • Tomorrow dependent on research council grants
    deposit funding

42
Discipline Astronomy
  • Part of major international effort
  • Expensive shared facilities, global reach
  • Well integrated into community
  • Enable new science
  • Tomorrow assured by community (another large
    facility)

43
Discipline Atmosphere
  • Strong believer in need for domain scientists as
    curators
  • Significant participant in community proxy
    agenda-setting activities
  • Internationally fragmented resources
  • Tomorrow mostly dependent on grant funding (but
    strong commitment)

44
Discipline Pharmacology
  • International Scientific Union
  • Attempting to build credit for data contributions
  • DB ownership rotates
  • Tomorrow extremely limited funding

45
Discipline Social Sciences
  • Mature!
  • Staffed by Social Science curators
  • Alert to opportunities
  • Able to appraise material offered
  • Strong relationship to discipline
  • Tomorrow assured through broad mix of funding
    streams

46
Publisher Crystallography
  • Publisher and Scientific Union
  • Created key domain crystallographic standard
    (CIF)
  • Strong motivator for deposit of structure data
  • Consistent quality checks
  • DOIs used for structure data
  • Tomorrow publishing business model
  • Slide from IUCr

47
National bodies British Library
  • Serious and robust approach
  • Legal deposit powers responsibilities as driver
  • Oriented primarily towards cultural heritage
    (broadly interpreted)
  • Little data, no science domain experience
  • Tomorrow strong future commitment

48
National bodies TNA/NDAD
  • Specialist archive for government datasets
  • Understand government regulations, dynamics
    requirements
  • Subject generalists disconnected from associated
    science
  • Technology specialists (understand databases)
  • Tomorrow likely to pass eventually to The
    National Archives

49
National bodies NOAA (etc)
  • Government body making serious data available
  • Domain scientists curate data
  • Operates in current political context (!)
  • Tomorrow reasonably assured but some un-funded
    mandates?

50
3rd parties OCLC?
  • Should this be community?
  • Demand driven
  • No domain science expertise rely on origins
  • Tomorrow business case

51
3rd parties Portico
  • Specific area eJournals
  • Depends on publisher agreements
  • No data or domain science expertise
  • Tomorrow commitment from Mellon publishers
    subscriptions, good funding mix

52
3rd Parties Iron Mountain
  • Records management IS a curation problem
  • Organisations like this very likely to branch out
  • No domain science expertise
  • Tomorrow business case, viability, stock market

53
Institutions the network
  • Institutions have some fundamental sustainability
  • Disciplines live in the network sustainability
    is an issue
  • Can we get the best of both?

54
Intersections
Institution 1 Institution 2 Institution 3 etc
Discipline 1 X X
Discipline 2 X X
Discipline 3 X X
etc
55
Who are the curation players again?
56
Project StORe findings
  • Discipline commonality from survey (Miller, UKDA,
    2006)
  • 2-way links between data publication useful
  • Barriers to actual deposit of data/outputs
  • Sharing data important, likely between colleagues
  • Perceived inconsistency across repositories
  • Most common searching Google type
  • Researchers favour self-reliance rather than
    library support
  • Recognise need for common minimum metadata
  • Aim for pilot linking middleware demonstrator
  • Creating small scale silos of information with
    institutional repositories is not a compelling
    information management strategy in the Google
    age (Heery Anderson for JISC, 2005)

57
Sustainability tomorrow is the emerging worry
  • Sustainability work package in DCC (new grant!)
  • JISC/NDIIPP meeting addressed it
  • AHRC report draft soon
  • Research Information Network report draft
  • JISC study on sustainable IT systems for HE
  • Recent ARL/NSF workshop, NSF strategy

58
Sustainability of what?
  • Repository as an organisation
  • Repository as a service
  • Repository as a system
  • Repositories as a network (federation?)
  • Collections and objects supported by repositories
  • Commit to collection contract the manager!

59
Social factors
  • Commitment essential much more than anything
    else (cf persistent identifiers)
  • Funder requirements express social determination
  • Policy grant application forms, selection
    criteria
  • Monitoring essential
  • Legal, ethical, IPR impacts all significant
  • Public good questions
  • Academic credit (citations?)
  • Free-loaders (embargos?)
  • Disciplines are different!
  • Workforce skills researcher, data
    librarian/scientist

60
Sustainability a function of...
  • Commitment
  • Goals
  • Value and cost
  • Business model
  • Time
  • Environment
  • Domain knowledge and information
  • Dimensions (how much stuff)
  • Technical approaches
  • Usage

61
So, tomorrow
  • Digital data repositories already sustained gt 30
    years
  • How?
  • Vision, leadership, commitment
  • Libraries, archives, museums sustained 100s of
    years
  • How?
  • Aggregate value proposition
  • Perception now under threat!
  • Collectively we need to identify the next steps
    toward digital data sustainability, for tomorrow,
    and tomorrow, and tomorrow!

62
Macbeth again
  • "To-morrow, and to-morrow, and to-morrow,
  • Creeps in this petty pace from day to day,
  • To the last syllable of recorded time
  • it is a tale
  • Told by an idiot, full of sound and fury,
  • Signifying nothing."

63
Mission (impossible?)
  • To that last syllable of recorded time
  • Keep our tales forever full of significance!
  • Thank you
Write a Comment
User Comments (0)
About PowerShow.com