The Role of Libraries in Data Curation - PowerPoint PPT Presentation

About This Presentation
Title:

The Role of Libraries in Data Curation

Description:

Use this 'light' template when projecting presentations in well lit rooms. – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 49
Provided by: robins197
Learn more at: https://www.oclc.org
Category:

less

Transcript and Presenter's Notes

Title: The Role of Libraries in Data Curation


1
The Role of Libraries in Data Curation

Or How do we even get started?
This link icon above automatically shows the
looping slides
John MacColl European Director, RLG
Partnership 9 June 2010
2
What I want to talk about
  • The importance of data
  • Institutional vs domain solutions
  • Skills needs
  • Our project
  • Reward structures

3
The importance of data
4
Its the data, stupid
  • astronomers are just as likely to point a
    software query tool at a digital sky survey as to
    point a telescope at the stars (The Economist,
    Feb 2010)
  • It's like the invention of the telescope,"
    Franco Moretti, a Stanford professor of English
    and comparative literature, says of Google Books.
    "All of a sudden, an enormous amount of matter
    becomes visible. (The Chronicle, The humanities
    go Google, May 28 2010)

5
DataVerse (Gary King, 2007)
  • Data sometimes exist on individual researchers
    Web sites, without professional backups, off-site
    replication, plans for format conversion and
    migration, or professional cataloging.

6
Pious hopes (Carole Palmer)
  • 60 archive generated or collected data (no
    offsite backup)
  • 61 expect to keep more than 10 years

7
Data lost, and data never born (U Wisconsin
Summary Report of the Research Data Management
Study Group (2009))
  • In some cases, inadequate storage capacity is
    leading to loss of data forcing some researchers
    to discard data from past experiments in order to
    make room for current ones or to avoid certain
    types of experiments and research altogether

8
Data and their uses
Freely available
Locked away
Embargoed
Shared with collaborators
Secondary artifacts statistical and pattern
analyses subset extractions visualisations
simulations discovery environments
transformations
Primary data sensory, numeric, digitised,
geospatial, etc
Ancillary data questionnaires, fieldnotes, lab
notebooks, data dictionaries, annotations,
lecture notes, etc
9
Dont try this at home?
10
Institutional vs domain solutions
11
Blue Ribbon Task Force on Sustainable Digital
Preservation and Access on aggregation
  • Creating economies of scale among archives when
    possible is always desirable, and may be critical
    when the materials under stewardship require
    particular kinds of expertise that are scarce.
    This is the case for much scientific data.

12
Qualified gravitational pull (Green and Gutmann)
  • Most institutional repositories do not and
    cannot offer support for managing dataset formats
    over time Policies for long-term stewardship
    vary among institutions, but many have developed
    a sliding scale of preservation promises

13
Oxford University Research data management
services findings of the consultation with
service providers (September 2008)
14
Cornell DataStaR a staging repository
15
Datasets in Cornell IR
16
Monash approach (institutional) (Treloar)
17
U Wisconsin proposal
  • Solutions comprised solely of expensive
    technology will fail, because of the underlying
    need to establish long-lasting cultural stability
    within and between the research, library, and IT
    communities on campus.

18
Curation responsibilities (Carlson, The
Chronicle, 2006)
Data from Big Science is easier to handle,
understand and archive. Small Science is
horribly heterogeneous and far more vast. In time
Small Science will generate 2-3 times more data
than Big Science.
big science data
domain?
institution?
small science data
19
Experiments failures
  • NSF DataNet Data Conservancy project. 20m
    awarded. Led by JHU. Includes social sciences.
  • U. Va. Mellon grant 870k. Programmers and
    archivists. Includes Stanford, Yale and Hull. To
    create a model for digital collection management
    that can be easily shared among research
    libraries.
  • UKRDS ?

20
Meanwhile
21
Specialist data archives
22
Skills needs
23
Is this possible (Gabridge)?
  • libraries can develop existing liaisons with
    interest, passion, and strong analytical skills
    or they can recruit domain experts, and teach
    them about excellent information science
    practices.

24
ARL study Scott Brandt
25
Our project
26
Joint OCLC Research-LIBER
  • Binghamton
  • Brigham Young
  • Cambridge
  • Leeds
  • Melbourne
  • Nijmegen
  • Oxford

27
Deliverables
  • Desk research
  • Case studies
  • Interviews with researchers
  • Report and recommendations

28
Project Aim
  • It has been frequently asserted in the
    literature on data curation that there are new
    service roles for research libraries emerging.
    This project will seek to test this hypothesis by
    considering the data curation requirements of a
    number of recently completed research projects in
    a sample group of North American and European
    universities

29
Method
  • Each university partner will produce two or
    three case studies of projects in which data has
    been generated, and consider the data curation
    implications of these The project will conclude
    with an assessment of the potential role of the
    research library in general in relation to such
    datasets, based on the examples of good practice
    discovered via the case studies.

30
Project Approach
  • The proposed project will adopt a bottom-up
    approach and be grounded in the realities of data
    storage and preservation behaviour as exemplified
    in a number of real instances

31
Scale again
  • We consider that the question of how to arrive
    at an articulation between the institutional
    library and domain or funder data archives is one
    of the most urgent requirements in this area, and
    the project will explore it carefully.

32
Environments data
33
Timescapes (Leeds)
34
Nyman/Jones Archive (Leeds)
35
The Australian Womens Register (Melbourne)
36
Life Patterns (Melbourne)
37
Incremental Project (Cambridge)
38
What do we expect?
  • Not a great deal!
  • Need to adjust our timescales?
  • Signs of progress?
  • Indications of favourable organisational
    frameworks?
  • Indications of favourable policies?
  • A taking of stock

39
Reward structures
40
Days understatement
41
Being excited about being cited (DataVerse, King)
  • Articles with accessible data are cited twice
    as often as otherwise equivalent articles that do
    not provide data access.
  • Articles in journals with replication policies
    that make data available are cited thrice as
    frequently as otherwise equivalent articles
    without accessible data

42
Library neutrality (Steinhart, 2007)?
  • There is ample evidence that even when
    appropriate data repositories exist for a
    particular discipline, researchers often fail to
    take full advantage of them This lack of
    participation in data sharing and archival
    activities suggests an opportunity for academic
    libraries to provide a much-needed service

43
Thinning the library
  • No longer just about capture of outputs at the
    endpoint
  • The library has to be involved in the whole
    process of research and scholarship, throughout
    its lifecycle
  • This involves thinning out the library
  • Rethinking the point of engagement
  • The library becomes engineering
  • and people

44
Ten Questions to Begin a Conversation With Your
Faculty About Data Curation (Witt Carlson)
  1. What is the story of your data?
  2. What form and format are the data in?
  3. What is the expected lifespan of your data?
  4. How could your data be used, reused, and
    repurposed?
  5. How large is your dataset, and what is its rate
    of growth?
  6. Who are potential audiences for your data?
  7. Who owns the data?
  8. Does the dataset include any sensitive
    information?
  9. What publications or discoveries have resulted
    from the data?
  10. How should the data be made accessible?

45
Repositories at present are the wrong model
(Green and Guttman)
  • repositories position themselves at or near the
    end of the scientific research life cycle. Their
    goal is less to partner with researchers or with
    domain-specific repositories throughout the
    research life cycle than to garner the value of
    the institutions productivity

46
Appraisal (Cornell)
  • The archivist can no longer wait passively at
    the end of the life cycle for records to arrive
    at the archives when their creators no longer
    wanted them or were dead (Cook 2000).

47
Discussion!
  • John MacColl

48
Next up
  • Lunch and then
  • 100
  • Framing Libraries and the Environment
  • Lorcan Dempsey, OCLC Research
  • Buckingham
Write a Comment
User Comments (0)
About PowerShow.com