The Role of Libraries in Data Curation - PowerPoint PPT Presentation

About This Presentation

Title:

The Role of Libraries in Data Curation

Description:

Use this 'light' template when projecting presentations in well lit rooms. – PowerPoint PPT presentation

Number of Views:103

Avg rating:3.0/5.0

Slides: 49

Provided by: robins197

Learn more at: https://www.oclc.org

Category:

more less

Transcript and Presenter's Notes

Title: The Role of Libraries in Data Curation

1
The Role of Libraries in Data Curation

Or How do we even get started?
This link icon above automatically shows the
looping slides
John MacColl European Director, RLG
Partnership 9 June 2010
2
What I want to talk about

The importance of data
Institutional vs domain solutions
Skills needs
Our project
Reward structures

3
The importance of data
4
Its the data, stupid

astronomers are just as likely to point a
software query tool at a digital sky survey as to
point a telescope at the stars (The Economist,
Feb 2010)
It's like the invention of the telescope,"
Franco Moretti, a Stanford professor of English
and comparative literature, says of Google Books.
"All of a sudden, an enormous amount of matter
becomes visible. (The Chronicle, The humanities
go Google, May 28 2010)

5
DataVerse (Gary King, 2007)

Data sometimes exist on individual researchers
Web sites, without professional backups, off-site
replication, plans for format conversion and
migration, or professional cataloging.

6
Pious hopes (Carole Palmer)

60 archive generated or collected data (no
offsite backup)
61 expect to keep more than 10 years

7
Data lost, and data never born (U Wisconsin
Summary Report of the Research Data Management
Study Group (2009))

In some cases, inadequate storage capacity is
leading to loss of data forcing some researchers
to discard data from past experiments in order to
make room for current ones or to avoid certain
types of experiments and research altogether

8
Data and their uses
Freely available
Locked away
Embargoed
Shared with collaborators
Secondary artifacts statistical and pattern
analyses subset extractions visualisations
simulations discovery environments
transformations
Primary data sensory, numeric, digitised,
geospatial, etc
Ancillary data questionnaires, fieldnotes, lab
notebooks, data dictionaries, annotations,
lecture notes, etc
9
Dont try this at home?
10
Institutional vs domain solutions
11
Blue Ribbon Task Force on Sustainable Digital
Preservation and Access on aggregation

Creating economies of scale among archives when
possible is always desirable, and may be critical
when the materials under stewardship require
particular kinds of expertise that are scarce.
This is the case for much scientific data.

12
Qualified gravitational pull (Green and Gutmann)

Most institutional repositories do not and
cannot offer support for managing dataset formats
over time Policies for long-term stewardship
vary among institutions, but many have developed
a sliding scale of preservation promises

13
Oxford University Research data management
services findings of the consultation with
service providers (September 2008)
14
Cornell DataStaR a staging repository
15
Datasets in Cornell IR
16
Monash approach (institutional) (Treloar)
17
U Wisconsin proposal

Solutions comprised solely of expensive
technology will fail, because of the underlying
need to establish long-lasting cultural stability
within and between the research, library, and IT
communities on campus.

18
Curation responsibilities (Carlson, The
Chronicle, 2006)
Data from Big Science is easier to handle,
understand and archive. Small Science is
horribly heterogeneous and far more vast. In time
Small Science will generate 2-3 times more data
than Big Science.
big science data
domain?
institution?
small science data
19
Experiments failures

NSF DataNet Data Conservancy project. 20m
awarded. Led by JHU. Includes social sciences.
U. Va. Mellon grant 870k. Programmers and
archivists. Includes Stanford, Yale and Hull. To
create a model for digital collection management
that can be easily shared among research
libraries.
UKRDS ?

20
Meanwhile
21
Specialist data archives
22
Skills needs
23
Is this possible (Gabridge)?

libraries can develop existing liaisons with
interest, passion, and strong analytical skills
or they can recruit domain experts, and teach
them about excellent information science
practices.

24
ARL study Scott Brandt
25
Our project
26
Joint OCLC Research-LIBER

Binghamton
Brigham Young
Cambridge
Leeds
Melbourne
Nijmegen
Oxford

27
Deliverables

Desk research
Case studies
Interviews with researchers
Report and recommendations

28
Project Aim

It has been frequently asserted in the
literature on data curation that there are new
service roles for research libraries emerging.
This project will seek to test this hypothesis by
considering the data curation requirements of a
number of recently completed research projects in
a sample group of North American and European
universities

29
Method

Each university partner will produce two or
three case studies of projects in which data has
been generated, and consider the data curation
implications of these The project will conclude
with an assessment of the potential role of the
research library in general in relation to such
datasets, based on the examples of good practice
discovered via the case studies.

30
Project Approach

The proposed project will adopt a bottom-up
approach and be grounded in the realities of data
storage and preservation behaviour as exemplified
in a number of real instances

31
Scale again

We consider that the question of how to arrive
at an articulation between the institutional
library and domain or funder data archives is one
of the most urgent requirements in this area, and
the project will explore it carefully.

32
Environments data
33
Timescapes (Leeds)
34
Nyman/Jones Archive (Leeds)
35
The Australian Womens Register (Melbourne)
36
Life Patterns (Melbourne)
37
Incremental Project (Cambridge)
38
What do we expect?

Not a great deal!
Need to adjust our timescales?
Signs of progress?
Indications of favourable organisational
frameworks?
Indications of favourable policies?
A taking of stock

39
Reward structures
40
Days understatement
41
Being excited about being cited (DataVerse, King)

Articles with accessible data are cited twice
as often as otherwise equivalent articles that do
not provide data access.
Articles in journals with replication policies
that make data available are cited thrice as
frequently as otherwise equivalent articles
without accessible data

42
Library neutrality (Steinhart, 2007)?

There is ample evidence that even when
appropriate data repositories exist for a
particular discipline, researchers often fail to
take full advantage of them This lack of
participation in data sharing and archival
activities suggests an opportunity for academic
libraries to provide a much-needed service

43
Thinning the library

No longer just about capture of outputs at the
endpoint
The library has to be involved in the whole
process of research and scholarship, throughout
its lifecycle
This involves thinning out the library
Rethinking the point of engagement
The library becomes engineering
and people

44
Ten Questions to Begin a Conversation With Your
Faculty About Data Curation (Witt Carlson)

What is the story of your data?
What form and format are the data in?
What is the expected lifespan of your data?
How could your data be used, reused, and
repurposed?
How large is your dataset, and what is its rate
of growth?
Who are potential audiences for your data?
Who owns the data?
Does the dataset include any sensitive
information?
What publications or discoveries have resulted
from the data?
How should the data be made accessible?

45
Repositories at present are the wrong model
(Green and Guttman)

repositories position themselves at or near the
end of the scientific research life cycle. Their
goal is less to partner with researchers or with
domain-specific repositories throughout the
research life cycle than to garner the value of
the institutions productivity

46
Appraisal (Cornell)

The archivist can no longer wait passively at
the end of the life cycle for records to arrive
at the archives when their creators no longer
wanted them or were dead (Cook 2000).

47
Discussion!