Attitudes and aspirations in a diverse world - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Attitudes and aspirations in a diverse world

Description:

Attitudes and aspirations in a diverse world. The Project StORe perspective on ... Attitudes and aspirations. Research data. Repositories. Metadata. Ownership ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 31
Provided by: graham53
Category:

less

Transcript and Presenter's Notes

Title: Attitudes and aspirations in a diverse world


1
Attitudes and aspirations in a diverse world
  • The Project StORe perspective on scientific
    repositories
  • Graham Pryor 22nd November 2006

Digital Data Curation in Practice - 2nd
International Digital Curation Conference,
Glasgow 21-22 November 2006
2
StORe Guide
  • Whats in StORe?
  • Curation and preservation issues
  • Attitudes and aspirations
  • Research data
  • Repositories
  • Metadata
  • Ownership and support
  • Too huge to handle?

3
(No Transcript)
4
Digital Data Curation
  • Definition the actions needed to maintain
    digital data and other digital materials over
    their entire life-cycle and indefinitely for
    current and future generations of users. These
    actions include not only the processes of digital
    archiving and preservation but also
  • all of the processes that are essential to good
    data creation and management, as well as the
    capacity to add value to data to generate new
    sources of information and knowledge.

5
Whats in StORe? Aims 1
  • Attach new value to the intellectual products of
    academic research by providing two-way links
    between source and output repositories

6
Whats in StORe Aims 2
  • Surveys to identify workflows and norms, problems
    and desirable enhancements to source/output
    repositories
  • A generic technical specification for functional
    enhancements to source and output repositories
  • Pilot middleware that demonstrates a
    bi-directional link
  • Independent evaluations of the pilot middleware
    and recommendations for future development as a
    generic platform for linking repositories

7
Whats in StORe? Survey
  • Genuine desire to contribute to the wealth of
    knowledge
  • Awareness of the critical need to assign and
    maintain appropriate metadata
  • Dual deposit of data and publications already an
    accepted concept
  • International strategies for data deposit and
    data preservation

8
Whats in StORe? Survey
  • Cultural and organisational barriers to deposit
    of research data in repositories
  • Inherent culture of self-sufficiency in the
    generation and organisation of data
  • Limited inclination towards voluntary deposit in
    open access source repositories
  • Institutional output repositories not on the
    agenda of most researchers

9
Research Data
  • Features of source data
  • Often large and complex
  • Can be impenetrable without local tools
  • May seem ambiguous to project outsiders
  • Are frequently held on standalone equipment
  • Commonly comprise several data formats
  • From the StORe survey
  • Physics raw data sets as large as petabytes
    (1015 bytes) may be generated or analysed using
    software written within the project
  • Biosciences need to describe how data were
    produced, the laboratory conditions and
    methodology
  • 70 of bioscience source data are not networked
  • Chemists data stored in numerous sub-folders
    (spectra, images, etc.) describing one process

10
Research Data
  • ..it would have to be everything associated
    with that compound. There is no point having an
    NMR without a picture of what it is. Then its
    useful to have a synthesis scenario and say oh
    that could fit with that but I want proof and
    then that really is a paper. You know you can
    waste a lot of time trying to follow what people
    have done before that isnt properly published
    and never have worth. Its not always, but is it
    worth the risk of wasting too much of your time?
  • Chemistry
  • data sets
  • links
  • between
  • complex
  • clusters

11
Research Data
  • Physics data types

12
Research Data
  • Archaeology
  • file types

13
Repositories
  • Source repository development is discipline-led
  • Large number of established services
  • - we suggested Archaeology Data Service,
    Brookhaven National Laboratories, CERN, GenBank,
    National Crystallography Service, NERC Data
    Centres, Protein Structures Database,
    SuperCOSMOS, UK Data Archive, UniProt
  • - to which were added 99 others
  • Some international strategies/
  • - Astronomy (Virtual Observatory)

14
Repositories
  • Source repository development is discipline-led
  • Large number of established services
  • - we suggested Archaeology Data Service,
    Brookhaven National Laboratories, CERN, GenBank,
    National Crystallography Service, NERC Data
    Centres, Protein Structures Database,
    SuperCOSMOS, UK Data Archive, UniProt
  • - to which were added 99 others
  • Some international strategies/mandates/
  • - Astronomy (Virtual Observatory)
  • - Biosciences (sequence data)

15
Repositories
  • Source repository development is discipline-led
  • Large number of established services
  • - we suggested Archaeology Data Service,
    Brookhaven National Laboratories, CERN, GenBank,
    National Crystallography Service, NERC Data
    Centres, Protein Structures Database,
    SuperCOSMOS, UK Data Archive, UniProt
  • - to which were added 99 others
  • Some international strategies/mandates/dual
    deposit
  • - Astronomy (Virtual Observatory)
  • - Biosciences (sequence data)
  • - Chemistry (Crystallographic Data Centre)

16
Repositories
  • Low awareness
  • of repositories
  • 65 of the chemists surveyed had not used a
    repository and were not familiar with the idea of
    open access repositories

17
Repositories
  • Low awareness
  • of repositories
  • Low volume of repository use
  • 65 of the chemists surveyed had not used a
    repository and were not familiar with the idea of
    open access repositories
  • Many social scientists did not associate
    repositories with their research agenda

18
Repositories
  • Low awareness
  • of repositories
  • Low volume of repository use
  • 65 of the chemists surveyed had not used a
    repository and were not familiar with the idea of
    open access repositories
  • Many social scientists did not associate
    repositories with their research agenda
  • Repositories are only one of many potential data
    sources/archives used by researchers

19
Repositories
  • Low awareness of repositories
  • Low volume of repository use
  • Low rate of source data deposit

20
Repositories
  • Low awareness of repositories
  • Low volume of repository use
  • Low rate of source data deposit
  • Output repositories
  • prefer publisher over institutional
  • prefer Google type searching

21
Metadata
  • All disciplines an awareness of the importance
    of appropriate metadata
  • Improvements to source repositories? Better
    metadata ranked highest
  • Metadata assignment considered challenging
    intellectually and in the demands on ones time
  • Yet
  • Evidence of lack of standard structures
  • Metadata assignment often almost an afterthought
  • One third of StORe respondents believed no
    metadata were being assigned

22
Metadata assignment
23
Metadata
  • Where researchers are familiar with metadata they
    possess an in-depth knowledge of its use,
    applications and functions
  • The assignment of metadata automatically (or by a
    process that relieves the depositor of doing it)
    is preferred
  • Quote from theoretical chemistry interview
  • Well, theres lots of different types of
    metadata. There is metadata for discovery, there
    is metadata for semantics, there is metadata for
    intellectual property and so on and so forth.
    They are all important. If I find some piece of
    information and its not on open access then I
    cant use it. If I find some piece of metadata
    and its in a language that my machine does not
    understand and there is no metadata, then it is
    uninterpretable, I cannot use it. If I am
    particularly concerned about the quality of data
    I need provenance metadata. So there are
    different needs for different people...

24
Metadata
  • Need for improved and universal standards
    acknowledged
  • A clear link identified between the condition of
    metadata used and the level of support from
    information specialists
  • Recognition of the need for different metadata
    for different phases of research lifecycle (raw,
    processed, published data and beyond) and to
    assist cross-discipline interpretation

25
Ownership Support
  • Working culture self-reliance and a constant
    pressure to deliver
  • Qualified enthusiasm for deposit in source
    repositories producer or consumer
  • Anxiety over predatory access and IPR
  • Storage methods protectionism?
  • Provision of specialist support less a case of
    unavailability as not sought

26
Too Huge to Handle?
  • One of the aspects that the Chemistry
    interviewees commented upon was that there should
    be a wider organisational/institutional
    requirement that supports and manages the
    repositories, should they be source, output or
    institutional.
  • sustainability depends on a business model.
    And its a major problem that confronts everybody
    at the moment in aggregating data, whether it be
    raw data, processed data, metadata, primary
    publications, abstracts, things like that

27
Too Huge to Handle?
  • Embedding of data management expertise within
    domains
  • Expensive?
  • Interventionist?
  • Too large and too difficult?

28
Too Huge to Handle?
  • Embedding of data management expertise within
    domains
  • Expensive?
  • Interventionist?
  • Too large and too difficult?
  • Poor investment decisions can have major
    implications on how much information can be
    preserved, and how effectively
  • Chris Rusbridge, http//www.ariadne.ac.uk/issue
    46/

29
Too Huge to Handle?
  • MRC 1 million data sharing and preservation
    initiative
  • - http//www.mrc.ac.uk/strategy-data_sharing.htm
  • - initial focus on 4 to 6 unique datasets of
    long term value
  • - engage community support longer term business
    plan
  • Virtual Observatory
  • Exploit information management and curation
    experience in the university libraries and build
    on long-term institutional commitments to
    preservation Bob Hanisch
  • http//www.arl.org/sparc/meetings/ala06/HanischPP
    T.pdf

30
END
http//jiscstore.jot.com/SurveyPhase
Write a Comment
User Comments (0)
About PowerShow.com