Adding Value to Data - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Adding Value to Data

Description:

This work is licensed under a Creative Commons Licence. Attribution ... Also receive funding from ... data for years and predate trendy r' word, experts ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 29
Provided by: lizl7
Category:
Tags: adding | data | trendy | value

less

Transcript and Presenter's Notes

Title: Adding Value to Data


1
Data Repositories and JISC Repository
Landscape Mahendra Mahey Repositories Research
Officer, Repositories Research Team,
UKOLN GRADE Project Meeting (all
partners),Edinburgh, 30 October 2006.
UKOLN is supported by
This work is licensed under a Creative Commons
LicenceAttribution-ShareAlike 2.0
2
Data Repositories Landscape
  • Disconnected landscape

Institutions
Data Centre
Data Centre
?
Data Centre
Institutions
Data Centre
?
3
JISC Funds
  • Data Centres
  • MIMAS
  • AHDS
  • UK Data Archive
  • EDINA

Also receive funding from Research Council UK
4
JISC Information Environment Architecture
(Idealised) Technical Infrastructure for
ServicesAndy Powell, 2005
5
Institutional Repositories Holding Research Data
  • Very few around the world are doing this and are
    they up to the job?
  • Versioning
  • Authentication at individual asset level
  • Other methods are being used, informal, ad-hoc,
    lots of data slipping through the net
  • Repositories offer a better way to do this?
    Different Data types lead to problems with
    existing software
  • Data cluster projects
  • E Bank
  • Spectra
  • GRADE
  • CLADDIER
  • ARROW DART
  • The idea of linking papers to underlying data of
    experiments and research is very appealing
    stORe project and Open Access!
  • Can do some (orphaned) but not all, still role
    for data centres

6
Data Centres
  • Have been storing data for years and predate
    trendy r word, experts
  • They can teach institutions many lessons
  • A lot of mystery, suspicion between Data Centres
    and Institutions communication and dialogue
    needed between the two and interdisciplinary
  • Time and money saving?
  • Data centres argue that that subject specific is
    a good thing, rationalising?
  • Storing and Curation has become science in its
    own right, bioinformatics
  • Offer
  • Databases
  • Web access
  • Tools to explore the information
  • Systems to capture the information
  • Service centres
  • Custodianship, acquisition and ownership
  • Depend of good will of community
  • Add value, service and organisation, require lots
    of money to continue

7
Reactome
Data Centre Infrastructure Can be Complex!
EMBL-BankDNA sequences
EnsEMBL Genome Annotation
UniProt Protein Sequences
Array-Express Microarray Expression Data
EMSD Macromolecular Structure Data
IntActProtein Interactions
8
Institutional and Data Centre practice exist
Data analysis, transformation, mining, modelling
Presentation services / portals
Data discovery, linking, citation
Publishers peer-review journals, conference
proceedings, etc
Aggregator services
Publication
Laboratory repository
Deposit
Validation
Institutional data repositories
Search, harvest
Validation
Deposit
9
DRP Projects
  • GRADE
  • R4L
  • SPECTRa
  • CLADDIER
  • stORe
  • eBank

Data Cluster
Meetings
Road Map Required
Briefing Paper
Workshop
Interviews and Surveys
Road Map for Digital Repository / Preservation
Projects Focusing on Data
06/09 Call
10
UKOLN - Data Repositories Research (Consultancy)
  • To define how institutions (collectively and
    individually) and scientific data centres can
    together effectively achieve
  • Preservation
  • Access Managed and Open
  • Reuse Data Citation, Data Mining and
    Reinterpretation
  • To identify the mechanisms, business processes
    and good practice by which these functions can be
    achieved
  • To facilitate dialogue between data centres,
    institutions and other key players and to define
    a collaborative way forward

Dr Liz Lyon
11
Identifying and defining inter-relationships
  • Socio-cultural, organisational, legal
  • Technical interoperability
  • Roles responsibilities
  • Access
  • Preservation
  • Re-use
  • See briefing paper produced for workshop

12
Socio-cultural, organisational, political and
legal issues
  • highly diverse in awareness
  • practice and skills
  • need to understand the full spectrum of research
    practice
  • workflows and associated data flows
  • both within and between disciplines/sub-discipline
    s

13
Hierarchy of Drivers
  • Level 0 deliver project.
  • Level 1 meet good scientific practice.
  • Level 2 support own science.
  • Level 3 employers requirements.
  • Level 4 funders requirements.
  • Level 5 public policy requirements.

Slide from Mark Thorley NERC
14
RC UK - Funding Body
15
Socio-legal conclusions
  • Use a questionnaire and send to data centres,
    disciplines will be different
  • Promote use interoperability through metadata
    standards. Resource discovery standards should be
    promoted developed by learned societies/
    (membership arms) subject communities by
    disciplines (not data curators). Bottom up rather
    than top down. Education recognise very wide
    range of understanding amongst disciplines re
    value of data curation centres/IRs/archives
    need go out and promote why they exist and why
    they should be used. Focus at community.
  • Each research council should have a written
    meaty data policy, disseminated and policed.
  • Legal issues value of JISC legal centre but
    lack clarity and guidance of law where law exists
    re use of digital objects, IP etc need clarity of
    law and guidance on how best to interpret it,
    straightforward answers to straightforward
    questions. Model licences for use,
    interpretation, confidentiality, disclosure.
  • Academics data centres need to be told
    differences between data banks/data centres etc
    and IRs. IRs have not had enough institutional
    buy-in yet.
  • JISC could investigate why subject repositories
    are more successful than IRs. JISC policy should
    reflect what is happening on ground.
  • JISC should help sell IRs better

16
Technical Interoperability
  • Federation models
  • interoperability and inter-relationships between
    repositories

17
Open Access
  • Good thing but
  • But are the tools up to the job
  • OAI PMH
  • Dublin Core
  • Use METS as packaging standard, momentum
    building?
  • Papers not data
  • For data do these map to other Metadata Schema
    developed, extensions to DC?

18
Federation
  • Monolithic solutions fail
  • Aggregation of institutional repositories is
    essential

Data Centres View
19
Technical
  • Need to define what is meant by semantics of
    structured data and publish guidelines at levels
    of metadata, classification/subject areas/factual
    names/agreed conventions layered on top e.g
    identifiers.
  • Application profiles who should be keeper of
    those definitions eg registries who funds and
    owns them ?
  • Scientists concentrate on narrow areas but
    connections are to other wider areas
  • Time series data are different how discover and
    use? More difficult to define discovery metadata
    for time series. Data might not be logically the
    same.
  • Data curation responsibility at institutional
    level/data centre data curation requires
    specialisms and data centres could feed this
    expertise back to institutions need flow of
    expertise from Data Centres to institutions
  • Invitations to work in a data centre for week
    happening in Australia
  • Mixed economy re organisational responsibility is
    inevitable some federation will be there
  • How to express quality role for provenance and
    audit as a means to express quality also ranking
    and annotation
  • Curation of data is of more interest to
    scientists than interoperability as a means of
    marketing/selling it.

20
Roles, Rights Responsibilities
  • Scientist Creation and use of data.
  • Data centre Curation of and access to data.
  • User Use of 3rd party data.
  • Funder Set / react to public policy drivers.
  • Publisher Maintain integrity of the scientific
    record.

From Mark Thorley NERC
21
Roles Responsibilities
  • Individual scientists to deposit data using
    domain standards of an acceptable quality
  • Re-user should acknowledge where data came from
    and if it is appropriate to improve the quality
    of the data.
  • Institution should have policies that mandate
    data deposit in an appropriate place not
    necessarily an IR.
  • Publishers/journals/editors should mandate open
    deposit of data.
  • Curators who collect, describe and connect data,
    idea of community proxy role - define standards
    for domain working, in and with the scientists
  • Funders should enforce their data deposit
    policies where possible
  • Funders should recognise the emerging need for
    new infrastructure and provide appropriate
    funding for this infrastructure and for the
    resulting actions
  • Users and funders should feed back views on the
    data stored to the data centre manager
  • Click use licence says if you enhance the data
    you must give it back, but how to police that
    policy by data centre? Versioning an issue here.
  • Value of good enough versus completely
    comprehensive descriptions (Graham C)
  • Who is responsible for ownership of the data to
    make changes? If multiple versions, not
    necessarily the last one is best
  • Competitive views risk of sabotage of other
    groups work is possible.
  • Who checks provenance of anything new? Curators?

22
Small Science vs Big Science
Data from Big Science is easier to handle,
understand and archive. Small Science is horribly
heterogeneous and far more vast. In time Small
Science will generate 2-3 times more data than
Big Science. Lost in a Sea of Science Data
S.Carlson, The Chronicle of Higher Education
(23/06/2006)
23
Dataset publishing
  • Re examine concept of Dataset Publishing
    (Callahan, Johnson, and Shelley 1996)
  • analogous to publishing papers
  • rewards for publishing datasets (e.g. promotion,
    RAE)
  • procedures (e.g. standards to use, peer review)
    resources to manage procedures
  • Should minimise time and effort required
  • need tools to assist in creation, maintenance and
    dissemination of dataset descriptions
  • Means of putting into a public/community
  • Deposit and Share are too cosy
  • to publicate, to issue
  • Terms of access and use
  • Open?
  • Privilege of membership
  • Payment of money

Taken from Peter Burnhill
24
Spatial is Special
  • Why?
  • GEO research data not deposited, Lots of data
    slipping through nets, not falling under RC
    remit, Data being lost, shared informally, may be
    case for national repository?
  • Fears about legality of resources, e.g. OS data,
    researchers really want to share in a big way
  • Should data be deposited in Data Centres?
  • Academics not comfortable about sharing on larger
    scale?
  • IRs not geared up to handle data?
  • DSPace not allow edit of Metadata
  • Problem with ISO Standard used for Geo data ISO
    19115 and DC
  • Mapping done, further work needed, from wing
    mirror to Smart Car?

25
Responsibility of Data Providers
  • Responsibility of publically funded research to
    share data
  • Free our Data Guardian work
  • INSPIRE work

26
GRADEs input
  • Important that GRADE inputs into this work as it
    will set direction of research and focus on
    GEOSPATIAL DATA Repository work
  • Interviews held with Rebecca and David

27
DRP Projects
  • GRADE
  • R4L
  • SPECTRa
  • CLADDIER
  • stORe
  • eBank

Data Cluster
Meetings
Road Map Required
Briefing Paper
Workshop
Interviews and Surveys
Road Map for Digital Repository / Preservation
Projects Focusing on Data
06/09 Call
28
We need your input!
l.lyon_at_ukoln.ac.uk m.mahey_at_ukoln.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com