Liz%20Lyon%20Associate%20Director,%20Outreach - PowerPoint PPT Presentation

About This Presentation
Title:

Liz%20Lyon%20Associate%20Director,%20Outreach

Description:

Liz Lyon Associate Director, Outreach. Chris Rusbridge, DCC ... MRC HGU. Kyoto. USC. INRIA. GSK. Roslin. IBM. Almaden. JHU. CSIRO. Caltech. JHU. CSIRO. CDS. ESO ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 45
Provided by: ukol
Category:

less

Transcript and Presenter's Notes

Title: Liz%20Lyon%20Associate%20Director,%20Outreach


1
Digital Curation Centre
a centre of support for data curation and
preservation
UK Digital Curation Centre One Year On
  • Liz Lyon Associate Director, Outreach
  • Chris Rusbridge, DCC Director

2
Overview
  • Why is digital curation important?
  • What are the challenges that the DCC faces?
  • About the people and our collaborative approach
  • Addressing the issues
  • How can you contribute to the DCC?

3
Curation?
  • maintaining and adding value to a trusted body
    of digital information for current and future
    use

4
Digital curation continuum
For later use? In use now (and the future)?
Static
Dynamic
Data preservation
Data curation
5
Assuring permanent access to the records of
science the humanities?
  • Long term access to primary data
  • Increasing data volumes from eScience and
    Grid-enabled / cyberinfrastructure applications
  • Changing research paradigm data-driven science,
    big science
  • Observational data, simulations, large-scale
    experimentation
  • Multi-media resources, statistical data,
    surveys, geo-spatial data

6
(No Transcript)
7
Facilitate post-processing and knowledge
extraction
  • Enable the acquisition of newly-derived
    information and knowledge
  • Run complex algorithms over primary datasets
  • Mining (data, text, structures)
  • Modelling (economic, climate, mathematical,
    biological)
  • Analysis (statistical, lexical, pattern
    matching, gene)
  • Presentation (visualisation, rendering)

8
(No Transcript)
9
Provide additional functionality beyond digital
preservation processes
  • Annotations
  • Gene and protein sequences
  • e-Lab books (Smart Tea Project in chemistry)

10
Presentation services subject, media-specific,
data, commercial portals
Searching , harvesting, embedding
Resource discovery, linking, embedding
Data creation / capture / gathering laboratory
experiments, Grids, fieldwork, surveys, media
The scholarly knowledge cycle linking research
data to publications eBank UK
Project http//www.ukoln.ac.uk/projects/ebank-uk/
Aggregator services national, commercial
Data analysis, transformation, mining, modelling
Harvestingmetadata
Research e-Science workflows
Repositories institutional,
e-prints, subject, data, learning objects
Deposit / self-archiving
Validation
Validation
Publication
Linking
Emerging policy on open access to data
Data curation databases databanks
Peer-reviewed publications journals, conference
proceedings
11
DCC people (some of them)
  • Management Co-ordination
  • Director Chris Rusbridge (University of
    Edinburgh)
  • Community Support Outreach
  • Led by Dr Liz Lyon (UKOLN, University of Bath)
  • Service Definition Delivery
  • Led by Professor Seamus Ross (HATII ERPANET,
    University of Glasgow)
  • Development
  • Led by Dr David Giaretta (Astronomical Software
    Services, CCLRC)
  • Research
  • Led by Professor Peter Buneman (Informatics,
    University of Edinburgh)

12
The challenges we face
  • Standards
  • Interoperability issues technical hopefully
    soluble
  • Scale
  • Volume and diversity of datasets
  • Culture
  • Bringing communities together
  • Library/information science/archives document
    tradition
  • Domain research (chemists, astronomers,
    biologists)
  • Computer science (databases)
  • Commercial suppliers (storage technology)

13
More challenges
  • Process
  • Highly-distributed organisation use
    collaborative tools
  • Skills
  • Distributed amongst the 4 partners beyond
  • Engagement
  • Lots of existing work and many significant
    players
  • Impact
  • Visible measurable, in the short long-term
  • Meeting expectations (which are high..)
  • Of the community and our funders

14
User requirements analysis
  • Commissioned study
  • Leona Carpenter
  • Reporting now
  • Desk-based research
  • Focus groups
  • Interviews
  • Results will inform research, development service
    definition / delivery and outreach
  • Recommendations and priority tasks

15
Some sound bytes
RD issues Annotation services, Ontology
development, Automating metadata creation, Tools
and toolkits, Data Format Description Language,
Identifiers, Registries, Economic and
cost-benefits studies Advisory services
Ask-a-Curator,FAQs, reports, briefings,
awareness-raising materials, best practice
guidance, Storage media, Like Erpanet, advise
Government, Research Councils, funding
bodies Professional development Short courses,
conferences, seminars, workshops, secondments to
DCC and to working repository services Outreach
Leadership for the future, case studies, sharing
solutions, collaboration with other partners,
international peers, industry links Taxonomy of
Users
16
Outline Taxonomy of digital curation users by role
  • 4. Policy makers
  • funding bodies
  • other leaders

2. Data Curators
1. Data Creators
3. Data Re-users
17
Outline Taxonomy of digital curation users by role
Data Preservers
  • 4. Policy makers
  • funding bodies
  • other leaders

2. Data Curators
Data publishers
1. Data Creators
3. Data Re-users
18
Outline Taxonomy by significant function of
organisational entity
  • Research

4. Funders
3. Learning teaching
5. Policy / strategy makers
2. Service provision
Designated communities
19
Outline Taxonomy by significant function of
organisational entity
  • Research

4. Funders
3. Learning teaching
5. Policy / strategy makers
2. Service provision
Commercial
Designated communities
20
Service definition delivery
  • Advisory services
  • Responses to queriesfrom legal to technical
    guidance HELPDESK_at_dcc.ac.uk
  • Site visits (National Institute of Environmental
    eScience)
  • Information Services
  • Briefing Documents - Freedom of Information by
    Mags McGinley
  • DIGITAL CURATION MANUAL
  • 20 chapters written by community experts e.g.
    Metadata written by Michael Day, UKOLN
  • Peer-reviewed
  • Checklist for Compliance with best practices and
    standards
  • Technology Watch

21
Services workshops
  • 2005 Programme
  • Preservation of medical databases 24-25 May
    at the Gulbenkian Institute, Lisbon in
    collaboration with ERPANET the Wellcome
    Trust
  • Institutional repositories 6 July at the
    University of Cambridge, UK in collaboration with
    DSpace
  • Cost models in collaboration with the Digital
    Preservation Coalition July at British Library
  • Persistent identifiers liaising with NISO,
    summer, UK location tbc

22
Development approach
  • OAIS (Open Archival Information System) linkage
    focus on representation information
  • link to global work on format registries?
  • Concentrate on scientific data formats?
  • Repository
  • Representation Information
  • Standards and Tools
  • Aim for OAIS compliance
  • Persistent identifiers
  • Certification RLG task force
  • Open development wiki and email list

23
OAIS Reference Model Functional Model
How relevant to curation?
24
Representation Net

25
Representation Information More detail
How does this relate to format registries?
26
High Level View
Example of use of Representation Information
Labelling
27
Registry issues?
  • Trusted repository of Representation Information
  • Authenticity of information
  • Access control
  • Certificates/Digests (are they trustable over
    the long term?)
  • Findability
  • Persistent IDs
  • What can we rely on?
  • Labels (to support automated processing)
  • Extensibility
  • Distributed

28
Registry development
  • Simple PHP prototype
  • Scoping study- unification
  • Formats, standards, tools
  • More robust prototype in development
  • Based on ebXML JAXR
  • Potentially distributed, cooperative maintenance
    model

29
Development Roadmap
  • Registry complete prototype, link to PRONOM,
    GDFR etc, handover to service
  • Representation information describe CCLRC
    (science) data using EAST, etc
  • Certification work continues
  • Additional tools metadata extraction
  • Testbeds, interactions with others

30
Research approaches
  • Publishing integrating scientific databases
  • Archiving past states of volatile databases
  • Database provenance and annotation
  • Organisational dynamics of trusted repositories
  • Automating metadata extraction
  • Cost-benefit analysis of data curation
  • Rights and responsibilities

31
The database picture
Curated data classified, cleaned, annotated,
integrated, cross-linked
Source data
32
Curated Databases are Central
  • Much/most scientific data is now in databases
  • They often do not contain source experimental
    data. Sometimes just annotation/metadata
  • They borrow extensively from, and refer to, other
    databases
  • You are now judged by your data as well as your
    (paper) publications!!
  • These databases are built and maintained with a
    great deal of human or computational effort.
  • What makes a database?
  • it has internal structure or it changes.
  • Size alone doesnt qualify

33
Archiving (preserving) volatile databases
  • How do you preserve something that changes every
    hour or minute?
  • Important for the scientific record someone
    might have cited your data at time t.
  • Current practice
  • Create versions (how often?)
  • Log changes
  • Use diffs
  • Do nothing (common!)

34
Curated databases some issues
  • Integrating and publishing data so that someone
    else can use it.
  • Annotating existing data and moving annotations
    to other databases
  • Provenance where did this data come from?
  • Archiving how do you preserve something that is
    constantly changing?

35
How do we cite data?
  • A URL or citation to an article is already
    unsatisfactory.
  • DCC client complaint I spend a lot of time
    searching electronic documents for the part
    that is relevant to the citation.
  • The problem is much worse when you are citing
    something in a very large database.
  • How do you use a citation to locate data?
  • How do you ensure that the citation persists?
  • Connections with DB archiving and DOIs

36
Research approaches
  • Publishing integrating scientific databases
  • Archiving past states of volatile databases
  • Database provenance and annotation
  • Organisational dynamics of trusted repositories
  • Automating metadata extraction
  • Cost-benefit analysis of data curation
  • Rights and responsibilities
  • Public domain, public interest, public funding
    paper Waelde McGinley

37
www.dcc.ac.uk
38
  • www.ijdc.net
  • Launch planned June/July
  • Peer-reviewed contributions
  • Peter Buneman Editor (research)
  • Production editor Philip Hunter

39
Sample issue Full papers Invited articles News
views Papers for submission are very welcome!
40
1st DCC International Conference
  • Location - Bath UK
  • 29-30 September 2005
  • Keynote speakers
  • Cliff Lynch CNI
  • Graham Cameron European Bio-informatics
    Institute
  • DCC Research update
  • Social highlights

41
Associates Network
Goals Develop understanding, share best practice,
advance research, promote recognition, develop
consensus Membership International groups,
national bodies, industry partners, funders,
research groups, HEIs, FEIs, individuals Benefit
s Early access to RD outputs, advisory services,
training, input to definition and design,
community participation Discussion Forum
www.dcc.ac.uk Please join us!
42
CCLRC
UKOLN
DELOS
DPC
DLI (US)
NeSC
UofG
UofE
43
Acknowledgements
  • Slides from Peter Buneman, David Giaretta and
    others used with thanks.

44
How you can help us
  • How does OAIS relate to curation?
  • How do format registries relate to representation
    information?
  • Who else is working across these areas?
  • What outcomes would you like to see?
Write a Comment
User Comments (0)
About PowerShow.com