The future of the DCC - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

The future of the DCC

Description:

... under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 ... send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, ... – PowerPoint PPT presentation

Number of Views:233
Avg rating:3.0/5.0
Slides: 21
Provided by: digit3
Category:
Tags: dcc | commons | future

less

Transcript and Presenter's Notes

Title: The future of the DCC


1
The future of the DCC
  • Chris Rusbridge
  • E-Science Workshop April 2009

2
Contents
  • Curation integrated science
  • Poetry Philosophy of D H Rumsfeld
  • Designated Community Knowledge Base
  • DCC services
  • Future of the DCC

3
Curation
  • Wikipedia
  • Curator a content specialist responsible for an
    institution's collections and, together with a
    publications specialist, their associated
    collections catalogs.
  • Digital Curation the curation, preservation,
    maintenance, collection and archiving of digital
    assets
  • Sheer curation an approach to digital curation
    where curation activities are quietly integrated
    into the normal work flow of those creating and
    managing data and other digital assets.
  • DCC Digital curation is maintaining and adding
    value to a trusted body of digital information
    for current and future use.

4
Integrated Science
  • The application of multiple scientific
    disciplines to one or more core scientific
    challenges
  • Examples of integrated sciences?
  • Archaeology
  • Environmental sciences

5
Integrated Science implications
  • Scientists will be using unfamiliar data,
    therefore
  • Data curators and managers must make their data
    available for unfamiliar users!
  • And now for something unfamiliar?

6
Poetry Philosophy of D H Rumsfeld
  • Hart Seely, April 2, 2003,
  • SLATE http//www.slate.com/id/2081042/

7
A Confession
  • Once in a while,
  • I'm standing here, doing something.
  • And I think,
  • "What in the world am I doing here?"
  • It's a big surprise.
  • May 16, 2001, interview with the New York Times

8
Clarity
  • I think what you'll find,
  • I think what you'll find is,
  • Whatever it is we do substantively,
  • There will be near-perfect clarity
  • As to what it is.
  • And it will be known,
  • And it will be known to the Congress,
  • And it will be known to you,
  • Probably before we decide it,
  • But it will be known.
  • Feb. 28, 2003, Department of Defense briefing

9
The Unknown
  • As we know,
  • There are known knowns.
  • There are things we know we know.
  • We also know
  • There are known unknowns.
  • That is to say
  • We know there are some things
  • We do not know.
  • But there are also unknown unknowns,
  • The ones we don't know
  • We don't know.
  • Feb. 12, 2002, Department of Defense news
    briefing

10
The 4th Rumsfeld?
  • 3 epistemological classes (???)
  • Known knowns
  • Known unknowns
  • Unknown unknowns
  • 4th class?
  • Uknown knowns?
  • Critical issue for cross-disciplinary sciences

11
Some OAIS Concepts?
  • Knowledge Base allows a consumer to understand
    something
  • Designated Community the set of consumers for
    whom the archive curates something
  • Representation Information helps you interpret a
    data object yielding an information object
  • The amount and nature of RepInfo required is
    dependent on the Knowledge Base of the Designated
    Community
  • If you curate for project colleagues in the short
    term, little if any RepInfo required
  • If you curate for those unfamiliar with the data,
    more RepInfo is needed
  • (All broadly interpreted!)
  • CCSDS (2002). Reference Model for an Open
    Archival Information System (OAIS).
  • Retrieved. from http//public.ccsds.org/publicatio
    ns/archive/650x0b1.pdf.

12
Time
  • KB is f1(DC, t)
  • DC is f2(t)
  • RepInfo needed is f3(f1(DC, t), f2(t))
  • (but none of these concepts can be precisely
    defined!)
  • If DC is small and t is short (months to year or
    so), then both may be ignored, and RepInfo be
    assumed part of the KB
  • If DC is extensive (eg cross-discipline) and t is
    long (5 years to 25 plus), then RepInfo must be
    articulated
  • If t is very long, most bets are off (post-hoc
    reconstruction likely to be needed)

13
What might RepInfo include
  • Structure information file format definitions,
    etc
  • Semantic information data dictionaries, code
    books etc
  • Robust methods (working code?)
  • Not to mention many kinds of metadata,
    provenance, documentation of hidden assumptions,
    etc
  • Cross-domain schemas one approach to articulating
    RepInfo?
  • (Never perfect, of course)

14
What about Rumsfeld 4?
  • Biggest concern with unfamiliar user is clashing
    concepts, eg different baselines, units,
    geographies, granularity
  • Especially where terms are ambiguous or
    differently interpreted
  • The KBs of two DCs conflict, potentially silently
  • Happens all the time, of course
  • The unspoken tacit knowledge, unknown knowns!

15
Timing
  • Curation starts before creation
  • Before project proposal!
  • Data acquisition should not happen at the end
  • Continuous acquisition much better?
  • Enforcement or credit for data?

16
Other curation issues of concern
  • Sustainability (work on your survival)
  • Succession (what happens to your data if you
    dont)
  • Data audit (know what youve got)
  • Data risk assessment (assess your chances of
    loss)
  • Repository external audit???
  • Provenance computational lineage
  • Archiving database changes
  • Community proxy roles help your communities
    develop data standards data practices
  • DCC has tools support for some of these

17
and Research Outputs?
  • Need more semantically aware texts to support
    cross-community understanding
  • Coded up (cf microformats, RDFa)
  • People
  • Citations references
  • Science features (eg chemicals, reactions)
  • Graphs, spectra, tables linking to
  • Supplementary data
  • PDF is pretty bad at this

18
DCC Phase 3
  • Post January 2010?
  • Smaller (2/3 budget if were lucky)
  • Joint planning with JISC
  • More tightly managed (hub and spoke)
  • No development (says JISC)
  • Core services plus optional additional services
  • 1st draft seen by JSR
  • Feedback session next week

19
Proposed core services
  • Reference Resources and Exemplars
  • Training and Staff Development
  • Expertise, Advice, Consultancy and Hands-on
    Support
  • Community-building and Information-sharing
    activities
  • Data Management and Sharing Plans
  • Policy and Strategic Development
  • Providing Access to Tools and Toolkits

20
Possible additional services
  • Development of Tools, Toolkits, Wizards and
    Templates
  • Infrastructure Services
  • Model licences for data
  • Data citation guidelines

21
What do you want from the DCC?
Write a Comment
User Comments (0)
About PowerShow.com