DCCPersistent Identifiers for Representation Information - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

DCCPersistent Identifiers for Representation Information

Description:

DCCPersistent Identifiers for Representation Information – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 41
Provided by: davidgi9
Category:

less

Transcript and Presenter's Notes

Title: DCCPersistent Identifiers for Representation Information


1
DCCPersistent Identifiers for Representation
Information
Digital Curation Centre
a centre of expertise in data curation and
preservation
  • D Giaretta
  • http//www.dcc.ac.uk
  • http//dev.dcc.ac.uk

Funders
2
Outline
  • DCC Development work
  • Beginning with OAIS Reference Model
  • Motivation for use of Persistent IDs
  • Simple case!
  • Discussion of some possible Persistent ID systems
  • Conclusions

3
OAIS Reminder
  • OAIS is a standard about the long-term
    preservation of information
  • An Information Objects is made up of a Data
    Object plus its accompanying Representation
    Information (RepInfo)

4
Information Objects
5
Representation Information
  • The Data Object is interpreted using the
    RepInfo
  • The Reference Model is designed to ensure that an
    OAIS is NOT set the impossible task of having to
    provide ALL possible RepInfo immediately
  • Hence
  • Take account of the Designated Community and its
    associated Knowledge Base

6
Representation Information
  • The information that maps a Data Object into more
    meaningful concepts. An example is the ASCII
    definition that describes how a sequence of bits
    (i.e., a Data Object) is mapped into a symbol.

7
Representation Information
  • The Representation Information accompanying a
    physical object, like a moon rock, may give
    additional meaning
  • It typically is a result of some analysis of the
    physically observable attributes of the rock
  • The Representation Information accompanying a
    digital object, or sequence of bits, is used to
    provide additional meaning.
  • It typically maps the bits into commonly
    recognized data types such as character, integer,
    and real and into groups of these data types.
  • It associates these with higher level meanings
    which can have complex inter-relationships that
    are also described

8
Recursive Nature ofRepresentation Information
  • Structure Information
  • Semantic Information
  • Other Representation Information

9
Examples (cont)
  • 504b0304140000000800f696.
  • This is a ZIP file which contains Word files,
    each of which contains an encoded message which
    needs the key !DGAJUKI to decode it using
    encryption method SHA7

10
Examples (cont)
  • LaTex file containing an EPS (Encapulated
    Postscript) version of an image
  • Web page containing Java Applet generating random
    numbers
  • SWISS-PROT data
  • Foreign Language emails

11
Further RepInfo Classification
12
Why classify?
  • This is a Word file
  • This is a ZIP file which contains Word files
  • This is a ZIP file which contains Word files,
    each of which contains an encoded message which
    needs the key !DGAJUKI to decode it using
    encryption method SHA7
  • This is a ZIP file which contains Word files,
    each of which contains an encoded message which
    needs the key !DGAJUKI to decode it using
    encryption method SHA8
  • To avoid repetition
  • To facilitate automation

13
Structure including Formats
  • Distinguish
  • formats which are used mainly for rendering to
    be followed by human inspection, and
  • formats used for automated processing
  • Distinguish
  • Things with unknown structure needs software
  • proprietary software e.g. MS Word
  • Open Source software e.g. CDF
  • Things with known structure
  • ASCII file, FITS file etc
  • Document the format
  • Use description language if possible e.g. EAST
  • The EAST tools are themselves Representation
    Information which in due course will have to be
    fully defined the closure of their
    Representation Nets will be the EAST standard
  • Higher level definitions should include useful
    scientific objects and humanities objects

14
Layered Model from OAIS
15
Semantics
  • Meaning/ Relationships
  • Hard problem
  • Probably start with Data Dictionaries
  • Add RDF etc

16
Time Dependent Information
  • Many, perhaps most, datasets change over time and
    the state at each particular moment in time may
    be important. It may be useful to break the issue
    into separate parts.
  • at each moment in time we could, in principle,
    take a snapshot and store it. That snapshot has
    its associated Representation Net.
  • efficient storage of a series of snapshots may
    lead one to store differences or include time
    tags in the data
  • Additional Representation Information would be
    needed which describes how to get to a particular
    time's snapshot from the efficiently encoded
    version.
  • Also applies to ANNOTATION who said what about
    which and when did they say it

17
Actions and Processes (Behaviour)
  • Some information has, as an integral part of its
    content, an implicit or explicit process
    associated with it
  • An examples of this is a database or other time
    dependent or reactive system such as a Neural
    Net.
  • Emulations
  • Universal Virtual Computer (UVC)
  • A very well specified VM e.g. JVM

18
Is saying its XML enough?
  • lt?xml version'1.0'?gt
  • ltVOTABLE version"1.1"
  • xmlnsxsi"http//www.w3.org/2001/XMLSchema-insta
    nce"
  • xsischemaLocation"http//www.ivoa.net/xml/VOTab
    le/v1.1 http//www.ivoa.net/xml/VOTable/v1.1"
  • xmlns"http//www.ivoa.net/xml/VOTable/v1.1"gt
  • lt!--
  • ! VOTable written by uk.ac.starlink.votable.VOTa
    bleWriter
  • !--gt
  • ltRESOURCEgt
  • ltTABLE name"6dfgs_E7_subset" nrows"875"gt
  • ltPARAM arraysize"" datatype"char"
    name"Original Source" value"http//www-wfau.roe.
    ac.uk/6dFGS/6dfgs_E7.fld.gz"gt
  • ltDESCRIPTIONgtURL of data file used to create this
    table.lt/DESCRIPTIONgt
  • lt/PARAMgt
  • ltPARAM arraysize"" datatype"char"
    name"Credits" value"Column explanations
    provided by Mike Read (ROE) from 6dfGS
    project."/gt
  • ltPARAM arraysize"" datatype"char"
    name"Conversion" value"Converted from
    6dfgs_E7.fld.gz by Mark Taylor (Starlink) using
    STIL."/gt
  • ltPARAM arraysize"" datatype"char"
    name"Comment" value"Cut down 6dfGS dataset for
    TOPCAT demo usage."/gt
  • ltFIELD arraysize"15" datatype"char"
    name"TARGET"gt
  • ltDESCRIPTIONgtTarget namelt/DESCRIPTIONgt

Or here
NO!
19
Why not embed in the object?
  • Do we have to repeat things each time?
  • Does every archive have to do everything?
  • What happens when the Designated Community
    Knowledge Base changes?

20
Registries
  • A place to register something
  • A place to look something up (find something)

21
Examples
  • http//www.loc.gov/film/nfr2004.html
  • http//hul.harvard.edu/gdfr/
  • http//sunsite.berkeley.edu/rbeaubie/metsimpl/
  • http//metadata.net/registries.html
  • http//uddi.microsoft.com/default.aspx

22
Simplest cases
  • Data object has an identifier pointing to
    Representation Information (RepInfo)
  • Services Given an identifier return associated
    contents of Repository
  • Writer of RepInfo needs to be able to find
    related stuff (i.e. has someone already done the
    work?)
  • Services must be able to SEARCH registry in
    various ways
  • Updater of RepInfo someone/something needs to
    be able to add, extend (add RepInfo for the
    RepInfo), correct

23
High Level Conceptual View
The Digital Object could have RepInfo packed with
it
Example of use of Representation Information
Labelling
24
Possible ways to attachment ID
  • DOI metadata
  • SRB attribute
  • METS/XFDU attribute
  • Object-based Storage Devices (OSD) attribute
  • NB local caching is possible
  • Simple buy-in

25
Example Label
26
Persistent ID Digital Objects
  • Persistent Identifiers of Persistent Objects
  • Uniqueness (over time)
  • Actionable i.e. actually allows one to get
    something
  • Bootstrap step
  • Sequence of resolutions
  • Terminal step

27
Uniqueness
  • Hierarchy of name spaces
  • In each namespace
  • Unique (how many?)
  • Final namespace e.g.
  • Unique (probably out of a larger number)
  • Repository assigned e.g. Sequential, Hashed etc
  • Repository or Depositor assigned or Distributed
    system e.g. UUID based

28
Resolvability
  • BsXsYs(Z)T
  • B Bootstrap step
  • s Separators may be different
  • X, Y Sequence of intermediate resolver steps
  • Z (implicit) terminal resolver service
  • T terminal token

29
Persistence Requirements
  • External to Repository
  • Bootstrap step
  • Each resolver step
  • Within the control of a Repository
  • Terminal resolver
  • The Digital Object

30
Bootstrap step
  • Fixed root
  • ISO based
  • ISO/IEC 6523-11998 (rolodex?)
  • ISO/IEC 8824-11998
  • DOI
  • Handle
  • PURL
  • Mutable root
  • ARK
  • http//NMAH/ark/NAAN/Name
  • URN
  • LSID

31
Two Forms of ISO Highest-Level Identifiers - from
ISO 8824
  • 1. iso(1) standard(0)
  • and
  • 2. iso(1) identified-organization(3)
  • Form 1
  • Requirements on the standard, if any, are not
    currently known
  • Can standard simply define procedures for ID
    assignments?
  • Must standard explicitly give all identifiers
    to be used?
  • Form 2
  • Identified Organization is to be identified
    using ISO 6523
  • ISO 6523-1,-2 (1998) extensively revised from
    1984 version

32
ISO 6523-1 (1998)
  • Rules for ICD registration, and usage of 3
    additional fields
  • ICD identifies organization registration system,
    1-4 characters (e.g., ICD 112 is system for
    registering top level standards organizations)
  • Organization Identifier (OI), up to 35 characters
    (e.g., 4 assigned to CCSDS)
  • Optional Organization Part Identifier (OPI), up
    to 35 characters identifying sub-org., services,
    or entity (e.g., 1 could be assigned to CCSDS
    CA Agent)
  • Optional OPI Source (OPIS), 1 character,
    identifies who assigned the OPI (e.g., 1 says
    identified organization (CCSDS) assigned the
    OPI)
  • Interpretation of identification string under
    6523 requires full knowledge of context of usage
  • Fields can be in any order
  • No syntax specified

33
Implications for Registered Identifier Usage
  • All identifiers are ambiguous without context of
    usage
  • No string is globally unique
  • Need syntax specification including meaning of
    included fields
  • In most contexts of usage, full iso string not
    needed
  • Sending and receiving parties understand context
  • May need to broaden context of usage in some
    cases
  • Can employ full string
  • Map into new identifier string syntax and
    semantics - not automatic

34
Investigation Status
  • ICD 112 has been obtained by ISO for
    identification of standards developing
    organizations
  • ICD 112 is under control of ISO JTC1/SC32 (in
    2000 the contact was)

35
Potential ID Construction(abstract level)
  • Using ISO/ICDs
  • x distinguishes among CCSDS defined domains
    (TBD)
  • Maintained by CCSDS Secretariat

P2 CA ADID services
1 NSSD 0233 (Panel 2)
ICD
OI
OPI
1 5 2. (Panel 3)
P3 SLE services
OPIS
  • Using ISO/ISO Standards
  • x is number of ISO standard

13764 NSSD 0233 (Panel 2)
X
0
1
ID
P3 Top Level SLE Standard
5 2. (Panel 3)
36
ISO 8824-1 Naming Tree
37
Persistent ID - roles
  • Who/What (role?) can update a Registry entry in
    the long term?
  • Who/What (role?) can access a Registry entry?
  • Authorisation?
  • Encryption keys?

38
What can be relied on?
  • Organisational/ Procedural/ Sociological issues
    are important
  • What can be relied on?
  • Organisations?
  • Internet? DNS?
  • Nothing

39
Example
  • ltreferencegt
  • ltidentifiergt
  • ltvaluegte1fe9271-cd48-4418-a63e-b112ebf792c7lt/v
    aluegt
  • ltresolver resolverType"ark"gthttp//foobar.zaf
    .org/ark/64269/lt/resolvergt
  • ltresolver resolverType"doi"gt10.123456/lt/resol
    vergt
  • lt/identifiergt
  • ltdescriptiongtFor example something registered
    with both ARK and DOIlt/descriptiongt
  • lt/referencegt

40
Conclusions
  • There is a need for Persistent Identifiers for
    persistent objects
  • There are many systems some may be more
    believable than others
  • None can actually be trusted in the really long
    term
Write a Comment
User Comments (0)
About PowerShow.com