Title: FILES, BITSTREAMS AND THE ONION MODEL
1FILES, BITSTREAMS AND THE ONION MODEL
2Files again
- FILE a named and ordered sequence of bytes that
is known by an operating system. - chapter1.pdf
- photo.tiff
- mapofGlasgow.jp2
-
- Can be zero or more bytes
- Has a file format
- Has access permissions and file system statistics
such as size and modification date
3Bitstreams again
- BITSTREAM contiguous or non-contiguous data
within a file that has meaningful common
properties for preservation purposes. - the video stream within an AVI file
- an image within a TIFF file
- Not known to operating system
- Can be located by starting position within the
file - Can not stand alone as a file without the
addition of a header, other structure, or
reformatting
4But some files arent that simple
chapter1.pdf
chapter1.gz
Unix gzip utility
- format gzip
- size 324,876 bytes
- messageDigest something else
- format PDF
- size 500,000 bytes
- messageDigest something
5Composition level
- How to describe layers of encodings so they can
be correctly reversed? - Treat each layer as a composition level
- Repeat description of object characteristics for
each composition level - A file with no compression and no encryption has
compositionLevel 0 (zero) - Each layer of encoding results in new format and
incremented compositionLevel
6compositionLevel
chapter1.pdf.gz
chapter1.pdf
7Ok, but what if you have this
package.tar
Inside the TAR file, file1 and file2 are simple
PDF files. Neither the containing TAR nor the
contained PDFs are encrypted or compressed.
file1.pdf
file2.pdf
8Then you have 3 objects!
package.tar is a file object with
compositionLevel 0 and a storageLocation in the
file system file1.pdf is a file object with
compositionLevel 0 and a storageLocation as an
offset in package.tar file2.pdf is a file object
with compositionLevel 0 and a storageLocation as
an offset in package.tar
package.tar
file1.pdf
file2.pdf
9In conclusion
- Remember Composition level increments only when
you have a single file object with multiple
successive encodings. - Bonus question why arent the PDF files within
package.tar considered bitstream objects?
10AGENTS, RIGHTS, EVENTS
11Agents
- The Agent entity aggregates information about
agents (persons, organizations, or software)
associated with rights management and/or
preservation events in the life of an object. - Intended only to identify the agent
unambiguously, and to allow linking from other
entity types. - Repositories encouraged to use any richer scheme
that may be appropriate. - agentIdentifier (mandatory)
- agentIdentifierType (mandatory)
- agentIdentifierValue (mandatory)
- agentName (optional)
- agentType (optional)
12Examples of agents
- agentIdentifier
- agentIdentifierType lcnaf
- agentIdentifierValue oca05896076
- agentName Caplan, Priscilla
- agentType person
- agentIdentifier
- agentIdentifierType repositoryX
- agentIdentifierValue 57
- agentName Timberline Publishing Company
- agentType organization
- agentIdentifier
- agentIdentifierType fda
- agentIdentifierValue daitss1.4.14
- agentName
- agentType software
13Rights
- The Rights entity aggregates information about
statements of permissions - PREMIS addresses only narrow scope what
permissions have been granted to the repository
itself to carry out actions related to objects
within the repository - permissionGranted (mandatory)
- act (mandatory)
- restriction (optional)
- termOfGrant (mandatory)
- startDate (mandatory)
- endDate (mandatory)
- permissionNote (optional)
14permissionGranted.act
- The action the repository is granted permission
to take - Suggested values
- replicate make an exact copy
- migrate make a copy identical in content in a
different file format - modify make a version different in content
- use read without copying or modifying (e.g., to
validate a file or run a program) - disseminate create a DIP for use outside of the
preservation repository - delete remove from the repository
15permissionGranted.restriction
- A condition or limitation on permissionGranted.act
- For example
- act replicate
- restriction no more than 3 copies at any time
- act disseminate
- restriction rightsholder must be notified after
the fact - Repeat if there are multiple conditions/limitation
s - How to make this actionable?
16permissionGranted.termOfGrant
- Beginning and ending dates of permission granted
- ISO 8601 format recommended
- Examples
- termOfGrant
- startDate 20050101
- endDate 20150101
- termOfGrant
- startDate 1900
- endDate 9999
17permissionGranted.permissionNote
- Defined as additional information about the
permission - Possible use for rights information that does not
narrowly fit the definition of permission? - Examples
- no contact information for rightsholder found
- public domain
18Other permissionStatement information
- permissionStatementIdentifier (mandatory)
- permissionStatementIdentifierType (mandatory)
- permissionStatementIdentifierValue (mandatory)
- linkingObject (mandatory)
- grantingAgent (optional)
- grantingAgreement (optional)
- grantingAgreementIdentification (optional)
- grantingAgreementInformation (optional)
- Granting agreement is formal documentation (e.g.
contract) behind the statement of permission.
19Why are PREMIS rights so narrow?
- Implementation survey report showed little
understanding of rights needed for preservation
and no vocabulary for expressing preservation
rights - Wanted rights information to be actionable
- Did not want to develop or endorse a rights
expression language - Thought a more thorough investigation of rights
would be a good activity for a successor group - Library of Congress commissioned Karen Coyle
paper as basis for further work
20Events
- The Events entity aggregates information about an
action involving one or more Objects - Recording events can be very important
- to demonstrate digital provenance
- to prove that rights have not been violated
- as an audit trail
- for problem solving if something goes wrong
- for billing or reporting
- Judgement calls
- what exactly are the boundaries of an Event?
- what actions are worth recording as Events?
21High level semantic units
- eventIdentifier (mandatory)
- eventType (mandatory)
- eventDateTime (mandatory)
- eventDetail (optional)
- eventOutcomeInformation (optional)
- linkingAgentIdentifier (optional)
- linkingObjectIdentifier (optional)
22eventType
- Names the event
- From a controlled vocabulary
- Could use coded values
- Granularity is implementation-specific
23eventDetail
- Additional information about the event
- Not necessarily intended to be machine-processable
, but could be structured to allow this - For example
- eventType dissemination
- eventDetail A001923WS20060413T071530-0500
- the agent requesting the dissemination a
dissemination type code and the date/time of the
request (which could be different from the time
of the actual dissemination itself)
24eventOutcomeInformation
- eventOutcomeInformation
- eventOutcome intended to be coded
- eventOutcomeDetail not necessarily
machine-processable - Examples
- eventOutcomeInformation
- eventOutcome 00 means ok
- eventOutcomeDetail new file successfully
created - eventOutcomeInformation
- eventOutcome FV-S means file validation
successful - eventOutcomeDetail A4,A14,A19 coded list of
validation errors found
25linking Events with Agents and Objects
- linkingAgentIdentifier
- linkingAgentIdentifierType
- linkingAgentIdentifierValue
- linkingAgentRole because there may be several
agents associated with the event - linkingObjectIdentifier
- linkingObjectIdentifierType
- linkingObjectIdentifierValue
26Event Example