Preserving access: Making more informed - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Preserving access: Making more informed

Description:

Changes to access paths. Incremental loss of access not directly obvious ... Recording conversion and alternate access paths. Exploring different approaches ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 21
Provided by: MADa155
Category:

less

Transcript and Presenter's Notes

Title: Preserving access: Making more informed


1
Preserving accessMaking more informed guesses
about what works
  • Prepared by Maxine Davis, Collaboration
    Research Officer
  • Presented by David Pearson, Acting Director
  • Web Archiving Digital Preservation,
  • National Library of Australia
  • IIPC Open Day, San Francisco, 7 October 2009

2
Presentation Outline
  • The problem
  • Case study PANDORA Web Archive
  • Some approaches options
  • Approach 1 Unified Digital Format Registry
    (UDFR)
  • Approach 2 Wikipedia
  • Approach 3 Another way documenting what web
    archives actually use/d

3
The problem
  • The World Wide Web is constantly evolving
  • Requires combinations of software/hardware to
    render web content
  • But what is used for creation and access changes
  • Web archives
  • Contain snapshots of websites taken at different
    times (different sites or same sites multiple
    times)
  • Lots of files, many file formats, various
    versions
  • Aim for ongoing access

4
Process of version creepin the archive
  • Mixed accessibility resulting from
  • Different browsers, plug-ins, operating systems
    in use (then and now)
  • Backwards compatibility not guaranteed
  • Changes in standards and coding practices
    (deprecated, dead non-standard tags)
  • Obsolescence of file formats renderers
  • Changes to access paths
  • Incremental loss of access not directly obvious
  • Alternative access paths not specified

5
Case study PANDORA Australias Web Archive (1)
  • Selective archive began collecting 1996
  • Sites individually selected by NLA partners
  • As at July 2009 over 70.6 million files
  • Accessible over the web using standard web
    browser
  • .au whole domain harvests
  • 4 annual harvests 2005-2008 completed, 2009
    underway with Internet Archive
  • Combined harvests 05-08 2.3 billion files
  • Not currently publicly available

6
Case study PANDORA Australias Web Archive (2)
7
IIPC Preservation Working Group discussions
  • Need for documenting the technical environment
  • Support required for alternative preservation
    action strategies
  • Emulation of past environments
  • Migration to standard formats
  • Risk notification
  • Recording conversion and alternate access paths
  • Exploring different approaches
  • Sharing information sensible

8
Technical information of interest
  • Browsers plug-ins/helper applications versions
    dependencies
  • Used approximately when?
  • Appropriate for which individual/ type of file
    format or whole archive?

9
Already documented?
  • Manufacturer/vendors websites
  • Developers networks, forums, blogs, etc.
  • File format registries
  • File extension resources
  • Software archives/download sites
  • Internet history websites
  • Internet statistics websites
  • Wikipedia

10
Possible Approach 1 UDFR
  • Digital format registry will result from proposed
    merger of PRONOM and GDFR
  • Pros
  • Considerable intellectual investment already
  • Could be used for general digital preservation
    and potential interaction with other tools
  • Cons
  • Under development
  • Web archive requirements need to be specified,
    use cases developed, changes to data model,
    population with relevant data and regular
    updating
  • Temporal aspect not currently catered for
  • Entry point Individual file format or software
    type could be a pro?

11
Possible Approach 2 Wikipedia (1)
  • Pros
  • Existing free, web-based collaborative
    multilingual project
  • Draws together a rich set of information
  • browsers, layout engines, plug-ins software,
    statistics, creators, standards, etc.
  • lists, history, comparisons, timelines, links to
    internal external references
  • Updated by many voluntary contributors

12
Possible Approach 2 Wikipedia (2)
  • Cons
  • General audience, not specific to web archive
    requirements or specific web archive
  • Amount of detail varies (between different
    language versions, articles)
  • Can be edited by multiple users ( -)
  • Not designed to interact with other digital
    preservation tools as UDFR has potential to do

13
Extract example
14
Possible Approach 3 Documenting what web
archives are using/used
  • Pros
  • Time based software suite approach
  • Starting point for
  • Potential UDFR seed list
  • Identifying commonly used software
  • Inferring additional software requirements
  • Identifying alternate access paths
  • Cons
  • Easier to document current versions
  • Obscure/obsolete material in our collections may
    be unknown

15
Individual web archives as sources of information
  • Analysis of archive contents harvesting
    statistics
  • Web archivists observations records
  • UK Web Archive Technology Watch blog
  • Website usage statistics
  • Browser versions operating systems
  • Indicative of popularity
  • Archived sites
  • Plug-in requirements, file type information
  • May include useful information websites
  • Internet Archive complementary collection

16
Example NLA Web archiving software environment
July 2009
  • Operating system Windows XP
  • Computer Windows PC, Intel Pentium 4
  • Browser Internet Explorer 7 (main browser), IE8,
    Firefox 3.0
  • Additional software
  • Adobe Reader 8
  • Adobe Shockwave Player
  • Adobe Flash Player 10
  • Real Player 10
  • Apple QuickTime 7
  • Windows Media Player 11
  • Java 6 Update 11
  • JavaScript enabled
  • Word, Excel, PowerPoint 2003
  • WinZip

17
Example Earlier NLA Software Environment
18
Example Comparison NLA and BnF software
environments
19
Going forward
  • Is it worth pursuing approach 3?
  • If so where would we record (IIPC PWG wiki?,
    other suggestions)?
  • Interested in contributing?

20
Questions?
  • Contact
  • David Pearson dapearson_at_nla.gov.au
  • Maxine Davis madavis_at_nla.gov.au
  • Report to IIPC PWG by end October 2009

Everything, for Everyone Forever
Write a Comment
User Comments (0)
About PowerShow.com