VRC: Preservation Risk Management for Web Resources - PowerPoint PPT Presentation

About This Presentation
Title:

VRC: Preservation Risk Management for Web Resources

Description:

none – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: VRC: Preservation Risk Management for Web Resources


1
VRC Preservation Risk Management for Web
Resources Nancy Y. McGovern, ECURE 2004
2
VRC Funding
  • Part of a 4(5)-year NSF-funded project
  • supported by the Digital Libraries Initiative,
    Phase 2 (Grant No. IIS-9905955, the Prism
    Project)
  • Also partially funded by a grant from
    The Andrew W. Mellon Foundation
  • Political Communications Web Archiving
    http//www.crl.edu/content/PolitWeb.htm
  • For updates
  • http//irisresearch.library.cornell.edu/VRC/

3
Current Team
  • Anne R. Kenney, Advisor
  • Nancy Y. McGovern, Project Manager
  • Richard Entlich, Sr. Researcher
  • William R. Kehoe, Technology Coordinator
  • Ellie Buckley, Digital Research Specialist
  • Erica Olsen (recent)
  • Carl Lagoze, CIS PI

4
Research Scope
  • see, "Preservation Risk Management for Web
    Resources Virtual Remote Control in Cornell's
    Project Prism"
  • by Anne R. Kenney, Nancy Y. McGovern, Peter
    Botticelli, Richard Entlich, Carl Lagoze, and
    Sandra Payette
  • in DLib Magazine, January 2002
  • http//www.dlib.org/dlib/january02/kenney/01kenney
    .html

5
Virtual
  • because VRC develops models to represent
    essential features of selected Web sites
  • that enable ongoing monitoring over time
  • to identify, respond to, and mitigate potential
    risks to the site integrity and longevity

6
Remote
  • because VRC is intended for use by cultural
    heritage institutions
  • interested in the longevity of Web resources
  • residing on remote servers
  • not owned or managed by the monitoring
    institution

7
Control
  • because at the most proactive end of the VRC
    approach
  • a monitoring organization may act to protect
    another organization's resources
  • by agreement or implicit consent
  • through notification and/or action

8
Purpose
  • Develop a model for research libraries (adaptable
    to other contexts)
  • Support spectrum from passive monitoring to
    active capture
  • Lifecycle support selection to capture
  • Understand nature of Web resources
  • Promulgate good practice

9
Types of Web Resources
  • Two types of initiatives for monitoring and/or
    capture of
  • Web-based publications Web site as a means
  • All of (or a subset of) a Web site consisting of
    pages within a boundary defined by a URL (or a
    portion of one) Web site as an end (VRC)

10
Nature of Risks
  • Two perspectives on Web-based risk
  • potential liability of an institution based upon
    the content of its Web site, or a Web site for
    which it is responsible
  • potential threats to the integrity and longevity
    of a Web resource (VRC)

11
Types of Risks
  • Include
  • technological obsolescence
  • security weaknesses and breaches
  • human-error in developing/maintaining sites
  • organizational issues benign neglect
  • power and technology failures
  • inadequate backup and secondary systems

12
Risk Factors
  • Organizational Context
  • Combination of indicators
  • Monitoring (change/loss over time)
  • Triggers (events, organizational, upgrades)
  • Degradation of site management indicators

13
VRC Stages
  1. Identification
  2. Analysis
  3. Appraisal
  4. Strategy
  5. Detection
  6. Response

14
Human Tool Scenario
  • 1. Identification
  • Human identify Web resources of interest
  • Tool verify list, expand list
  • 2. Analysis
  • Tool crawl sites, generate characterizations
  • Human accept/revise characterizations
  • 3. Appraisal
  • Human define/review attributes of value
  • Tool support appraisal, capture results

15
Human Tool Scenario
  • 4. Strategy
  • Human develop/review strategies
  • Tool plot appraisals, compile strategies
  • 5. Detection
  • Human define risk parameters
  • Tool identify/assess risks propose responses
  • 6. Response
  • Tool propose risk response based on rules
    automatic response for some risk categories
  • Human monitor automated responses select
    response based on recommended actions

16
Contextual Layers
17
Server-level Monitoring
  • Potential multi-site impact
  • Server vulnerabilities put site content at risk
  • deletion or modification
  • Patches and new versions of Microsoft IIS and
    Apache server released frequently
  • Apache http server 1.3 security updates
  • to version 1.3.26 on June 18, 2002
  • to version 1.3.27 on October 3, 2002

18
Server-level Monitoring
19
VRC Toolkit
  • Identify tools for each stage (adopt, adapt,
    define, devise)
  • Leverage existing apply to longevity
  • Analyze steps - automated and manual
  • Formalize protocol
  • Provide a framework to map existing, plug gaps
    with developments

20
VRC Toolkit
  • Development steps
  • extensive literature review
  • development of tool categories
  • definition of categories and test protocols
  • survey existing tools for evaluation
  • select representative for testing
  • highlight findings in category summaries

21
Web Crawling
  • traversing Web sites via links
  • a capability common to most tools, but with
    different purposes and results
  • the VRC toolkit needs more than just Web crawlers

22
Tool Categories
  • Link checkers
  • Web site monitors
  • Web crawlers
  • Site management
  • Change Management
  • Site Mapping (includes visualization)

23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
OAIS Issues
  • Pre-Ingest Selection options
  • Ingest Capture
  • vs. monitoring
  • Targets, level and frequency
  • Archival Storage Formats
  • Access Site(s) vs. Page(s)
  • AIP Metadata issues

27
Management Issues
  • frequency of capture determined by
  • nature of sites/pages
  • events technological, organizational
  • resources
  • well-informed crawling
  • valuable vs. archival

28
Mandate
  • to fully document the site by capturing all
    changes to the pages/sites
  • to capture significant changes to pages/sites
  • to record periodic versions of the site
  • to capture one-time copy of pages/sites

29
Current Activities
  • VRC Preservation Risk Management Program
  • Map stages to tool requirements
  • Apply to potential organizational scenarios
  • Enable risk/response scenario development
  • Toolkit
  • Revise and populate tool inventory
  • VRC Control Site

30
Future Projects
  • Develop approach for building human sexuality
    collection capturing Web blogs and other
    Internet communications
  • State Government Web site case study
  • Demonstrators for toolkit scenarios

31
For Discussion
  • What would the VRC approach have to address to
    be of interest, value, and/or potential impact
    for archivists and records managers?
Write a Comment
User Comments (0)
About PowerShow.com