Web Storage: Permanence of Information - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Web Storage: Permanence of Information

Description:

Intermemory basics ... URL:http://members.aol.com/aiaio/index.html ... by Shafer, Weibel, Jul & Fausey http://purl.oclc.org/docs/inet96.html ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 28
Provided by: ist6
Category:

less

Transcript and Presenter's Notes

Title: Web Storage: Permanence of Information


1
Web Storage Permanence of Information
  • By Laura Milodin
  • IST 497

2
Overview
  • Introduction
  • Problems
  • Solutions
  • Some alternatives
  • Conclusion

3
Introduction
  • Through publication we preserve and transmit our
    knowledge and culture.
  • Electronic media offers clear advantage to
    transmit knowledge.
  • However preserving our knowledge offers some new
    challenges.

4
The Problem
  • At some time in the future, we may want some
    information that we have today.
  • We want that information to be efficiently
    available to us in the future.

5
Web permanence
  • Narrow sense -- we have control of a particular
    piece of web content which we want to remain
    accessible to Web users.
  • Broader sense -- we want to save everything of
    value on the Web in order to preserve our
    culture.
  • The information presented today is looking at the
    problem in the narrow sense of Web permanence.

6
Actions we might have to take
  • 1. We have to protect from unexpected disaster.
  • 2. We have to protect from known slower acting
    deterioration.
  • 3. We have to keep the content accessible to
    users.
  • 4. We have to maintain not distort the importance
    of the content stored.

7
Importance of the content
  • Whatever value society places on something should
    not be affected by how we store it.
  • We might however store something differently
    because of its importance.
  • The storage process itself should do nothing that
    would prevent recovery of important content in
    preference to less important.

8
Distortion of importance
  • Hardware needed for access may become less
    available over time, thus favoring content that
    is read by newer equipment.
  • Some content might be easier to access than other
    more important content.
  • Accidental discovery of some content might be
    much more likely than for some more important
    content.

9
Solutions
  • Intermemory is a noncommercial, decentralized
    concept using a scheme similar to barter.
  • An alternative to commercial archives or archival
    services offered by large libraries.

10
What is Intermemory barter?
  • There is no central organization, rather each
    subscriber donates a certain amount of storage
    space for a limited time and in return receives
    the right to archive a much smaller amount.
  • Hard numbers are difficult to calculate due to
    many parameters.

11
Intermemory basics
  • The Intermemory is a very large, distributed,
    self-organizing memory consisting of the combined
    memory of all subscribers that is addressed by a
    single addressing scheme.
  • The addresses correspond to blocks of N words
    each of w bits each.

12
How it Works?
  • Redundancy and dispersal provide the protection
    from unexpected disaster.
  • The rebuilding of data from a discontinued
    subscriber by a new subscriber provides the
    updating of equipment for the whole system. If
    one processor fails, its data is reconstructed
    automatically by a new or other existing
    subscriber.

13
Redundancy and Dispersal
  • A particular data block exists in its entirety
    only at a single processor.
  • The address of this block is used to retrieve the
    data under normal circumstances.
  • The portions of the data in this block are also
    dispersed among many other processors, this is
    used to rebuild the original data in case of
    failure.

14
Space-optimal dispersal
  • The mathematics behind the dispersal algorithm
    involve polynomial evaluation and interpolation.
    It is based on the idea of associating every
    block of N words with a polynomial of degree N-1.
    The value assumed by this polynomial at N
    distinct points would uniquely identify it.

15
Retrieval
  • We calculate more points than we need, say 2N
    points.
  • We disperse these 2N points among many
    processors.
  • If we lose the original word block, we only need
    to recover N out of possible 2N points to
    recover the polynomial and find the original
    block that corresponds to it.

16
Space requirement
  • Looking at space requirements, at the first level
    of replication, each dispersal level takes twice
    the original block size. At the second level of
    replication takes four times the original block
    size.
  • Total space requirement is 9 times the original.

17
Degree of dispersal
  • The degree of maximum dispersal is a key variable
    here.
  • Dispersal on the scale proposed in the former
    slide would not be needed for a model where
    processor failures were independent.
  • This model assumes possibility of software, bugs,
    viruses, overt adversarial action.

18
Uniform Resource Locators
  • Another alternative is the URL.
  • The goal here is not to physically maintain
    hardware, or a readable copy of content on that
    hardware, but rather to maintain a link.
  • We are assuming that a readable copy exists we
    want to be able to link to it even when its
    physical location or file structure changes.

19
Uniform Resource Name, URN
  • The general solution to this problem is the
    development of Uniform Resource Names or URNs.
  • A parallel situation exists with books in a
    library. We could attempt to describe a book as
    on the fourth floor, third aisle from west end,
    top shelf, second from end. Or we could give the
    book a name and maintain some way of resolving
    the name into its location.

20
Persistent URL (PURL)
  • PURLs are one possible solution to the problem
    of developing URNs.
  • PURLs look and function much like URLs, but
    instead of pointing directly to the location of
    an Internet resource, a PURL points to an
    intermediate resolution service.
  • Ex PURL http//purl.oclc.org/NET/frankdill/index.
    html
  • URLhttp//members.aol.com/aiaio/index.html

21
PURL Server
  • The process of resolution now involves one extra
    step in which a PURL server associates the PURL
    with a unique URL which is returned to the
    client.
  • The extra step is an HTTP redirect
  • The key to PURL server is indirection, not
    redirection, naming something to separate
    location from identification.

22
PURL database
  • PURLs must be maintained.
  • A change to the PURL database is required if the
    owner of a file moves that file.
  • If file is completely removed, of course, the
    PURL is as useless as the URL.

23
Some alternatives
  • Meanwhile, there is a website that archives all
    the web pages that posted on the World Wide Web.
  • The name of it is Internet Archive .
  • The URL for it is http//www.archive.org .

24
Conclusion
  • Introduction of web storage
  • Problems
  • Solutions
  • Some alternatives for right now

25
References
  • Towards an Archival Intermemory by Goldberg
    Yianilos http//citeseer.nj.nec.com/goldberg98towa
    rds.html
  • Introduction to Persistent Uniform Resource
    Locators
  • by Shafer, Weibel, Jul Fausey
    http//purl.oclc.org/docs/inet96.html
  • Web Storage The Permanence of Information
  • by F. Dill http//external.nj.nec.com/homepages/k
    rovetz/timetable.html

26
(No Transcript)
27
Any Questions?
Write a Comment
User Comments (0)
About PowerShow.com