Web Storage: Permanence of Information - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Web Storage: Permanence of Information

Description:

Intermemory basics ... URL:http://members.aol.com/aiaio/index.html ... by Shafer, Weibel, Jul & Fausey http://purl.oclc.org/docs/inet96.html ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 28

Provided by: ist6

Category:

more less

Transcript and Presenter's Notes

Title: Web Storage: Permanence of Information

1
Web Storage Permanence of Information

By Laura Milodin
IST 497

2
Overview

Introduction
Problems
Solutions
Some alternatives
Conclusion

3
Introduction

Through publication we preserve and transmit our
knowledge and culture.
Electronic media offers clear advantage to
transmit knowledge.
However preserving our knowledge offers some new
challenges.

4
The Problem

At some time in the future, we may want some
information that we have today.
We want that information to be efficiently
available to us in the future.

5
Web permanence

Narrow sense -- we have control of a particular
piece of web content which we want to remain
accessible to Web users.
Broader sense -- we want to save everything of
value on the Web in order to preserve our
culture.
The information presented today is looking at the
problem in the narrow sense of Web permanence.

6
Actions we might have to take

1. We have to protect from unexpected disaster.
2. We have to protect from known slower acting
deterioration.
3. We have to keep the content accessible to
users.
4. We have to maintain not distort the importance
of the content stored.

7
Importance of the content

Whatever value society places on something should
not be affected by how we store it.
We might however store something differently
because of its importance.
The storage process itself should do nothing that
would prevent recovery of important content in
preference to less important.

8
Distortion of importance

Hardware needed for access may become less
available over time, thus favoring content that
is read by newer equipment.
Some content might be easier to access than other
more important content.
Accidental discovery of some content might be
much more likely than for some more important
content.

9
Solutions

Intermemory is a noncommercial, decentralized
concept using a scheme similar to barter.
An alternative to commercial archives or archival
services offered by large libraries.

10
What is Intermemory barter?

There is no central organization, rather each
subscriber donates a certain amount of storage
space for a limited time and in return receives
the right to archive a much smaller amount.
Hard numbers are difficult to calculate due to
many parameters.

11
Intermemory basics

The Intermemory is a very large, distributed,
self-organizing memory consisting of the combined
memory of all subscribers that is addressed by a
single addressing scheme.
The addresses correspond to blocks of N words
each of w bits each.

12
How it Works?

Redundancy and dispersal provide the protection
from unexpected disaster.
The rebuilding of data from a discontinued
subscriber by a new subscriber provides the
updating of equipment for the whole system. If
one processor fails, its data is reconstructed
automatically by a new or other existing
subscriber.

13
Redundancy and Dispersal

A particular data block exists in its entirety
only at a single processor.
The address of this block is used to retrieve the
data under normal circumstances.
The portions of the data in this block are also
dispersed among many other processors, this is
used to rebuild the original data in case of
failure.

14
Space-optimal dispersal

The mathematics behind the dispersal algorithm
involve polynomial evaluation and interpolation.
It is based on the idea of associating every
block of N words with a polynomial of degree N-1.
The value assumed by this polynomial at N
distinct points would uniquely identify it.

15
Retrieval

We calculate more points than we need, say 2N
points.
We disperse these 2N points among many
processors.
If we lose the original word block, we only need
to recover N out of possible 2N points to
recover the polynomial and find the original
block that corresponds to it.

16
Space requirement

Looking at space requirements, at the first level
of replication, each dispersal level takes twice
the original block size. At the second level of
replication takes four times the original block
size.
Total space requirement is 9 times the original.

17
Degree of dispersal

The degree of maximum dispersal is a key variable
here.
Dispersal on the scale proposed in the former
slide would not be needed for a model where
processor failures were independent.
This model assumes possibility of software, bugs,
viruses, overt adversarial action.

18
Uniform Resource Locators

Another alternative is the URL.
The goal here is not to physically maintain
hardware, or a readable copy of content on that
hardware, but rather to maintain a link.
We are assuming that a readable copy exists we
want to be able to link to it even when its
physical location or file structure changes.

19
Uniform Resource Name, URN

The general solution to this problem is the
development of Uniform Resource Names or URNs.
A parallel situation exists with books in a
library. We could attempt to describe a book as
on the fourth floor, third aisle from west end,
top shelf, second from end. Or we could give the
book a name and maintain some way of resolving
the name into its location.

20
Persistent URL (PURL)

PURLs are one possible solution to the problem
of developing URNs.
PURLs look and function much like URLs, but
instead of pointing directly to the location of
an Internet resource, a PURL points to an
intermediate resolution service.
Ex PURL http//purl.oclc.org/NET/frankdill/index.
html
URLhttp//members.aol.com/aiaio/index.html

21
PURL Server

The process of resolution now involves one extra
step in which a PURL server associates the PURL
with a unique URL which is returned to the
client.
The extra step is an HTTP redirect
The key to PURL server is indirection, not
redirection, naming something to separate
location from identification.

22
PURL database

PURLs must be maintained.
A change to the PURL database is required if the
owner of a file moves that file.
If file is completely removed, of course, the
PURL is as useless as the URL.

23
Some alternatives

Meanwhile, there is a website that archives all
the web pages that posted on the World Wide Web.
The name of it is Internet Archive .
The URL for it is http//www.archive.org .

24
Conclusion

Introduction of web storage
Problems
Solutions
Some alternatives for right now

25
References

Towards an Archival Intermemory by Goldberg
Yianilos http//citeseer.nj.nec.com/goldberg98towa
rds.html
Introduction to Persistent Uniform Resource
Locators
by Shafer, Weibel, Jul Fausey
http//purl.oclc.org/docs/inet96.html
Web Storage The Permanence of Information
by F. Dill http//external.nj.nec.com/homepages/k
rovetz/timetable.html

26
(No Transcript)
27
Any Questions?

Write a Comment

User Comments (0)