Title: Awareness Services for Digital Libraries
1Awareness Services for Digital Libraries
- Arturo Crespo
- Hector Garcia-Molina
- Stanford University
2Awareness Services for Digital Libraries
- Digital library repository
- Data store
- Other components
- Indexers
- Name manager
- Replica manager
- Etc
3Data Stores and Clients
DB Tech Reports
DB Indexer
AI Tech Reports
CS Indexer
HCI Tech Reports
Data Stores
Clients
4Data Store Services
- Object access
- Via a handle
- Object awareness
- Clients must be aware of changes at the store
5A Case Study CS-TR and SIFT
- SIFT a selective dissemination service
- CS-TR A digital library of technical reports
from about 50 universities - Awareness based on timestamps
- Problems
- File system timestamps
- Application timestamps
- Deletions
6Contributions
- Survey of the spectrum of awareness options
- Advantages and disadvantages of each one
- All mechanisms can be capture by a single
algorithm the UNI-AWARE algorithm - Enhancements for signature-based schemes
- Reduced computation
- Reduced communication costs
7Related Work
- Database replica maintenance
- Remote file comparison
- Deployment of programs over the network
8The Client-store Design Space
- Push vs. Pull
- Statefull versus stateless stores and clients
- Cognizant clients and sources
- Number of clients per data store
9The UNI-AWARE Algorithm
- A unified algorithm that covers known schemes
- Snapshot algorithm
- Timestamps and versions
- Logs
- Triggers
- Signatures
- Algorithm is tailored to a specific scheme
through the definition of custom functions
10UNI-AWARE Signature Algorithm
- Signature a token associated with each document
that has a high probability of being unique and
changes when the content of the object changes - Example CRC, checksums
- Advantages
- Robust as it does not require metadata
maintenance - Easy to manage consistently when store fails or
object migrates
11UNI-AWARE Signature Algorithm
All signatures transferred
Data Store
Client
Document
Signature
Request Documents
12DIST-UNI-AWARE Algorithm
- Objective reduce amount of data exchanged
between data store and clients - DIST-UNI-AWARE
- Unified algorithm that can be tailored to
different schemes - Hierarchical signatures
- Hierarchical timestamps
13DIST-UNI-AWARE
Signatures of Buckets transferred
Data Store
Client
Request more Signatures
Request Documents
Document
Signature
14Advantages of Signature Algorithms
- Support the push and pull models
- No need for reliable storage of additional data
structures if signatures are lost or corrupted,
they can be recomputed - Efficient in usage of network resources, clients
and data stores - Scales well in number of clients and documents
15DIST-UNI-AWARE Performance
- Performance depends on number of changes
- No changes only one round is required
- Single change log2n rounds
- 2 changes log2n rounds, but twice as much data
-
- Eventually, DIST-UNI-AWARE starts behaving worse
than UNI-AWARE
16DIST-UNI-AWARE Enhancements
- Increase group split factor
- Client sends additional information at split time
- Clustering of changed objects
17Conclusions
- Awareness mechanism for digital libraries
- Separation of storage functionality and other
services - Awareness schemes must be resilient to computer
environment changes and bugs - UNI-AWARE and DIST-UNI-AWARE
18Awareness Services for Digital Libraries
Arturo Crespo Hector Garcia-Molina Stanford
University