Title: Stanford InterLib Technologies
1Stanford InterLib Technologies
- Hector Garcia-Molina
- and the Stanford DigLib Team
2Stanford Digital Libraries Team
- Faculty
- Dan Boneh, Hector Garcia-Molina, Terry Winograd
- Research Scientist
- Andreas Paepcke
- Librarians
- Vicky Reich, Rebecca Wesley
- Partners
- InterLib Partners, ACM, Dialog, Hitachi, IBM,
Intel, Microsoft, NASA Ames Library, Stanford
Libraries,SUL HighWire Press, Xerox
3Barriers to Effective DLs
Physical Barriers
Economic Concerns
Information Loss
Information Overload
Service Heterogeneity
4Thrusts
Physical Barriers
Economic Concerns
Information Loss
Information Overload
Service Heterogeneity
5DL Interoperability Challenges
- Growing number of players, formats, countries,...
- Repositories ? Services
- Dynamic artifacts
- Reliability
6DL Interoperability Challenges
- Growing number of players, formats, countries,...
- Repositories ? Services
- Dynamic artifacts
- Reliability
Solution InfoBus ? InterServ
7InfoBus Example
Q Find Ti distributed (W) systems
Query Trans
Meta Data
Con- tracts
DLite
Gloss
U-Pai
Dialog Proxy
Folio Proxy
DigiCash Proxy
F.V. Proxy
F.V.
Folio
Dialog
DigiCash
8InfoBus Example
Q Find Ti distributed (W) systems
Suggested Folio, Dialog
Query Trans
Meta Data
Con- tracts
DLite
Gloss
U-Pai
Dialog Proxy
Folio Proxy
DigiCash Proxy
F.V. Proxy
F.V.
Folio
Dialog
DigiCash
9InfoBus Example
Q Find Ti distributed (W) systems
Query Translation
Query Trans
Meta Data
Con- tracts
DLite
Gloss
U-Pai
Dialog Proxy
Folio Proxy
DigiCash Proxy
F.V. Proxy
F.V.
Folio
Dialog
DigiCash
Q Find Ti distributed AND systems
10InfoBus Example
Q Find Ti distributed (W) systems
Pay per View
Query Trans
Meta Data
Con- tracts
DLite
Gloss
U-Pai
Dialog Proxy
Folio Proxy
DigiCash Proxy
F.V. Proxy
F.V.
Folio
Dialog
DigiCash
11InterServ
Dynamic Artifacts
Services
Sophistication
Perpetual Activity
InfoBus
InfoBus Pro
Maturity
12Perpetual Activity Service
Service
register
P.A.S.
User Request
state plans
13Perpetual Activity Service
Service
register
restart service, use alternate
P.A.S.
check
check
User Request
restore state, try alternatives
state plans
14SDLIP
- Simple Digital Library Interoperability Protocol
- Goal get InterLib (and DLI2) to interoperate!!
15Search Protocol Initial Goals
- Trivial to implement!
- Works over CORBA/COM, DASL/HTTP
- Use XML
- Does not prescribe query format
- Does not prescribe result format
- Small footprint (Desktop/Laptop/PDA)
- Allows for stateful or stateless operation
But lets you say whatyoure using
16Interface Consists of Four Components
17SDLIP Status
- Design Meeting June 22, 1999
18SDLIP Status
- Design Meeting June 22, 1999
- Client Server Toolkits Available
- Extensive Documentation
- Seehttp//www-diglib.Stanford.EDU/testbed/doc2/S
DLIP/
19Current SDLIP Sources
- Some Web sources
- People Lookup www.switchboard.com
- Altavista
- IMDB (movies)
- NCSTRL services www.ncstrl.org
- Dienst compliant services, e.g., CoRR?
- Z39.50 servers
- e.g., Library of Congress
- Stanford WebBase
- CDL
- e.g., MELVYL gateway
- DASL-compliant servers
20Existing Clients
- Java
- command line
- applet
- C
- Palm Pilot
- TCL (Ray Larson)
- DASL-compliant clients
21Filtering Challenges
- Too much information
- Not controlled
22Current Filtering
textual similarity
23Page Rank Filtering
textual similarity
page rank (Google)
24Initial Page Rank
1
4
25Recursive Page Rank
2
1
2
1212 6
4
1
6
26Value Filtering
access
textual similarity
opinions
page rank
context
geography
27Value Filtering Challenges
- Collection of Value Information
- Scalability
- Privacy of Value Information
- Understanding Page Rank
- Searching Non-Text Objects
- Combining Value Information
- HCI Aspects
28WebBase Goals
- Manage very large collections of Web pages
- Enable large-scale Web-related research
- Locally provide a significant portion of the Web
- Efficient wide-area Web data distribution
29Challenges
- Huge information space
- Wide area distribution
- URL space (to remember while crawling)
- Web content (to store)
- Limited resources
- Disk
- Time
- Memory
- Bandwidth
- Server administrator tolerance
- Continuous evolution
- More pages
- Pages change/disappear
- Mirror sites installed
- Keeping data fresh
- Crawling issues
- Data fiefdoms firewalls access permissions
load controls - Overhead per site DNS lookups processing
robots.txt - Parallelization
- Ability to interrupt restart
30WebBase Architecture
Client
Client
Webbase API
WWW
Retrieval Indexes
Feature Repository
Repository
Multicast Engine
Client
Client
Client
Client
31Mobile Access Challenges
- Limited Resources
- Transitions Between Devices
- Exploiting Context
32Mobile Access Challenges
- Limited Resources
- Transitions Between Devices
- Exploiting Context
- Solutions
- Power Browsing
- Information Tiles
- Information Paging
33Power Browsing
?
34Power Browsing
?
- Techniques
- Show only text headers
- Show URLs, anchors, titles
- Order URLs by page rank
- Summarize text
- Summarize set of pages
- Low-resolution pictures
- Display relevant text
- ...
35PowerBrowser - Start Screen
36PowerBrowser - Hypertext View
37PowerBrowser - Text View
38PowerBrowser - History
39IP Management Challenges
- Heterogeneity
- Complexity of Interactions
- Varied Information Appliances
- Mobile Access
- Security/Privacy
40Fundamental Problem
- Safeguards (security, privacy, authentication,
payment, non-repudiation...) are afterthought - Spaghetti code for safeguards
- Experience at Stanford
- InterPay, CommPacts, Copy Detection
- Goal was interoperability
- Correctness, complexity were problems
41Example Simple Pay Per View
transfer(amt, account, libAccount)
patron
library
bank
view(docId, account, amt)
42Example Simple Payment
transfer(amt, account, libAccount)
patron
library
bank
view(docId, account, amt)
- Goals
- Do not want others to see data
- Do not want library to see account number
- Need receipt from bank
43Example Simple Payment
transfer(amt, account, libAccount)
patron
library
bank
view(docId, account, amt)
- Goals
- Do not want others to see data
- Do not want library to see account number
- Need receipt from bank
Result A Mess!!
44Declarative Safeguards for DLs
- Safeguards built in at system design time
- Declare goals, not mechanisms
- Players, data, ...
- Who can see what, who can do what, ...(Note
access information can also be protected)
Secure DLs
Components IP Mgmt, Wallets, ...
Declarative Infrastructure
45Solution
- Extended Interface Definition Language
- Corba or D-COM like
- Example
class artRecord authorized(policy)
setOwner(encrypted string ownerName,
encrypted(bank) int price,
picture pic )
46Declarative Safeguards for DLs
Secure DLs
Components IP Mgmt, Wallets, ...
Declarative Infrastructure
47Information Preservation Challenges
- Preserving the Bits
- Evolving hardware
- Evolving software
- Evolving organizations
- Preserving the Meaning
48Stanford Archival Repository
- Object Identifier ? Signature
handle
- No Deletions (never ever!)
set
new version?
49Repository Layers
Intellectual Property
Indexing, Naming
Reliability
Complex Objects
Identity
Object Store
50Archiving the Web - Problem
users
Web Server
File System
51Archiving the Web - One Solution
users
Web Server
Archival Repository
File System
52Archiving the Web - Our Solution
users
users
Web Server
Archival Repository
InfoMonitor
File System
53InfoMonitor History View
54InfoMonitor Snapshot View
55Stanford InterLib Technologies
Physical Barriers
Economic Concerns
Information Loss
Information Overload
Service Heterogeneity