Peer-to-peer archival data trading - PowerPoint PPT Presentation

About This Presentation
Title:

Peer-to-peer archival data trading

Description:

Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University – PowerPoint PPT presentation

Number of Views:197
Avg rating:3.0/5.0
Slides: 45
Provided by: BrianC206
Category:

less

Transcript and Presenter's Notes

Title: Peer-to-peer archival data trading


1
Peer-to-peer archival data trading
  • Brian Cooper
  • Joint work with Hector Garcia-Molina
  • (and others)
  • Stanford University

2
Problem Fragile Data
  • Data easy to create, hard to preserve
  • Broken tapes
  • Human deletions
  • Going out of business

3
Replication-based preservation
4
Replication-based preservation
5
Motivation
  • Several systems use replication
  • Preserve digital collections
  • SAV, others
  • Archival part of digital library
  • Individual organizations cooperate
  • Not a lot of money to spend

6
Goal
  • Reliable replication of digital collections
  • Given that
  • Resources are limited
  • Sites are autonomous
  • Not all sites are equal
  • Traditional methods
  • Central control
  • Random
  • Replicate popular
  • Metric
  • Reliability
  • Not necessarily efficiency

7
Our solution
  • Data trading
  • Ill store a copy of your collection if youll
    store a copy of mine
  • Sites make local decisions
  • Who to trade with
  • How many copies to make
  • How much space to provide
  • Etc.

8
Trading network
  • A series of binary, peer-to-peer trading links

9
Architecture
Local archive
Remote archive
Users
Users
Service layer
Reliability layer
Reliability layer
SAV Archive
SAV Archive
InfoMonitor
Filesystem
This architecture developed with Arturo Crespo
10
Overview
  • Trading model
  • Trading algorithm
  • Optimizing (and simulating) trading
  • Some results
  • Some stuff we are still working on

11
Trading model
12
Trading model
  • Archive site an autonomous archiving provider

13
Trading model
  • Archive site an autonomous archiving provider
  • Digital collection a set of related digital
    materials

14
Trading model
  • Archive site an autonomous archiving provider
  • Digital collection a set of related digital
    materials
  • Archival storage stores locally and remotely
    owned digital collections

15
Trading model
  • Archive site an autonomous archiving provider
  • Digital collection a set of related digital
    materials
  • Archival storage stores locally and remotely
    owned digital collections
  • Archiving client deposit and retrieve materials

16
Trading model
  • Archive site an autonomous archiving provider
  • Digital collection a set of related digital
    materials
  • Archival storage stores locally and remotely
    owned digital collections
  • Archiving client deposit and retrieve materials
  • Data reliability probability that data is not
    lost

17
Deeds
  • A right to use space at another site
  • Bookkeeping mechanism for trades
  • Used, saved, split, or transferred
  • Trading algorithm
  • Sites trade deeds
  • Sites exercise deeds to
  • replicate collections

Deed for space
For use by Library of Congress or for transfer
623 gigabytes
Stanford University
18
Deed trading
19
The challenge
20
The challenge
21
Alternative solutions
  • Are there other ways besides trading?

22
Other solutions central control
23
Other solutions client-based
24
Other solutions random
25
Why is trading good?
  • High reliability
  • Framework for replication
  • Site autonomy
  • Make local decisions
  • No submission to external authority
  • Fairness
  • Contribute more more reliability
  • Must contribute resources

26
Decisions facing an archive
  • Who to trade with
  • How much to trade
  • When to ask for a trade
  • Providing space
  • Advertising space
  • Picking a number of copies
  • Coping with varying site reliabilities
  • What to do with acquired resources
  • How to deliver other services

Many many degrees of freedom!
27
Our approach
  • Define a basic trading protocol
  • Deed trading
  • Assume all sites follow same rules
  • Basic system for trading
  • Extend not all sites are equal
  • Some are more reliable or trusted
  • Extend sites have freedom to negotiate
  • Bid trading
  • Extend some sites are malicious
  • Ensure documents survive despite evildoers
  • For each model, what policies are best?

28
How do we evaluate policies?
  • Trading simulator
  • Generate scenario
  • Simulate trading with different policies
  • Evaluate reliability for each policy
  • Compare each policy

29
Simulation parameters
Number of sites 2 to 15
Site reliability 0.5 to 0.8
Collections per site 4 to 25
Data per collection 50 Gb to 1000 Gb
Space per site 2x data to 7x data
Replication goal 2 to 15 copies
Scenarios per simulation 200
30
Reliability
  • Site reliability
  • Will a site fail?
  • Example 0.9 10 chance of failure
  • Data reliability
  • How safe is the data?
  • Despite site failures
  • Example 320 year MTTF

31
Basic trading approach
  • How does trading work?
  • Assuming all sites follow the rules
  • Example advertising policy

B
A
Lets trade. How much space do you have?
32
Advertising policy
B
A
I have 120 GB
120 GB
Space fractional policy
B
A
I have 60 GB
60 GB
Data proportional policy
B
A
40 GB Data
I have 40 GB
40 GB
33
Result
34
Extend some sites gt others
  • May prefer certain sites
  • More reliable
  • Better reputation
  • Part of same system
  • Example who to trade with?

A
?
?
?
35
Who to trade with?
36
Extend freedom to negotiate
  • Bid for trades

How much do I pay for 100 GB of your space?
A
120 GB
80 GB
95 GB
37
Bid trading
  • Questions
  • When do I call auctions?
  • How much do I bid?
  • Can I take advantage of the system by being
    clever?

38
Extend some sites are malicious
  • Secure services
  • Publish Makes copies to survive failures
  • Search Find documents
  • Retrieve Get a copy of a document
  • Challenges
  • Attacker may delete copy
  • Attacker may provide fake search results
  • Attacker may provide altered document
  • Attacker may disrupt message routing
  • Joint work with Mayank Bawa and Neil Daswani

39
Current and future work
  • Access
  • Support searching over collections
  • Distribute indexes via trading
  • Prototype implementation
  • Basic SAV architecture implemented
  • Trading protocol/policies must be added
  • Develop security techniques further

40
Current and future work
  • Other topics of interest
  • Designing peer-to-peer primitives
  • Building other p2p services
  • Other ways of acquiring data
  • How to archive active systems
  • Semantic archiving
  • Managing format obsolescence
  • Finding data once it is archived

41
Other parts of SAV project
  • SAV data model
  • Write-once objects
  • Signature-based naming
  • How to get objects into SAV
  • InfoMonitor filesystem
  • Other inputs (Web, DBMS, etc.)
  • Modeling archival repositories
  • Arturo Crespo
  • Choose best components and design

42
Related work
  • Peer-to-peer replication
  • SAV, Intermemory, LOCKSS, OceanStore
  • Fault tolerant systems
  • RAID, mirrored disks, replicated databases
  • Caching systems (Andrew, Coda)
  • Deep storage (Tivoli)
  • Barter/auction based systems
  • ContractNet
  • Distributed resource allocation
  • File Allocation Problem

43
Conclusion
  • Important, exciting area
  • Preservation critical
  • Difficult to accomplish
  • Many decisions are ad hoc today
  • An effective framework is needed
  • Scientific evaluation of decisions
  • Trading networks replicate data
  • Model for trading networks
  • Trading algorithm
  • Simulation results

44
For more information
  • cooperb_at_stanford.edu
  • http//www-diglib.stanford.edu/
Write a Comment
User Comments (0)
About PowerShow.com