Title: P2P Apps
1P2P Apps
- Chandrasekar Ramachandran and Rahul Malik
- Papers
- 1.Storage management and caching in PAST, a
large-scale, persistent peer-to-peer storage
utility - 2. Colyseus A distributed architecture for
interactive multiplayer games - 3. OverCite A Distributed, Cooperative CiteSeer
CS525
02/19/2008
2Storage management and caching in PAST, a
large-scale, persistent peer-to-peer storage
utility
- Antony Rowstron Peter
Druschel - Microsoft Research Rice
University
3Contents
- Introduction
- Background
- An Overview of PAST
- Pastry
- Operations
- Improvements
- Storage Management
- Caching
- Experimental Evaluation
- Setup
- Results
- Conclusions
4Introduction Focus and Common Themes
- Recent Focus
- Decentralized Control
- Self-Organization
- Adaptability/Scalability
- P2P Utility Systems
- Large-Scale
- Common Themes in P2P Systems
- Symmetric Communication
- Nearly-Identical Capabilities
Source1
5Background
- Characteristic Features of Internet
- Geography
- Ownership
- Administration
- Jurisdiction
- Need for Strong Persistence and High Availability
- Obviates
- Physical Transport of Storage Data
- Mirroring
- Sharing of Storage and Bandwidth
Source2
6An Overview of PAST
- Any host connected to Internet can be a PAST Node
- Overlay Network
- PAST Node Access Point for User
- Operations Exported to Clients
- Insert
- Lookup
- Reclaim
- Terms NodeId, FileID
- NodeId 128-bit, SHA-1 Hash of Nodes Public Key
Source3
7Pastry Overview and Routing Table
- P2P Routing Substrate
- Given Message
- Routes to NodeID with FileId Closest to 128 msb
- In lt log2bN Steps
- Eventual Delivery Guaranteed
- Routing Table
- log2bN levels with 2b - 1 entries
- Each entry
- NodeId ? appropriate prefix
- Leaf Set and Neighborhood Set
8Basic PAST Operations
- Insert
- Store File on K-PAST Nodes with NodeId closest to
128 msb of FileId - Balance Storage Utilization
- Uniform Distribution of Set of NodeIds and
FileIDs - Storage Quota ? Debited
- Store Receipts
- Routing Via Pastry
- Lookup
- Nodes Respond with Content and Stored File
Certificate - Data Usually Found Near Client. Why?
- Proximity Metric
- Reclaim
- Reclaim Certificate
- Reclaim Receipt
9PAST - Security
- Smartcards
- Private/Public Key Pair
- Certificates
- Storage Quotas
- Assumptions
- Computationally Infeasible to Crack Cryptographic
Functions - Most Nodes Well Behaved
- Attacker impervious to Smartcards
- Features
- Integrity Maintained
- Store Receipts
- Randomized Pastry Routing Scheme
- Routing Information Redundant
Source4
Ingenious?
10Storage Management - Overview
- Aims
- High Global Storage Utilization
- Graceful Degradation with Max Utilization
- Rely on Local Coordination
- Why is Storage Not Always uniform?
- Statistical Variations
- Size Distribution of Files
- Different Storage Capacities
- How much can a node store?
- Difference No more than Order 2 Mag
- Compare Advertised Storage Capacity with Leaf Set
- Use Cheap Hardware with 60 GB Avg
- Node Large?
- Split it
11Storage Management - Replication
- Replica Diversion
- Purpose? Balance remaining free storage
- Store Success?
- Forward to k-1 nodes
- Store Receipt
- Store Fail?
- Choose B in Non-Leaf Set
- B Stores, APointer to B
- Replacement Replicas
- Policies
- Acceptance of Replicas Locally
- Selection of Replica Nodes
- Decisions, Decisions
- File Diversion
- Balance Free Storage in NodeId Space
- Retry Insert Operation
12Storage Management Maintenance
- Maintenance
- K Copies of Inserted File
- Leaf Set
- Failures?
- Keep-Alive Messages
- Adjustments in Leaf-sets
- Nodes Please Give Me Replicas of All Files!
- Not Possible
- Time-Consuming and Inefficient
- Solutions
- Use Pointers to FileIds
- Assumption
- Total Amount of Storage in the System Never
Decreases
13Caching
- Goal
- Minimize Client Access Latencies
- Balance Query Load
- Maximize Query Throughput
- Creating Additional Replicas
- Where do you Cache?
- Use unused disk space
- Evict Cached Copies when necessary
- Insert into Cache If
- Size less than Fraction C of Node Storage
Capacity - Greedy-Dual Size Policy
14Performance Evaluation
- Implemented in Java
- Configured to Run in Single Java VM
- Hardware
- Compaq AlphaServer ES40
- True64 Unix
- 6 GB Main Memory
- Data
- 8 Web Proxy Logs from NLANR
- 4 Million Entries
- 18.7 GB Content
- Institutional File Systems
- 2 Million files
- 167 GB
15Results
SD/FN gt tdiv
- Storage
- Number of files Stored increases with
- Lower tpri
- Storage Utilization Drops
- Higher Rate of Insertion Failure
- Number of Diverted Replicas Small at High
Utilization - Caching
- Global Cache Hit Ratio Decreases as Storage
Utilization Increases
when tpri 0.1 and tdiv 0.05.
16References
- Images
- bahaiviews.blogspot.com/2006_02_01_archive.html
- http//images.jupiterimages.com/common/detail/21/0
5/22480521.jpg - http//www.masternewmedia.org/news/images/p2p_swar
ming.jpg - http//www.theage.com.au/news/national/smart-card-
back-on-the-agenda/2006/03/26/1143330931688.html
17Discussion
- Comparison with CFS and Ivy
- How can External Factors such as globally known
information help in Local Coordination?
18Colyseus A Distributed Architecture for Online
Multiplayer Games
Ashwin Bharambe, Jeffrey Pang, Srini Seshan
ACM/USENIX NSDI 2006
19Networked games are rapidly evolving
www.mmogchart.com
20Centralized Scheme
Slow paced games with less interaction between
server and client may scale well
- Not true of FPS games (e.g. Quake)
- Demand high interactivity
- Need a single game world
- High outgoing traffic at server
- Common shared state between clients
21Game Model
Think function
Mutable State
Ammo
Monsters
Game Status
Screenshot of Serious Sam
Player
22Distributed Architecture
- Create the replicas
- Discovery of objects
Object
Replica
23Replication
- Each object has a primary copy that resides on
exactly one node - Primary executes think function for the object
- Replicas are read-only
- Replicas are serialized at primary.
24Object Location
Publication
Subscription
Find objects in range x1,x2, y1,y2, z1,z2
My location is (x,y,z)
Challenge Overcome the delay between Subscription
and reception of Matching publication
25Distributed Hash Tables (DHT)
0xf0
0xe0
0x00
0xd0
0xc0
0x10
0xb0
0x20
0xa0
0x30
Finger pointer
0x90
0x40
0x80
O(log n) hops
0x50
0x70
0x60
26Using DHTs for Range Queries
- No cryptographic hashing for key ? identifier
Query 6 ? x ? 13 key 6 ? 0xab key 7 ?
0xd3 key 13 ? 0x12
0xf0
0xe0
0x00
0xd0
0x10
0xc0
Query 6 ? x ? 13
0xb0
0x20
0xa0
0x30
0x90
0x40
0x50
0x80
0x60
0x70
27Using DHTs for Range Queries
- Nodes in popular regions can be overloaded
- Load imbalance!
28DHTs with Load Balancing
- Load balancing strategy
- Re-adjust responsibilities
- Range ownerships are skewed!
29DHTs with Load Balancing
0xf0
0xe0
0xd0
0x00
Popular Region
0xb0
Finger pointers get skewed!
0x30
0xa0
0x90
- Each routing hop may not reduce node-space by
half! - ? no log(n) hop guarantee
0x80
30Ideal Link Structure
0xf0
0xe0
0xd0
0x00
Popular Region
0xb0
0x30
0xa0
0x90
0x80
31- Need to establish links based on node-distance
Values
v4
v8
4
8
Nodes
- If we had the above information
- For finger i
- Estimate value v for which 2i th node is
responsible
32Histogram Maintenance
0xf0
- Measure node-density locally
- Gossip about it!
0xe0
0xd0
0x00
(Range, density)
(Range, density)
(Range, density)
0xb0
Request sample
0x30
0xa0
0x90
0x80
0x70
33Load Balancing
Load histogram
Load
0
10
20
25
35
45
60
65
70
75
85
- Basic idea leave-join
- light nodes leave
- Re-join near heavy nodes split the range of
the heavier node
34Prefetching
- On-demand object discovery can cause stalls or
render an incorrect view - So, use game physics for prediction
- Predict which areas to move to and subscribe
objects from those areas
35Proactive Replication
- Standard object discovery and replica
instantiation slow for short-lived objects - Uses observation that most objects originate
close to creator - Piggyback object-creation messages to updates of
other objects
36Soft State Storage
- Objects need to tailor publication rate to speed
- Ammo or health-packs dont move much
- Add TTLs to subscriptions and publications
- Stored at the rendezvous node(s) Pubs act like
triggers to incoming subs
37Experimental Setup
- Emulab-based evaluation
- Synthetic game
- Workload based on Quake III traces
- P2P scenario
- 1 player per server
- Unlimited bandwidth
- Modeled end-to-end latencies
- More results including a Quake II evaluation, in
the paper
38Evaluation
Per-node Bandwidth Scaling
39View Inconsistency
40Discussion
- Bandwidth costs scale well with number of nodes
- As compared to single server model, more feasible
for P2P deployment - However, overall bandwidth costs are 4-5 higher.
So, there is overhead. - View inconsistency is small and gets quickly
repaired
41Discussion Questions
- Avenue for cheating
- Nodes can modify objects in local storage
- Nodes can withhold publications
- Nodes can subscribe to regions of world they
should not see - How scalable is the architecture?
- Feasibility in real world
42OverCite A Distributed, Cooperative CiteSeer
- Jeremy Stribling, Jinyang Li, Isaac G. Councill,
M. Frans Kaashoek, and Robert Morris
43Contents
- Introduction
- Characteristics of CiteSeer
- Problems and Possible Solutions
- Structure of OverCite
- Experimental Evaluation
44Introduction
- What is CiteSeer?
- Online Repository of Papers
- Crawls, Indexes, Links, Ranks Papers
- Periodically updates its Index with Newly
Discovered Documents - Stores Several Meta-Data Tables to
- Identify Tables
- Filter Out Duplicates
- Overcite
- CiteSeer Like System
- Provides
- Scalable and Load-Balanced Storage
- Automatic Data Management
- Efficient Query Processing
45Characteristics of CiteSeer - Problems
- 35 GB Network Traffic Per Day
- 1 TB of Disk Storage
- Significant Human Maintenance
- Coordinating Crawling Activities Across All Sites
- Reducing Inter-Site Communication
- Parallelizing Storage to Minimize Per-Site Burden
- Tolerating Network and Site Failures
- Adding New Resources Difficult
46Possible Solutions
- Mentioned Solutions
- Donate Resources
- Run your Own Mirrors
- Partitioning Network
- Use Content Distribution Networks
47Structure of OverCite
- 3-Tier DHT Backed Design
- Web-based Front End
- Application Server
- DHT Back-End
- Multi-Site Deployment of CiteSeer
- Indexed Keyword Search
- Parallelized Similar to Cluster based Search
Engines
48Features of Search and Crawl
- Crawling
- Coordinate via DHT
- Searching
- Divide Docs into Partitions, hosts into Groups
- Less Search Work per Host
49OverCite DHT Storage and Partitioning
- Stores Papers For Durability
- Meta Data Tables As
- Document ID ? Title etc
- Partitioning
- By Document
- Dividing Index into K Partitions
- Each Query
- Sent to K - Nodes
50OverCite Implementation and Deployment
- Storage Chord/D Hash/DHT
- Index Search Search Engine
- Web Server OKWS
- Deployment
- 27 Nodes Across North America
- 9 RON/IRIS nodes and Private Machines
- 47 Physical Disk and 3 DHash nodes per disk
51Evaluation and Results
- Clients
- 1 at MIT
- 1000 Queries from Cs Trace
- 11000 Lines of c code
- 9 Web-Front End, 18 Index and 27 DHT Servers
System Wide Storage Overhead
52Thank You