Title: FeedEx: Collaborative Exchange of News Feeds
1FeedEx Collaborative Exchange of News Feeds
- Seung Jun, Mustaque Ahamad
- Georgia Institute of Technology
- WWW 2006
2Outline
- One line comment
- Motivation/Problem
- Approach
- Analysis of feed publishing
- Challenges
- Experiments
- Critique
3One line comment
- Disseminate web feeds in a distributed (P2P)
manner to increase scalability of web servers
Traditional method
P2P method
RSS
A
B
A
B
RSS reveals visitors to content providers RSS
decoupled fetch operation from read
4Motivation Problem
- RSS/Atom feeds have become increasingly popular
- Published by most traditional media and blogs
- Feeding mechanism
nyt.com
http//nyt.com/../feed.xml
HTTP request
HTTP response
Update page as contents are added
RSS reader Poll server to check updates
5Approach
- The Approach
- P2P overlay gossip based protocol
- P2P Scalable growth in resources with service
demand - Gossip Scalable, Robustness (Join Leave)
- Feature of this overlay
- Dont have to guarantee delivery or delay
- Challenges
content searching
Data dissemination
?
Free riding prevention
Fetching interval determination
Overlay construction
6Analysis of Feed Publishing
- Methodology
- 245 popular feeds monitored for 10 days
- Most popular feeds information from Gmails web
clips, Bloglines - Feeds fetched every 2 minutes
- Measured..
- Publishing rate
- Entry count in a feed
- Entry lifetime
7Publishing Rate by Rank
- Great difference between publishers
- Partly zipf distribution
8Entry Count
- High publish rate, More entry counts? NO
- Lifetime of entries are short ? Entries can be
lost with infrequent requests
9Publishing Rate by Time
- 4 types of publishing patterns
10Challenges Overlay Construction (1/2)
- Goal Minimize network management overhead
- Join
- Well known host
- OR Contact previous neighbors
- Share subscription set info
- Update subscription set info to the network
- Leave
- Soft-state
- Update subscription set periodically
Gateway
Neighbor list
Subscription set
dest hop
CNN 0
dest hop
YAHO 0
HANI 1
dest hop
CNN 1
11Challenges Overlay Construction (1/2)
- Neighbor selection
- Many neighbors may incur overhead
- Need to adapt to my resource status
- select useful neighbors to me
- Whose subscription set is similar to me
HANI 0
CNN 0
YAHOO 0
DAUM 0
A
1 direct, 1 one-hop, 1 two-hop
NCLAB 0
CNN 0
HANI 1
DAUM 2
B
12Challenges Fetching interval determination
- Adaptive Fetching
- Problem Little hints about the publishing rate
or entry lifetime - Frequent polling overload servers, consume
clients net bandwidth - Lazy polling increase delay or miss entries
- Adaptive Algorithm
- Intuition Frequent fetching ? few new entries
- Freshness rate fraction of new entries in the
fetched document - If Freshness rate lt target freshness ? Halve the
fetching rate - If Freshness rate gt target freshness ? Double the
fetching rate
Entries in a feed
HANI
- Report 1
- Report 2
- Report 3
-
Fetch
13Challenges Data dissemination
- Goal Minimize bandwidth consumption
- Limit the boundary of delivery
- Forward only to matching neighbors (subscription
set, hop_count) - ? reduce forwarding overhead
- Reduce the unit of delivery
- Unit of delivery Entry bundle
- A set of new entries (Filter out old entries)
- ? Reduce redundant content delivery
- Check before forwarding
- Exchange id of an entry bundle (ID SHA-1 digest
of the bundle) - If it is an undelivered bundle ? deliver it
HANI 0
HANI 2
Max subset hops 1
HANI 0
HANI
HANI 1
Fetch
14Challenges Free riding prevention
- Nodes may manifest selfish behavior
- Only receive, without forwarding
- Lie subscription set to become a preferred
neighbor - Solution Provide a neighbor evaluation method
- Contribution metric
- Nodes who forwards feeds I subscribe, and my near
neighbors subscribe - Level of contribution direct subscription, 1 hop
subscription, 2 hop sub, - cmi, j wf -hf
- Cut out unhelpful neighbors I helped, but it
doesnt helped me - di,j cmi,j - cmj,i
- Feature
- Uses local information only
- ? Easy to implement and enforce the mechanism
15Challenges Entry searching
- Overlay as a distributed storage
- Iterative searching
- Strong points Searching latency, query traffic
- Recursive searching (flooding)
- Strong points low overhead of a requester,
caching for popular queries, reflect to neighbor
evaluation
?
16Benefits of FeedEx
- Scalability
- Archivability
- Storage of entries
- Controllability
- Compared to web based readers e.g. Fetch
interval - Filtering and recommendation
- Share opinions on entries (e.g. voting)
- Feed recommendation
- Privacy
- Users can fetch documents for others
- ? anonymize actual users
17Architecture of FeedEx
- Prototpye python
- Networking Twisted
- Protocol XML-RPC
- Interoperability, fast-prototyping
- Entry Storage SQLite (Lightweight RDB)
- RSS parser feedparser.org
18Experimental Setup
- Two modes
- Stand-alone mode ? SLN
- FeedEx mode ? XCH
- Metrics
- Time lag
- Missing entries
- Communication cost
- Experiments
- Use 189 PlanetLab nodes
- Run 22 hours on a weekday
- Primary factor 6 fetching intervals
- Let each node subscribe 20 out of 70 feeds
19Results Time Lag
- Average Time Lag
- Average of node averages
- Without applying adaptive fetching algorithm
- ? Despite of fetching interval, contents are
delivered soon
15.8times
20Results Missing Entries
- Rate of Missing entries
- enrtries in a node / of entries in a
reference node - Low missing rate
- despite of a problem(DNS error or routing error)
in the network - Sometimes better than the reference node
21Results Communication Cost
- Two most frequently called precedures check_did,
put_entries - Check_did call single IP packet
- Put_entries 2 calls / minute ? deliver 2.67
entries / call - Low communication cost
22Critique
- Strong points
- Made an new problem from an old domain web
caching - Free from delay / failure of nodes
- Draw out possible benefits/extensions
- simple!
- Practically deployable
- Tried to find a mechanism both good for servers
and clients
23Critique
- Weak points
- Overload due to RSS feed delivery?
- Only a small text file delivery
- Should have considered podcasting(Multimedia RSS)
- Will the clients donate their resource?
- Is short delay a strong incentive?
- Is low bandwidth consumption a strong
incentive? - Will the subscription sets of people really
overlap a lot? - Net effective to SPs providing diverse RSS feeds
- e.g. Naver blog, egloos..
- Is it really robust to frequent leave and join?
- Lack of server side evaluation
- Server load network resource
- Delivering critical data (e.g. timely news) using
RSS?
24Supplementary slides
25Entry Lifetime
- Generally CNN,
- Publishers have policies (probably)
26New idea
- Topic based feed pub/sub system
- Why should we register the address of a feed?
- Need to find addresses providing contents I want
- A feed may contain contents that I dont want
27New idea
- Topic based feeding services are already launched
- Baebo
- Create new feeds by keywords from the Amazon,
Yahoo, eBay feeds - Say4
- Extract entries containing sentences in the bible
from the BBC feed. - But centralized server runs the service
- Limitation in the number of input feeds
- Hard to add input feed dynamically compared to
P2P approach