FeedEx: Collaborative Exchange of News Feeds - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

FeedEx: Collaborative Exchange of News Feeds

Description:

RSS/Atom feeds have become increasingly popular. Published by most traditional media and blogs ... Overload due to RSS feed delivery? Only a small text file delivery ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 28
Provided by: Seun77
Category:

less

Transcript and Presenter's Notes

Title: FeedEx: Collaborative Exchange of News Feeds


1
FeedEx Collaborative Exchange of News Feeds
  • Seung Jun, Mustaque Ahamad
  • Georgia Institute of Technology
  • WWW 2006

2
Outline
  • One line comment
  • Motivation/Problem
  • Approach
  • Analysis of feed publishing
  • Challenges
  • Experiments
  • Critique

3
One line comment
  • Disseminate web feeds in a distributed (P2P)
    manner to increase scalability of web servers

Traditional method
P2P method
RSS
A
B
A
B
RSS reveals visitors to content providers RSS
decoupled fetch operation from read
4
Motivation Problem
  • RSS/Atom feeds have become increasingly popular
  • Published by most traditional media and blogs
  • Feeding mechanism

nyt.com
http//nyt.com/../feed.xml
HTTP request
HTTP response


Update page as contents are added
RSS reader Poll server to check updates
5
Approach
  • The Approach
  • P2P overlay gossip based protocol
  • P2P Scalable growth in resources with service
    demand
  • Gossip Scalable, Robustness (Join Leave)
  • Feature of this overlay
  • Dont have to guarantee delivery or delay
  • Challenges

content searching
Data dissemination
?
Free riding prevention
Fetching interval determination
Overlay construction
6
Analysis of Feed Publishing
  • Methodology
  • 245 popular feeds monitored for 10 days
  • Most popular feeds information from Gmails web
    clips, Bloglines
  • Feeds fetched every 2 minutes
  • Measured..
  • Publishing rate
  • Entry count in a feed
  • Entry lifetime

7
Publishing Rate by Rank
  • Great difference between publishers
  • Partly zipf distribution

8
Entry Count
  • High publish rate, More entry counts? NO
  • Lifetime of entries are short ? Entries can be
    lost with infrequent requests

9
Publishing Rate by Time
  • 4 types of publishing patterns

10
Challenges Overlay Construction (1/2)
  • Goal Minimize network management overhead
  • Join
  • Well known host
  • OR Contact previous neighbors
  • Share subscription set info
  • Update subscription set info to the network
  • Leave
  • Soft-state
  • Update subscription set periodically

Gateway
Neighbor list
Subscription set
dest hop
CNN 0
dest hop
YAHO 0
HANI 1
dest hop
CNN 1
11
Challenges Overlay Construction (1/2)
  • Neighbor selection
  • Many neighbors may incur overhead
  • Need to adapt to my resource status
  • select useful neighbors to me
  • Whose subscription set is similar to me

HANI 0
CNN 0
YAHOO 0
DAUM 0
A
1 direct, 1 one-hop, 1 two-hop
NCLAB 0
CNN 0
HANI 1
DAUM 2
B
12
Challenges Fetching interval determination
  • Adaptive Fetching
  • Problem Little hints about the publishing rate
    or entry lifetime
  • Frequent polling overload servers, consume
    clients net bandwidth
  • Lazy polling increase delay or miss entries
  • Adaptive Algorithm
  • Intuition Frequent fetching ? few new entries
  • Freshness rate fraction of new entries in the
    fetched document
  • If Freshness rate lt target freshness ? Halve the
    fetching rate
  • If Freshness rate gt target freshness ? Double the
    fetching rate

Entries in a feed
HANI
  1. Report 1
  2. Report 2
  3. Report 3

Fetch
13
Challenges Data dissemination
  • Goal Minimize bandwidth consumption
  • Limit the boundary of delivery
  • Forward only to matching neighbors (subscription
    set, hop_count)
  • ? reduce forwarding overhead
  • Reduce the unit of delivery
  • Unit of delivery Entry bundle
  • A set of new entries (Filter out old entries)
  • ? Reduce redundant content delivery
  • Check before forwarding
  • Exchange id of an entry bundle (ID SHA-1 digest
    of the bundle)
  • If it is an undelivered bundle ? deliver it

HANI 0
HANI 2
Max subset hops 1
HANI 0
HANI
HANI 1
Fetch
14
Challenges Free riding prevention
  • Nodes may manifest selfish behavior
  • Only receive, without forwarding
  • Lie subscription set to become a preferred
    neighbor
  • Solution Provide a neighbor evaluation method
  • Contribution metric
  • Nodes who forwards feeds I subscribe, and my near
    neighbors subscribe
  • Level of contribution direct subscription, 1 hop
    subscription, 2 hop sub,
  • cmi, j wf -hf
  • Cut out unhelpful neighbors I helped, but it
    doesnt helped me
  • di,j cmi,j - cmj,i
  • Feature
  • Uses local information only
  • ? Easy to implement and enforce the mechanism

15
Challenges Entry searching
  • Overlay as a distributed storage
  • Iterative searching
  • Strong points Searching latency, query traffic
  • Recursive searching (flooding)
  • Strong points low overhead of a requester,
    caching for popular queries, reflect to neighbor
    evaluation

?
16
Benefits of FeedEx
  • Scalability
  • Archivability
  • Storage of entries
  • Controllability
  • Compared to web based readers e.g. Fetch
    interval
  • Filtering and recommendation
  • Share opinions on entries (e.g. voting)
  • Feed recommendation
  • Privacy
  • Users can fetch documents for others
  • ? anonymize actual users

17
Architecture of FeedEx
  • Prototpye python
  • Networking Twisted
  • Protocol XML-RPC
  • Interoperability, fast-prototyping
  • Entry Storage SQLite (Lightweight RDB)
  • RSS parser feedparser.org

18
Experimental Setup
  • Two modes
  • Stand-alone mode ? SLN
  • FeedEx mode ? XCH
  • Metrics
  • Time lag
  • Missing entries
  • Communication cost
  • Experiments
  • Use 189 PlanetLab nodes
  • Run 22 hours on a weekday
  • Primary factor 6 fetching intervals
  • Let each node subscribe 20 out of 70 feeds

19
Results Time Lag
  • Average Time Lag
  • Average of node averages
  • Without applying adaptive fetching algorithm
  • ? Despite of fetching interval, contents are
    delivered soon

15.8times
20
Results Missing Entries
  • Rate of Missing entries
  • enrtries in a node / of entries in a
    reference node
  • Low missing rate
  • despite of a problem(DNS error or routing error)
    in the network
  • Sometimes better than the reference node

21
Results Communication Cost
  • Two most frequently called precedures check_did,
    put_entries
  • Check_did call single IP packet
  • Put_entries 2 calls / minute ? deliver 2.67
    entries / call
  • Low communication cost

22
Critique
  • Strong points
  • Made an new problem from an old domain web
    caching
  • Free from delay / failure of nodes
  • Draw out possible benefits/extensions
  • simple!
  • Practically deployable
  • Tried to find a mechanism both good for servers
    and clients

23
Critique
  • Weak points
  • Overload due to RSS feed delivery?
  • Only a small text file delivery
  • Should have considered podcasting(Multimedia RSS)
  • Will the clients donate their resource?
  • Is short delay a strong incentive?
  • Is low bandwidth consumption a strong
    incentive?
  • Will the subscription sets of people really
    overlap a lot?
  • Net effective to SPs providing diverse RSS feeds
  • e.g. Naver blog, egloos..
  • Is it really robust to frequent leave and join?
  • Lack of server side evaluation
  • Server load network resource
  • Delivering critical data (e.g. timely news) using
    RSS?

24
Supplementary slides
25
Entry Lifetime
  • Generally CNN,
  • Publishers have policies (probably)

26
New idea
  • Topic based feed pub/sub system
  • Why should we register the address of a feed?
  • Need to find addresses providing contents I want
  • A feed may contain contents that I dont want

27
New idea
  • Topic based feeding services are already launched
  • Baebo
  • Create new feeds by keywords from the Amazon,
    Yahoo, eBay feeds
  • Say4
  • Extract entries containing sentences in the bible
    from the BBC feed.
  • But centralized server runs the service
  • Limitation in the number of input feeds
  • Hard to add input feed dynamically compared to
    P2P approach
Write a Comment
User Comments (0)
About PowerShow.com