Stochastic Models of UserContributory Web Sites - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Stochastic Models of UserContributory Web Sites

Description:

and view their friends' activity on Digg. Directed social network ... Others vote on (digg) the story ... behavior for Digg, Wikipedia, YouTube, ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 35
Provided by: Kristin216
Category:

less

Transcript and Presenter's Notes

Title: Stochastic Models of UserContributory Web Sites


1
Stochastic Models of User-Contributory Web Sites
  • Tad Hogg
  • HP Labs
  • Kristina Lerman
  • USC Information Sciences Institute

2
The Social Web
Bugzilla
essembly
delicious
wisdom of crowds
3
Activities
  • View existing content
  • Rate existing content
  • simple vote
  • complex write a review
  • Add new content
  • Link to other users

focus of this presentation
4
Aggregate group behavior
  • Determines structure and usefulness of
    user-participatory sites
  • Models enable
  • Predicting trends or behaviors
  • E.g., which newly contributed content will become
    popular
  • Designing web sites
  • E.g., productive information displays
  • Altering user incentives
  • E.g., improve content quality or participation

5
Stochastic Modeling summary
  • Start with individual user behavior
  • Specify states and transitions between states
  • Determine collective behavior
  • Aggregate behavior of interest
  • Individual user behaviors create transitions
    among aggregate states
  • Rate equations give dynamics
  • How average collective behavior changes in time
  • How collective behavior depends on user
    characteristics

6
Illustration Stochastic Model of Digg
  • Phenomenology of Digg
  • Users submit and vote on news stories
  • Digg promotes popular stories to front page
  • Digg allows social networking
  • Users can designate Friends
  • and view their friends activity on Digg
  • Directed social network
  • Friends of user A are everyone A is watching
  • Fans of A are all users who are watching A

Alices friend
Alice
Bob
Bobs fan
7
Lifecycle of a story
  • User submits a story to the Upcoming Stories
    queue
  • Others vote on (digg) the story
  • If story accumulates enough votes in short time,
    it is promoted to the Front page
  • The Friends Interface lets users see
  • Stories friends submitted
  • Stories friends voted on,

8
Model of Digg voting behavior
  • Stochastic model based on Digg user interface
  • visibility and interestingness ? votes
  • Extension to prior model Lerman 2007
  • law of surfing for viewing web pages Huberman
    et al, 1998
  • instead of geometric distribution
  • incremental average growth in number of voters
    fans
  • i.e., people who can see story via friends
    interface
  • Related work aggregate phenomenological models
  • behavior for Digg, Wikipedia, YouTube, .
  • e.g., Wu Huberman 2007 Crane Sornette 2008
    Wilkinson 2008

9
Voting on stories
  • combination of
  • visibility does user see the story?
  • user interface
  • browse
  • recommended by friends
  • search
  • interest does user like the story?
  • novelty,

10
Story location
  • Digg shows stories as lists
  • most recent first
  • 15 stories per page
  • user must click to view subsequent pages
  • visibility decreases with distance from top of
    list
  • A given story
  • moves down the list as new stories added
  • eventually moves to later pages
  • switches from upcoming to top of front page if
    promoted

11
User behavioral model
upcomingq
upcoming1

r
c
n
r
front1
Ø
frontp

vote
r
wS
friends
12
Dynamical model of aggregate behavior
  • How number of votes Nvote(t) for a story changes
  • nf - rate users find story on the front page
    queue
  • nu - rate users find story on the upcoming
    stories queue
  • nfriends - rate users find story through the
    friends interface
  • r fraction of users who see the story choose to
    vote for it

visibility
13
Estimating model parameters
  • Need model parameters for
  • Story visibility
  • Story interestingness
  • Estimate from behavior of sample of users

14
Digg data set
  • Stories from front and upcoming pages
  • number of votes vs. time since submission
  • for several days in May 2006
  • prior to availability of Digg API
  • sampled more extensively from front than upcoming
    pages
  • Number of fans for active users
  • 2152 stories with at least 4 observations
  • submitted by 1212 distinct users
  • 510 of these stories promoted to front page

15
Story visibility
  • User viewing behavior not available
  • which stories users look at
  • how they find stories
  • front page, friends interface,
  • Estimate indirectly from models data

16
Modeling story visibility
  • Story location
  • Navigating web sites
  • Number of fans

17
Story location vs. time in each list
  • For upcoming and front page lists
  • location on page (1 to 15), which page (1st, 2nd,
    )
  • distance from top of list increases linearly with
    time
  • Rate story position increases
  • front page 0.2 pages/hr
  • upcoming 4 pages/hr
  • 1/15th the rates new stories are
  • promoted to front page (3/hr)
  • submitted as new stories (60/hr)
  • since each page holds 15 stories
  • Averages over hourly variation
  • Szabo Huberman 2008

examples
18
Story location promotion to front page
  • Digg promotion decision algorithm not public
  • based on popularity expressed by user votes
  • Approximation from data
  • story promoted if
  • at least 40 votes within 24 hours of submission

19
Modeling story visibility
  • Story location
  • Navigating web sites
  • Number of fans

20
Navigating through a web site
  • Empirical model of user following links on a Web
    site
  • law of surfing Huberman et al. 1998
  • Inverse Gaussian distribution of pages viewed
    before leaving web site

few users go beyond 1st page
parameters estimated from Digg data model
21
Modeling story visibility
  • Story location
  • Navigating web sites
  • Number of fans visibility via friends interface

22
Story visibility via friends interface
  • Each voter enables their fans to see story
  • via friends interface
  • Model of number of fans not yet viewing story,
    s(t)
  • based on number of votes on the story
  • story visible to submitters fans at submission
    time s(0)

fans of prior voters visit Digg
new fans from new votes
23
Story interestingness
  • Reasons users vote for story not available, e.g.,
  • topic
  • novelty Wu Huberman 2007
  • popularity (determining interest, not just
    visibility)
  • e.g., cool fashion or gadgets
  • One approach web-based experiments
  • e.g., Salganik et al. 2006
  • Estimate from models data
  • from vote history after accounting for visibility

24
Model results
25
Solutions votes vs. time
model vs. observations for 6 stories
  • model captures qualitative features
  • slow growth initially
  • influence of fans on promotion
  • rapid growth if story promoted (much more
    visible to users)

26
Model requirements for promotion
  • Values of S and r to get the story on front page

27
Promotion to front page model prediction vs.
data 95 accurate
promotion threshold from model
logarithmic scale
most stories not promoted, and from people with
no fans
28
Additional model insights
  • Heterogeneity
  • users activity
  • content quality (interestingness)
  • Predictability from early reactions to new story

29
Story interestingness
  • Long-tail distribution (lognormal)
  • a few stories much more interesting than average
  • after accounting for visibility via user
    interface part of model
  • Open question why?
  • A multiplicative process underlying user
    interests?

30
Predictions from early behavior
  • Estimate story interestingness
  • from full history, or
  • using initial votes
  • Behavior predictable from early reaction to story
  • also with YouTube
  • e.g., Crane Sornette 2008 Lerman Galstyan
    2008 Szabo Huberman 2008

example use first 4 observations r estimates
correlate 0.9 with those based on full
history prediction of final votes account for
75 of variance rms prediction error 244 votes
31
Model based on votes only?
  • Estimate based on initial votes only
  • not including visibility model
  • i.e., ignore effects of law of surfing and
    social network

32
Model based on votes only?
full model is better than not including
visibility (differences significant, p-value
lt10-4)
33
Future work on models of activities new content
links
  • View existing content
  • Rate existing content
  • Add new content
  • What motivates high-quality contribution?
  • Link to other users
  • How do users chose who to link to?
  • What does link signify?
  • common interests?
  • trust in recommendations?

focus of this presentation
34
Conclusion
  • Stochastic process approach
  • connect user and system behaviors
  • Applicability
  • users have limited information and actions
  • limited use of personalized history
  • e.g., user communities on the web
  • not face-to-face small group interactions
  • Example news aggregator Digg
  • votes from visibility interestingness
  • user model from info and actions provided by Digg
    UI
Write a Comment
User Comments (0)
About PowerShow.com