Title: Stochastic Models of UserContributory Web Sites
1Stochastic Models of User-Contributory Web Sites
- Tad Hogg
- HP Labs
- Kristina Lerman
- USC Information Sciences Institute
2The Social Web
Bugzilla
essembly
delicious
wisdom of crowds
3Activities
- View existing content
- Rate existing content
- simple vote
- complex write a review
- Add new content
- Link to other users
focus of this presentation
4Aggregate group behavior
- Determines structure and usefulness of
user-participatory sites - Models enable
- Predicting trends or behaviors
- E.g., which newly contributed content will become
popular - Designing web sites
- E.g., productive information displays
- Altering user incentives
- E.g., improve content quality or participation
5Stochastic Modeling summary
- Start with individual user behavior
- Specify states and transitions between states
- Determine collective behavior
- Aggregate behavior of interest
- Individual user behaviors create transitions
among aggregate states - Rate equations give dynamics
- How average collective behavior changes in time
- How collective behavior depends on user
characteristics
6Illustration Stochastic Model of Digg
- Phenomenology of Digg
- Users submit and vote on news stories
- Digg promotes popular stories to front page
- Digg allows social networking
- Users can designate Friends
- and view their friends activity on Digg
- Directed social network
- Friends of user A are everyone A is watching
- Fans of A are all users who are watching A
Alices friend
Alice
Bob
Bobs fan
7Lifecycle of a story
- User submits a story to the Upcoming Stories
queue - Others vote on (digg) the story
- If story accumulates enough votes in short time,
it is promoted to the Front page - The Friends Interface lets users see
- Stories friends submitted
- Stories friends voted on,
8Model of Digg voting behavior
- Stochastic model based on Digg user interface
- visibility and interestingness ? votes
- Extension to prior model Lerman 2007
- law of surfing for viewing web pages Huberman
et al, 1998 - instead of geometric distribution
- incremental average growth in number of voters
fans - i.e., people who can see story via friends
interface - Related work aggregate phenomenological models
- behavior for Digg, Wikipedia, YouTube, .
- e.g., Wu Huberman 2007 Crane Sornette 2008
Wilkinson 2008
9Voting on stories
- combination of
- visibility does user see the story?
- user interface
- browse
- recommended by friends
- search
- interest does user like the story?
- novelty,
10Story location
- Digg shows stories as lists
- most recent first
- 15 stories per page
- user must click to view subsequent pages
- visibility decreases with distance from top of
list - A given story
- moves down the list as new stories added
- eventually moves to later pages
- switches from upcoming to top of front page if
promoted
11User behavioral model
upcomingq
upcoming1
r
c
n
r
front1
Ø
frontp
vote
r
wS
friends
12Dynamical model of aggregate behavior
- How number of votes Nvote(t) for a story changes
- nf - rate users find story on the front page
queue - nu - rate users find story on the upcoming
stories queue - nfriends - rate users find story through the
friends interface - r fraction of users who see the story choose to
vote for it
visibility
13Estimating model parameters
- Need model parameters for
- Story visibility
- Story interestingness
- Estimate from behavior of sample of users
14Digg data set
- Stories from front and upcoming pages
- number of votes vs. time since submission
- for several days in May 2006
- prior to availability of Digg API
- sampled more extensively from front than upcoming
pages - Number of fans for active users
- 2152 stories with at least 4 observations
- submitted by 1212 distinct users
- 510 of these stories promoted to front page
15Story visibility
- User viewing behavior not available
- which stories users look at
- how they find stories
- front page, friends interface,
- Estimate indirectly from models data
16Modeling story visibility
- Story location
- Navigating web sites
- Number of fans
17Story location vs. time in each list
- For upcoming and front page lists
- location on page (1 to 15), which page (1st, 2nd,
) - distance from top of list increases linearly with
time
- Rate story position increases
- front page 0.2 pages/hr
- upcoming 4 pages/hr
- 1/15th the rates new stories are
- promoted to front page (3/hr)
- submitted as new stories (60/hr)
- since each page holds 15 stories
- Averages over hourly variation
- Szabo Huberman 2008
examples
18Story location promotion to front page
- Digg promotion decision algorithm not public
- based on popularity expressed by user votes
- Approximation from data
- story promoted if
- at least 40 votes within 24 hours of submission
19Modeling story visibility
- Story location
- Navigating web sites
- Number of fans
20Navigating through a web site
- Empirical model of user following links on a Web
site - law of surfing Huberman et al. 1998
- Inverse Gaussian distribution of pages viewed
before leaving web site
few users go beyond 1st page
parameters estimated from Digg data model
21Modeling story visibility
- Story location
- Navigating web sites
- Number of fans visibility via friends interface
22Story visibility via friends interface
- Each voter enables their fans to see story
- via friends interface
- Model of number of fans not yet viewing story,
s(t) - based on number of votes on the story
- story visible to submitters fans at submission
time s(0)
fans of prior voters visit Digg
new fans from new votes
23Story interestingness
- Reasons users vote for story not available, e.g.,
- topic
- novelty Wu Huberman 2007
- popularity (determining interest, not just
visibility) - e.g., cool fashion or gadgets
-
- One approach web-based experiments
- e.g., Salganik et al. 2006
- Estimate from models data
- from vote history after accounting for visibility
24Model results
25Solutions votes vs. time
model vs. observations for 6 stories
- model captures qualitative features
- slow growth initially
- influence of fans on promotion
- rapid growth if story promoted (much more
visible to users)
26Model requirements for promotion
- Values of S and r to get the story on front page
27Promotion to front page model prediction vs.
data 95 accurate
promotion threshold from model
logarithmic scale
most stories not promoted, and from people with
no fans
28Additional model insights
- Heterogeneity
- users activity
- content quality (interestingness)
- Predictability from early reactions to new story
29Story interestingness
- Long-tail distribution (lognormal)
- a few stories much more interesting than average
- after accounting for visibility via user
interface part of model - Open question why?
- A multiplicative process underlying user
interests?
30Predictions from early behavior
- Estimate story interestingness
- from full history, or
- using initial votes
- Behavior predictable from early reaction to story
- also with YouTube
- e.g., Crane Sornette 2008 Lerman Galstyan
2008 Szabo Huberman 2008
example use first 4 observations r estimates
correlate 0.9 with those based on full
history prediction of final votes account for
75 of variance rms prediction error 244 votes
31Model based on votes only?
- Estimate based on initial votes only
- not including visibility model
- i.e., ignore effects of law of surfing and
social network
32Model based on votes only?
full model is better than not including
visibility (differences significant, p-value
lt10-4)
33Future work on models of activities new content
links
- View existing content
- Rate existing content
- Add new content
- What motivates high-quality contribution?
- Link to other users
- How do users chose who to link to?
- What does link signify?
- common interests?
- trust in recommendations?
focus of this presentation
34Conclusion
- Stochastic process approach
- connect user and system behaviors
- Applicability
- users have limited information and actions
- limited use of personalized history
- e.g., user communities on the web
- not face-to-face small group interactions
- Example news aggregator Digg
- votes from visibility interestingness
- user model from info and actions provided by Digg
UI