Title: The Spread of Media Content through the Blogosphere
1 The Spread of Media Content through the
Blogosphere
TU Berlin Deutsche Telekom Lab
Flash Floods and Ripples
Meeyoung Cha
Juan A. Navarro Max Planck Institute for Software
Systems (MPI-SWS)
Hamed Haddadi
ICWSM Data Challenge 2009
2Motivation
- Blogs play a significant role in todays Internet
culture - Blogs are used for information propagation
purposes - Discuss political issues
- Review new products and online contents
- Form communities and special interest groups
- Increasingly, media content is shared through
blogs
How does content spread in blogs? What kinds of
content are shared?
3Our goal
- Characterize how the structure of the blogosphere
influences the patterns of content spreading - 1. Understand the structure of the blogosphere
- Is the structure ideal for content dissemination?
- 2. Understand the spreading patterns of content
- What types of content spread?
- How quickly does content spread?
4Part1. Measurementmethodology
Part2. Analysis of network properties
Part3. Analysis of spreading patterns
5Spinn3r dataset
- Extracted post URL, site, host, language,
timestamps, etc. - Step1 Focus on top 15 blog domains
- Step2 Scrape content to find embedded HTML
links - Code available at http//www.mpi-sws.org/jnavarr
o/tools/ - Limitations
- Comments and blogrolls missing
- Some blogs only post summaries
- Only used dataset with numbered tiers
6Step1 Top 15 blog sites
Total
7Step2 Extracting HTML links
Links tomedia content
Links toother blogs
8Part1. Measurementmethodology
Part2. Analysis of network properties
Part3. Analysis of spreading patterns
9Network of blogs
Directed network of 85,013 nodes and 129,079 edges
A
B
10Network structure
- Average node degree 1.5
- Power-law degree distribution
- 6 of links are reciprocal
- 35 of links cross blog domains
- 7 of links cross language boundaries
73 of blogs in the largest connected
component
11Network structure 2
- Density Ratio of observed links, out of all
possible links
Network structure is more sparse than social
networks
12Insights for information propagation
- Sparse structure power-law degree distribution
- Clear preference for bloggers to particular
topics or sources - Trend setters (high in-degree) and recommenders
(high out-degree) - Potential factors that can limit spreading
- Blog domains had no visible effect on linking
- Language barriers inhibit the flow of information
13Part1. Measurementmethodology
Part2. Analysis of network properties
Part3. Analysis of spreading patterns
14Spreading of media content
- What types of content are shared?
- How quickly does information spread?
media
15Types of content shared
16Popularity of YouTube videos
- Video popularity follows a power-law
distribution - Very large diffusion processes exist
- Preferential attachment may drive linking
17Popular video categories
- We downloaded metadata of top 10,000 videos
Musicmost popular
Still spread!
Keen onpolitics
18Time lag in the spread of videos
Flash floods
Ripples
19Example spreading pattern
Blogs linking the same video are connected
Diffusion through the blogosphere
- McCains political campaignlinked by 79 blogs
20Insights from spreading patterns
- Videos in different genres spread with very
different patterns - Flash floods found quickly and spread rapidly
- Ripples took longer to spread, re-discovered
years after upload - Diffusion through links in the blogosphere
- 24 of videos had any spreading in the blog graph
- Other spreading factors featuring and search
21Part1. Measurementmethodology
Part2. Analysis of network properties
Part3. Analysis of spreading patterns
22Conclusion
- Identified spreading patterns and factors that
limit spreading - Blogs serve as a medium to filter and spread
media content - Potential implication Recommendation systems
can take into account and exploit different
spreading patterns - Future work spreading patterns of other types of
content