Title: Designing for Discovery: Dealing with the Live Web
1Designing for DiscoveryDealing with the Live Web
- David L. Sifry
- CEO, Technorati Inc.
- dave_at_technorati.com
2Weblogs CumulativeMarch 2003 - April 2006
Doubling
34.5 Million Weblogs Tracked Doubling in size
approx. every 6 months Consistent doubling over
the last 42 months The blogosphere is over 60
times larger than it was 3 years ago
Doubling
Doubling
Doubling
Doubling
Doubling
Doubling
3New Blogs per Day
160,000
- As of April 2006 over 75,000 blogs were created
daily. - A new weblog is created about every second.
- - 55 of new bloggers are still posting 3 months
later. - 11 of all blogs update weekly (or more).
- About 9 of new blogs are spam (red spikes)
- 60 of pings are from known spam sources
- Technorati blocks these spam pings before
- they even become splogs
140,000
120,000
100,000
80,000
60,000
40,000
20,000
0
1/1/04
2/1/04
3/1/04
4/1/04
5/1/04
6/1/04
7/1/04
8/1/04
9/1/04
1/1/05
2/1/05
3/1/05
4/1/05
5/1/05
6/1/05
7/1/05
8/1/05
9/1/05
1/1/06
2/1/06
3/1/06
4/1/06
10/1/04
11/1/04
12/1/04
10/1/05
11/1/05
12/1/05
4Daily Posting Volume
Superbowl
1.2 Million legitimate Posts/Day About 50,000
postings per hour
London Bombings
2006 State of The Union
Intel Mac
Justice OConnor Live 8 Concerts
Hurricane Katrina
iPod Video, Iraqi Elections
Deepthroat Revealed
Kryptonite Lock Controversy
Newsweek Koran
Schiavo Dies
US Election Day
Superbowl
Indian Ocean Tsunami
5Blue Mainstream Media Red Blog
6(No Transcript)
7Posts by Language
8Posts by Language
9Posts by Language
10Posts by Language
11Posts by Language
12Posts by Language
13Posts by Language
14(No Transcript)
15Almost one half of blog posts use tags or
categories Over 100 Million tagged posts,
growing at about 560k/day
16What does this mean?
- And how does all this data change infrastructure
/ instrumentation / metrics?
17Understanding your users
- Where are they?
- What do they care about?
- What do they search for?
- How often do they come back?
- What are they looking at?
18Understanding your data set
- How do you deal with realtime data?
- What can you learn from data streams?
- How does your users data affect those streams?
- How do you create or amplify network effects?
19What metrics are most valuable?
- Understanding your data set
- Your users
- Implicit metadata
- Linking behavior
- Authority / Influence
- Revenue drivers
20How do you instrument for this?
- Understand your requirements / Questions
- Deal with distributed architectures / Failures in
hardware and software - How do you know youve failed?
21Creating Discovery Applications
- Give users a mirror to themselves
- Understand the explicit and implicit data
- There are tradeoffs to explicit data!
- What can you learn when you can add new metrics
deeply into they system? - understanding Time as an independent variable
- The garden path analogy
22Where to go in the future?
- What are the assumptions about your data model?
- What are you NOT measuring?
- What is your design philosophy?
- Are you planning for the unknown?