Title: Analyzing Browse Patterns of Mobile Clients
1Analyzing Browse Patterns of Mobile Clients
- Lili Qiu
- Joint work with Atul Adya and Victor Bahl
- adya,bahl,liliq_at_microsoft.com
- Microsoft Research
- ACM SIGCOMM Measurement Workshop
- San Francisco, CA, November 2001
2Outline
- Overview
- Related work
- Analysis of a popular mobile Web site
- Document popularity analysis
- User behavior analysis
- System load analysis
- Content analysis
- Summary and implications
3Motivation
- Phenomenal growth in cellular industry and
handheld device - Crucial to understand the performance of wireless
Web - Limited understanding of how wireless Web
services are being used
4Related Work
- Workload of clients at wireline networks
- Server-based studies
- ABC96, AW96, MS97, AJ99,PQ00
- Proxy-based studies
- BCF99, DMF97, GB97, VDA99, WVS99
- Client-based studies
- CBC95 and BBB98
- Workload of wireless clients
- KBZ2000
- Only 80K requests over seven months
5Overview
- A popular mobile Web site
- Content
- news, weather, stock quotes, email, yellow pages,
travel reservations, entertainment etc. - Period studied
- August 15, 2000 August 26, 2000
- 33 million accesses in 12 days
- Type of analyses
- This paper is a part of larger analysis study
- Analysis of browse pattern
- Analysis of notification logs
- Correlation between how browsing and notification
services are being used
6Overview Types of Analysis
- Document popularity analysis
- User behavior analysis
- System load analysis
- Content analysis
7Overview User Categories
- Cellular users
- Browse the Web in real time on cellular phones
- Offline users
- Download content onto their PDAs for later
(offline) browsing, e.g. AvantGo - Desktop users
- Signup services and specify preferences
- Many more users now
User Type Users Requests
Cellular 58,432 2,210,758
Offline 50,968 20,508,272
Desktop 639,971 7,342,206
Misc. 1634 2,944,708
8Document Popularity
- Previous Web research have found Web accesses
follow Zipf-like distribution (i.e. request
frequency ? 1/i?) - Two definitions of document
- URL
- ltURL, parametergt (i.e. query)
9Document Popularity (Cont.)
Document Popularity does not closely follow
Zipf-like distribution.
10Document Popularity (Cont.)
- Majority of the requests are concentrated on a
small number of documents - 0.1 - 0.5 URL and parameter combinations (i.e.
112 442) account for 90 requests
Very small amount of memory needed to cache
popular query results.
11User Behavior Analysis
- Understand how long a wireless user stays on the
channel as he/she browses the Web - Determine user sessions
- Intuition a session is idle for a sufficiently
long time, we say it has ended. - Heuristic to determine a session inactivity
period -
12User Behavior Analysis (Cont.)
- Determine the session inactivity period (s)
- Too small s gt too many sessions
- Too large s gt too few sessions
- An appropriate value is at the knee point
- The knee point is between 30 to 45 seconds
- 95 users
- Have session time less than 3 minutes
- Initiated less than 35 sessions during the 12
days
We can reclaim IP addresses more quickly than 90
seconds used previously in KBZ2000.
13System Load Analysis
- Understand how to optimize Web server for better
performance - Small replies
- 98 to wireless users lt 3 KB
- 99 to offline users lt 6.3 KB
- Diurnal pattern and weekday vs. weekend variation
- Over 60 browsing requests are from offline PDA
users, and less than 7 are from wireless users.
1) Highly optimize sending small replies. 2)
Identify what type of user issued the request,
and prioritize the request according to the
user type.
14Content Analysis
Important to content providers what content is
interesting to users
Rank 1 Rank 2 Rank 3
Wireless Stock quotes News Yellow pages
Offline Help News Stock quotes
Desktop Sign-ups Email Sports
Top three preferences for different kinds of users
15Summary of Results and Implications
Facts Implications
0.1 - 0.5 queries (i.e. 121-442) account for 90 requests. Caching the results of popular queries can be very effective.
A large fraction of requests come from automated sync programs. System designers should prioritize requests according to user type.
16Summary of Results and Implications
Facts Implications
Most of the replies are short (lt 3KB for wireless users, and lt 6KB for offline users). Wireless Web servers should highly optimize sending short replies.
The session inactivity period is between 30 to 45 seconds. We may reclaim IP addresses more quickly than 90 seconds used previously.