Title: Statistical and Applied Mathematical Sciences Institute
1Statistical and Applied Mathematical Sciences
- Semi experiment analysis
- of the shifting knee wavelet spectrum
- F. Hernandez Campos, N. Hohn, J. S. Marron,
- C. Park, H. Shen, F. D. Smith, D. Veitch,
- October 4, 2009
2Web Traffic Responses (a.k.a. Objects, Files)
Response 1 (HTML Page)
Response 3 (GIF Image)
TCP Pkt 1
TCP Pkt 2
TCP Pkt 3
TCP Pkt 1
TCP Pkt 1
TCP Pkt 2
TCP Pkt 4
TCP Pkt 3
Response 2 (GIF Image Page)
3HTTP Responses Wavelet Spectrum
4HTTP Responses Wavelet Spectrum
- Shape appears frequently
- Explanation?
- Can generate as Poisson Cluster process
- Physical explanation of clusters?
5A Multi-level View of Web Traffic
Document 2
Document 3
Document 1
Response 2 (jpg)
Response 2 (gif)
Response 1 (html)
Response 1 (html)
Response 3 (gif)
Response 1 (html)
6HTTP Responses Natural clustering
- Aggregate responses into documents
- Approximation for web pages
- Document start times human chosen
- Not so for responses
- Most are embedded page components
7HTTP Documents Wavelet Spectrum
8HTTP Documents Wavelet Spectrum
- Still have LRD type scaling?
- Knee comes up at coarser scales
9HTTP Documents SiZer map
- Not Poisson Process
- Why not?
10Heavy-Tailed Number of Responses?
11HTTP Document Start Times
- Why not Poisson?
- Wrong level of aggregation?
- Documents have Cluster Poisson Distn?
- Consider Client Level
- Many very strange documents?
- Try filtering them out
12Web Traffic Responses (a.k.a. Objects, Files)
Client 1
Doc 1
Doc 2
Doc 3
Doc 1
Doc 1
Doc 2
Doc 4
Doc 3
13HTTP Documents Natural clustering
- Aggregate documents into clients
- Approximn for web browsing session
14HTTP Clients Wavelet Spectrum
15HTTP Clients Wavelet Spectrum
- Quite flat (Poisson) over most of spectrum
- But still upturns at coarsest scales???
- Why?
- Sample size (17,295) too small?
- Weird clients (not actual web browsing)
- Edge effect?
- Unusual non-stationarity (see SiZer map)
16HTTP Clients Filtering bad clients
- Goal Eliminate non-Web Browsing clients
- Criteria
- responses per client gt 3000.
- A connection whose duration is gt 2 hours.
- gt 5 resps, R. I. (p50) gt 0.8 median gt 1 sec.
- Duration gt 3.5 hours.
- Max response interarrivals gt 2000 sec.
- A document having of responses gt 250.
- connections per clinet gt 3000.
- Duration gt 2 hr log10(idle time1) lt 0.1 sec.
17Post Filtering Results (little change)
18HTTP Clients SiZer map
Downwards trend? Not just at edge Impact on
wavelet Spectrum? Non-homogeneous Poisson
process? Consider only fully captured clients
19Post Filtering Fully Captured Results
20Work in Progress
- Semi-experiments at Client Level
- Threshold Number of responses
- Response Inter-arrivals within clients
- http//www-dirt.cs.unc.edu/semiexps