The Cache Location Problem - PowerPoint PPT Presentation

About This Presentation
Title:

The Cache Location Problem

Description:

Caches are located along routes from clients to servers, and are ... Sort all nodes in reverse BFS order: nodes descendents are numbered before the node itself. ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 36
Provided by: engT
Category:

less

Transcript and Presenter's Notes

Title: The Cache Location Problem


1
The Cache Location Problem


2
Overview
  • TERCs Vs. Proxies
  • Stability
  • Cache location

3
Proxy Web Caching is Good
  • Saves network bandwidth
  • Reduces delay
  • Reduces servers load
  • But it is not perfect
  • not everybody uses it (configuration)
  • may become a bottleneck and increase delay
  • increases delay for unsatisfied pages

4
Transparent En-Route Caches (TERCs)
  • Caches are located along routes from clients to
    servers, and are transparent to both server and
    client
  • Requests are intercepted by the TERC on their way
    to the server, and either
  • answered by the cache if the information exists
  • otherwise, forwarded to the server
  • Advantages
  • No configuration required! No management!
  • No change required in current network
    infrastructure
  • Can be deployed independently within an ISP
    subnetwork

5
TERCs (-)
  • Must be on the route from client to server
  • sensitive to route changes
  • hierarchies are much harder to implemen
  • Needs to intercept traffic
  • implementation problem
  • more complex
  • can TERCs work at line speed?
  • Depends on routing stability, and flow stability

Where should TERCs be placed?
6
Route Stability
  • Published results indicate that routing is stable
    (Paxon, Labovitz)
  • We need stability only during the connection
    lifetime (1 min.)
  • KRS00 measurements to more that 13000
    destinations show that gt93 of connections were
    stable
  • real numbers are probably higher
  • TCP route caching
  • equivalent of IP addresses

7
Stability of Flows
  • We built the flow tree from servers
  • Data from Bell-Labs servers (www.bell-labs.com,
    www.multimedia.bell-labs.com )
  • Nov. 97 - Jan. 98
  • 14000 different hosts, 1 Gbytes, 200k cachable
    requests (per week)
  • From log files to results
  • extract unique host
  • run traceroute for each host
  • obtain the routing tree (or is it DAG?)

8
Stability - Visual
9
Client return rate between days
day 0111 0112 0113 0114 0115 0116 0117 1130 1201 1202 1203 1204 1205 1206
0111 4.35 4 3.78 3.69 3.55 3.73 3.25 3.12 3.21 2.96 3.01 2.79 3.36
0112 4.35 6.93 6.06 5.66 5.34 3.58 2.77 4.4 3.85 3.86 3.87 4.02 3.33
0113 4 6.93 7.48 6.1 6.12 4.26 3.28 4.58 4.25 4.16 4.34 4.25 2.96
0114 3.78 6.06 7.48 7.33 6.48 4.07 3.03 4.21 4.23 4.28 4.34 4.25 3.15
0115 3.69 5.66 6.1 7.33 7.41 4.3 2.77 3.71 4.02 4.25 3.98 4.2 2.88
0116 3.55 5.34 6.12 6.48 7.41 5.38 3.13 4.21 4.56 4.12 4.1 4.36 3.25
0117 3.73 3.58 4.26 4.07 4.3 5.38 3.36 2.99 3.14 2.86 2.88 3.18 3.46
1130 3.25 2.77 3.28 3.03 2.77 3.13 3.36 4.32 4.08 4.15 3.42 3.49 4.23
1201 3.12 4.4 4.58 4.21 3.71 4.21 2.99 4.32 7 6.34 6.06 4.97 3.58
1202 3.21 3.85 4.25 4.23 4.02 4.56 3.14 4.08 7 6.88 5.89 5.35 3.94
1203 2.96 3.86 4.16 4.28 4.25 4.12 2.86 4.15 6.34 6.88 7.01 5.58 3.48
1204 3.01 3.87 4.34 4.34 3.98 4.1 2.88 3.42 6.06 5.89 7.01 7.15 3.95
1205 2.79 4.02 4.25 4.25 4.2 4.36 3.18 3.49 4.97 5.35 5.58 7.15 4.82
1206 3.36 3.33 2.96 3.15 2.88 3.25 3.46 4.23 3.58 3.94 3.48 3.95 4.82
10
Stability (3)
  • The relative flow in the tree is stable in time,
    although the client population changes
    significantly
  • Routing is stable for the lifetime of the
    connection
  • Placing caches based on past traffic yields good
    results

11
How Fixed is the Hit Ratio?
12
How Fixed is the Hit Ratio?(2)
13

Where Should the TERCs be Placed?
14
The Model
  • Wide area network
  • Requests are represented by a set of demands (of
    client i from server j)
  • Goal minimize average delay
    minimize total flow
  • The hit ratio (P) abstracts cache behavior
  • most hits due to small number of popular pages
  • full dependency - the same pages are cached
    everywhere
  • But part of the flow can come from Proxies

gt
Each flow is associated with a hit ratio Pi,j
15
The General k-cache Location Problem
  • Instance
  • an undirected graph G(V,E)
  • a set of demands Ffi,j
  • a set of hit ratios Ppi,j
  • k - the number of caches
  • Solution K, a subset of V of size k
  • Objective minimizing total flow

min fi,j
pi,j d(i,v) (1-pi,j) (d(i,v)d(v,j))
Ã¥
i,j
v ? Kj
16
The k-TERC Location Problem
  • Instance
  • an undirected graph G(V,E)
  • a set of demands Ffi,j
  • a set of hit ratios Ppi,j
  • k - the number of caches
  • Solution K, a subset of V of size k
  • Objective minimizing total flow

min fi,j
Ã¥
pi,j d(i,v) (1-pi,j) (d(i,v)d(v,j))
i,j
v ? Kj on the path from j to i
17
Remarks
  • A generalization of the p-median problem(in the
    p-median problem we want to minimize the total
    cost of serving a set of demands from at most p
    centers)
  • In the k-TERC location problem
  • it is enough to solve the problem for fixed p
    (pi,j p)
  • The optimal set K does not depend on p.
  • (not true in general)
  • The k-TERC location problem is a special case of
    the general k-location problem(p1/n)

18
The independence of ps,c
TERC
constant
19
Hardness Results
line
tree
general graph
NP - hard
one server
Poly.
Poly.
m servers
Poly.
NP - hard
NP - hard
20
Placement on a line
0
1
2
n-1
  • Topology a line of n nodes
  • Every node may be a server, a client, or both.
  • FR(i) The flow demand on the segment (i-1,i)
  • FR can be easily computed from the input.
  • FC(i,lo,li) - The flow on the segment (i-1,i)
    when the closest caches to i are in lo and li.
  • FC can be computed from the input with p1.
  • Note FR(i) FC(i,n-1,0)

21
Placement on a line
  • C(j,lo,li,k) the overall flow in segment 0,j
    when k caches are locate optimally inside the
    segment, and the closest caches to j are in lo
    and li.

22
The dynamic Program
  • Base case (j1)
  • For jgt1

23
The Algorithm
  • Compute C(1,li,1,1) and C(1,li,0,0) for 1lin-1
  • For each jgt1 compute C(j,lo,li,k) for all 0kk
    and 0lijlon-1
  • Complexity O(n3k)

24
Optimizing for a single server
  • The routes from the server to all clients form a
    tree (actually a DAG)
  • Well use dynamic programing to find the optimal
    cache locations

25
The Greedy Algorithm
  • Optimal algorithm using a bottom up dynamic
    programming
  • not trivial
  • complexity O(n k2 h)
  • Greedy
  • repeat k timesfind the best cache location
  • complexity O(n k)
  • How bad can it be?

26
Greedy Vs. Optimal
27
Dynamic Programming for Tree
  • First we convert the tree to a binary tree by
    adding dummy nodes.
  • Sort all nodes in reverse BFS order nodes
    descendents are numbered before the node itself.
  • Children of node i are iR and iL

28
Notations
  • C(i,k,l) is the cost of a subtree rooted at i
    with k optimally located caches, where the next
    cache up the tree is at distance l from i.
  • F(i,k,l) is the sum of demands in the subtree i
    that do not pass thru a cache in the solution
    C(i,k,l).

29
The Dynamic Program
30
The DP Formula for C(i,k,l)
  • The cost if a cache is not placed at node i
  • The cost if a cache is placed at node i
  • Complexity
  • O(nhk) variables ? O(nhk2) time cmplx
  • Finer analysis yields O(nhk) time complexity

31
The Servers Point of View
32
Traffic Reduction
33
TERCs Vs. Edge Caches
34
The Servers Point of View (2)
35
Popularity Stability
Write a Comment
User Comments (0)
About PowerShow.com