The Mystery of Cooperative Web Caching - PowerPoint PPT Presentation

About This Presentation
Title:

The Mystery of Cooperative Web Caching

Description:

The Mystery of Cooperative Web Caching Web caching : is a process implemented by a caching proxy to improve the efficiency of the web. – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 22
Provided by: diUnipiI81
Category:

less

Transcript and Presenter's Notes

Title: The Mystery of Cooperative Web Caching


1
The Mystery of Cooperative Web Caching

2
  • Web caching is a process implemented by a
    caching proxy to improve the efficiency of the
    web. It reduces the delay in retrieving a
    document from the Internet by decreasing the
    number of
  • request directed to it.

3
Cooperative Web Caching it consists on a set
of web caching located in a different places in
the Internet and cooperate to each other to
improve the performance of the system
4
The main entities in a cooperative web
caching are Proxy Router
group proxies and router
  • Entities requirements
  • proxy the proxy must acts as proxy
    cache
  • Router must implement interior gateway
    exterior gateway
  • Group proxies- router the main
    requirement is the inter-cache communication.
  • The mystery of cooperative web caching is the
    inter- cache communication technique

5
The inter cache
communication techniques
  • There are many protocol proposed for the inter-
    cache communication for a cooperative web
    caching.
  • ICP Internet Cache protocol was proposed
    by Duane Wessels, K.Claffy 1997
  • Cache digest was proposed by Alex
    Rousskov, Duane wessels 1998
  • Summary Cache was proposed by Pei Cao in
    1998
  • HTCP Hyper Text Caching Protocol was
    proposed by P.Vixie, D.Wessels 2000
  • CARP Cache Array Routing Protocol was
    proposed by Vinod Valloppillil , Keith W.Ross
    1998

6
1. Internet Cache Protocol
  • ICP is a message format protocol
    when, each cache collects information about the
    existence of a particular web object in the cache
    of its neighbours by sending an ICP_query message
    .
  • The message is composed on fixed
    20 octets header followed by a variable payload
    size.

7
0 8 16
32
The message Format
  • Opcode field ,8 bit, it is an integer number
    that indicates the state of the message query-
    hit miss- denied
  • Version field indicate the number of ICP
    version used
  • Message length header length payload length
    at maximum 16 Kbytes
  • Payload that contains the URL of the requested
    document , to which is depend the payload length

Message length Version OPCODE
Request Number Request Number Request Number
Option Option Option
Option Data Option Data Option Data
Sender Addresses Sender Addresses Sender Addresses
Payload Payload Payload
8
Message Specification
  • A cache send an ICP_query ( Opcode 1) to
    all its neighbours to collect information about a
    particular document.
  • The cache that receives the query extracts the
    URL of the document from the payload and sends a
    ICP response message ( Opcode 2 ,3).
  • The cache that generate the query collects all
    the responses and select the best one to send an
    HTTP request to retrieve the document.
  • There are two kinds of message hit- response
  • ICP_OP_HIT
  • ICP_OP_HIT_Obj

9
Peer Selection The selection of the best
peer to retrieve the document can be done by
selection algorithms based on the following
parameters
  • RTT measurement that measure the congestion
    between two nodes .
  • it is variable with the time.
  • Hop count it is a constant measure .

10
Comparison between ICP format and
HTTP message
11
2. Cache Digest
  • Cache digest provides a mechanism for the
    communication among web caching.
  • The digest contain a list of the URLs of the
    documents stored in the cache
  • Digest Construction
  • The URLs of the document stored in the cache are
    indexed in the digest by a keys ( set of bits )
    stored in a bloom filter.
  • The keys are extracted from the URL by a number
    of hash functions that determines which bit
    must turn on and which must turn off.
  • a bit turn on if its state change from 0 to 1
  • a bit turn off if its state change from 1 to 0

12
Bloom filter
  • Is a hash coding method , proposed by Burton
    H.Bloom in 1970
  • is based on the idea to reduce the hash area size
    that allows a small number of test to be falsely
    identified without increasing the reject time.
  • Reject time is the time needed to classify that
    an element does not belong the set of elements
    stored in the hash.
  • The hash area is organised in N cells with N
    differences keys oN-1 , the document must be
    codified in N bits .
  • Initially all the cells gas empty, all the bits
    are set of 0 , to insert an element it is
    necessary to generate a set of hash addresses
    a1.ad all are set of 1.
  • To search an element , it is necessary to
    generate in the same way a set of hash addresses.
    If all are set of 1 that means the document is
    accepted and if any of these addresses are o that
    means the element is rejected

13
The calculation of the public keys
  • The URLs is transformed by the MD5 in a public
    key (128 bits) which is composed on two parts a
    numeric part 1-7 bits , the second parts
    represent the transformation of the URL.
  • The hash function then, assign to each key an
    index extracted from the URL by doing the
    following computation
  • 1. Splitting the 128 bits in N
    parts
  • 2. Finding the index to each
    part by calculating the modulo of the digest
    value to
  • the digest size
  • the digest size (the
    number of bits for entry the public keys) ?
    cache
  • capacity.
  • 3. Combining the indices of
    each part to compose the index of the
    correspondent public key

14
  • Digest Accuracy
  • The calculation of the public keys allows some
    possibility of errors . There are two kinds of
    errors
  • 1. False miss
  • 2. False hit

15
Digest Requirement
  • - The digest is a large data structure.
    200MB-2MB needed to store all the URLs of the
    documents stored in the cache.
  • - It is necessary to do two copies of the
    digest one stored on the disk and the other in
    memory for the fast update.
  • How does it work?
  • - the cache exchange its own digest with
    its neighbours.
  • - the cache digest message is composed on
    fixed 128 bytes in binary representation in the
    header which contain the digest specifications
    followed by the entire digest.
  • - When a miss occurs in the local cache ,
    it fetch in the other digest.
  • - In the case of miss, the cache send an
    HTTP request to retrieve the document from the
    opportune location.

16
Conclusion
  • Cache digest eliminate the ICP_Query -response
    message used for the collection of the
    information about the requested document but, it
    requires a lot of memory to store it, and it
    transfers a large quantity of information over
    the network is proportioned with the size of the
    digest

17
3. Summary Cache
  • It is proposed by Pei Cao and group of their
    student to reduce the internal traffic created by
    ICP_Query .
  • Each proxy keeps a summary of the URLs of the
    document stored in each participating proxy.
  • It scale well , because it can employs a large
    number of proxies to reduce the web traffic.
  • Two main factors influence in the scalability
  • 1. Updating delay
  • 2. Memory requirement

18
Updating delay the summary is updated
periodically or after a determined threshold of
the documents is not reflected in the summary.
  • Memory requirement is depend on the way to
    represent the summary.
  • The summary can be represented in the following
    way
  • exact directory it requires a lot of memory,
    for 100 proxies of 8GB cache and 1 million of
    documents with average URL length is 50 bytes the
    space needed to represent the summary is 2MB.
  • Server name it reduces the summary size but,
    increase the possibility of error.
  • Bloom filter is proposed by Pei Cao to reduce
    the memory requirement of the summary . The
    documents are stored in the filter in the same
    way as cache digest with a difference in the
    calculation of index when , the hash function
    doing the following computation
  • 1. 128 bits are divided in four 32bit word
    to each is extracted an index by the modulo on
    the summary size .
  • 2. Each proxy maintains a counter C (l)
    for each location l

19
  • There are three kind of errors
  • False hit
  • False miss
  • Remote hit stale

20
Comparison between the summary
representation methods and ICP

21
Comparison between the summary representation
methods and ICPComparison between the summary
representation methods and ICP Conclusion
  • The memory requirement in the summary cache
    depends on the size of the individual summary and
    on the number of proxy

22
4. Protocols
comparison Comparison of the three previous
protocols in term of network traffic
Write a Comment
User Comments (0)
About PowerShow.com