Replication2 - PowerPoint PPT Presentation

About This Presentation
Title:

Replication2

Description:

... tracks all proxies that have requested ... Caching proxies serve only their parents and not all Internet users. ... Accounting issues with caching proxies: ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 34
Provided by: michae77
Category:

less

Transcript and Presenter's Notes

Title: Replication2


1
Replication(2)
2
Some Interesting Observations
  • Top 1 of all documents account for 20 - 35 of
    proxy requests
  • Top 10 account for 45 - 55 of requests
  • It takes 25 to 40 of all documents to account
    for 70 of requests
  • It takes 70 to 80 of all documents to account
    for 90 of requests

3
Web Caching
  • As an example, we use the web to illustrate
    caching and other related issues

4
Web Browser Caching
  • Web browsers have their own caches. When a page
    is downloaded from a site the web page is put
    into the browser cache.
  • This is especially useful in those cases when the
    back button is pressed.
  • If a new copy is needed then a refresh can be
    done.
  • No page stays permanently in the cache. There is
    limited room.
  • A replacement algorithm is needed to determine
    which cached page should be purged.

5
Web Browser Caching
  • Client pull
  • The server provides the content with instructions
    on when the client should ask for a refreshed
    copy of the content or if the content should be
    cached.
  • Server push
  • The server transmits page information to the
    screen.
  • The browser application displays the information
    and leaves the connection to the server open.
  • With an open connection, the server can continue
    to push updated pages for your screen to display
    on an ongoing basis. You can close the connection
    by closing the page.
  • The server is in control
  • Browser caches are different from proxy caches
    (discussed next).

6
Web Caching
  • Proxy caches (also called proxy server)
  • Intercepts HTTP requests from client
  • Serves object if in its cache
  • If not goes to objects home server
  • On behalf of user, gets the object and possibly
    deposits in its cache before returning to user
  • Usually deployed at edges of a network
  • Wide area bandwidth savings, improved response
    time and increased availability of static
    web-based objects
  • A browser may have to be configured to point to
    the proxy server.
  • Usually a web cache is purchased and installed by
    an ISP e.g., a university.

7
Push-Based Approach
  • Server tracks all proxies that have requested
    objects
  • If a web page is modified, notify each proxy
  • Notification types
  • Indicate object has changed invalidate
  • Send new version of object update
  • How to decide between invalidate and updates?
  • Pros and cons?
  • One approach send updates for more frequently
    accessed objects, invalidate for rest

8
Push-Based Approaches
  • Advantages
  • Provide tight consistency minimal stale data
  • Proxies can be passive
  • Disadvantages
  • Need to maintain state at the server
  • Recall that HTTP is stateless
  • Need mechanisms beyond HTTP
  • State may need to be maintained indefinitely
  • Not resilient to server crashes
  • The disadvantage is the reason why push-based
    approaches are not used

9
Pull-Based Approaches
  • The proxy is entirely responsible for maintaining
    consistency
  • The proxy periodically polls the server to see if
    object has changed
  • Use if-modified-since HTTP messages This type
    of message can be used by a proxy to tell a
    remote server to return a copy only if it has
    been modified.
  • Key question When should a proxy poll?
  • Server-assigned Time-to-Live (TTL) values
  • No guarantee if the object will change in the
    interim

10
Pull-Based Approach Intelligent Polling
  • Proxy can dynamically determine the refresh
    interval
  • Compute based on past observations
  • Start with a conservative refresh interval
  • Increase interval if object has not changed
    between two successive polls
  • Decrease interval if object is updated between
    two polls
  • Adaptive No prior knowledge of object
    characteristics needed

11
Pull-Based Approach
  • Advantages
  • Server remains stateless
  • Resilient to both server and proxy failures
  • Disadvantages
  • Weaker consistency guarantees (objects can change
    between two polls and proxy will contain stale
    data until next poll)
  • High message overhead

12
A Hybrid Approach Leases
  • Lease Duration of time for which server agrees
    to notify proxy of modification
  • Issue lease on first request, send notification
    until expiry
  • Need to renew lease upon expiry
  • Smooth tradeoff between state and messages
    exchanged
  • Zero duration polling, Infinite leases
    server-push
  • Efficiency depends on the lease duration
  • Limited use

13
Cooperative Caching
  • Caching infrastructure can have multiple web
    proxies
  • Proxies can be arranged in a hierarchy or other
    structures
  • Proxies can cooperate with one another
  • Answer client requests
  • Propagate server notifications
  • Uses a combination of HTTP and ICP (Internet
    Caching Protocol).
  • ICP can be used by one cache to quickly ask
    another cache if it has an object.
  • HTTP is used to actually retrieve the object.

14
Problems
  • Caching proxies serve only their parents and not
    all Internet users.
  • Content providers (say, Web servers) cannot rely
    on existence and correct implementation of
    caching proxies.
  • Accounting issues with caching proxies
  • Example www.cnn.com needs to know the number of
    hits to the advertisements displayed on the web
    page.

15
Content Distribution Networks (CDN)
  • Business Model A content provider such as
    www.cnn.com or Yahoo pays a CDN company (such as
    Akamai) to get its content to the requesting
    users with short delays.
  • A CDN provides a mechanism for
  • Replicating content on multiple servers in the
    Internet
  • Providing clients with a means to determine the
    servers that can deliver the content fastest.

16
Terminology
  • Content Any publicly accessible combination of
    text, images, applets, frames, MP3, video, flash,
    virtual reality objects, etc.
  • Content Provider Any individual, organization,
    or company that has content that it wishes to
    make available to users.
  • Origin Server Content providers server , where
    the content is first uploaded.
  • Surrogate Server (sometimes called edge server)
    Content distributors server, where the
    replicated content is kept.

17
Players of the game
Yahoo, MSNBC, CNN
Content Provider
Send content
Akamai, Digital Island, ATT
Content Distributor
Sells servers
Install servers
H/W and S/W Vendor
  • Cisco,
  • Lucent,
  • Inktomi,
  • CacheFlow

Hosting Provider
Exodus
18
CDN Distribution
  • The CDN company places hundreds of CDN servers in
    Internet hosting centers.
  • The CDN replicates its customers content in the
    CDN servers. Whenever, a customer updates its
    content (e.g., web page), the CDN redistributes
    the fresh content to the CDN servers.
  • The CDN provides a mechanism so that when a user
    requests content, the content is provided by the
    CDN server that can most rapidly deliver the
    content to the user.
  • This can be the closest CDN server to the user
    (perhaps in the same ISP as the user) or may be a
    CDN server with a congestion-free path to the
    user.

19
CDN Distribution
Origin server in North America
push content
Akamai CDN
CDN distribution node
push content
push content
CDN server in South America
push content
CDN server in Asia
CDN server in Europe
20
CDN Functional Components
  • Distribution Service
  • Redirection Service
  • Accounting and Billing system

21
CDNDistribution Service
  • The content provider determines which of its
    objects it wants the CDN to distribute.
  • The content provider tags and then pushes this
    content to a CDN node, which in turn replicates
    and pushes the content to all its CDN servers.

22
CDN Distribution Service
  • When a browser in a users host is instructed to
    retrieve a specific object (specified using a
    URL), how does the browser determine whether it
    should retrieve the object from the origin server
    or from one of the CDN servers?
  • As an example, suppose the hostname of the
    content provider is www.cnn.com
  • Suppose the hostname of the CDN company is
    www.akamai.com

23
CDN Redirection
  • Users get an html document from www.cnn.com this
    could be index.html
  • The file index.html uses a modified URL for
    content that has been replicated.
  • Example If the gif files are what has been
    replicated then
    may be modified as follows
  • af/x.gif
  • The browser needs to resolve aXYZ.g.akamaitech.net
    hostname for replicated content.

24
CDN Redirection
  • DNS is configured so that all queries about
    g.akamaitech.net that arrive at a DNS server are
    sent to an authoritative DNS server for
    g.akamaitech.net. This is referred to as a Akamai
    DNS server (authoritative DNS server)
  • When the Akamai DNS server receives the query, it
    extracts the IP address of the requesting
    browser.
  • Based on the IP address and information that it
    has about the Internet (called a map), the IP
    address of an Akamai server(surrogate server) is
    returned to the requesting browser based on
    policy e.g., select the server that is the fewest
    hops away.

25
CDN Redirection
  • The Akamai DNS server IP address is now in the
    cache of the local DNS server.
  • This implies that it is not always necessary to
    go to the root DNS server.
  • The TTL associated with the IP address of an
    Akamai server(surrogate) is relatively small.
  • This is done for performance reasons.
  • Akamai content distribution servers are caches

26
CDN Redirection
  • What if content is not there?
  • If the request content is not found then the
    surrogate will ask other surrogates within a
    specified region for information.
  • If requested information is still not found or is
    stale, then a request is made to the original web
    site.

27
CDN Redirection
Authoritative DNS server for cdn.com
CNN.com
PUT /images/.gif
64.236.24.28
Index.html
DNS query cdn.com ?
GET www.cnn.com/index.html
Index.html
64.236.24.28
... ...
GET /cnn/images/1.gif
1.gif
DNS query cdn.com ?
64.236.24.28
Client
Local DNS server
28
CDN Selection
  • The tricky issue is selecting which local content
    server to use for a particular request
  • Want to spread load evenly
  • Want minimal impact if server is added or
    removed.
  • In Akamai, each surrogate server sends
    measurement results to the Network Operations
    Communications Center (NOCC).
  • Measurement results include number of active TCP
    connections, HTTP request arrival rate, bandwidth
    availability, etc
  • This information is used by the Akamai DNS
    server.

29
Accounting Mechanism
  • Accounting mechanisms collect and track
    information related to request routing,
    distribution and delivery.
  • Information is gathered in real time and put into
    log files for each CDN component.
  • This gets sent to the Network Operations
    Communications Center (NOCC).

30
Full Site Delivery vs. Partial Site Delivery
  • Full Site Delivery All the contents are
    delivered by the CDN (including HTML, images,
    and other objects).
  • Partial Site delivery Only images, streaming
    media and other bandwidth intensive objects
    delivered by the CDN.

31
CDNs and Content
  • Content Suitable for CDNS
  • Images
  • Streaming media
  • Java applets
  • Static information
  • Content not suitable
  • Dynamic information
  • Personalized information

32
Current Akamai Customers
33
Summary
  • We have examined replication and issues related
    to the design and implementation of a replicated
    system.
  • Many choices and tradeoffs to consider
Write a Comment
User Comments (0)
About PowerShow.com