Scalable Web Server Clustering Technologies - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Scalable Web Server Clustering Technologies

Description:

Trevor Schroeder,Steve Goddard, and Byrav Ramamurthy University of ... Client/server transactions should be relatively short and hight in frequency. 3/27/09 ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 29
Provided by: cpl5
Category:

less

Transcript and Presenter's Notes

Title: Scalable Web Server Clustering Technologies


1
Scalable Web Server Clustering Technologies
  • Trevor Schroeder,Steve Goddard, and Byrav
    Ramamurthy University of Nebraska-Lincoln
  • IEEE Netwwork ,volume 14 Issue 13 May-June 2000

2
Outline
  • Introduction
  • Terminology
  • L4/2 Clustering
  • L4/3 Clustering
  • L7 Clustering
  • Conclusions

3
Introduction
  • Background Growth of Internet, Dynamic content
    and increasing users force us to find faster
    server (Web).
  • In the past, we replaced the web server with
    faster machine (processor).
  • Drawback
  • Short-term (Moore Law, the number of
    transistors per integrated circuit would double
    every 18 months)
  • Expensive we need to replace almost the
    whole machine.
  • Solution Add more processor or machine to the
    Web server. (It is commodity hardware and
    software, so that we can keep the past
    investment.)

4
Introduction (cont.)
  • Any server application may be clustered as long
    as it fulfills the following two properties.
  • The application must maintain no state on the
    server
  • Client/server transactions should be relatively
    short and hight in frequency

5
Terminology
  • L4/2 Layer 4 Switching with Layer 2 Packet
    Forwarding. The system has identical layer 3
    (Network) with unique MAC address.
  • L4/3 Layer 4 Switching with Layer 3 Packet
    Forwarding. The system has identical layer 4
    (Transport, same services) with unique network
    address.
  • Layer 7 Switch Make forwarding decision based on
    the content of client requests. It can employ
    L4/2 or L4/3.

6
Terminology (cont.)
  • Client-side Transparency The whole cluster
    servers appear to be a single host to clients
    because of the dispatcher.
  • Server-side Transparency Each cluster server
    runs standard web-server designed for standalone
    server. It servers the requests forwarded from
    dispatcher just the same as the requests come
    directly from the clients.
  •  
  • Performance Index Connections per seconds or
    bits per seconds. (Cluster Maximum Utilization)

7
L4/2 Clustering
  • The clusters IP address (A) is shared by the
    dispatcher and servers through the use of primary
    and secondary IP addresses. (BK Each host can
    have several IP addresses.)
  • The dispatchers primary IP address is A.
  • The servers use A as secondary address.
  • All packets whose destinations are A are
    forwarded to the dispatcher through the use of
    Address Resolution Protocol (ARP) in the nearest
    gateway/router.

8
Technology Specification
  • Load-Sharing Algorithm Round-Robin or other
    policies.
  • Session Map When request is connection
    initiation, if it belongs to established
    connection in the map, forward it to the
    previously selected server, or select a server
    and save the connection in the map. If it doesnt
    contain a SYN, it maybe discarded or not.
  • Backup method To avoid the down of the
    dispatcher and servers.

9
L4/2 Traffic Flow
10
L4/2 Traffic Flow (cont.)
  • A client sends a request to A.
  • The router sends the request to the dispatcher.
  • Based on the load-sharing algorithm, the
    dispatcher selects actual server (2) to serve the
    client.
  • Server 2 replies the client directly.

11
Advantage vs. Disadvantage
  • Advantage
  • Servers reply clients directly, which avoid the
    dispatcher to be bottleneck.
  • Dont need to recalculate the checksum because it
    operates on layer 2.
  • Disadvantage
  • There must be direct physical connection to all
    servers and the dispatcher.

12
ONE-IP (Bell Lab, 1996)
  • Load-Sharing Algorithm
  • Routing-based Dispatching Hash the incoming
    clients address to get a number that indicates
    which server to service the request

13
ONE-IP (cont.)
  • Broadcast based dispatching Each server has a
    fixed and disjoint portion of the address space.

14
ONE-IP (cont.)
  • Drawback Cannot adapt to the condition that the
    client requests are disproportionately
    distributed.
  • Backup Watchdog daemonm watchd
  • Dispatcher fail The backup dispatcher will
    notice the missing heartbeat of the primary
    dispatcher and take over.
  • Server fail Reconfigure the hash table or the
    address filters on other servers.

15
L4/3 Clustering
  • The dispatcher appears as a single host to
    clients while as a gateway to the servers (IP
    address A).
  • Each server has its own IP address that can be
    globally unique or locally unique (IP addresses
    B1, B2, , Bn).
  • Load sharing algorithm Round robin or other
    algorithms
  • Keep a session map table.

16
L4/3 Clustering (cont.)
17
L4/3 Clustering (cont.)
  • A client sends request with A as the destination
  • The packet comes to the dispatcher
  • Based on the load sharing algorithm and session
    table, select the server, rewrite the destination
    IP address, recalculate the checksums, forward it
    to the server
  • The server replies the request through the
    dispatcher (gateway) address A as the destination
    address.
  • The dispatcher rewrite the source IP address of
    reply as A, recalculate the checksums, forward it
    to the client.
  • Disadvantage
  • Recalculate twice the checksums. (IP and TCP)
  • All traffic flow through the dispatcher.
    (Bottleneck)

18
Magicrouter
  • University of California at Berkeley, 1996
  • Fast Packet Interposing and modifications of
    kernel
  • Load sharing Algorithms
  • Round robin
  • Random
  • Incremental Load
  • Backup
  • Dispatcher primary backup model.
  • Server Use ARP to map server IP addresses to MAC
    addresses to detect the fail of servers.

19
LocalDirector (Cisco, 1996)
  • Load sharing Algorithm
  • Least connections choose the server with fewest
    connections
  • Fastest Response choose the server that response
    the request first.
  • Round-Robin Strictly RR policy.
  • Backup
  • Dispatcher extra LocalDirector unit that linked
    to the primary one with special failover cable
  • Server Contact servers periodically, when fail,
    remove it, continue to contact, when up, add to
    the server pool
  •  

20
LSNAT
  • University of Nebraska-Lincoln
  • RFC2391 Load Sharing using IP Network Address
    Translation (LSNAT)
  •  
  • Backup
  • Dispatcher select one server as new dispatcher.
    Distributed State Reconstruction Mechanism to
    rebuild the map of existing connections.
  • Server Exclude from active servers pool. When
    up, include it again.

21
L7 Clustering
  • Make dispatch decision based on the content.
    (Application Layer)
  • Content-based dispatching

22
LARD
  • Locality-Aware Request Distribution, Rice
    University
  • It uses TCP handoff protocol with the modified
    kernel.
  • Different server processes different kind of
    requests, which can make use of specialized
    server.

23
Web Accelerator (IBM)
  • The accelerator can now perform content-based
    routing in which it makes intelligent decisions
    about where to route requests based on the URL.
  • L7 based on L4/2
  • Web page caching
  • The dispatcher services as a gateway/router.
  • All traffic flows through the dispatcher.

24
ArrowPoint
  • Content-based dispatching policy
  • Caching mechanism is similar to Web Accelerator
  • Sticky connection
  • Hot standby of the dispatcher and server node
    fail detection mechanism.

25
Conclusion(1/4)
  • L4/2 Clustering
  • Bottleneck power of dispatcher to process
    incoming request
  • Advantage Sustainable request rate.
  • L4/3 Clustering
  • Bottleneck recalculation of checksums.
  •  
  • L7 Clustering
  • Bottleneck complexity of content-based
    dispatching algorithm
  • Advantage Localizing request space and caching
    request results.

26
Conclusion (2/4)
  • Qualitative comparison
  • Client-based approach
  • Advantage Reduce the load on web server by
    implementing route service in client side.
  • Disadvantage It is not general applicability and
    it need the server-side cooperation.
  • Dispatcher-based approach
  • Advantage Full control of client requests to
    gain good load balancing. Easy to implementation.
  • Disadvantage Risk of dispatcher bottleneck.

27
Conclusion (3/4)
  • DNS-based approach
  • Advantage High Scalability. No risk of
    bottleneck.
  • Disadvantage
  • Due to the address caching mechanisms, need
    sophisticated algorithms to gain load balancing.
  • Less than 32 web servers for each public URL
    because of the limitation of UDP packet size.
  • Server-based approach
  • Advantage No risk of single-point failure and
    bottleneck.
  • Disadvantage Redirection will increase the
    latency time for clients.

28
Conclusion (4/4)
Write a Comment
User Comments (0)
About PowerShow.com