Title: This paper presents several key techniques for
1(No Transcript)
2Introduction
- This paper presents several key techniques for
- Designing Web sites that need to handle large
- request volumes and provide high availability.
- It also gives an overview on how many of
- these techniques were deployed at the official
- Web site for the 1998 Olympic Winter
- Games in Nagano, Japan.
3Topic of Discussion
- Redundant Hardware and Load Balancing
- Web Server Acceleration
- Efficient Management Of Dynamic Data
- 1998 Olympic games site
4Topic One Redundant Hardware and Load Balancing
- Redundant Hardware Multiple Server running on
different computers(143 processor for 1998
Olympic games ) - Load Balancing
- Round-Robin Domain Name server(RR-DNS) approach
a single domain name is associate with multiple
IP address. Clients requests specifying the
domain name are mapped to servers in round robin
fashion - problems
- Server-side caching load imbalance even with
specified TTL - Client-side caching load imbalance, lower mean
loads - Node failures difficult to provide availability
5- TCP Routing a node of the cluster server as
router forwarding client request to server nodes
in the cluster in round-robin order - advantages
- Using different load -based algorithms
- Detecting Web server node failure
- Achieving good scalability(when combined with
RR-DNS)
6Topic One Redundant Hardware and Load Balancing
- Commercially available TCP router
- IBM, Redware, Resonate and Cisco
- Ciscos LocalDirctor the packets returned from
server go through the router - IBMs Network Dispatcher(ND)
- running under an embedded OS can rout 10,000
HTTP request/sec
7Topic One Redundant Hardware and Load Balancing
- An embedded OS improves router performance by
optimizing the TCP communications stack and
eliminating the scheduler and interrupt
processing overheads of a general-purpose OS - allowing requests to be routed with an affinity
toward specific server. This avoid generate
multiple session key for the requests encrypted
using Secure Sockets Layer(SSL)
8Topic Two Web server Accelerators
- Description web sit cache which servers
frequently requested pages
9Topic Two Web server Accelerators
- IBMs Web server Accelerators
- Run under an embedded operating system
- Serve higher request pages(5000pages/sec)
- API allow caching of dynamic pages
- To reduce cache miss overhead persistent TCP
connections need to be kept between the cache and
the server - Operate in one of two modes transparent or
dynamic
10Topic Two Web server Accelerators
- Performance Cache on a uniprocessor 200-MHZ
Power PC, and using least recently used(LRU) for
cache replacement.
11Topic Three Efficient Dynamic Data Serving
- Dynamic pages are essential at sites that provide
frequently changing data, CPU overhead associated
with repeatedly generating them can cause
performance bottleneck. - Caching technique can improve the performance,
but rise with the problem on determining which
pages to cache and when they become obsolete.
12Topic Three Efficient Dynamic Data Serving
- Data Update Propagation(DUP) is developed for
cache management. - DUP maintains dependencies between cached objects
and underlying data - A trigger monitor program can detect changes of
data, and system can invalidate or update cache
objects that are obsolete
13Topic Three Efficient Dynamic Data Serving
- Dependencies are represented by a directed graph-
object dependence graph(ODG), wherein a vertex
usually represents an object or underlying data.
An edge from a vertex v to another vertex u
indicates that a change to v also affects u.
14Topic Three Efficient Dynamic Data Serving
- Interfaces for creating dynamic data
- Interface for invoking server programs that
create dynamic pages has significant effect on
performance - Common gateway interface(CGI) creates a new
process to handle each request which incurs
considerable overhead - FastCGI establishes long-running process to web
requests, but needs some communication overhead
between web server and process
15Topic Three Efficient Dynamic Data Serving
- IBMs GWAPI, Netscapes NSAPI, and Microsofts
ISAPI as well as Apaches modules all run server
tasks in separate threads. Unfortunately, these
interface can be tricky to use in practice, with
issues such as portability, thread safety, and
memory management. - More recent approaches, such as IBMs JSP,
Microsofts ASP, Java Servlets and Apaches
mod-perl hide those interface and issues of
thread safety also provide built-in garbage
collection, thus ease the creation of program,
maintenance, and portability.
16Topic Four 1998 Olympic Games Site
- The 1998 Winter games web sites architecture was
an outgrowth of experience with 1996 Summer games
web site. - A key objective for 1998 site was reduce hits by
giving clients the information on the home page
for the current day. Redesign of the pages led to
at least a three-fold decrease of the hit rate.
The 1998 server log suggests more than 25 of the
users found the information with a signal hit.
17Topic Four 1998 Olympic Games Site
- Site Architecture
- Utilized 13 IBM Scalable Power Parallel(SP2)
system at four complexes scattered around the
world, containing 143 processors, 78Gbytes of
memory, and more than 2.4 Terabytes of disk space
for high performance and availability. 100
availability was achieved by using replication
information and redundant hardware. - Dynamic pages were created via FastCGI
interface, and cached using the DUP algorithm.
Achieving cache hit rates of better than 97. The
1996 web site without employing DUP, many current
pages were invalidated in the process to ensure
all stale pages were removed, but the hit rates
were around 80. - Prefetching is another key component in
achieving near 100 hit rates
18Topic Four 1998 Olympic Games Site
- System Architecture
- web pages were served from four location
Schaumburg(4 SP2), Illinois Columbus(3 SP2),
Ohio Bethesda(3 SP2), Maryland and Tokyo (3
SP2), Japan. - Each SP2 composed of 10 RISC/6000 UP and 1
RISC/6000 8-way SMP. Each UP had 512 Mbyte of
memory and approximately 18 Gbytes of disk space.
Each SMP had 1Gbyte of memory and approximately 6
Gbyte of disk space. Numerous machines at each
location were also dedicate to maintenance,
support, file serving, networking, routing, and
various other functions
19Topic Four 1998 Olympic Games Site
- Data flow from Nagano to the internet and the
scoring
20Topic Four 1998 Olympic Games Site
- Data flow from the master database to the
internet server
21Topic Four 1998 Olympic Games Site
- Local load balancing and high availability
- IBMs Network Dispatcher were used as load
balancers(LB) - LB servers ran the gated routing daemon which
configured to advertise IP address as routes to
the routers via dynamic routing protocol. Each LB
was assigned a different cost based on if it was
the primary or secondary server for an IP
address. The secondary server has higher cost. - The routers redistributed these routes into the
network. The LB that was the primary source for
the address assigned to incoming requests at the
closest complex. Only if the primary LB were
down, the request would go to the secondary LB. - Each LB server was connected to a pool of front
end web servers dispersed among the SP2 at each
site. Traffic was distributed among web server
based on advisors information.
22(No Transcript)
23Summary
- Since the 1998 Olympic, these technology has been
deployed at highly accessed sites including the
Web sites of 1999 Olympic and 1999 Wimbledon
tennis - The 1999 Wimbledon site made extensive use of Web
server acceleration technology that was not ready
for the 1998 Olympic site - 1999 Olympic site receive 942 million hits over
14 days. Peak hit rates of 430,000/min, 125
million/day. 1998 Olympic site receive 643.7
million request over 16 days with peak hit rates
of 110,000/min and 57 million/day.