Title: Request Distribution in Server Clusters
1Request Distribution in Server Clusters
2Web site infrastructure
- Clustered, multi-tiered architectures
e-Shopping Open the portal home page Login
View items, prices, availability Select an item
type Specify the no. of items Confirm by
entering the credit card number Logout
3WS vs. AS
- Web servers
- Do well defined and quantifiable local work
- e.g., processing HTTP headers, serving static
content - Application servers
- Run multi-layer programs
- e.g., scripts involving calls to backends
4ReDal
- In clustered, multi-tiered architectures, two
request distribution points - Web Server Request Distribution (WSRD)
- Web switch distributes requests to the
web server cluster - Application Server Request Distribution (ASRD)
- Web server distributes requests
requiring business logic to the application
server cluster
ReDal Request Distribution for the Application
Layer An approach for efficient distribution of
requests across a cluster of application servers
5Web Server Request Distribution
- Many policies Random, Round Robin (RR),
- Weighted Round Robin (WRR), Least
Connections - Several of these policies are commercially
implemented - (e.g., Ciscos Local Director and F5s
BIG/IP) - Two improvements
- Session Affinity
- Locality-Aware
- Request Distribution (LARD)
- attempts to exploit locality of working sets on
different servers - not applicable to dynamically generated
content
Session Affinity Consecutive requests in a given
user session will be served faster if they are
handled by the same server
6Application Server Request Distribution
- Dynamic scheduling techniques usually presuppose
some knowledge of task - (e.g., duration, weight) and/ or resource (e.g.,
queue sizes, service times) - In ASRD, both tasks and resources are highly
dynamic - So, techniques are adaptations of WSRD techniques
- Most common technique combination of RR and
Session Affinity - Requests starting new sessions are dispatched
according to RR - Subsequent requests in a session are routed to
the server where the sessions previous request
was served, i.e., where the session object
resides - gt frequently results in load imbalances
7ReDal Motivation
- Request distribution combining
- RR and Session Affinity
- Short and long sessions arrive at at one-minute
intervals - S S L S S L S L L S
-
8ReDAL Objective
- Distribute requests across a cluster of
application servers such that - Load on each application server is kept below a
certain threshold - Session affinity is preserved where possible
9ReDAL Components
Application Analyzer characterizes behavior
of application server Runs in offline phase to
record peak throughput/load values, which are
used at runtime by Request Dispatcher
Request Dispatcher
routes requests to a set of
application servers Monitors expected and
actual load on each application server Routes a
given request to the affined server if
lightly loaded else to application server
having lowest expected load
10ReDAL Algorithm
- based on key observation
- think-time or view-time on a page is predictable
based on past behavior
Jeffrey Heer and Ed H. Chi (Palo Alto Xerox
Research Center), Mining the Structure of User
Activity using Cluster Stability, Proceedings of
the Web Analytics Workshop, SIAM Conference on
Data Mining (2002)
11ReDal Capacity Reservation
- Consider a finite lookahead period partitioned
into discrete time periods or slices
Current Time
Think Time
r1
r2
Time
t1
t2
Time Slice
Slice 0
Slice 1
Slice 2
- Load metrics
- Actual Load number of requests in time slice
- Expected Load number of requests expected in a
time slice based on think time, i.e., time
between subsequent requests in a session - e.g., Capacity is reserved for request r2 on this
application server during time slice 2 - Modified Load Actual Load ? Expected Load (0
? ? ? 1) - ? accounts for prediction errors
12ReDal Algorithm Overview
- Inputs
- Request in a session, Think time, Time slice
duration, ? - Output
- Assignment of request to application server A
- A NULL
- A SessionAffinity()
- If A is NULL
- A LeastLoaded()
- UpdateLoadMetrics()
- AdvanceTimeSlice()
- Return A
SessionAffinity If ActualLoad() lt PeakLoad()
Return AffinedServer()
LeastLoaded If request is part of new session A
LeastLoaded(modified) Else A
LeastLoaded(actual) Return A
13Consistent global view of metadata
- Multicasting of changed load info by
- WS request dispatcher
- Session objects virtualized in a shared db
- Web server records time of response in a cookie
- useful for estimating think times in web server
clusters
14ReDal Evaluation
HJ (Hwang and Jung, 2002) uses least-active-reque
sts routing policy not applicable to stateful
applications
- ReDal, RR, HJ implemented as
- Apache Web Server plug-ins
- Load generator simulates a varying number of
simultaneous user sessions, each session
submitting a stream of requests - Each request chosen from a uniform distribution
across the high and low load transaction requests - Load generator (LoadRunner 6), Web server
(Apache), 10 application server instances
(WebLogic 7.1), and session repository (Oracle
8), each running on separate hardware - Machine configuration single-CPU (900 MHz), 1GB
RAM, 20 GB disk, running Windows 2000 Advanced
Server (SP3)
15ReDal Experimental Results
- Performance Metrics
- Average Throughput per Application Server (ATAS)
average number of transactions per second an
application server in the cluster provides - Average Response Time (ART) average response
time provided by the application servers,
measured from the end user perspective - Web Server CPU Utilization (WSCU) percentage CPU
utilization on the web server, measured by OS
utilities - Peak CPU on the Application Servers peak
percentage CPU usage among a cluster of
application servers measured by OS utilities. - Scaling with Application Servers percentage CPU
usage in web server for various number of
application servers in application server
cluster.
16Throughput Performance
- ReDAL (0.9) is ReDAL algorithm with ? 0.9
- ReDAL (0.5) is ReDAL algorithm with ? 0.5
ReDAL with ? 0.9 case has highest throughput
17Response Time Performance
ReDAL with ? 0.9 case has best response time
18CPU Overhead on the Web Server
Additional overhead of ReDal algorithm is 1.5 or
less
19Peak CPU Utilization on Application Servers
Highest in the RR case and lowest in the ReDAL
(? 0.9) case
20Scaling with Application Servers
overhead of ReDAL algorithm is at or below 15
for 100 concurrent sessions
21Real World Evaluation
- Online credit card application
- 30 WebLogic application servers on Linux Redhat
9.0 - Apache Web Server on Linux RedHat 9.0
- Machine hardware configuration 1 GB RAM, 2.2
GHz dual processors - Load was simulated by re-tracing web log
collected during various times over a day
At a peak load of 1000 simultaneous sessions,
ReDAL improved the response time of RR by 100.
22Summary
ReDal Application server load Distribution
Maximizes affinity Exploits application
characteristics Practical and scalable