Improving Web Systems

About This Presentation

Title:

Improving Web Systems

Description:

Increasing the availability, performance, and manageability of ... Quick set up and remote configuration. Choose from Six different load balancing algorithms ... – PowerPoint PPT presentation

Number of Views:75

Avg rating:3.0/5.0

Slides: 88

Provided by: edwar57

Learn more at: http://www.cs.uccs.edu

Category:

more less

Transcript and Presenter's Notes

Title: Improving Web Systems

1
Improving Web Systems

Edward ChowDepartment of Computer
ScienceUniversity of Colorado at Colorado Springs

2
Outline of the Talk

Trends in Web Systems
Web switches and the support for advanced web
system features.
Load balancing Research
load balancing algorithms research
network bandwidth measurement research
web server status research

3
Trends in Web Systems

Increasing the availability, performance, and
manageability of web sites.
High Performance through multiple servers
connected by high speed networks.
High Availability (HA) 7x24 network services
Reliable/Efficient Content Routing and Content
Distribution
Emerging Network Centric StorageNetworks
Emerging Linux virtual server library for low
cost HA web systems.

4
Networkshops Prediction

Already, load-balancers are overcoming the
inherent one-to-one nature of the network and
distributing queries across tuned servers -- GIFs
to a machine with a huge RAM cache, processing to
servers with fibre-channel-attached databases.
I suspect we'll see content routing as a
full-fledged concept in Las Vegas next spring.By
Networkshop News 10/1999.

5
Virtual Resource Management

Also called Server load balancing or Internet
Traffic Management.
Goal Increasing the availability, performance,
and manageability of web sites.
April 2000 Acuitive Report on 1999 VRM market
share

6
VRM Market Prediction
7
F5 VRM Solution
8
BIG/ip - Delivers High Availability

E-commerce - ensures sites are not only
up-and-running, but taking orders
Fault-tolerance - eliminates single points of
failure
Content Availability - verifies servers are
responding with the correct content
Directory Authentication - load balance
multiple directory and/or authentication services
(LDAP, Radius, and NDS)
Portals/Search Engines Using EAV administrators
perform key-word searches
Legacy Systems - Load balance services to
multiple interactive services
Gateways Load balance gateways (SAA, SNA, etc.)
E-mail (POP, IMAP, SendMail) - Balances traffic
across a large number of mail servers

9
3DNS Intelligent Load Balancing

Intelligent Load Balancing
QoS Load Balancing
Quality of Service load balancing is the ability
to select apply different load balancing methods
for different users or request types
Modes of Load Balancing
Round Robin Ratio
Least Connections Random
User-defined Quality-of-Service Round Trip Time
Completion Rate (Packet Loss) BIG/ip Packet Rate
Global Availability HOPS
Topology Distribution Access Control
LDNS Round Robin Dynamic Ratio
E-Commerce

10
GLOBAL-SITE Replicate Multiple Servers and Sites

File archiving engine and scheduler for automated
site and server replication
BIG-IP controls server availability during
replication and synchronization
Gracefully shutdown for update
update in group/scheduled manner
FTP provides transferring files from GLOBAL-SITE
to target servers (agent free, scalable)
RCE for source control
No client side software
Complete, turnkey system (appliance)(adapt from
F5 presentation)

11
Content Distribution

Secure, automate content/application distribution
to single (multiple server)/wide area Internet
sites.
Provide replication, synchronization, staged
rollout and roll back.
With revision control, transmit only updates.
User-defined file distribution profiles/rules

12
Intel NetStructure

Routing based on XML tag (e.g., given preferred
treatment for buyers, large volume)
http//www.intel.com/network/solutions/xml.htm

13
1. Compared to SUN E450 server
14
Phobos IPXpress

Balances web traffic among all 4
Ethernet/FastEthernet connections.
Easily connects to any Ethernet network.
Quick set up and remote configuration.
Choose from Six different load balancing
algorithmsRound RobinLeast ConnectionsWeighted
Least ConnectionsFastest Response
TimeAdaptiveFixed
Hot standby failover port for web site uptime.
U.S. Retail 3495.00

15
Phobos In-Switch

Only load balancing switch in a PCI card form
factor
Plugs directly into any server PCI slot
Supports up to 8,192 servers, ensuring
availability and maximum performance
Six different algorithms are available for
optimum performance Round Robin, Weighted
Percentage, Least Connections, Fastest Response
Time, Adaptive and Fixed.
Provides failover to other servers for
high-availability of the web site
U.S. Retail 1995.00

16
Foundry NetworksServerIron Internet Traffic
Management Switches

One Million Concurrent Connections
SwitchBack - Also known as direct server return
Throughput 64 Gbps with BigServerIron
Session Processing Lead with 80,000
connections/sec.
Symmetric LB picking up the full load where the
failed switch left off without losing stateful
information.
Switching Capacity BigServerIron deliver 256
Gbps of total switching capacity.

17
BigServerIron

BigServerIron supports up to 168 10/100Base-TX
ports or 64 Gigabit Ethernet ports.
Internet IronWare supports unlimited virtual
server addresses, up to 64,000 Virtual IP (VIP)
addresses and 1,024 real servers.
Web Hosting enable network managers to define
multiple VIPs and track service usage by VIP.
Health Checks provide Layer 3,4,7 Health
ChecksInclude HTTP, DNS, SMTP, POP3, iMAP4,
LDAPv3, NNTP, FTP, Telnet and RADIUS

18
BigServerIron LB Algorithms

Round Robin
Least Connections
Weighted Percentage (assign perform weight to
server)
Slow Start - To protect the server from a surging
flow of traffic at startup. It can really
happened!!
Ya, LVS has performed for us like a champ..
under higher volumes, I have
had some problems with wlc.... for some reason
LVS freaks and starts
binding all traffic to one box... or at least the
majority of it.. it is
really wierd... but as soon as you switch to
using wrr then everything
worked fine... I have been using LVS for about 4
months to manage our
E-Commerce cluster and I haven't had any problems
other than the wlc vs wrr
problem -- Jeremy Johnson ltjjohnson_at_real.comgt
6/1/2000

19
BigServerIron LB Features

Set max connection limit for each server
Cookie Switching - This feature directs HTTP
requests to a server group based on cookie value.
For client persistent and servlet
URL Switching - directs requests based on the
text of a URL string using defined policies. Can
place different web content on different servers
URL Hashing - map hash value of Cookie header or
the URL string to one of the real servers bound
to the virtual server. This HTTP request and all
future HTTP requests that contain this
information then always go to the same real
server.
URL Parsing - Selects real server by applying
pattern matching expression to the entire URL.
ServerIron supports up to 256 URL rules
SSL Session ID Switching - ensures that all the
traffic for a SSL transaction with a given SSL
session ID always goes to the same server.

20
IronClad Security

NAT
TCP SYN attack protection stops binding new
sessions for a user definable timeframe when the
rate of incoming TCP SYN packets exceed certain
threshod.
Guard against Denial Of Service (DoS) Attacks
-against massive numbers of uncompleted
handshakes, also known as TCP SYN attacks, by
monitoring and tracking unfinished connections
High Performance Access Control Lists (ACLs) and
Extended ACLs - By using ACLs, network
administrators can restrict access to specific
applications from/to a given address or sub-net,
or port number.
Cisco-syntax ACLs - ServerIron supports
Cisco-syntax ACLs, which enables network
administrators to cut/copy/paste ACLs from their
existing Cisco products.

21
Session Persistence for eCommerce Transactions

Port Tracking Some web applications define a
lead port (http) and follower (SSL) ports.
ServerIron ensures connections to the follower
ports arrive at the same server
Sticky Ports - ServerIron supports a wide variety
of 'sticky' connections clients request for
next port or all ports go to same server
Support large range of user programmable options
Mega Proxy Sever Persistence - treat a range of
source IP addresses as a single source to solve
the persistence problem caused by certain mega
proxy sites in the Internet.
Use Source IP address for session persistenece
when cookie missing.

22
High Availability Services

Remote Backup Servers - If no local servers or
applications are available, ServerIron sends
client requests to remote servers.
HTTP Re-direct - ServerIron can also use HTTP
redirect to send traffic to remote servers if the
requested application is not available on the
local server farm.
Active/Standby - When deployed in Active/Standby
mode, the standby load-balancing device will
assume control and preserve the state of existing
sessions in the event the primary load-balancing
device fails
Active/Active - When deployed in Active/Active
mode, both load-balancing devices work
simultaneously and provide a backup for each
other while supporting stateful fail-over.
Quality of Service - Network administrators can
prioritize traffic based on ports, MAC, VLAN, and
802.1p attributes, grant priority to HTTP traffic
over FTP
Redundant hot-swappable power supplies

23
Linux Virtual Server (LVS)

Virtual server is a highly scalable and highly
available server built on a cluster of real
servers. The architecture of the cluster is
transparent to end users, and the users see only
a single virtual server.

24
LVS-NAT Configuration

All return traffic go through load balancer

25
LVS-Tunnel Configuration

Real Servers need to be reconfigured to handle
IP-IP packets
Real Servers can be geographically separated and
return traffic go through different routes

26
LVS-Direct Routing Configuration

Similar to the one implemented in IBM's
NetDispatcher
Real servers need to configure a non-arp alias
interface with virtual IP address and that
interface must share same physical segment with
load balancer.
Load balancer only rewrites server mac address
IP packetnot changed

27
HA-LVS Configuration
28
Persistence Handling in LVS

Sticky connections Examples
FTP control (port21), data (port20)For passive
FTP, the server tells the clients the port that
it listens to, the client initiates the data
connection connecting to that port. For the
LVS/TUN and the LVS/DR, LinuxDirector is only on
the client-to-server half connection, so it is
imposssible for LinuxDirector to get the port
from the packet that goes to the client directly.
SSL Session port 443 for secure Web servers and
port 465 for secure mail server, key for
connection must be chosen/exchanged.
Persistent port solution
First accesses the service, LinuxDirector create
a template between the given client and the
selected server, then create an entry for the
connection in the hash table.
The template expires in a configurable time, and
the template won't expire until all its
connections expire.
The connections for any port from the client will
send to the server before the template expires.
The timeout of persistent templates can be
configured by users, and the default is 300
seconds

29
Performance of LVS-based Systems

We ran a very simple LVS-DR arrangement with one
PII-400 (2.2.14 kernel)directing about 20,000
HTTP requests/second to a bank of about 20 Web
servers answering with tiny identical dummy
responses for a few minutes. Worked just fine.
Jerry Glomph Black, Director, Internet
Technical Operations, RealNetworks
I had basically (1024) four class-Cs of virtual
servers which were loadbalanced through a
LinuxDirector (two, actually -- I used redundant
directors) onto four real servers which each had
the four different class-Cs aliased on them.
"Ted Pavlic" lttpavlic_at_netwalk.comgt

30
What is Content Intelligence?By Erv Johnson,
Arrowpoint
31
ArrowPoints Content Smart Web Switch
Architecture from CCL viewgraph
4?MIPS RISC CPU 512 MB Mem
Control Plane (content Policy Services)
Switch Fabric
Content Location Services
Switch Fabric
Flowwall Security
Switch Fabric
Flow Managers
Content Based QoS
Site Server Selection
Forwarding Plane
Up to 16 ports
LAN I/O
Mapped Row Cache
Switch Fabric
Shared Memory
LAN I/O
Mapped Row Cache
???????? 1B hits per day
8 Mb Mem
32
Load Balancing Study

The current web switches do not take server load
or network bandwidth directly into consideration.
How can we improve them?
The node with the least connection may have the
heaviest load.
The current wide area load balancing does not
consider the available/bottleneck bandwidth.
Lack of simulation and network planning tools for
suggesting network configuration.

33
Server Load Status Collection

Three basic approaches
Observe response time of requests
modify web servers to report current
queue/processing speed
Use web server agent to collect system data
The 2nd approach requires access to web server
code/internal
We have modified Apache code (v1.3.9) by
accumulating size of pending request (in terms
bytes) in active child servers and diving it
with the estimated processing speed.
Note that it is harder to estimate CGI script of
Servlet processing.

34
Apache Server Status Report

Apache Server Status for gandalf.uccs.edu
Current Time Wed Dec 10 003251 1997
Restart Time Wed Dec 10 003227 1997
Server uptime 24 seconds
Total accesses 0 - Total Traffic 0 kB
CPU Usage u0 s0 cu0 cs0
0 requests/sec - 0 B/second
1 requests currently being processed, 4 idle
servers
...
Forked web server processes with no work (idle
servers)
Requests per second (history)

35
Collecting System Statistics

Web server agent collects system data
Run queue ()
CPU idle time ()
Pages scanned by page daemon (pages/s)
Web server agent uses
vmstat 1 2
every 1 second collect 2 samples

36
Vmstat Output and Meaning

r - of processes waiting to run (extent)
sr - of pages scanned by page daemon to put
back on the free list
id - of CPU idle time 100 - (us sy) id
(discrete)

37
Network Bandwidth Measurement

Bottleneck bandwidth BBw can be measured by
sending burst of packets (of size S) and
measuring the return time gap(Tg). BBwS/Tg if
no interference
Available bandwidth ABw is harder to measure.
Cprobe (U. Boston) sends burst of packets and
measures the time-gap between 1st and last msg.
Estimate ABw based on packet round trip time or
comparison with history of round trip time.

38
Smart Probe Simulation Results
39
Weight Calculation

Rate each web server with weight based on
statistics sent from the web server agents

weight of server ((19.68rid) (19.58rcpu)
(19.60rrq) (19.64rrps) (17.24rap)
(4.23rsr))
40
Weight Calculations (Example)

CPU idle time had an average throughput of 51.92.
The sum of averages for the characteristics was
265.18. To find the relevant percentage
51.92/265.18 0.1958 19.58 was then
multiplied by the actual CPU percent idle divided
by the approximate threshold (found to be 100
during the benchmarks), to get the weight ltcpu
weightgt 19.58(ltactual cpugt/100)

41
Network Design/Planning Tool

Need realistic network traffic (Self-similar)
load to exercise the simulator.
Need tools for
specifying network topology,
detecting bottlenecks in the web systems
suggesting new topology and configurations

42
Why is the Internet hard to model?

Its BIG
January 2000 gt 72 Million Hosts1
Growing Rapidly
gt 67 per year
Constantly Changing
Traffic patterns have high variability
Causes of High variability
Client Request Rates
Server Responses
Network Topology

43
Characteristics of Client Request Rate1

Client Sleep Time
Inactive Off Time
Active Off Time
Embedded References
1Barford and Crovella, Generating Representative
Web Workloads for Network and Server Performance
Evaluation, Boston University, BU-CS-97-006,
1997

44
Internet Traffic Request Pattern
45
Inactive Off Time

Time between requests (Think Time)
Uses a Pareto Distribution
Shape parameter a 1.5
Lower bound (k) 1.0
To create a random variable x
u U(0,1)
x k / (1.0-u)1.0/ a

46
Inactive Off Time
47
Active Off Time

Time between embedded references
Uses a Weibull Distribution
alpha a 1.46 (scale parameter)
beta b 0.382 (shape parameter)
To create a random variable x
u U(0,1)
x a ( -ln( 1.0 u ) 1.0/b

48
Active Off Time
49
Example HTML Document with Embedded References

lthtmlgtltheadgt
lttitlegtCS522 F99 Home Pagelt/titlegt
lt/headgt
ltbody background"marble1.jpg"gt
ltBGSOUND SRC"rocky.mid"gtltembed src"rocky.mid"
autostarttrue hiddentrue loopfalsegtlt/embedgt
lttd ALIGNCENTERgtltimg SRC"rainbowan.gif"
height15 width100gtlt/tdgt

50
Embedded References
51
Server Characteristics

File Size Distribution
Body Lognormal Distribution
Tail Pareto Distribution
Cache Size
Temporal Locality
Number of Connections
System Performance CPU speed, disk access time,
memory, network interface

52
File Size Distribution - Body

Lognormal Distribution
Build table with 930 values
Range 92 lt x lt 9020 bytes
To create a random variable x
u U(0,1)
if ( u lt 93 ) then
look up value in table u 1000
else
use tail distribution

53
File Size Distribution - Body
54
File Size Distribution - Tail

Pareto Distribution
Shape parameter a 1.5
Lower Bound k 9,020
To create a random variable x
u U(0,1)
x k / (1.0 u) 1.0/a

55
File Size Distribution Tail
56
Self-similarity

Fractal-like characteristics Fractals look the
same at all size scales
Statistical Self-similarity Empirical data has
similar variability over a wide range of time
scales.

57
Verification of Self-similarity

Methods
Observation
Variance Time Plot
R/S Plot
Periodogram
Whittle Estimator

58
Self-similarity - Observation
59
Variance Time Plot

Hurst Parameter
H 1 b / 2
b inverse of the slope
½ lt H lt 1
H 0.7

60
Load Balancing vs. Load Sharing

Load Sharing
System avoids having idle processors by placing
new processes on idle processors first
Load Balancing
System attempts to distribute the load equally
across all processors based on some global
average.
Static
Processes are placed and executed on only one
processor.
Dynamic
Processes are initially placed on one processor
but at some point in time the process may be
migrated to another processor based upon some
decision criteria.

61
Load Balancing Algorithms

Stateless
Select a processor without consideration of the
system state.
Round Robin
Random
State-based
Select a processor based upon some knowledge of
the system state.
Greedy
Subset
Stochastic

62
Simulation Entities

Request
Client
Load Balance Manager
Server

63
Request Event Loop
64
Experimental Design

Cooperative Environment
For each algorithm (round robin, random, greedy,
subset, stochastic)
Eight Servers with 1, 4 Connections
8, 16, 32, 64, 128, 256,512 Clients
1, 2, 4 Load Balance Managers

65
Servers with One Connection
66
Global vs. Local Info.
67
Servers with Four Connections
68
Global vs. Local Info.
69
Experimental Design

Adversarial Environment
For each algorithm (greedy, subset, stochastic)
Eight Servers with 1, 4 Connections
8, 16, 32, 64, 128, 256,512 Clients
4 Load Balance Managers with 1, 2, 3 Random Load
Balance Managers as adversaries

70
Servers with One Connection
71
LBM w/ Adversaries
72
Servers with Four Connection
73
LBM w/ Adversaries
74
Analysis of Experimental Results

Single Connection
Global
Greedy 2.2-27.6x improvement in Response Time
Subset 1.8-3.4x
Stochastic 1.3-2.2x
Local
Greedy 2.2-6.6x
Subset 1.4-2.5x
Stochastic 1.1-2.0x

75
Analysis of Experimental Results cont.

Four Connections
Global
Greedy 1.7-4.3x improvement in Response Time
Subset 1.8-3.4x
Stochastic 1.1-2.5x
Local
Greedy 1.0-4.1x
Subset 1.0-3.0x
Stochastic 1.0-2.3x

76
Analysis of Experimental Results cont.

Single Connection w/ Adversaries
Greedy 1.1-4.3x
Subset 1.1-2.1x
Stochastic 1.0-1.9x

77
Analysis of Experimental Results cont.

Four Connection w/ Adversaries
Greedy 1.0-3.7x
Subset 1.0-2.7x
Stochastic 1.0-2.1x

78
NetLobars Load Balancing Research Simulator
79
System Bottleneck Detection
80
Conclusion

Survey of some major web switches
Discuss their features and related support
functions
Introduction of its low-cost competitor (LVS)
Discuss future improvement directions
Present load balancing Simulation result with
realistic web traffic modeling.
Present preliminary design of a network load
balancing research tool

81
Discussion

Identify tasks/products of content/web
switch/system products
Discuss cooperation issues-How I/UCCS team can
help?

82
Identify tasks/products of content/web
switch/system products

Examine how content/web switches fit in an high
performance, efficient, intelligent end-to-end
systems?
Analyze existing products and their features
Identify the products we would like to make
web switch, client NIC(with DiffServ), server
modules
Derive common functions/modules of our products
Architecture/System Designs

83
Existing Products

web switch (layer4-7XML/email/file extension)
wide area load balancer (3DNS)
content distribution appliance (global siite)
firewall
edge switch

84
Products we like to offer

What basic features we need to provide
What enhanced features we offer to compete
What range of capacity and performance we would
like to offer?

85
Derive Common Modules

Packet classifier
Interaction with encryption feature of server
(IP-sec, SSL key/other encryption ),...
Packet scheduler
Packet rewrite
Module for Packet Routing Rule Management (API)
Allow static/dynamic management of packet routing
rules (QOS, DEN, firewall)

86
Incoming vs. Outgoing Traffic Control

Load balancing (typically address the incoming
requests distribution)
Outgoing data delivery can be regular for
bandwidth/QoS control (e.g., packeteer)

87
Architecture/System Design

How to group modules into products.
Interface between intra-switch modules,
inter-swtich modules (I.e., switch-server-client
interaction)
How to derive capacity and performance
parameter(through hardware limitation, cost, or
software simulation evaluation)
Management from DEN/policybasednetwork QoS point
of view.

Write a Comment

User Comments (0)

About PowerShow.com

Improving Web Systems - PowerPoint PPT Presentation

Improving Web Systems

Increasing the availability, performance, and manageability of ... Quick set up and remote configuration. Choose from Six different load balancing algorithms ... – PowerPoint PPT presentation