Linux in HighAvailability Environments - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

Linux in HighAvailability Environments

Description:

OSS in HA Environments. Why OSS for High Availability Environments? ... Karstadt, Circuit City, Autozone use Linux-HA in each of several hundred stores ... – PowerPoint PPT presentation

Number of Views:114

Avg rating:3.0/5.0

Slides: 34

Provided by: linu

Category:

more less

Transcript and Presenter's Notes

Title: Linux in HighAvailability Environments

1
Linux in High-Availability Environments

Alan Robertson
IBM Linux Technology Center
alanr_at_unix.sh

2
OSS in HA Environments

Why OSS for High Availability Environments?
What is High-Availability (HA) Clustering?
What can HA do for me?
DRBD Data Replication
The Linux Virtual Server Load Balancer
The Linux-HA project?
Linux-HA applications and customers
Thoughts about cluster security

3
Why OSS In High-Availability Environments?

Openness
Broad Range Of Environments
Breadth of Support Options
Lack of Vendor Lock-In

4
Openness

Extensive Peer Review System
Source code freely available
Source code reviewed by outside parties
Changes discussed openly often in great detail
Ability to obtain uncensored product information
Mailing lists archives contain contain uncensored
comments from
Users with deep expertise
Users with little expertise
Users who are very happy
Users with problems

5
Broad Range of Environments

OSS typically runs on many platforms, often on
different OSes too
Users often find very creative uses for the
software
Freedom to try something at low cost decreases
perceived risks and encourages this behavior
Creative uses find their way into mailing list
(archives) and sometimes into the OSS product
Users help with testing providing more breadth
in test environment than might otherwise occur

6
Support for OSS Systems

Mailing lists consist of hundreds to thousands of
users who are very knowledgeable and helpful
usually regarded as very responsive typically
located in most time zones across the world
Can choose support vendor freely
Hardware, OS or OSS supplier
Independent consulting/support organizations
In-house expertise (most motivated)
OSS mailing lists
Any combination of the above

7
No Vendor Lock-In

Does not rely on a vendor's future plans being
compatible with yours (risk mitigation)
Obsolescence more readily manageable
Does not rely on a single vendor in another
company or country
Contributing to the product (or paying someone
else to) provides you a voice in future direction
Compatibility with other systems typically better

8
What Is HA Clustering?

A group of computers which cooperate and trust
each other to provide a service even when cluster
components fail
When one machine goes down, others take over its
work
This involves IP address takeover, service
takeover, etc.
New work comes to the takeover machine
Not primarily designed for high-performance

9
What Can HA Clustering Do For You?

It cannot achieve 100 availability nothing
can.
HA Clustering designed to recover from single
faults
It can make your outages very short
From about a second to a few minutes
It is like a Magician's (Illusionist's) trick
When it goes well, the hand is faster than the
eye
When it goes not-so-well, it can be reasonably
visible
A good HA clustering system adds a 9 or two to
your availability
99-99.9, 99.9-99.99, 99.99-99.999,
etc.
Complexity is the enemy of reliability!

10
The Desire for HA systems

Who wants low-availability systems?
Why are so few systems High-Availability?

11
Why isn't everything HA?

Cost
Complexity

12
(No Transcript)
13
Single Points of Failure (SPOFs)

A single point of failure is a component whose
failure will cause near-immediate failure of an
entire system or service
Good HA design eliminates of single points of
failure

14
How Does HA work?

Manage redundancy to improve service availability
Like a cluster-wide-super-init on steroids
Even complex services are now respawn
on node (computer) death
on impairment of nodes
on loss of connectivity
for services that aren't working (not necessarily
stopped)
managing very complex dependency relationships

15
Redundant Communications

Intra-cluster communication is critical to HA
system operation
Most HA clustering systems provide mechanisms for
redundant internal communication for heartbeats,
etc.
External communications is usually essential to
provision of service
External communication redundancy is usually
accomplished through routing tricks
Having an expert in BGP or OSPF is a help

16
Redundant Data Access

Replicated
Copies of data are kept updated on more than one
computer in the cluster
Shared
Typically Fiber Channel Disk (SAN)
Sometimes shared SCSI
Back-end Storage (Somebody Else's Problem)
NFS, SMB
Back-end database

17
DRBD RAID over the LAN

Block-device (filesystem) level replication
Clever synchronization methods make resyncs
faster, decrease latency, preserve integrity
Useful for both HA and Disaster Recovery
NO single point of failure
Extremely cost-effective200 (max) instead of
20,000 (min) (USD)
Probably not suitable for some high-end
write-intensive applications
Supportable by IBM Support Line

18
(No Transcript)
19
LVS The Linux Virtual Server Project

LVS is the standard Linux Load Balancer
Called "ipvs" in the standard Linux kernel
Stable, fast, flexible
Especially suitable for large "server farms"

20
LVS IN Action
21
Plays Well With Others

Each of these independent services can work
together to scale to large systems
All single points of failure can be eliminated
High-Availability, Load Balancing work together
nicely

22
Linux Virtual Server, Linux-HA and DRBD
23
The Linux-HA Project

Linux-HA is the oldest high-availability project
for Linux, with the largest associated community
The core piece of Linux-HA is called
heartbeat(though it does much more than
heartbeat)
Linux-HA has been in production since 1999, and
is currently in use on about ten thousand sites
Linux-HA also runs on FreeBSD and Solaris, and is
being ported to OpenBSD and others
Linux-HA is shipped with every major Linux
distribution except one.

24
Linux-HA Release 1 Applications

Database Servers
Load Balancers
Web Servers
Custom Applications
Firewalls, routers, DNS, DHCP
Retail Point of Sale Solutions
Authentication
File Servers
Proxy Servers
Medical Imaging
Almost any type server application you can think
of except SAP

25
Selected Linux-HA customers

Los Alamos (US) National Labs linear
accelerator badge reader
Emageon medical imaging for hospitals and
clinics
ISO New England manages power grid using 20
Linux-HA clusters
Various Firewall, DNS, DHCP products use Linux-HA
basically embedded
Karstadt, Circuit City, Autozone use Linux-HA in
each of several hundred stores
MAN Nutzfahrzeuge AG truck manufacturing
division of Man AG
Autostrada 230 clusters across Italy
BBC Internet Infrastructure
Citysavings Bank in Munich (infrastructure)
Bavarian Radio Station (Munich) coverage of 2002
Olympics in Salt Lake City
The Weather Channel (weather.com)
Sony (manufacturing)
Incredimail bases their mail service on Linux-HA
on IBM hardware
University of Toledo (US) 20k student Computer
Aided Instruction system

26
Linux-HA Release 1 capabilities

Supports 2-node clusters
Can use serial, UDP bcast, mcast, ucast comm.
Fails over on node failure
Fails over on loss of IP connectivity
Capability for failing over on loss of SAN
connectivity
Limited command line administrative tools to fail
over, query current status, etc.
Active/Active or Active/Passive
Simple resource group dependency model
Requires external tool for resource monitoring
SNMP monitoring

27
Linux-HA Release 2 capabilities

Built-in resource monitoring
Support for the OCF resource standard
Much Larger clusters supported ( 8 nodes)
Sophisticated dependency model with rich
constraint support (resources, groups,
incarnations, master/slave) (needed for SAP)
XML-based resource configuration
Configuration and monitoring GUI
Support for GFS cluster filesystem
Multi-state (master/slave) resource support
Initially - no IP, SAN monitoring

28
Release 2 Credits

Andrew Beekhof CRM, CIB
Gouchun Shi significant infrastructure
improvements
Sun, Jiang Dong and Huang, Zhen LRM, Stonithd
and testing
Dave Blaschke STONITH improvements
Lars Marowsky-Bree architecture, PHB -)
Alan Robertson architecture, project
leadership, original heartbeat code and testing

29
Linux-HA Release 1 Architecture
30
Linux-HA Release 2 Architecture(add TE and PE)

31
Resource Objects in Release 2

Release 2 supports resource objects which can
be any of the following
Primitive Resources
OCF, heartbeat-style, or LSB resource agent
scripts
Resource Incarnations need n resource objects
- somewhere
Resource groups a group of resources with
implied co-location and linear ordering
constraints
Multi-state resources (master/slave)
Designed to model master/slave (replication)
resources (DRBD, et al)

32
Basic Dependencies in Release 2

Ordering Dependencies
start before (implies stop after)
start after (implies stop before)
Mandatory Co-location Dependencies
must be co-located with
cannot be co-located with

33
Resource Location Constraints

Mandatory Constraints
Resource Objects can be constrained to run on any
selected subset of nodes. Default is none.
Preferential Constraints
Resource Objects can also be preferentially
constrained to run on specified nodes by
providing weightings for arbitrary logical
conditions
The resource object is run on the node which has
the highest weight (score)

34
Resource Incarnations

Resource Incarnations allow one to have a
resource which runs multiple (n) times on the
cluster
This is useful for managing
load balancing clusters where you want n of
them to be slave servers
Cluster filesystems
Cluster Alias IP addresses

35
Resource Groups

Resource Groups provide a shorthand for creating
ordering and co-location dependencies
Each resource object in the group is declared to
have linear start-after ordering relationships
Each resource object in the group is declared to
have co-location dependencies on each other
This is an easy way of converting release 1
resource groups to release 2

36
Multi-State (master/slave) Resources

Normal resources can be in one of two stable
states
running
stopped
Multi-state resources can have more than two
stable states. For example
running-as-master
running-as-slave
stopped
This is ideal for modeling replication resources
like DRBD

37
Advanced Constraints

Nodes can have arbitrary attributes associated
with them in namevalue form
Attributes have types int, string, version
Constraint expressions can use these attributes
as well as node names, etc in largely arbitrary
ways
Operators
, ! , ,
defined(attrname), undefined(attrname),
colocated(resource id), not colocated(resource id)

38
Advanced Constraints (cont'd)

Each constraint is associated with particular
resource, and is evaluated in the context of a
particular node.
A given constraint has a boolean predicate
associated with it according to the expressions
before, and is associated with a weight, and
condition.
If the predicate is true, then the condition is
used to compute the weight associated with
locating the given resource on the given node.
Values of infinity and -infinity are supported
to make these conditional constraints effectively
mandatory

39
Security Considerations

Cluster A computer whose backplane is the
Internet
If this isn't scary, you don't understand...
You may think you have a secure cluster network
You're probably mistaken now
You will be in the future

40
Secure Networks are Difficult Because...

Security is not often well-understood by admins
Security is well-understood by black hats
Network security is easy to breach accidentally
Users bypass it
Hardware installers don't fully understand it
Most security breaches come from trusted staff
Staff turnover is often a big issue
Virus/Worm/P2P technologies will create new holes
especially for Windows machines

41
Security Advice

Good HA software should be designed to assume
insecure networks
Not all HA software assumes insecure networks
Good HA installation architects use dedicated
(secure?) networks for intra-cluster HA
communication
Crossover cables are reasonably secure all else
is suspect -)

42
References

http//linux-ha.org/
http//linux-ha.org/download/
http//wiki.linux-ha.org/NewHeartbeatDesign
New Web site content (a work in progress)
http//wwnew.linux-ha.org/ (prettier)
http//wiki.linux-ha.org/ (editable)
http//wwnew.linux-ha.org/SuccessStories
www.linux-mag.com/2003-11/availability_01.html
http//www.linuxvirtualserver.org/
http//drbd.org/

43
Legal Statements

IBM is a trademark of International Business
Machines Corporation.
Linux is a registered trademark of Linus
Torvalds.
Other company, product, and service names may be
trademarks or service marks of others.
This work represents the views of the author and
does not necessarily reflect the views of the IBM
Corporation.

Write a Comment

User Comments (0)