LinuxHA Release 2 - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

LinuxHA Release 2

Description:

Linux-HA Release 2. What is High-Availability (HA) Clustering? What can HA ... Circuit City, Autozone, others uses Linux-HA in each of several hundred stores ... – PowerPoint PPT presentation

Number of Views:108

Avg rating:3.0/5.0

Slides: 33

Provided by: linux3

Category:

more less

Transcript and Presenter's Notes

Title: LinuxHA Release 2

1
Linux-HA Release 2

Alan Robertson
IBM Linux Technology Center
alanr_at_unix.sh

2
Linux-HA Release 2

What is High-Availability (HA) Clustering?
What can HA do for me?
What is the Linux-HA project?
Linux-HA applications
Linux-HA customers
Linux-HA release 1 capabilities
Linux-HA release 2 capabilities
Comparative Architectures
Release 2 Details
Futures

3
What Is HA Clustering?

Putting together a group of computers which trust
each other to provide a service even when system
components fail
When one machine goes down, others take over its
work
This involves IP address takeover, service
takeover, etc.
New work comes to the takeover machine
Not primarily designed for high-performance

4
What Can HA Clustering Do For You?

It cannot achieve 100 availability nothing
can.
HA Clustering designed to recover from single
faults
It can make your outages very short
From about a second to a few minutes
It is like a Magician's (Illusionist's) trick
When it goes well, the hand is faster than the
eye
When it goes not-so-well, it can be reasonably
visible
A good HA clustering system adds a 9 to your
base availability
99-99.9, 99.9-99.99, 99.99-99.999,
etc.
Complexity is the enemy of reliability!

5
Single Points of Failure (SPOFs)

A single point of failure is a component whose
failure will cause near-immediate failure of an
entire system or service
Good HA design eliminates of single points of
failure

6
How Does HA work?

Manage redundancy to improve service availability
Like a cluster-wide-super-init on steroids
Even complex services are now respawn
on node (computer) death
on impairment of nodes
on loss of connectivity
for services that aren't working (not necessarily
stopped)
managing very complex dependency relationships

7
Redundant Communications

Intra-cluster communication is critical to HA
system operation
Most HA clustering systems provide mechanisms for
redundant internal communication for heartbeats,
etc.
External communications is usually essential to
provision of service
External communication redundancy is usually
accomplished through routing tricks
Having an expert in BGP or OSPF is a help

8
Redundant Data Access

Replicated
Copies of data are kept updated on more than one
computer in the cluster
Shared
Typically Fiber Channel Disk (SAN)
Sometimes shared SCSI
Back-end Storage (Somebody Else's Problem)
NFS, SMB
Back-end database

9
The Desire for HA systems

Who wants low-availability systems?
Why are so few systems High-Availability?

10
Why isn't everything HA?

Cost
Complexity

11
(No Transcript)
12
The Linux-HA Project

Linux-HA is the oldest high-availability project
for Linux, with the largest associated community
The core piece of Linux-HA is called
heartbeat(though it does much more than
heartbeat)
Linux-HA has been in production since 1999, and
is currently in use on about ten thousand sites
Linux-HA also runs on FreeBSD and Solaris, and is
being ported to OpenBSD and others
Linux-HA is shipped with every major Linux
distribution except one.

13
Linux-HA Release 1 Applications

Load Balancers
Web Servers
Database Servers
Custom Applications
Firewalls
Retail Point of Sale Solutions
Authentication
File Servers
Proxy Servers
Medical Imaging
Almost any type server application you can think
of except SAP

14
Linux-HA customers

Emageon medical imaging services
Contraloria General de la Republica (Colombian
government)
Incredimail bases their mail service on Linux-HA
on IBM hardware
Karstadts' uses Linux-HA in each of several
hundred stores
Bavarian Radio Station (Munich) coverage of 2002
Olympics in Salt Lake City
Circuit City, Autozone, others uses Linux-HA in
each of several hundred stores
Citysavings Bank in Munich (infrastructure)
University of Toledo (US) 20k student Computer
Aided Instruction system
Autostrada 230 clusters across country
The Weather Channel (weather.com)
Sony (manufacturing)
ISO New England manages power grid using 25
Linux-HA clusters

15
Linux-HA Release 1 capabilities

Supports 2-node clusters
Can use serial, UDP bcast, mcast, ucast comm.
Fails over on node failure
Fails over on loss of IP connectivity
Capability for failing over on loss of SAN
connectivity
Limited command line administrative tools to fail
over, query current status, etc.
Active/Active or Active/Passive
Simple resource group dependency model
Requires external tool for resource monitoring
SNMP monitoring

16
Linux-HA Release 2 capabilities

Built-in resource monitoring
Support for the OCF resource standard
Much Larger clusters supported ( 8 nodes)
Sophisticated dependency model with rich
constraint support (resources, groups,
incarnations, master/slave) (needed for SAP)
XML-based resource configuration
Configuration and monitoring GUI
Support for GFS cluster filesystem
Multi-state (master/slave) resource support
Initially - no IP, SAN monitoring

17
Release 2 Credits

Andrew Beekhof CRM, CIB
Gouchun Shi significant infrastructure
improvements
Sun, Jiang Dong and Huang, Zhen LRM, Stonithd
and testing
Lars Marowsky-Bree architecture, PHB -)
Alan Robertson architecture, project
leadership, original heartbeat code and testing

18
Linux-HA Release 1 Architecture
19
Linux-HA Release 2 Architecture(add TE and PE)

20
Resource Objects in Release 2

Release 2 supports resource objects which can
be any of the following
Primitive Resources
OCF, heartbeat-style, or LSB resource agent
scripts
Resource Incarnations need n resource objects
- somewhere
Resource groups a group of resources with
implied co-location and linear ordering
constraints
Multi-state resources (master/slave)
Designed to model master/slave (replication)
resources (DRBD, et al)

21
Basic Dependencies in Release 2

Ordering Dependencies
start before (implies stop after)
start after (implies stop before)
Mandatory Co-location Dependencies
must be co-located with
cannot be co-located with

22
Resource Location Constraints

Mandatory Constraints
Resource Objects can be constrained to run on any
selected subset of nodes. Default is none.
Preferential Constraints
Resource Objects can also be preferentially
constrained to run on specified nodes by
providing weightings for arbitrary logical
conditions
The resource object is run on the node which has
the highest weight (score)

23
Resource Incarnations

Resource Incarnations allow one to have a
resource which runs multiple (n) times on the
cluster
This is useful for managing
load balancing clusters where you want n of
them to be slave servers
Cluster filesystems
Cluster Alias IP addresses

24
Resource Groups

Resource Groups provide a shorthand for making a
creating ordering and co-location dependencies
Each resource object in the group is declared to
have linear start-after ordering relationships
Each resource object in the group is declared to
have co-location dependencies on each other
This is an easy way of converting release 1
resource groups to release 2

25
Multi-State (master/slave) Resources

Normal resources can be in one of two stable
states
running
stopped
Multi-state resources can have more than two
stable states. For example
running-as-master
running-as-slave
stopped
This is ideal for modeling replication resources
like DRBD

26
Advanced Constraints

Nodes can have arbitrary attributes associated
with them in namevalue form
Attributes have types int, string, version
Constraint expressions can use these attributes
as well as node names, etc in largely arbitrary
ways
Operators
, ! , ,
defined(attrname), undefined(attrname),
colocated(resource id), not colocated(resource id)

27
Advanced Constraints (cont'd)

Each constraint is associated with particular
resource, and is evaluated in the context of a
particular node.
A given constraint has a boolean predicate
associated with it according to the expressions
before, and is associated with a weight, and a
condition.
If the predicate is true, then the condition is
used to compute the weight associated with
locating the given resource on the given node.
Supported conditions are (these distinctions
may be unneeded ?)
can same as prefer with MAXINT weight
cannot same as prefer with -MAXINT weight
prefer positive weight
prefer not same as prefer with negative weight

28
Security Considerations

Cluster A computer whose backplane is the
Internet
If this isn't frightening, you don't
understand...
You may think you have a secure cluster network
You're probably mistaken now
You will be in the future

29
Secure Networks are Difficult Because...

Security is not often well-understood by admins
Security is well-understood by black hats
Network security is easy to breach accidentally
Users bypass it
Hardware installers don't fully understand it
Most security breaches come from trusted staff
Staff turnover is often a big issue
Virus/Worm/P2P technologies will create new holes
especially for Windows machines

30
Security Advice

Good HA software should be designed to assume
insecure networks
Not all HA software assumes insecure networks
Good HA installation architects use dedicated
(secure?) networks for intra-cluster HA
communication
Crossover cables are reasonably secure all else
is suspect -)

31
References

http//linux-ha.org/
http//linux-ha.org/download/
http//wiki.linux-ha.org/NewHeartbeatDesign
New Web site content (in progress)
http//linux-ha.trick.ca/ (pretty - offline!)
http//wiki.linux-ha.org/ (editable)
www.linux-mag.com/2003-11/availability_01.html

32
Legal Statements

IBM is a trademark of International Business
Machines Corporation.
Linux is a registered trademark of Linus
Torvalds.
Other company, product, and service names may be
trademarks or service marks of others.
This work represents the views of the author and
does not necessarily reflect the views of the IBM
Corporation.

Write a Comment

User Comments (0)