Title: LinuxHA Release 2 An Overview
1Linux-HA Release 2 An Overview
- Alan Robertson
- Project Leader Linux-HA project
- alanr_at_unix.sh
- (a.k.a. alanr_at_us.ibm.com)
- IBM Linux Technology Center
2Agenda
- High-Availability (HA) Clustering?
- What is the Linux-HA project?
- Linux-HA applications and customers
- Linux-HA release 1 / Release 2 /Feature
Comparison - Release 2 Details
- Request for Feedback
- DRBD an important component
- Thoughts about cluster security
3What Is HA Clustering?
- Putting together a group of computers which trust
each other to provide a service even when system
components fail - When one machine goes down, others take over its
work - This involves IP address takeover, service
takeover, etc. - New work comes to the remaining machines
- Not primarily designed for high-performance
4High Availability Through Redundancy and
Monitoring
- Redundancy eliminates Single Points Of Failure
(SPOF) - Monitoring determines when things need to change
- Reduces cost of planned and unplanned outagesby
reducing MTTR(Mean Time To Repair)
5Failover and Restart
- Monitoring detects failures (hardware, network,
applications) - Automatic Recovery from failures (no human
intervention) - Managed restart or failover to standby systems,
components
6What Can HA Clustering Do For You?
- It cannot achieve 100 availability nothing
can. - HA Clustering designed to recover from single
faults - It can make your outages very short
- From about a second to a few minutes
- It is like a Magician's (Illusionist's) trick
- When it goes well, the hand is faster than the
eye - When it goes not-so-well, it can be reasonably
visible - A good HA clustering system adds a 9 to your
base availability - 99-99.9, 99.9-99.99, 99.99-99.999,
etc.
7Lies, Damn Lies, and Statistics
- Counting nines downtime allowed per year
8The Desire for HA systems
- Who wants low-availability systems?
- Why are so few systems High-Availability?
9Why isn't everything HA?
10Complexity
-
- Complexity is the Enemy of Reliability
11(No Transcript)
12Commodity HA?
- Installations with more than 200 Linux-HA pairs
- Autostrada Italy
- Italian Bingo Authority
- Oxfordshire School System
- Many retailers (through IRES and others)
- Karstadt's
- Circuit City
- etc.
- Also a component in commercial routers,
firewalls, security hardware
13The HA Continuum
- Single node HA system (monitoring w/o redundancy)
- Provides for application monitoring and restart
- Easy, near-zero-cost entry point HA system
starts init scripts instead of /etc/init.d/rc (or
equivalent) - Addresses Solaris / Linux functional gap
- Multiple Virtual Machines Single Physical
machine - Adds OS crash protection, rolling upgrades of OS
and application good for security fixes, etc. - Many possibilities for interactions with virtual
machines exist - Multiple Physical Machines (normal cluster)
- Adds protection against hardware failures
- Split-Site (stretch) Clusters
- Adds protection against site-wide failures
(power, air-conditioning, flood, fire)
14How Does HA work?
- Manage redundancy to improve service availability
- Like a cluster-wide-super-init with monitoring
- Even complex services are now respawn
- on node (computer) death
- on impairment of nodes
- on loss of connectivity
- for services that aren't working (not necessarily
stopped) - managing potentially complex dependency
relationships
15Single Points of Failure (SPOFs)
- A single point of failure is a component whose
failure will cause near-immediate failure of an
entire system or service - Good HA design adds redundancy to eliminate
single points of failure - Non-Obvious SPOFs can require deep expertise to
spot
16The Three R's of High-Availability
- Redundancy
- Redundancy
- Redundancy
- If this sounds redundant, that's probably
appropriate... - Most SPOFs are eliminated by redundancy
- HA Clustering is a good way of providing and
managing redundancy
17Redundant Communications
- Intra-cluster communication is critical to HA
system operation - Most HA clustering systems provide mechanisms for
redundant internal communication for heartbeats,
etc. - External communications is usually essential to
provision of service - External communication redundancy is usually
accomplished through routing tricks - Having an expert in BGP or OSPF routing is a help
18Fencing
- Guarantees resource integrity in the case of
certain difficult cases (split-brain) - Four Common Methods
- FiberChannel Switch lockouts
- SCSI Reserve/Release (painful to make reliable)
- Self-Fencing (like IBM ServeRAID)
- STONITH Shoot The Other Node In The Head
- Linux-HA has native support for the last two
19Redundant Data Access
- Replicated
- Copies of data are kept updated on more than one
computer in the cluster - Shared
- Typically Fiber Channel Disk (SAN)
- Sometimes shared SCSI
- Back-end Storage (Somebody Else's Problem)
- NFS, SMB
- Back-end database
- All are supported by Linux-HA
20Data Sharing Replication
- Some applications provide their own replication
- DNS, DHCP, LDAP, DB2, etc.
- Linux has excellent disk replication methods
available - DRBD is my favorite
- DRBD-based HA clusters are shockingly cheap
- Some environments can live with less precise
replication methods rsync, etc. - Generally does not support parallel access
- Fencing usually required
- EXTREMELY cost effective
21Data Sharing ServeRAID et al
- IBM ServeRAID SCSI controller is self-fencing
- This helps integrity in failover environments
- This makes cluster filesystems, etc. impossible
- No Oracle RAC, no GPFS, etc.
- ServeRAID failover requires a script to perform
volume handover - Linux-HA provides such a script in open source
- Linux-HA is ServerProven with ServeRAID
22Data Sharing Shared Disk
- The most classic data sharing mechanism
commonly fiber channel - Allows for failover mode
- Allows for true parallel access
- Oracle RAC, Cluster filesystems, etc.
- Fencing always required with Shared Disk
23Data Sharing Back-End
- Network Attached Storage can act as a data
sharing method - Existing Back End databases can also act as a
data sharing mechanism - Both make reliable and redundant data sharing
Somebody Else's Problem (SEP). - If they did a good job, you can benefit from
them. - Beware SPOFs in your local network
24The Linux-HA Project
- Linux-HA is the oldest high-availability project
for Linux, with the largest associated community - Linux-HA is the OSS portion of IBM's HA strategy
for Linux - Linux-HA is the best-tested Open Source HA
product - The Linux-HA package is called Heartbeat(though
it does much more than heartbeat) - Linux-HA has been in production since 1999, and
is currently in use on more than ten thousand
sites - Linux-HA also runs on FreeBSD and Solaris, and is
being ported to OpenBSD and others - Linux-HA shipped with every major Linux
distribution except one. - Release 2 shipped end of July more than 6000
downloads since then
25Linux-HA Release 1 Applications
- Database Servers (DB2, Oracle, MySQL, others)
- Load Balancers
- Web Servers
- Custom Applications
- Firewalls
- Retail Point of Sale Solutions
- Authentication
- File Servers
- Proxy Servers
- Medical Imaging
- Almost any type server application you can think
of except SAP
26Linux-HA customers
- FedEx Truck Location Tracking
- BBC Internet infrastructure
- Oxfordshire Schools Universal servers an HA
pair in every school - The Weather Channel (weather.com)
- Sony (manufacturing)
- ISO New England manages power grid using 25
Linux-HA clusters - MAN Nutzfahrzeuge AG truck manufacturing
division of Man AG - Karstadt, Circuit City use Linux-HA and databases
each in several hundred stores - Citysavings Bank in Munich (infrastructure)
- Bavarian Radio Station (Munich) coverage of 2002
Olympics in Salt Lake City - Emageon medical imaging services
- Incredimail bases their mail service on Linux-HA
on IBM hardware - University of Toledo (US) 20k student Computer
Aided Instruction system
27Linux-HA Release 1 capabilities
- Supports 2-node clusters
- Can use serial, UDP bcast, mcast, ucast
communication - Fails over on node failure
- Fails over on loss of IP connectivity
- Capability for failing over on loss of SAN
connectivity - Limited command line administrative tools to fail
over, query current status, etc. - Active/Active or Active/Passive
- Simple resource group dependency model
- Requires external tool for resource (service)
monitoring - SNMP monitoring
28Linux-HA Release 2 capabilities
- Built-in resource monitoring
- Support for the OCF resource standard
- Much larger clusters supported ( 8 nodes)
- Sophisticated dependency model
- Rich constraint support (resources, groups,
incarnations, master/slave) - XML-based resource configuration
- Coming in 2.0.x (later in 2005)
- Configuration and monitoring GUI
- Support for GFS cluster filesystem
- Multi-state (master/slave) resource support
- Monitoring of arbitrary external entities (temp,
SAN, network)
29Release 2 Credits
- Andrew Beekhof (SUSE) CRM, CIB
- Gouchun Shi (NCSA) significant infrastructure
improvements - Sun, Jiang Dong and Huang, Zhen LRM, Stonithd
and testing - Lars Marowsky-Bree (NCSA) architecture,
leadership - Alan Robertson architecture, project
leadership, original heartbeat code, testing,
evangelism
30Linux-HA Release 1 Architecture
31Linux-HA Release 2 Architecture(add TE and PE)
32Linux-HA Release 2 Architecture(more detail)
33Resource Objects in Release 2
- Release 2 supports resource objects which can
be any of the following - Primitive Resources
- Resource Groups
- Resource Clones n resource objects
- Multi-state (master/slave) resources
34Classes of Resource Agents in R2(resource
primitives)
- OCF Open Cluster Framework - http//opencf.org/
- take parameters as name/value pairs through the
environment - Can be monitored well by R2
- Heartbeat R1-style heartbeat resources
- Take parameters as command line arguments
- Can be monitored by status action
- LSB Standard LSB Init scripts
- Take no parameters
- Can be monitored by status action
- Stonith Node Reset Capability
- Very similar to OCF resources
35An OCF primitive object
typeIPaddr providerheartbeat
nameip value192.168.224.5/
Attribute nvpairs are translated into
environment variables
36An LSB primitive resource object(i. e., an init
script)
37A STONITH primitive resource
- typeibmhmc providerheartbeat
value192.168.224.99 /
ive
38Resource Groups
- Resource Groups provide a shorthand for creating
ordering and co-location dependencies - Each resource object in the group is declared to
have linear start-after ordering relationships - Each resource object in the group is declared to
have co-location dependencies on each other - This is an easy way of converting release 1
resource groups to release 2 -
39Resource Clones
- Resource Clones allow one to have a resource
object which runs multiple (n) times on the
cluster - This is useful for managing
- load balancing clusters where you want n of
them to be slave servers - Cluster filesystem mount points
- Cluster Alias IP addresses
- Cloned resource object can be a primitive or a
group
40Sample clone XML
41Multi-State (master/slave) Resources(coming in
2.0.3)
- Normal resources can be in one of two stable
states - running
- stopped
- Multi-state resources can have more than two
stable states. For example - running-as-master
- running-as-slave
- stopped
- This is ideal for modeling replication resources
like DRBD
42Basic Dependencies in Release 2
- Ordering Dependencies
- start before (normally implies stop after)
- start after (normally implies stop before)
- Mandatory Co-location Dependencies
- must be co-located with
- cannot be co-located with
43Resource Location Constraints
- Mandatory Constraints
- Resource Objects can be constrained to run on any
selected subset of nodes. Default depends on
setting of symmetric_cluster. - Preferential Constraints
- Resource Objects can also be preferentially
constrained to run on specified nodes by
providing weightings for arbitrary logical
conditions - The resource object is run on the node which has
the highest weight (score)
44Advanced Constraints
- Nodes can have arbitrary attributes associated
with them in namevalue form - Attributes have types int, string, version
- Constraint expressions can use these attributes
as well as node names, etc in largely arbitrary
ways - Operators
- , !, ,
- defined(attrname), undefined(attrname),
- colocated(resource id), not colocated(resource id)
45Advanced Constraints (cont'd)
- Each constraint is associated with particular
resource, and is evaluated in the context of a
particular node. - A given constraint has a boolean predicate
associated with it according to the expressions
before, and is associated with a weight, and
condition. Weights can be constants or
attribute values. - If the predicate is true, then the condition is
used to compute the weight associated with
locating the given resource on the given node. - Conditions are given weights, positive or
negative. Additionally there are special values
for modeling must-have conditions - INFINITY
- -INFINITY
- The total score is the sum of all the applicable
constraint weights
46Sample Dynamic Attribute Use
- Attributes are arbitrary only given meaning by
rules - You can assign them values from external programs
- For example
- Create a rule which uses the attribute fc_status
as its weight for some resource needing a Fiber
Channel connection - Write a script to set the status of fc_status for
a node to 0 if the FC connection is working, and
-10000 if it is not - Now, those resources automatically move to a
place where the FC connection is working if
there is such a place, if not they stay where
they are.
47rsc_location information
- We prefer the webserver group to run on host
node01 - groupwebserver score100 operationeq valuenode01/
48Request for Feedback
- Linux-HA Release 2 is a good solid HA product
- At this point human and experience factors will
likely more helpful than most technical doo-dads
and refinements - This audience knows more about that than probably
any other similar audience in the world - So, check out Linux-HA release 2 and tell us...
- What we got right
- What needs improvement
- What we got wrong
- We are very responsive to comments
- We look forward to your critiques, brickbats, and
other comments
49DRBD RAID1 over the LAN
- DRBD is a block-level replication technology
- Every time a block is written on the master side,
it is copied over the LAN and written on the
slave side - Typically, a dedicated replication link is used
- It is extremely cost-effective common with
xSeries - Worst-case around 10 throughput loss
- Recent versions have very fast full resync
50(No Transcript)
51Security Considerations
- Cluster A computer whose backplane is the
Internet - If this isn't scary, you don't understand...
- You may think you have a secure cluster network
- You're probably mistaken now
- You will be in the future
52Secure Networks are Difficult Because...
- Security is not often well-understood by admins
- Security is well-understood by black hats
- Network security is easy to breach accidentally
- Users bypass it
- Hardware installers don't fully understand it
- Most security breaches come from trusted staff
- Staff turnover is often a big issue
- Virus/Worm/P2P technologies will create new holes
especially for Windows machines
53Security Advice
- Good HA software should be designed to assume
insecure networks - Not all HA software assumes insecure networks
- Good HA installation architects use dedicated
(secure?) networks for intra-cluster HA
communication - Crossover cables are reasonably secure all else
is suspect -)
54References
- http//linux-ha.org/
- http//linux-ha.org/Talks (these slides)
- http//linux-ha.org/download/
- http//linux-ha.org/SuccessStories
- http//linux-ha.org/Certifications
- http//linux-ha.org/BasicArchitecture
- http//linux-ha.org/NewHeartbeatDesign
- www.linux-mag.com/2003-11/availability_01.html
55Legal Statements
- IBM is a trademark of International Business
Machines Corporation. - Linux is a registered trademark of Linus
Torvalds. - Other company, product, and service names may be
trademarks or service marks of others. - This work represents the views of the author and
does not necessarily reflect the views of the IBM
Corporation.