FT NT: A Tutorial on Microsoft Cluster Server

About This Presentation

Title:

FT NT: A Tutorial on Microsoft Cluster Server

Description:

1996, 1997 Microsoft Corp. 4. Case Study - Japan 'Survey on Computer Security', Japan Info Dev Corp., March 1986. ( trans: Eiichi Watanabe) ... – PowerPoint PPT presentation

Number of Views:249

Avg rating:3.0/5.0

Slides: 113

Provided by: joseph340

Category:

more less

Transcript and Presenter's Notes

Title: FT NT: A Tutorial on Microsoft Cluster Server

1
FT NT A Tutorial on Microsoft Cluster
Server(formerly Wolfpack)

Joe Barrera
Jim Gray
Microsoft Research
joebar, gray _at_ microsoft.com
http//research.microsoft.com/barc

2
Outline

Why FT and Why Clusters
Cluster Abstractions
Cluster Architecture
Cluster Implementation
Application Support
QA

3
DEPENDABILITY The 3 ITIES

RELIABILITY / INTEGRITY Does the right thing.
(also large MTTF)
AVAILABILITY Does it now. (also small
MTTR )
MTTFMTTRSystem AvailabilityIf 90 of
terminals up 99 of DB up? (gt89 of
transactions are serviced on time).
Holistic vs. Reductionist view

Security
Integrity
Reliability
Availability
4
Case Study - Japan"Survey on Computer Security",
Japan Info Dev Corp., March 1986. (trans Eiichi
Watanabe).
Vendor
4
2

Tele Comm lines
1
2

1
1
.
2
Environment

2
5

Application Software
9
.
3

Operations

Vendor (hardware and software) 5 Months
Application software 9 Months
Communications lines 1.5 Years
Operations 2 Years
Environment 2 Years
10 Weeks
1,383 institutions reported (6/84 - 7/85)
7,517 outages, MTTF 10 weeks, avg
duration 90 MINUTES
To Get 10 Year MTTF, Must Attack All These Areas

5
Case Studies - Tandem Trends

MTTF improved
Shift from Hardware Maintenance to from 50 to
10
to Software (62) Operations (15)
NOTE Systematic under-reporting of Environment
Operations errors
Application Software

6
Summary of FT Studies

Current Situation 4-year MTTF gt Fault
Tolerance Works.
Hardware is GREAT (maintenance and MTTF).
Software masks most hardware faults.
Many hidden software outages in operations
New Software.
Utilities.
Must make all software ONLINE.
Software seems to define a 30-year MTTF ceiling.
Reasonable Goal 100-year MTTF.
class 4 today gt class 6 tomorrow.

7
Fault Tolerance vs Disaster Tolerance

Fault-Tolerance mask local faults
RAID disks
Uninterruptible Power Supplies
Cluster Failover
Disaster Tolerance masks site failures
Protects against fire, flood, sabotage,..
Redundant system and service at remote site.

8
The Microsoft Vision Plug Play Dependability

Transactions for reliability
Clusters for availability
Security
All built into the OS

Integrity
Security
Integrity /
Reliability
Availability
9
Cluster Goals

Manageability
Manage nodes as a single system
Perform server maintenance without affecting
users
Mask faults, so repair is non-disruptive
Availability
Restart failed applications servers
un-availability MTTR / MTBF , so quick repair.
Detect/warn administrators of failures
Scalability
Add nodes for incremental
processing
storage
bandwidth

10
Fault Model

Failures are independentSo, single fault
tolerance is a big win
Hardware fails fast (blue-screen)
Software fails-fast (or goes to sleep)
Software often repaired by reboot
Heisenbugs
Operations tasks major source of outage
Utility operations
Software upgrades

11
Cluster Servers Combined to Improve
Availability Scalability

Cluster A group of independent systems working
together as a single system. Clients see
scalable FT services (single system image).
Node A server in a cluster. May be an SMP
server.
Interconnect Communications link used for
intra-cluster status info such as heartbeats.
Can be Ethernet.

12
Microsoft Cluster Server

2-node availability Summer 97 (20,000 Beta
Testers now)
Commoditize fault-tolerance (high availability)
Commodity hardware (no special hardware)
Easy to set up and manage
Lots of applications work out of the box.
16-node scalability later (next year?)

13
Failover Example
Server 1
Server 2
Web site
Web site
Database
Database
Web site files
Database files
14
MS Press Failover Demo
Resource States

Client/Server
Software failure
Admin shutdown
Server failure

- Pending - Partial - Failed - Offline
!
15
Demo Configuration
SCSI Disk Cabinet
Windows NT Server Cluster
16
Demo Administration
Server Alice Runs SQL Trace Runs Globe
Server Betty Run SQL Trace
SCSI Disk Cabinet
Windows NT Server Cluster
Client
17
Generic Stateless ApplicationRotating Globe

Mplay32 is generic app.
Registered with MSCS
MSCS restarts it on failure
Move/restart 2 seconds
Fail-over if
4 failures ( process exits)
in 3 minutes
settable default

18
Demo Moving or Failing Over An Application
SCSI Disk Cabinet
Windows NT Server Cluster
19
Generic Stateful ApplicationNotePad

Notepad saves state on shared disk
Failure before save gt lost changes
Failover or move (disk state move)

20
Demo Step 1 Alice Delivering Service
SQL Activity
No SQL Activity
SQL
SQL
ODBC
ODBC
SCSI Disk Cabinet
IIS
IIS
Windows NT Server Cluster
IP
HTTP
21
2 Request Move to Betty
SCSI Disk Cabinet
Windows NT Server Cluster
HTTP
22
3 Betty Delivering Service
SQL
SQL
ODBC
ODBC
SCSI Disk Cabinet
IIS
IIS
Windows NT Server Cluster
23
4 Power Fail Betty, Alice Takeover
SCSI Disk Cabinet
Windows NT Server Cluster
24
5 Alice Delivering Service
SQL Activity
No SQL Activity
SQL
ODBC
SCSI Disk Cabinet
IIS
Windows NT Server Cluster
IP
HTTP
25
6 Reboot Betty, now can takeover
SQL Activity
No SQL Activity
SQL
ODBC
SCSI Disk Cabinet
IIS
Windows NT Server Cluster
IP
HTTP
26
Outline

Why FT and Why Clusters
Cluster Abstractions
Cluster Architecture
Cluster Implementation
Application Support
QA

27
Cluster and NT Abstractions
Resource
Cluster
Group
Cluster Abstractions
NT Abstractions
Service
Domain
Node
28
Basic NT Abstractions
Service
Domain
Node

Service program or device managed by a node
e.g., file service, print service, database
server
can depend on other services (startup ordering)
can be started, stopped, paused, failed
Node a single (tightly-coupled) NT system
hosts services belongs to a domain
services on node always remain co-located
unit of service co-location involved in naming
services
Domain a collection of nodes
cooperation for authentication, administration,
naming

29
Cluster Abstractions
Resource
Cluster
Resource Group

Resource program or device managed by a cluster
e.g., file service, print service, database
server
can depend on other resources (startup ordering)
can be online, offline, paused, failed
Resource Group a collection of related resources
hosts resources belongs to a cluster
unit of co-location involved in naming resources
Cluster a collection of nodes, resources, and
groups
cooperation for authentication, administration,
naming

30
Resources
Resource
Cluster
Group

Resources have...
Type what it does (file, DB, print, web)
An operational state (online/offline/failed)
Current and possible nodes
Containing Resource Group
Dependencies on other resources
Restart parameters (in case of resource failure)

31
Resource Types

Built-in types
Generic Application
Generic Service
Internet Information Server (IIS) Virtual Root
Network Name
TCP/IP Address
Physical Disk
FT Disk (Software RAID)
Print Spooler
File Share

Added by others
Microsoft SQL Server,
Message Queues,
Exchange Mail Server,
Oracle,
SAP R/3
Your application? (use developer kit wizard).

32
Physical Disk
33
TCP/IP Address
34
Network Name
35
File Share
36
IIS (WWW/FTP) Server
37
Print Spooler
38
Resource States

Resources states
Offline exists, not offering service
Online offering service
Failed not able to offer service
Resource failure may cause
local restart
other resources to go offline
resource group to move
(all subject to group and resource parameters)
Resource failure detected by
Polling failure
Node failure

Im Online!
Go Off-line!
Offline Pending
Im here!
Go Online!
Im Off-line!
39
Resource Dependencies

Similar to NT Service Dependencies
Orderly startup shutdown
A resource is brought online after any resources
it depends on are online.
A Resource is taken offline before any resources
it depends on
Interdependent resources
Form dependency trees
move among nodes together
failover together
as per resource group

40
Dependencies Tab
41
NT Registry

Stores all configuration information
Software
Hardware
Hierarchical (name, value) map
Has a open, documented interface
Is secure
Is visible across the net (RPC interface)
Typical Entry
\Software\Microsoft\MSSQLServer\MSSQLServer\
DefaultLogin GUEST
DefaultDomain REDMOND

42
Cluster Registry

Separate from local NT Registry
Replicated at each node
Algorithms explained later
Maintains configuration information
Cluster members
Cluster resources
Resource and group parameters (e.g. restart)
Stable storage
Refreshed from master copy when node joins
cluster

43
Other Resource Properties

Name
Restart policy (restart N times, failover)
Startup parameters
Private configuration info (resource type
specific)
Per-node as well, if necessary
Poll Intervals (LooksAlive, IsAlive, Timeout)
These properties are all kept in Cluster Registry

44
General Resource Tab
45
Advanced Resource Tab
46
Resource Groups
Resource
Cluster
Group

Every resource belongs to a resource group.
Resource groups move (failover) as a unit
Dependencies NEVER cross groups. (Dependency
trees contained within groups.)
Group may contain forest of dependency trees

Payroll Group
Web Server
SQL Server
Drive E
IP Address
Drive F
47
Moving a Resource Group
48
Group Properties

CurrentState Online, Partially Online, Offline
Members resources that belong to group
members determine which nodes can host group.
Preferred Owners ordered list of host nodes
FailoverThreshold How many faults cause failover
FailoverPeriod Time window for failover
threshold
FailbackWindowsStart When can failback happen?
FailbackWindowEnd When can failback happen?
Everything (except CurrentState) is stored in
registry

49
Failover and Failback

Failover parameters
timeout on LooksAlive, IsAlive
local restarts in failure window after this,
offline.
Failback to preferred node
(during failback window)
Do resource failures affect group?

Node \\Betty
Node \\Alice
Cluster Service
Cluster Service
IPaddr
name
50
Cluster ConceptsClusters
Resource
Cluster
Group
Resource
Group
Resource
Group
Resource
Group
51
Cluster Properties

Defined Members nodes that can join the cluster
Active Members nodes currently joined to cluster
Resource Groups groups in a cluster
Quorum Resource
Stores copy of cluster registry.
Used to form quorum.
Network Which network used for communication
All properties kept in Cluster Registry

52
Cluster API Functions(operations on nodes
groups)

Find and communicate with Cluster
Query/Set Cluster properties
Enumerate Cluster objects
Nodes
Groups
Resources and Resource Types
Cluster Event Notifications
Node state and property changes
Group state and property changes
Resource state and property changes

53
Cluster Management
54
Demo

Server startup and shutdown
Installing applications
Changing status
Failing over
Transferring ownership of groups or resources
Deleting Groups and Resources

55
Outline

Why FT and Why Clusters
Cluster Abstractions
Cluster Architecture
Cluster Implementation
Application Support
QA

56
Architecture

Top tier provides cluster abstractions
Middle tier provides distributed operations
Bottom tier is NT and drivers

Failover Manager
Resource Monitor
Cluster Registry
Global Update
Quorum
Membership
Windows NT Server
Cluster Disk Driver
Cluster Net Drivers
57
Membership and Regroup

Membership
Used for orderly addition and removal from
active nodes
Regroup
Used for failure detection (via heartbeat
messages)
Forceful eviction from active nodes

Failover Manager
Resource Monitor
Cluster Registry
Global Update
Membership
Regroup
Windows NT Server
Cluster Disk Driver
Cluster Net Drivers
58
Membership

Defined cluster all nodes
Active cluster
Subset of defined cluster
Includes Quorum Resource
Stable (no regroup in progress)

Failover Manager
Resource Monitor
Cluster Registry
Global Update
Membership
Regroup
Windows NT Server
Cluster Disk Driver
Cluster Net Drivers
59
Quorum Resource

Usually (but not necessarily) a SCSI disk
Requirements
Arbitrates for a resource by supporting the
challenge/defense protocol
Capable of storing cluster registry and logs
Configuration Change Logs
Tracks changes to configuration database when any
defined member missing (not active)
Prevents configuration partitions in time

60
Challenge/Defense Protocol

SCSI-2 has reserve/release verbs
Semaphore on disk controller
Owner gets lease on semaphore
Renews lease once every 3 seconds
To preempt ownership
Challenger clears semaphore (SCSI bus reset)
Waits 10 seconds
3 seconds for renewal 2 seconds bus settle time
x2 to give owner two chances to renew
If still clear, then former owner loses lease
Challenger issues reserve to acquire semaphore

61
Challenge/Defense ProtocolSuccessful Defense
62
Challenge/Defense ProtocolSuccessful Challenge
Defender Node
No reservation detected
Challenger Node
63
Regroup

Invariant All members agree on members
Regroup re-computes members
Each node sends heartbeat message to a peer
(default is one per second)
Regroup if two lost heartbeat messages
suspicion that sender is dead
failure detection in bounded time
Uses a 5-round protocol to agree.
Checks communication among nodes.
Suspected missing node may survive.
Upper levels (global update, etc.) informed of
regroup event.

Failover Manager
Resource Monitor
Cluster Registry
Global Update
Membership
Regroup
Windows NT Server
Cluster Disk Driver
Cluster Net Drivers
64
Membership State Machine
Initialize
Search or Reserve Fails
Sleeping
Start Cluster
Member Search
Quorum Disk Search
Search Fails
Regroup
Acquire (reserve) Quorum Disk
Minority or no Quorum
Found Online Member
Forming
Non-Minority and Quorum
Lost Heartbeat
Joining
Join Succeeds
Synchronize Succeeds
Online
65
Joining a Cluster

When a node starts up, it mounts and configures
only local, non-cluster devices
Starts Cluster Service which
looks in local (stale) registry for members
Asks each member in turn to sponsor new nodes
membership. (Stop when sponsor found.)
Sponsor (any active member)
Sponsor authenticates applicant
Broadcasts applicant to cluster members
Sponsor sends updated registry to applicant
Applicant becomes a cluster member

66
Forming a Cluster(when Joining fails)

Use registry to find quorum resource
Attach to (arbitrate for) quorum resource
Update cluster registry from quorum resource
e.g. if we were down when it was in use
Form new one-node cluster
Bring other cluster resources online
Let others join your cluster

67
Leaving A Cluster (Gracefully)

Pause
Move all groups off this member.
Change to paused state (remains a cluster member)
Offline
Move all groups off this member.
Sends ClusterExit message all cluster members
Prevents regroup
Prevents stalls during departure transitions
Close Cluster connections (now not an active
cluster member)
Cluster service stops on node
Evict remove node from defined member list

68
Leaving a Cluster (Node Failure)

Node (or communication) failure triggers Regroup
If after regroup
Minority group OR no quorum device
group does NOT survive
Non-minority group AND quorum device
group DOES survive
Non-Minority rule
Number of new members gt 1/2 old active cluster
Prevents minority from seizing quorum device at
the expense of a larger potentially surviving
cluster
Quorum guarantees correctness
Prevents split-brain
e.g. with newly forming cluster containing a
single node

69
Global Update

Propagates updates to all nodes in cluster
Used to maintain replicated cluster registry
Updates are atomic and totally ordered
Tolerates all benign failures.
Depends on membership
all are up
all can communicate
R. Carr, Tandem Systems Review. V1.2 1985,
sketches regroup and global update protocol.

Failover Manager
Resource Monitor
Cluster Registry
Global Update
Membership
Regroup
Windows NT Server
Cluster Disk Driver
Cluster Net Drivers
70
Global Update Algorithm

Cluster has locker node that regulates updates.
Oldest active node in cluster
Send Update to locker node
Update other (active) nodes
in seniority order (e.g. locker first)
this includes the updating node
Failure of all updated nodes
Update never happened
Updated nodes will roll back on recovery
Survival of any updated nodes
New locker is oldest and so has update if any do.
New locker restarts update

L
S
71
Cluster Registry

Separate from local NT Registry
Maintains cluster configuration
members, resources, restart parameters, etc.
Stable storage
Replicated at each member
Global Update protocol
NT Registry keeps local copy

Failover Manager
Resource Monitor
Cluster Registry
Global Update
Membership
Regroup
Windows NT Server
Cluster Disk Driver
Cluster Net Drivers
72
Cluster Registry Bootstrapping

Membership uses Cluster Registry for list of
nodes
Circular dependency
Solution
Membership uses stale local cluster registry
Refresh after joining or forming cluster
Master is either
quorum device, or
active members

Failover Manager
Resource Monitor
Cluster Registry
Global Update
Membership
Regroup
Windows NT Server
Cluster Disk Driver
Cluster Net Drivers
73
Resource Monitor

Polls resources
IsAlive and LooksAlive
Detects failures
polling failure
failure event from resource
Higher levels tell it
Online, Offline
Restart

Failover Manager
Resource Monitor
Cluster Registry
Global Update
Membership
Regroup
Windows NT Server
Cluster Disk Driver
Cluster Net Drivers
74
Failover Manager
Failover Manager

Assigns groups to nodes based on
Failover parameters
Possible nodes for each resource in group
Preferred nodes for resource group

Resource Monitor
Cluster Registry
Global Update
Membership
Regroup
Windows NT Server
Cluster Disk Driver
Cluster Net Drivers
75
Failover(Resource Goes Offline)
Notify Failover Manager.
Resource Manager Detects resource error.
Failover Manager checks Failover Window
and Failover Threshold
Attempt to restart resource.
Wait for Failback Window
Are Failover conditions within Constraints?
No
Has the Resource Retry limit been exceeded?
No
Yes
Leave Group in partially Online state.
Yes
No
Can another owner be found? (Arbitration)
Switch resource (and Dependants) Offline.
Notify Failover Manager on the new system
to bring resource Online.
Yes
76
Pushing a Group (Resource Failure)
Resource Monitor notifies Resource Manager of
resource failure.
Resource Manager enumerates all objects in
the Dependency Tree of the failed resource.
Resource Manager notifies Failover Manager that
the Dependency Tree is Offline and needs to fail
over.
Failover Manager performs Arbitration to locate a
new owner for the group.
Resource Manager takes each depending resource
Offline.
Failover Manager on the new owner node brings the
resources Online.
Any resource has Affect the Group True
No
Leave Group in partially Online state.
Yes
77
Pulling a Group(Node Failure)
Cluster Service notifies Failover Manager of node
failure.
Failover Manager determines which groups were
owned by the failed node.
Failover Manager performs Arbitration to locate a
new owner for the groups.
Failover Manager on the new owner(s) bring the
resources Online in dependency order.
Resource Manager notifies Failover Manager that
the node is Offline and the groups it owned
need to fail over.
78
Failback to Preferred Owner Node

Group may have a Preferred Owner
Preferred Owner comes back online
Will only occur during the Failback
Window (time slot, e.g. at night)

Resource Manager takes each resource on
the current owner Offline.
Preferred owner comes back Online.
Failover Manager performs Arbitration to locate
the Preferred Owner of the group.
Is the time within the Failback Window?
Resource Manager notifies Failover Manager that
the Group is Offline and needs to fail over to
the Preferred Owner.
Failover Manager on the Preferred Owner brings
the resources Online.
79
Outline

Why FT and Why Clusters
Cluster Abstractions
Cluster Architecture
Cluster Implementation
Application Support
QA

80
Process Structure

Cluster Service
Failover Manager
Cluster Registry
Global Update
Quorum
Membership
Resource Monitor
Resource Monitor
Resource DLLs
Resources
Services
Applications

A Node
Cluster Service
81
Resource Control

Commands
CreateResource()
OnlineResource()
OfflineResource()
TerminateResource()
CloseResource()
ShutdownProcess()
And resource events

A Node
Resource Monitor
Cluster Service
Private calls
Resource Monitor
DLL
Private calls
Resource
82
Resource DLLs

Calls to Resource DLL
Open get handle
Online start offering service
Offline stop offering service
as a standby or
pair-is offline
LooksAlive Quick check
IsAlive Thorough check
Terminate Forceful Offline
Close release handle

Resource Monitor
DLL
Private calls
Resource
Std calls
83
Cluster Communications

Most communication via DCOM /RPC
UDP used for membership heartbeat messages
Standard (e.g. Ethernet) interconnects

Management apps
DCOM / RPC admin UDP Heartbeat
Cluster Service
DCOM / RPC
Resource Monitors
Resource Monitors
84
Outline

Why FT and Why Clusters
Cluster Abstractions
Cluster Architecture
Cluster Implementation
Application Support
QA

85
Application Support

Virtual Servers
Generic Resource DLLs
Resource DLL VC Wizard
Cluster API

86
Virtual Servers

Problem
Client and Server Applications do not want node
name to change when server app moves to another
node.
A Virtual Server simulates an NT Node
Resource Group (name, disks, databases,)
NetName and IP address (node \\a keeps name
and IP address as is moves)
Virtual Registry (registry moves (is
replicated))
Virtual Service Control
Virtual RPC service
Challenges
Limit app to virtual servers devices and
services.
Client reconnect on failover (easy if
connectionless -- eg web-clients)

Virtual Server \\a1.2.3.4
87
Virtual Servers (before failover)

Nodes \\Y and \\Z support virtual servers \\A and
\\B
Things that need to fail over transparently
Client connection
Server dependencies
Service names
Binding to local resources
Binding to local servers

\\Y
\\Z
SAP
SAP
SQL
SQL
T\
S\
\\A
\\B
SAP on A
SAP on B
88
Virtual Servers (just after failover)

\\Y resources and groups (i.e. Virtual Server
\\A)moved to \\Z
A resources bind to each other and to local
resources (e.g., local file system)
Registry
Physical resource
Security domain
Time
Transactions used to make DB state consistent.
To work, local resources on \\Y and \\Z have to
be similar
E.g. time must remain monotonic after failover

\\Y
\\Z
\\A
\\B
89
Address Failover andClient Reconnection

Name and Address rebind to new node
Details later
Clients reconnect
Failure not transparent
Must log on again
Client context lost (encourages connectionless)
Applications could maintain context

\\Y
\\Z
SAP
SAP
SQL
SQL
S\
T\
\\A
\\B
SAP on A
SAP on B
90
Mapping Local References to Group-Relative
References

Send client requests to correct server
\\A\SAP refers to \\.\SQL
\\B\SAP refers to \\.\SQL
Must remap references
\\A\SAP to \\.\SQLA
\\B\SAP to \\.\SQLB
Also handles namespace collision
Done via
modifying server apps, or
DLLs to transparently rename

\\Y
\\Z
SAP
SAP
SQL
SQL
S\
T\
\\A
\\B
SAP on A
SAP on B
91
Naming and Binding and Failover

Services rely on the NT node name and - or IP
address to advertise Shares, Printers, and
Services.
Applications register names to advertise services
Example \\Alice\SQL (i.e. ltnodegtltservicegt)
Example 128.2.2.280 (http//www.foo.com/)
Binding
Clients bind to an address (e.g. name-gtIP
address)
Thus the node name and IP address must failover
along with the services (preserve client bindings)

92
Client to Cluster CommunicationsIP address
mobility based on MAC rebinding

Cluster Clients
Must use IP (TCP, UDP, NBT,... )
Must Reconnect or Retry after failure
Cluster Servers
All cluster nodes must be on same LAN segment

IP rebinds to failover MAC addr
Transparent to client or server
Low-level ARP (address resolution protocol)
rebinds IP add to new MAC addr.

Client Alice lt-gt 200.110.12.4 Virtual Alice lt-gt
200.110.12.5 Betty lt-gt 200.110.12.6 Virtual Betty
lt-gt 200.110.12.7
WAN
Alice lt-gt 200.110.120.4 Virtual Alice lt-gt
200.110.120.5
Betty lt-gt 200.110.120.6 Virtual Betty lt-gt
200.110.120.7
Router 200.110.120.4 -gtAliceMAC 200.110.120.5
-gtAliceMAC 200.110.120.6 -gtBettyMAC 200.110.120.7
-gtBettyMAC
Local Network
93
Time

Time must increase monotonically
Otherwise applications get confused
e.g. make/nmake/build
Time is maintained within failover resolution
Not hard, since failover on order of seconds
Time is a resource, so one node owns time
resource
Other nodes periodically correct drift from
owners time

94
Application Local NT Registry Checkpointing

Resources can request that local NT registry
sub-trees be replicated
Changes written out to quorum device
Uses registry change notification interface
Changes read and applied on fail-over

\\A on \\X
\\A on \\B
registry
registry
Each update
registry
After Failover
Quorum Device
95
Registry Replication
96
Application Support

Virtual Servers
Generic Resource DLLs
Resource DLL VC Wizard
Cluster API

97
Generic Resource DLLs

Generic Application DLL
Simplest just starts, stops application, and
makes sure process is alive
Generic Service DLL
Translates DLL calls into equivalent NT Server
calls
Online gt Service Start
Offline gt Service Stop
Looks/IsAlive gt Service Status

98
Generic Application
99
Generic Service
100
Application Support

Virtual Servers
Generic Resource DLLs
Resource DLL VC Wizard
Cluster API

101
Resource DLL VC Wizard

Asks for resource type name
Asks for optional service to control
Asks for other parameters (and associated types)
Generates DLL source code
Source can be modified as necessary
E.g. additional checks for Looks/IsAlive

102
Creating a New Workspace
103
Specifying Resource Type Name
104
Specifying Resource Parameters
105
Automatic Code Generation
106
Customizing The Code
107
Application Support

Virtual Servers
Generic Resource DLLs
Resource DLL VC Wizard
Cluster API

108
Cluster API

Allows resources to
Examine dependencies
Manage per-resource data
Change parameters (e.g. failover)
Listen for cluster events
etc.
Specs API became public Sept 1996
On all MSDN Level 3
On web site
http//www.microsoft.com/clustering.htm

109
Cluster API Documentation
110
Outline

Why FT and Why Clusters
Cluster Abstractions
Cluster Architecture
Cluster Implementation
Application Support
QA

111
Research Topics?

Even easier to manage
Transparent failover
Instant failover
Geographic distribution (disaster tolerance)
Server pools (load-balanced pool of processes)
Process pair (active/backup process)
10,000 nodes?
Better algorithms
Shared memory or shared disk among nodes
a truly bad idea?

112
References

Microsoft NT site http//www.microsoft.com/ntserv
er/
BARC site (e.g. these slides)http//research.micr
osoft.com/joebar/wolfpack
Inside Windows NT, H. Custer, Microsoft Pr,
ISBN 155615481
Tandem Global Update Protocol, R. Carr, Tandem
Systems Review. V1.2 1985, sketches regroup and
global update protocol.
VAXclusters a Closely Coupled Distributed
System, Kronenberg, N., Levey, H., Strecker, W.,
ACM TOCS, V 4.2 1986. A (the) shared disk
cluster.
In Search of Clusters The Coming Battle in
Lowly Parallel Computing, Gregory F. Pfister,
Prentice Hall, 1995, ISBN 0134376250. Argues
for shared nothing
Transaction Processing Concepts and Techniques,
Gray, J., Reuter A., Morgan Kaufmann, 1994.
ISBN 1558601902, survey of outages, transaction
techniques.