Title: Fault Tolerant Servers and Disaster Recovery
1Fault Tolerant Servers and Disaster Recovery
Product Management NEC Solutions America
2Outline
- What is the Fault Tolerance?
- 5 Customer Benefits of FT
- 99.999 Uptime
- Ease of Maintenance
- Standard Software
- Remote Management
- Beats the TCO of Clusters
- How to Compete with Clusters
- Disaster Recovery Solution
- Benefits
- Failover/Failback
- Review Benefits
3What is Fault Tolerance?
Availability
Average Annual Downtime
User Tolerance to Downtime
Source IDC
99.999
Fault Tolerant Continuous Availability (CA)
5 Minutes
None
Cluster High Availability (HA)
99.9
8 Hours 45 Minutes
Business Interruption Lost Transaction
72 of mission critical applications experience
nine hours of outage per year. - Standish Group
Research
4FT Server Customer Advantages
- 0 Downtime hardware - total hardware redundancy
- Ease of maintenance - modular design
- Runs standard software - no modifications to OS
or apps - Lights out computing complete remote management
- Lower Total Cost of Ownership - beats clusters
5Benefit 1 99.999 Uptime
Conventional system
Memory
CPU
Disk
CPU
CPU
CPU
Power
Chipset
Fault-tolerant system
No single point of failure
Zero switchover time
6Benefit 2 - Ease of Maintenance
Designed for Simplified Service (Customer
Replaceable Units (CRU))
7Benefit 3 - Runs Out-of-the-Box Applications
with FT Software Capabilities
- NEC FT can provide fault tolerant availability to
any straight out of the box application! - FT uses standard operating system
- Windows Server 2003 Enterprise Edition
- Requires only one copy of any application
- Applications need not be cluster aware or
Enterprise version
8OS/Application Monitoring Recovery
ExpressCluster SRE Monitors and Restores Server
Functionality Windows OS - Monitors the Windows
Operating System resources and drivers reboots
the server if failure is detected Applications
- Monitors the application processes restarts
failed applications. Optionally reboots the
server if applications are not restarted after
pre-set number of retries
Application
Application
Windows 2003
ExpressClusterSelf Recovery Edition
restart
9What is Active Upgrade?
- An advanced method of performing software
maintenance by using the system architecture of
the FT Series Servers. - Provides the ability to perform software
maintenance without requiring a reboot of the
operating system - Windows Hot-fixes Security Patches
- Service Packs
- System Software upgrades from NEC
- Applications (dependant upon characteristics)
10How does Active Upgrade work?
Side A Side B
Side A Side B
CPU Memory
CPU Memory
CPU Memory
I/O
I/O
I/O
Normal Operation
System Split Upgrade
Resynchronization
- Concept Overview
- Mission critical applications run at 100 on side
A. - Upgrades performed on Side B while it is offline.
- Application is turned off and turned back on
directed to the system disks on Side B.
Application downtime is only 30-100 seconds. - Full restore option if upgrade causes undesirable
server behavior
11VMware FT Virtual Server Environment
- Microsoft
- Win 2000
- XP
- NT 4.0
- Win Me
- Win 98
- Win 95
- Win 3.1
- MS-DOS
- Linux
- RH AS
- RH 6.2 9.0
- SuSE Ent 7, 8
- SuSE 7.3 9.1
- Turbo 7.0, 8.0
- Mand 8.0-9.2
Guest Operating Systems
- Netware
- 6.5 SP 1
- 6.0 SP 3
- 5.1 SP 6
- 4.2 SP 9
- FreeBSD
- 5.0 5.2
- 4.8 4.9
- 4.0- 4.6.2
Solaris x86 Platform Edition 9
Virtualization Layer
VMware GSX Server 3.1
Host Operating System
Microsoft Windows Server 2003
Memory 512MB minimum up to 16GB for Windows
Enterprise Edition
Hardware Requirements
Space 130GB for the Host plus 1GB minimum for
each virtual stack
12Benefit 4 Remote Server Management
Industry standard SNMP based management software.
- Integrated management
- Standalone
- Remote (In Out-of-band)
- Complete state coverage
- System boot
- Operating system
- SNMP Based
- Open standards
- Standard MIB interface for linking to Tivoli,
OpenView, UniCenter, etc.
Module Control
Module Level System Information
On-LineDiagnostics
13320Ma Server Family
- 320Ma DC - Dual Core CPUs Data Center
Performance Server - Equivalent to almost 4 x 2.8GHz Xeon CPUs logical
- Supports up to 16GB Memory
- 320Ma 3.6GHz 2 x 3.6GHz CPUs Ideal Virtual
Host - Supports up to 16GB memory
- Includes riser card
- 320Ma 3.2GHz 2 x 3.2GHz CPUs App Platform
- Supports up to 8GB memory
- Ideal SMB or departmental app platform
- 320Ma Single 1 x 3.2GHz CPUs Volume
- Supports up to 4GB memory
- No support for Active Upgrade
- Not Upgradeable to dual CPUs
- Volume purchases only
14NEC Storage S1500
- Compact Design
- 15 Drives and Electronics in single 3U enclosure
(1 hot standby) - Supports up to 4 enclosures
- Total of up to 60 Drive Bays
- Total of 13.8TB with 300GB HDDs
- Performance
- Fiber Channel Optical HBA
- 4GB/s HBA
- Integrated Processor (s)
- 1MB of Cache per controller
- RAID 0,1,5,10, 50, 6
- Reliability w/Redundant
- I/O Paths
- RAID Cache Controller(s)
- Power Units Batteries
15Benefit 5 TCO and Performance vs Clusters
Fault Tolerant Server
App A OS A
Processor I/O Module
Processor I/O Module
Lockstep
RAID
App A OS A
Processor I/O Module
X
Lockstep
RAID
16FT Total Cost of Ownership Model
NEC FT should be judged based on Total Cost of
Ownership, not Price/Performance. THE NEC FT TCO
beats a 2 node cluster.
NEC ft
Cluster
Install
Service
Admin.
Software
Outage
Hardware
17FT vs. Cluster Comparison
FT series
Cluster solution
18FT vs. Cluster Total Cost Comparison
1-Year Total Cost-of-Ownership study 320Ma vs.
HP DL380 G4
On average 9 hours per year
19FT Total Cost of Ownership vs GP servers
NEC FT should be judged based on Total Cost of
Ownership including the cost of downtime.
NEC ft
GP server
Install
Service
Admin.
Software
Outage
Hardware
20TCO FT vs GP Hot Stand By
NEC 320Ma 3.2 versus HP DL380 G4 Standalone
Server with Cold Standby
21Blade Servers
- General purpose servers in a smaller form factor
- All reliability issues for general purpose
servers apply to blade servers - High availability is achieved with software
clusters
Expansion Slot
Server Control Processor (BMC)
Intel Processor
Intel Processors
CPU Cooling Fan
Express5800/120Ba-4
Memory
Network, SCSI I/F
22Disaster Tolerance Solution
- Since 9/11 disaster tolerance has become a
critical requirement for IT Directors - The challenge
- Protect mission critical data in the event of the
destruction of local computing resources - Allow surviving resources to continue working
with access to the latest data - Dont break the IT budget to do it.
23FT Disaster Tolerant Solution
Site A
Site B
Corporate Network
LAN
LAN
R
R
WAN (T1-1.5Mbps, 60ms RT latency)
FT Server (320Ma)
FT Server (320Ma)
Protection against unexpected HW failures
External Storage (S1500)
External Storage (S1500)
24Disaster Scenario
- Site A 40 SQL Database Users logged into local
FT server when a fire breaks out in the computer
room. - Failover
- Users lose access to the server (A)
- When network loses contact it initiates a
failover using Floating IP Address or Floating
IP Name. Failover in about two minutes. - Users log back into accounts but they are now
running on remote server B via corporate with
full up-to-date data.
Site B
Site A
Corporate Network
LAN
LAN
WAN
FT Server
FT Server
25Disaster Scenario (continued)
- Recovery
- Servers reconnect and database begins resynch of
changed or new data - Once synchronization completes, reset can be
automatic or on-command - Automatic As soon as the databases are ready
users lose connection to remote server. When
they re-login they are directed to local server - On-command IT managers can initiate reset later
in day with appropriate warnings to users
Site A
Site B
Corporate Network
LAN
LAN
WAN
Resynch
FT Server
FT Server
26Conclusions
- 5 Customer Benefits of the FT Server
- 99.999 Uptime
- Easy Maintenance
- Single copy of Standard Software
- Great Remote Management Tools
- FT TCO Beats Software Clusters
- Disaster Recovery Solution
- No Data Loss
- Less than 4 minutes of application loss
- No reprogramming of users or devices
- Affordable to Small/Medium Businesses
27(No Transcript)