Title: The Modern Data Center Topology
1The Modern Data Center TopologyThe High
Availability Mantra
2GreenField Software
- Company
- GreenField Software is a privately held, early
stage Indian (Kolkata-based) software company
looking to be a globally recognized player in
Cloud-based Intelligent Infrastructure Management
- Mission
- GFS delivers pioneering Cloud-based Intelligent
Infrastructure Management solutions to improve
operational and energy efficiencies, safety and
environmental conditions of facilities with
critical infrastructure. - Vision
- Our Cloud-based Intelligent Infrastructure
Management solutions help our customers to - Optimize capex, reduce operating costs and
mitigate risks of critical infrastructure
failures - Improve Sustainability through improved energy
management and safety of their employees and
other stakeholders using the facilities
3Partners Customers
Higher Education
Oil Gas
Media House
Financial Services
Telecom
Power Utility
4Todays Topics
- The Modern Data Center Overview
- The High Availability (HA) Mantra
- Operating Challenges
- A Solution
5Modern Data Center Overview
6Multiple Classes of Data Centers
- Internet Data Center
- used by external clients connecting from the
Internet - supports servers and devices required for B2C
transaction-based applications (e-commerce). - Extranet Data Center
- provides support and services for external B2B
partner transactions. - accessed over secure VPN connections or private
WAN links between the partner network and the
enterprise extranet. - Intranet Data Center
- hosts applications and services mostly accessed
by internal employees with connectivity to the
internal enterprise network. - ness services.
- Special Purpose Data Center
- For specialized application areas like Geological
Geophysical for Oil Gas Industry - May or may not be inter-connected
7Common Objective Business Continuity
- Disaster Recovery Data Center
- Each Class may have dedicated or Shared DR Center
- Usually located separately from Primary Data
Center - High Availability (HA) Data Center
- Each Data Center provided for with significant
redundancies - DR Center comes into play only when a Disaster
strikes. - Component or system failures within any DC should
be either self-healing or redundancies within the
DC should take over - Insurance Against Power Network Outages
- Reliability through multiple service providers
- Internal Back-ups
- ness services.
- Securing the Data Center
- Against malicious hacking that can bring down the
Data Center impacting business continuity - Implementing Firewalls/ Virtual Firewalls
8Common Complexity Multitude of Assets
- Multitude of Assets
- Divided between two worlds IT Facilities
- Includes Mission Critical Applications
- Like a manufacturing operation
- Raw Material Power Networks
- Processing Data
- Output Information Service
- Needs Asset Management, Resource Optimization, a
la Manufacturing
9The High Availability Mantra
10Todays High Availability Data Center
Extreme Redundancies for 99.99 Uptime -gt Higher
Power Consumption
Huge Population of N1/N2 Equipment -gt Asset
Under utilization Too complex to manage
with spreadsheets Visio tools
Chain of inter-dependent equipment -gt Multiple
points of failures
KW per Rack increases as more processing
capacity is added -gt Trade-offs need to support
more per rack versus extra space heat loads.
Growing Heat Loads, Carbon Emissions e-waste -gt
Sustainability Issues
High Availability is Inversely Proportional to
Asset Utilization Energy Efficiency
11When HA fails - Tale of Two Disasters
- Tech fault at RBS and Natwest freezes millions of
UK bank balances - RBS and Natwest have failed to register inbound
payments for up to three days, customers have
reported, leaving people unable to pay for bills,
travel and even food. The banks - both owned by
RBS Group - have confirmed that technical
glitches have left bank accounts displaying the
wrong balances and certain services unavailable.
There is no fix date available.
Amazon cloud outage takes down Netflix,
Instagram, Pinterest, Â more With the critical
Amazon outage, which is the second this month, we
wouldnt be surprised if these popular services
started looking at other options, including
Rackspace, SoftLayer, Microsofts Azure, and
Googles just-introduced Compute Engine. Some of
Amazons biggest EC2 outages occurred in April
and August of last year.
Which Will Be The Next One?
12Whats the High Availability Mantra?
Availability Downtime per year Downtime per month Downtime per week
99 ("two nines") 3.65 days 7.20 hours 1.68 hours
99.5 1.83 days 3.60 hours 50.4 minutes
99.8 17.52 hours 86.23 minutes 20.16 minutes
99.9 ("three nines") 8.76 hours 43.8 minutes 10.1 minutes
99.95 4.38 hours 21.56 minutes 5.04 minutes
99.99 ("four nines") 52.56 minutes 4.32 minutes 1.01 minutes
99.999 ("five nines") 5.26 minutes 25.9 seconds 6.05 seconds
99.9999 ("six nines") 31.5 seconds 2.59 seconds 0.605 seconds
99.99999 ("seven nines") 3.15 seconds 0.259 seconds 0.0605 seconds
- Amazon Data Centers (built to Tier 4 standards
and with an expected availability of 99.995) has
had two outages already in 2012 each over 3
hours! - Tier 3/Tier 4 just defined by hardware
redundancies - Glaring gaps in operating procedures to prevent
fatal human errors - Lack of purpose-built BCP software to predict
failures - Lack of chain of custody to detect root cause
-
13Delivering the High Availability Promise
- Adequate Redundancies
- Are there any points of failure besides power
and external networks - that can impact uptime?
(Not everything is N1) - What are my redundancy paths?
- Are the relationships dependencies among
critical assets clearly defined? - Can I do an impact analysis on the
outage/downtime of any equipment? Can I predict
the cascading effect of such an outage on other
assets/applications in the data center? - Preventing Failures
- Can any failure be predicted to take proactive
measures? Do I get alerts on threshold breaches
so that I can take preventive actions before a
failure happens? - Is there a history of a Move-Add-Change (MAC)
that I should be aware of? - What is the impact of a MAC on space, power,
cooling? - Where can new devices/servers be best placed?
Floor -gt Rack -gt Cage. How this can be determined
based on current infrastructure and other
dependencies to avoid a failure? - How do I prevent a fatal human error?
-
14Operating Challenges
15The High Availability Challenge
- Lack of HA Management Tool
- Too many assets two classes of assets
- Absence of Software Portfolio (even if hardware
assets are tracked) - Move-Add-Change Decisions not based on
simulations, analysis - Absence of change management
- Absence of workflow approvals
- Unable to predict failures
- No chain of custody
- IT assets tracked by Systems Management Tool
- Facilities assets tracked by BMS
- Two not inter-operable Unable to determine
missing link for HA - Unable to track redundancy paths
- HA fails if any equipment or software in critical
path fails - HA fails if theres fatal human error
- Health and history of equipment, or previous MAC
impact, not tracked
Need to Predict Failures
16Beyond HA Infrastructure Operational Challenges
- Higher power consumption growing power bills
- Not monitoring power use at device levels
- Dissemination of enormous heat
- Creation of hot spots
- Drastic reduction in expected life of computing
equipment - Failing of a data center
- Increase in CO2 emission
- Low level asset tracking
- Under utilization of many computing resources
- Running of old inefficient equipment
- Decisions not based on analysis
- Cooling not optimized
- Floor Rack Space Non-optimal placements of
equipment - Increasing demand for rack space
- Absence of capacity planning
17A Solution
18Solution That Bridges the Gap Between IT
Facilities
Data Center Infrastructure Management (DCIM)
Software
19Solution That Addresses The High Availability
Challenge
- Lack of HA Management Tool
- IT assets tracked by Systems Management Tool
- Facilities assets tracked by BMS
- Two not inter-operable Unable to determine
missing link for HA - Unable to track redundancy paths
- HA fails if any equipment or software in critical
path fails - HA fails if theres fatal human error
- Health and history of equipment, or previous MAC
impact, not tracked
- Too many assets two classes of assets
- Absence of Software Portfolio (even if hardware
assets are tracked) - Move-Add-Change Decisions not based on
simulations, analysis - Absence of change management
- Absence of workflow approvals
- Unable to predict failures
- No chain of custody
DCIM Helps to Predict Failures
20Solution That Addresses Infra Operational
Challenges
- Higher power consumption growing power bills
- Not monitoring power use at device levels
- Dissemination of enormous heat
- Creation of hot spots
- Drastic reduction in expected life of computing
equipment - Failing of a data center
- Increase in CO2 emission
- Low level asset tracking
- Under utilization of many computing resources
- Running of old inefficient equipment
- Decisions not based on analysis
- Cooling not optimized
- Floor Rack Space Non-optimal placements of
equipment - Increasing demand for rack space
- Absence of capacity planning
DCIM Improves Energy Operational Efficiencies
21Anatomy of a DCIM Software GFS Crane
22Thank Youhttp//www.greenfieldsoft.comEmail
sales_at_greenfieldsoft.com
23See alsoData Center Infrastructure Management
ERP for the Data Center Manager