Title: The Data Centre Crisis
1The Data Centre Crisis
- Presentation to the Computer Sciences Course,
University of Warwick - by William Lees
- Executive Director, Information Technology
- UBS Investment Bank
2Agenda
- UBS Investment Bank who we are
- Crisis, What Crisis? the insatiable demand for
hardware - The Solution? More space!
- The Solution More technology!
- Your Challenge
- Conclusion and FAQ
3Who We Are - http//www.ibb.ubs.com
4The UBS-IB global IT organization in 2005
UBS-IB IT Development, Production Staff and
Contractors
Staff in locations not shown on map - other
Europe 53 - other APAC 55 - other America
63
Source IT PAC Sept 2005
5,015 UNIX servers 5,087 NT Wintel
servers 3,874 Linux servers 36,772 workstations
7,743 technologists 1,069,233 p.a. help desk
calls 9,927 market data users 232 sites in 140
cities
99,887 LAN ports 50,593 telephony ports 855
routers 314 firewalls
Source Production Sept 05
5The Stamford, Connecticut Trading Floor
6Crisis, What Crisis?
7Trends
- Server Population has doubled in 4 years power
consumption rising even faster - 24 have CPU utilisation lt15, 1 above 90
- 26 are 3 years old
8Whats behind the growth?
- General business trends 2005 Q3 operating
income up by 46 compared to Q3 2004 - Transaction rates hedge funds and program
trading in general are generating much higher
volumes - Sarbanes / Oxley tighter regulatory environment
has placed more emphasis on business continuity - New hardware is more power-hungry
- Hardware has trended towards use of multiple
blade servers rather than a smaller number of
larger units
9Blade Servers
Broil IBM Blade servers 320 Blades 640 CPUs
1664 GHz!
- Cabinet with 2 Blade chassis, with 14 Blades in
each - multiple cabinets in each of 3 locations
- 2 Blade chassis, with 14 Blades in each running
Windows 2000 Advanced Server - Chassis contain redundant power, fans network
switches and Gigabit backplane
- Inside a Blade server
- 2x 40Gb IDE hd (top left)
- 4Gb memory (centre)
- 2x P4 Xeon 2.6 GHz CPU (bottom)
10Application architectures can be complex...
WebSphere
MQ / RV / WME
Verity
OID / Metadir
Apache
111 application requires multiple servers maybe
8-10 in this scenario
We will also need a Business Continuity standby
configuration, UAT and development hardware so
1 application can easily require gt30-40 servers!
12The Result
- Data centres are running out of space to
accommodate new servers - Where theres space, were running out of power
or cooling capacity - Time-to-market is affected
- Much energy is being put in to administrative
tasks prioritisation, reconfiguration,
decommissioning unwanted equipment quickly
13The Solution?
14A new purpose-built Data Centre
- 15,000 server capacity, in 2 phases
- 500M. First phase ready in 3-4 years
- Current server growth 4000 servers per year,
so this represents 4 years capacity!
15Service Outsourcing even this has long lead
times
Oct
May
Jun
Jul
Aug
Sep
Nov
Ordering/Delivery
Initiation
Planning
Deployment
Infra-Build
CRE/ Facilities
Sign Lease
Power / HVAC
100 cabinets
Data Center Operations
Design Layout
Del . Instal
Order Delivery
Cabinets
Cable NetCabs
Cable InfraCabs
Cable rest of Cabs
Cables
Networks
Distance Test
Design
OrderDelivery
Inst. Test
WAN(fiber)
OrderDelivery
Inst. Test
Design
LAN(Cables)
OrderDelivery
Inst. Test
Design
Voice
Servers Storage
Design
Provisioning
Build
Install Infra
Migrate Business
UNIX
Provisioning
Build
Install Infra
Migrate Business
Design
To Apr 2006
WINTEL
Provisioning
Build
Install Infra
Migrate Business
Design
DB
Provisioning
Build
Migrate Business
Install Infra
Design
Business Occupation
Design, Build, Populate Capacity DB
Business Forecast/ Migration Policies
Schedule Migration
16Conclusions
- Increasing Data Centre capacity is long term, and
very capital intensive - Outsourced/co-located space can offer short term
relief - But even if we do all we can, current growth
rates will be difficult and expensive to
accommodate - Hardware today just isnt meeting application
requirements - Too difficult/risky to concentrate multiple
applications on a single server - Business continuity capacity left idle
- Development/Test servers under-utilised
- Our average server utilisation is only 27. If we
could concentrate usage enough to double this, we
wouldnt need to buy another server for three
years.
This is a very simplistic assertion. Why?
17The Solution?
18Utility Computing
- The only technology label more confusing to
buyers than grid computing is utility
computing. Saugatuck Research - VMWare (a division of EMC) turned up at the top
of the IT spend pack, underlining the hot
demand for server virtualization technology
Goldman Sachs - IDC study found 21 definitions for grid
computing across seven different technical
domains.
19Utility Computing Business Properties
20Evolutionary vs. Revolutionary
21The Utility Space and the Value Curve
Tradable cycles ?
Partner Grid
Comprehensive provisioning
Enterprise Grid
Regional sub-Grids
Network-attached processors
Agility and Cost Savings
Savvis-style xSP utility
Stream sub-Grids
Server Virtualization
Shareable grid
Strategic grid product
Desktop cycle scavenging
Virtual workstations
Blades
Bespoke Workload Distribution
Storage Filers
Adoption of Enabling Technologies Over Time
22Technology Example Grid Computing
23How do Grids Help? Cycle Scavenging
- What is Cycle Scavenging?
- A technique used to aggregate unused, existing,
distributed, idle resources to provide large
amounts of compute power. (Seti_at_Home) - Machines are added and removed from scavenging
systems in an ad-hoc, random and unpredictable
manner as primary users stop and start using
their machines. - How does it work?
- 3GHZ,1MB Dell GX280 desktop PC can provide
60-70 capacity of a dedicated engine when
running during non business hours only - Earlier this year, our first project went live
with cycle scavenging using more than 350
workstations, saving more than 1mio in
equivalent hardware cost and enhancing SLA
performance - In a typical firm, up to 40 of the
cycle-scavenged grid may be offline or
unavailable at any moment in time. There is no
discernible effect on the end-user. - What are the problems?
- Grid software license fees may be the same
whether dedicated 3.6GHz blade or old 1GHz
desktop at 60 performance for 40 time - With cycle scavenging, grid becomes "the new
black - everyone wants in. Again, this can be
expensive on licensing. - Scheduling flexibility becomes very important,
but better scheduling can result in less
predictable performance. - Grid/Cycle Scavenging is only suitable for
certain types of task specifically compute
intensive tasks that can be broken into many
independent small chunks. The application has to
be specially designed to work with it.
24Technology Example OS Virtualisation
25OS Virtualisation - Continued
- Takes full advantage of the hardware by running
several operating system instances on each
server - New Virtual Servers can be deployed rapidly, and
moved from server to server to balance
performance - Typically limited to Intel technology (as opposed
to Sun Solaris) but Windows and Linux instances
can be mixed - Limit on number of instances will tend to be
memory-related memory will max out before CPU - Is the technology mature? There are multiple
products available and quite a few case studies,
but you need to read between the lines and come
up with your own analysis (this applies to all
three technologies not just this one)
26Technology Example Dedicated Appliances
27Appliances - Continued
- Appliances perform one particular task better
than a general purpose server in this case Java
computing - Because of their specialization they can only
replace a proportion of the overall server
population (and may limit future flexibility) - Application server appliances are typically best
matched to scalable, highly concurrent workloads,
not so well matched to linear, compute intensive
workloads
28Your Challenge
- You are head of Application Infrastructure at a
large financial institution. The data centres are
completely congested and the demand for new
servers is insatiable key projects are running
late because they cannot obtain servers on time.
It's clear that even the very large investments
you have in place to provide additional data
centre capacity will barely keep up unless
something changes, the problem could affect your
organisation's ability to deliver new services to
clients for years to come. Various thought
leaders in the company have come up with
solutions and are lobbying hard for their
adoption. The most promising seem to be related
to 'utility computing' - specifically Grid, OS
Virtualisation, and Application Applicances. Your
Engineering department tells you that all of
these have merit, but they would each take at
least 12 months to implement and they certainly
couldn't take on more than one at a time. You
know from experience that the 12 month figure is
an underestimate... - Using the presentation material and information
from the Internet, analyse the ability of these
three technologies to mitigate the problem, and
select one to implement that you feel will
provide the greatest benefit. If you feel there's
a better technological approach, feel free to
include it as well. Make sure you consider cost,
risks and threats as well as the benefits of each
approach.
29Conclusion
- UBS IB is a large graduate / postgraduate
employer. We have a popular summer internship
programme for students about to enter their final
year or postgraduate year - All information and application forms are on our
website http//www.ibb.ubs.com