Title: Robust Design of Air Cooled Server Cabinets
1Robust Design of Air Cooled Server Cabinets
- Nathan Rolander
- CEETHERM Review Meeting
- 16 August 2005
Systems Realization Laboratory
Microelectronics Emerging Technologies Thermal
Laboratory
METTL
2Outline
- Motivation problem statement
- Design challenges constructs
- Introduction to constructs models
- 2D cabinet results
- Optimal vs. Robust configurations
- 3D cabinet simulation
- Experimental validation of 3D simulation
- Conclusions
3Background What is a data center?
- 10,000-500,000 sq. ft. facilities filled with
cabinets which house data processing equipment,
servers, switches, etc. - Tens to hundreds of MW power consumption for
computing equipment and associated cooling
hardware - Trend towards very high power density servers (30
kW/cabinet) requiring stringent thermal management
Image B. Tschudi, Lawrence Berkeley
Laboratories
4Introduction Motivation
- Up to 40 of data center operating costs can be
cooling related - Cooling challenges are compounded by a lifecycle
mismatch - New computer equipment introduced 2 years
- Center infrastructure overhauled 25 years
How do we efficiently integrate high powered
equipment into an existing cabinet infrastructure
while maximizing operational stability?
Source W. Tschudi, Lawrence Berkeley
Laboratories
5Cabinet Design Challenges
- Flow complexity
- The turbulent CFD models required to analyze the
air flow distribution in cabinets are impractical
to use iterative optimization algorithms - Operational stability
- Variations in data center operating conditions,
coupled with model inaccuracies mean computed
optimal solutions do not translate to efficient
or feasible physical solutions - Multiple design objectives
- Objectives of efficient thermal management,
cooling cost minimization, operational
stability are conflicting goals
6Approach Overview
- Integration of three constructs to tackle cabinet
design challenges
7Introduction to the POD
- Modal expansion of basis functions,
- Fit optimal linear subspace through
- a series of system observations, .
- Maximize the projection of the basis functions
onto the observations
f
u
8Introduction to the POD
- Modal expansion of basis functions,
- Fit optimal linear subspace through
- a series of system observations, .
- Maximize the projection of the basis functions,
onto the observations
f
u
Constrained variational calculus problem
lt , gt denotes ensemble averaging
( , ) denotes L2 inner product
9Introduction to the POD
- Modal expansion of basis functions,
- Fit optimal linear subspace through
- a series of system observations, .
- Maximize the projection of the basis functions
onto the observations
f
u
Assemble observations covariance matrix
lt , gt denotes ensemble averaging
( , ) denotes L2 inner product
10Introduction to the POD
- Modal expansion of basis functions,
- Fit optimal linear subspace through
- a series of system observations, .
- Maximize the projection of the basis functions
onto the observations
f
u
lt , gt denotes ensemble averaging
Take cross correlation tensor of covariance matrix
( , ) denotes L2 inner product
11Introduction to the POD
- Modal expansion of basis functions,
- Fit optimal linear subspace through
- a series of system observations, .
- Maximize the projection of the basis functions
onto the observations
f
u
lt , gt denotes ensemble averaging
Take eigen-decomposition of the cross-correlation
tensor
( , ) denotes L2 inner product
12POD Based Turbulent Flow Modeling
- Vector-valued eigenvectors form empirical basis
of m-dimensional subspace, called POD modes - Superposition of modes used to reconstruct any
solution within the range of observations 10
error - Flux matching procedure applied at boundaries gtgt
areas of known flow conditions, resulting in the
minimization problem - Values of found using method of least squares
- Resulting model has O(105) reduction in DoF
G is the flux goal
F(.) is contribution to boundary flux from the
POD modes
a is the POD mode weight coefficient
ai
see Jeff Rambos presentation for complete
analysis
13Robust Design Principles
- Determine superior solutions through minimizing
the effects of variation, without eliminating
their causes. - Type I minimizing variations in performance
caused by variations noise factors
(uncontrollable parameters) - Type II minimizing variations in performance
caused by variation in control factors (design
variables) - A common implementation of Type I robust design
is Taguchi Parameter Design
14Robust Design Application
Y
Objective Function
X
Design Variable
15Robust Design Application
X2
Feasible Design Space
Design Variable
Constraint Boundary
X1
Design Variable
16The Compromise DSP Mathematics
- Hybrid of Mathematical Programming and Goal
Programming optimization routines
17The Compromise DSP Formulation
- Formulated as text-book problem statement
18Problem Geometry
- Enclosed Cabinet containing 10 servers
- Cooling air supplied from under floor plenum
Cabinet Profile
Server Profile
19Cabinet Modeling
- 9 Observations of Vin 00.252 m/s for POD
- k-e turbulence model for RANS implemented in
commercial CFD software (FLUENT) - Finite difference energy equation solver used for
thermal solution, using POD computed flow field - 1 iteration 12 sec
Vin 0.95 m/s
20Design Variables Objectives
Server Cabinet Model
21Design Variables Objectives
Server Cabinet Model
22Design Variables Objectives
iterate
Server Cabinet Model
23Results
- Baseline vs. Maximum efficient power dissipation
- Without server power re-distribution, increasing
flow of cooling air alone is ineffective
24Results
- Inlet air velocity vs. Total cabinet power level
- Cooling air is re-distributed to different
cabinet sections depending upon supply rate gtgt
server cooling efficiency
25Results
- Maximum chip temperature and bounds
- Maximum chip temperature constraint met as
variation in response changes with varying power
flow rates
26Robust vs. Optimal Solution
- Investigate the difference in performance
requirements between a robust and optimal
solution - Changes in design parameters do not change
linearly with change in weighting - Plot response for full weighting of minimize
inlet velocity goal to full weighting of minimize
response variation goal - Test for a fixed cabinet power of 2kW
27Effects of Robust Solution
- Optimal gtgt Robust Temperature Variation
28Power Loading Configuration
- Optimal gtgt Robust Power Profile
29Robust vs. Optimal Pareto Frontier
- Pareto Frontier used to show bounds of feasible
design space variable interactions
- Optimal Solution
- Robust Solution
303D Cabinet Study
- Increasing complexity to full scale 3D cabinet
simulation - Experimental mock blade server cabinet modeled
simulated - Investigation Goals
- Test PODc modeling for complex 3D flow
- Compare CFD, POD model experimental results
31Cabinet Geometry
- Model based on experimental cabinet
- Cabinet 2 x 0.6 x 0.8 m
- 7 blade servers, 10 blades per server
- Single chip on each blade
- Alternating blades blank
- Geometry simplified to unit length scale
32Server Geometry
- Servers 0.72 x 0.44 x 0.132 m
- Blades 0.36 x 0.132 x 0.0016 m
- Chip 32 x 32 mm
- FR4 modeled as anisotropic material with shell
conduction - 1oz Cu deposition on surface of FR4
33Flow Boundary Conditions
- Velocity Inlet
- Outlet Fan
- Internal Fan
- Servers FR4
34CFD Flow Results Cross-section
Vin 1.625 m/s
35CFD Flow Results Server Profiles
36CFD Flow Results Server Flows
37Simulated Temperature Response
- Max chip temperature for all servers blades
38PODc Modeling Accuracy
- U covariance matrix u v w
- 8 Observations of 0.250.258 m/s Vin
39Complete Cabinet Simulation
- PODc input into FLUENT as interpolation file
- Flux matching applied for velocity, k ,epsilon
- k epsilon reconstruction slightly less accurate
than velocity but lt 15 error - FLUENT used to compute energy equation
- Complete simulation used to find flow and power
distribution parameters for maximum reliable
cabinet power dissipation - Tradeoff studies further investigations
performed in thesis
40Mock Blade Server Cabinet
41Measurements
- Thermocouples at Tchip, Tin, Tout
- Running linear regression of last 20 data points
gtgt slope lt 1e-3 for steady state measurement - 100 points taken _at_ 2Hz
- Power measured using precision resistor using
powers of 4,8,12 W
42Experimental Temperature Response
43Experimental vs. Simulated Results
Difference (Experimental Simulated Chip
Temperatures)
44Simulation Accuracy Analysis
- Average temperature difference 1 oC
- Largest difference is lowest server gtgt intricate
flow obstructions not modeled - Blade 10 experimental result higher as model fans
placement are spread evenly in server - True anisotropic thermal conductivity of FR4
unknown without expensive testing - Trends are accurately captured
45Conclusions
How do we efficiently integrate high powered
equipment into an existing cabinet infrastructure
while maximizing operational stability?
46Conclusions
- For the typical enclosed cabinet modeled, over
50 more power than baseline can be reliably
dissipated through efficient configuration - Robust solutions account for variability in
internal external operating conditions, as well
as a degree of modeling assumptions inaccuracies - Server cabinet configuration design can be
accomplished without center level re-design - PODc flow model is highly accurate even for
complex 3D flows
47Questions?
- Thank you for your attention!
48Statistical Analysis of Results
4 W
12 W
8 W
Linearity Test R-Sq 0.998 99.8 of temperature
variation is caused by changing power load on
heaters
49Inlet Velocity Tradeoff Study
50Final Validation
- Comparison of results obtained using robust
design and compact model to FLUENT
51Obtaining Cabinet Flow Rates
- Top fan rated _at_ 550 CFM
- Flow Hood Measured _at_ 430 CFM
- Also can back out standard deviation of flow
rates for modeling optimization work
52Floor Tile Analysis
53Floor Tile Analysis
54Floor Tile Analysis
55Floor Tile Analysis
56Current Work
- Currently Optimizing 3D cabinet model
- Using experimental results for accurate estimates
of variation