Title: ThermalAware Scheduling in Environmentally Coupled CyberPhysical Distributed Systems
1?Thermal-Aware Scheduling in Environmentally
Coupled Cyber-Physical Distributed Systems
- Qinghui Tang
- Committee
- Dr. Sandeep Gupta
- Dr. Martin Reisslein
- Dr. Loren Schwiebert
- Dr. Cihan Tepedelenlioglu
- Dr. Junshan Zhang
Sponsors
2Presentation Outline
Background and Motivation Unified Thermal-Aware
Approach Applications of Thermal-Aware
Scheduling Summary of Research Results Conclusions
3Background and Motivation
- What are Cyber-Physical Systems (CPS)
- Computing systems tightly coupled with physical
world - Environmentally coupled CPS (ECCPDS).)
- Applying interference on system itself and the
surrounding environment - Increasing deployment of distributed system
- Sensor networks
- Pervasive computing
- Grid/cluster computing
- Existing approaches and methodology did not take
into account the interference and interactions
among systems and environment - Emerging new systems require new methodology and
approach - Cross disciplinary, more complicated applications
4Environmentally Coupled Distributed CPS
- Terminologies
- Interference
- the negative impact to the environment which
- Self-interference
- Environmental interference
- Cross-interference
- Interference models
- Quantitative model
- Temporal model
- Spatial model
- Comprehensive model
- Individual design approach
- Network/system operation approach
- Task scheduling
5Thermal-Aware ECCPDS
- We focus on thermal related applications because
- Correlation between heat dissipation and power
consumption (energy efficiency) - Correlation between temperature change and
reliability - Importance of energy efficiency system lifetime
- Direct impact on embedded environment
- Green technology is the new trend
- Energy efficient and environmentally friendly
6Examples of Task Scheduling of Cyber-Physical
Systems
- Implanted biomedical sensor networks are used for
prosthesis or monitoring - Sensor nodes work in shift to accomplish the
assigned task - task scheduling in temporal domain
- Server farms inside data centers
- Heat dissipation of one server may heat up other
servers - task scheduling in spatial domain
2
4
3
1
7Unified Thermal-Aware Scheduling for ECCPDS (1)
- A Cyber-Physical system with N nodes interacting
with each others - A scheduler assigns the total task Ctotal into a
task vector , resulting in a power
consumption vector - Each node
- performs a subset of the total task Ctotal
- consumes power in certain rate
- experiences temperature change Ti depending on
other nodes power consumption - System objective function W depends on node
temperatures (and task assignments)
7
8Unified Thermal-Aware Scheduling for ECCPDS (2)
- Problem Statement
- How to divide the total task Ctotal into C2,Cn to minimize/maximize the objective
function W - Generalized Approach
- Step 1 Profiling the correlation between power
consumption, task and temperature rise function
Gi(?) - estimation, measurement or profiling
- Step 2 Characterizing the thermal interference
function Fi(?) and building fast thermal
evaluation method - Step 3 Formalizing the objective function
function H(?) - Step 4 Exploring design space find the best
scheduling
8
9Related Work
- Previous research on minimizing thermal
interference - focused on individual design approach instead of
system operation approach - Used numerical method for thermal evaluation, and
was not appropriate for online and real-time
scheduling - failed to consider the cross interference applied
by neighboring nodes
9
10Dissertation Contributions
- Proposed a unified methodology and analytical
technique of analyzing and designing
interference-minimized distributed systems - Verified in two thermal applications
- Can be applied to other forms of interference
(i.e. sonic - Verified the methodology by applying the approach
on two vastly different applications - Built an abstract heat model for fast thermal
evaluation and power consumption prediction - Thermal-aware task scheduling for biomedical
sensor networks - IEEE Tran. Biomedical Eng. 07
- DCOSS05
- Minimizing data center cooling energy cost
through thermal-aware task placement - IEEE TPDS special issue on Power-aware Parallel
and Distributed Computing - DASC06, ICISIP06, Cluster07, COMSWARE07
10
11Dissertation Contributions (cont.)
- Thermal-aware task scheduling for biomedical
sensor networks - Modeling thermal interference of implanted
biosensors - Identifying factors that minimize thermal effects
- Time-Space function for fast thermal evaluation
- Minimizing data center cooling energy cost
through thermal-aware task placement - Homogeneous data center with a single task
- Thermal-aware algorithm based interference
characterization - Heterogeneous data center with heterogeneous
tasks - Multiple tasks with different timing information
12- Application Example Task Scheduling of
Biosensor Networks
12
13Biosensor Scheduling Overview
- Implanted biomedical sensor networks are used for
prosthesis or monitoring - Sensor nodes work in shift to accomplish the
assigned task - Environment interference should be minimized
- It is task scheduling in temporal domain
- Task assignments for multiple time slots
- Ctotal 1
- Each slot only one node performing the task
2
2
4
4
3
3
1
1
13
14Biosensor Scheduling Step 1 Profiling the
correlation
- Profiling the correlation between power
consumption and temperature rise Gi(?) with
Pennes bioheat equation
Heat by radiation
Heat by power dissipation
Heat transfer by conduction
Heat accumulated
Heat by metabolism
Heat transfer by convection
14
15Biosensor Scheduling Step 2 Characterizing
Thermal Interference F(?)
- Characterizing cross interference between node i
and node j as a function of spatial distance and
temporal distance
2
4
3
1
Spatial Distance
3
2
1
4
Temporal Distance
15
16Biosensor Scheduling Step 3 and 4 Exploring
Design Space
- The objective function H(?)
- Searching the best scheduling sequence by using
Genetic Algorithm
16
17- Application Example Thermal Aware Task
Scheduling of Data Center
18Problem Statement of Task Scheduling in Data
Centers
- Given a total task C, how to divide it among N
server nodes to finish computing task with
minimal cooling energy cost ? - Self-Interference and cross-interference lead to
the temperature rise of inlet air, should be
minimized - Environment interference (room temperature) is
not critical - Task scheduling in spatial domain
Data Center with 4 servers
?
Task 30
19Conceptual overview ofthermal-aware task
placement
Server task distribution
Power consumption distribution
Temperature distribution
Energy cost
20Data Center Preliminary Layout
Inlet temperature Tin Must less than 25?C
Cold supply temperature Ts
20
21Data Center Preliminary Scheduling vs. Cooling
Cost
Inlet temperature distribution without Cooling
Inlet temperature distribution with Cooling
Scheduling 1
25?C
Scheduling 2
25?C
21
Minimizing the peak inlet temperature equals to
minimizing the cooling cost
22Data Center Step 1 Profiling the Correlation
Gi(?)
Server Power Consumption Pi Depending on amount
of computing task
Outlet Airflow
Inlet Airflow, a mixture of Supplied cold air and
Recirculated hot air
22
23Data Center Step 2 Characterizing Cross
Interference F(?)
- Heat Recirculation Coefficients
- Analytical
- Matrix-based
- Characterizing process
- Running CFD with various power consumption
scenario - Calculating recirculation coefficients based on
Law of Conservation of Matter and Energy - Using coefficients to predict temperature without
running CFD
Tin
Tsup
D
P
heat distribution
powervector
inlettemperatures
supplied airtemperatures
24Benefit Fast Thermal Evaluation
Extracttemperatures
Run CFD simulation (days)
Give workload
Courtesy Flometrics
D
Tin
Tsup
Yieldstemperatures
Give workload
Compute vector (seconds)
25Data Center Step 4 Explore Solutions
- Homogeneous data center with a single task
- Naïve algorithms without considering cross
interference - Thermal-aware algorithm based interference
characterization - Heterogeneous data center with heterogeneous
tasks - Multiple tasks with different timing information
26Recirculation Coefficients
- Consistent with data center observations
- Large values are observed along diagonal
- Strong recirculation among neighboring servers,
or between bottom servers and top servers
?1-4
?46-50
?1-5
?1-10
20
40
45
10
50
5
9
4
8
Victims
Sources
3
7
2
6
46
?1-40
1
27Fast Thermal Evaluation Results
- Thermal Evaluation
- Fast thermal evaluation
- Acceptable predict error less than normal
temperature fluctuation
- Energy Efficiency
- Consistently provide optimal or near-optimal
energy efficiency - Energy savings by 530 depending on utilization
rate
28Heterogeneous Data Center with Heterogeneous Tasks
Data Center with 4 servers
Change of solution Vector to matrix
28
Change of constraints
29Multiple Tasks with Different Timing Parameters
Data Center with 4 servers
Tasks 35, 30
Change of objective function Change of constraints
30Conclusion and Future Work
- Increasingly tightly coupled Cyber-Physical
Systems require new methodology to apply on new
applications - Proposed approach
- Characterizing complicated interference between
systems and embedded environment - Minimizing thermal effects
- Real-time online decisions
- Future work in biosensor networks
- Thermal-aware scheduling for multiple clusters
- Cross-cluster interference
- Applying interference minimization to coverage
and topology applications
30
31Conclusion and Future Work (cont.)
- Future work in data center management
- Overall data center operation cost
- Trade-off between cooling cost computing cost
- Hardware reliability model, trade-off between
energy cost and hardware cost - Multiple tasks with different priorities and
deadlines - Estimation of execution time
- Other Challenges in Environmentally Coupled
Cyber-Physical Systems - Online characterization
- without interrupting normal operation
- For the case where it is impossible to conduct
test and verification - Unknown environment
- Investigate the applicability of using the
methodology on other non-thermal interference - Chemical sensors to monitor enzyme reaction
- Minimizing the chance of being detected in a
hostile environment - Different approaches of modeling interference
- For the case where interference can not be
measured directly
31
32Questions ?
33Backup Slides for
33
34Review of Two Studied Cyber-Physical Applications
34
35Experience Obtained
Verify solution Performance comparison
Relax assumption
Formalizing problem Explore solutions
- Cross disciplinary research problems
- Challenging and promising
- Incremental Research Approach
- Extensive survey to identify existing problems
and gaps between existing solutions - Start with simplified system model, gradually
relax system assumption and obtain a more
realistic one
Modeling Interference
Characterizing Interference
Identify interference source impact
Problem Investigation
36System Model
Interference cause undesired Temperature rise
Heat Exchange
System performance depends on the thermal
distribution
37Characterizing the Interference Function F(?)
- Characterizing the interference applied to
neighboring nodes and the environment - Building heat model to characterize
- Power consumption of each node
- Heat dissipation of each node
- Thermal interference to other nodes
- Conducting fast thermal evaluation
- Replacing traditional numerical method to predict
thermal performance in realtime
A Task scheduling result
Numerical Simulation
Fast thermal evaluation
37
Temperature prediction
38Application Background
- What are data centers
- Server farms, IT centers, computer rooms
- Why they are important
- Centralized management, powerful computation
capabilities - Backbones of Internet Infrastructure
- Why thermal management is important
- Improve reliability
- Reduce system down time
- Save energy cost !!
- 400,000 annually to power a 1,000 volume
server-unit data center, then how much for this - More than 40 are cooling cost
39Data Center Step 2 Characterizing Cross
Interference F(?)
The amount of heat in outlet air
some recirculates to other inlets
- Recirculation coefficients
- Quantified description of recirculation
some returns to AC
- Characterizing process
- Running CFD with various power consumption
scenario - Calculating recirculation coefficients based on
Law of Conservation of Matter and Energy - Using coefficients to predict temperature without
running CFD
Power Consumption
The amount of heat in inlet air
consists of cold supply air
and recirculated heat
40Data Center Step 2 Fast Thermal Evaluation
- Based on Law of Conservation of Energy, after
some mathematical derivation, we have
Power Consumptions
Supplied cold air
Inlet Temperature
Recirculation coefficients
Constants depends on hardware specifications and
constant properties of air
41Data Center Step 3 Formalizing the Minimization
Problem H(?)
- Minimizing the maximal inlet temperature
- Can be converted into Linear or Non-linear
optimization problems
41
42Airflow Inside Data Centers
- Observation
- Airflow patterns are stable (confirmed through
CFD simulations)
- Hypothesis
- The amount of recirculated heat is stable, can be
quantified as recirculation coefficients - Define ?ij as the percentage of recirculated heat
from node i to node j
Courtesy Flomerics