Power management in Realtime systems - PowerPoint PPT Presentation

1 / 59

About This Presentation

Title:

Power management in Realtime systems

Description:

Power off un-used parts: LCD, disk for Laptop. Gracefully reduce the performance ... Alex Jones. PhD student: Ding Zhu. Dan Li (now doing AI) Shuyi Shao ... – PowerPoint PPT presentation

Number of Views:163

Avg rating:3.0/5.0

Slides: 60

Provided by: PAJ6

Category:

more less

Transcript and Presenter's Notes

Title: Power management in Realtime systems

1
Power management in Real-time systems
Collaborators Daniel Mosse Bruce ChildersPhD
students Hakan Aydin Dakai Zhu Cosmin
Rusu Nevine AbouGhazaleh Ruibin Xu
2
Power Management

Why?
Battery operated Laptop, PDA and Cell phone
Heating complex Servers (multiprocessors)
Power Aware maintain QoS, reduce energy
How?
Power off un-used parts LCD, disk for Laptop
Gracefully reduce the performance
CPU dynamic power Pd Cef Vdd2 f
Cef switch capacitance
Vdd supply voltage
f processor frequency ? linearly related to Vdd

3
Power Aware Scheduling

Static Power Management (SPM)
Static slack uniformly slow down all tasks
Gets more interesting for multiprocessors

fmax
Static Slack
D
E
T1
T2
idle
time
4
Dynamic Power management (DPM)

Dynamic slack average execution 10
Utilize slack to slow down future tasks
(Proportional, Greedy, aggressive,)

5
Stochastic power management
slack
ß1
6
Computing ßi in Reverse Order
T1
T2
T3
T4
7
Dynamic Speed adjustment techniques for
non-linear code
PMP
p1
p3
p2
min
average
max
At a
PMP

Remaining WCET is based on the longest path
Remaining average case execution time is based on
the branching probabilities (from trace
information).

8
Who should manage?
Run-time information
Static analysis
Compiler (knows future better)
OS (knows past Better)
9
Maximizing systems utility
(as opposed to minimizing energy consumption)
Energy constrains
Time constrains (deadlines or rates)
System utility (reward)
Increased reward with increased execution
Determine appropriate versions to execute
Determine the most rewarding subset of tasks to
execute
10
Many problem formulations

Continuous frequencies, continuous reward
functions
Discrete operating frequencies, no reward for
partial execution
Version programming an alternative to the IRIS
(IC) QoS model

Optimal solutions
Heuristics
EXAMPLE For homogeneous power functions, maximum
reward is when power is allocated equally to all
tasks.
Add a task
if constraintsis violated
no
yes
Repair schedule
11
Rechargeable systems
(additional constrains on energy and power)
battery
Available power
Use to store (recharge)
split
merge
consume
time
Schedulable system
Example

Solar panel (needs light)
Tasks are continuously executed
keep the battery level above a threshold at all
times
Frame based system
Three dynamic policies (greedy, speculative and
proportional)

12
Multiprocessing systems
13
Scheduling Policy

Partition Tasks to Processors

Each processor applies PM individually
Distributed

Global Queue

Global management
Shared memory

14
Dynamic Power Management

Greedy
Any available slack is given to next ready task
Feasible for single processor systems
Fails for multi-processor systems

15
Steaming Applications

Streaming applications are prevalent
Audio, video, real-time tasks, cognitive
applications
Executing on
Servers, embedded systems
Multiprocessors and processor clusters
Chip Multiprocessors TRIPS, RAW, etc.
Constrains
Interarrival time (T)
End-to-end delay (D)
Two possible strategies
Master-slave
Pipelining

T
D
16
Master-slave Strategy

Single streaming application
The optimal number, n, of active PEs strikes a
balance between static and dynamic power
Given n, the speed on each PE is chosen to
minimize energy consumption

Multiple steaming applications
Determine the optimal number of active PEs
Given the number of active PEs,
First assign streams to groups of PEs (ex
balance load using the minimum span algorithm).
Adjust the speed on each PE to minimize energy

17
Pipeline Strategy

(1) Linear pipeline ( of stages of PEs)

18
Pipeline Strategy

(2) Linear pipeline ( of stages of PEs)

Solution 1 (optimal)
Discretize the time and use dynamic programming

Solution 2 (use some heuristics)
(3) Nonlinear pipeline

of stages of PEs
Formulate an optimization problem with multiple
sets of constraints, each corresponding to a
linear pipeline
Problem the number of constraints (can be
exponential)
Solution add additional variables denoting the
finishing time for each stage
of stages of PEs
Apply partitioning/mapping first and then do
power management

19
Scheduling into a 2-D processor array (CMP)
A
B
C
D
E
F
G
H
I
J

Step 1 topological-sort-based morphing

Step 2 A dynamic programming approach to find
the optimal of stages and optimal of
processors for each stage

20
Tradeoff Energy Dependability
21
Time slack (unused processor capacity)
Use to reduce speed
Use for redundancy
Use to do more work
Fault tolerance
Power management
Increase productivity
space redundancy
Time redundancy
Effect of DVS on reliability
22
Exploring time redundancy
The slack is used to 1) add checkpoints 2)
reserve recovery time 3) reduce processing
speed
For a given slack and checkpoint
overhead, We can find the number of
checkpoints and the placement of
checkpoints Such that we minimizes energy
consumption, and guarantee recovery and
timeliness.
Energy
of checkpoints
23
TMR vs. Duplex
TMR
Duplex
r
overhead of checkpoint
p
ratio of static/dynamic power
r
r
TMR is more Energy efficient
Load0.7
0.035
Load0.6
Load0.5
Duplex is more Energy efficient
0.02
p
p
0.1
0.2
Identified energy efficient operating regions for
TMR/Duplex
24
Effect of DVS on SEU rate

Lower voltages ? higher fault rate
Lower speed ? less slack for recovery

Reliability requirement
Fault model
Available slack
Acceptable level of DVS
25
Near-memory Caching for Improved Energy
Consumption
26
Near-CPU vs. Near-memory caches
CPU

Caching to mask memory delays
Where?

cache
cache

Which is more power and performance efficeint ?

Main Memory

Thesis Need to balance the allocation of the two
for better delay and energy.

27
Near-memory caching Cached-DRAM (CDRAM)

On-memory SRAM cache
accessing fast SRAM cache ? Improves performance.
High internal bandwidth ? use large block sizes
Improves performance but consume more energy

Same config. as in Hsu et al., 2003
28
Power-aware CDRAM

Power management in near-memory caches
Use distributed near-memory caches
Choose adequate cache configuration
to reduce miss rate energy per access.
Power management in DRAM-core
Use moderate sized SRAM cache
Turn the DRAM core to low power state
Use immediate shutdown
Near-memory versus DRAM energy
tradeoff cache block size

29
Wireless Networks
Collaborators Daniel MossePhD student Sameh
Gobrial
30
Saving Power

Power is proportional to the square of the
distance
The closer the nodes, the less power is needed
Power-aware Routing (PARO) identifies new nodes
between other nodes and re-routes packets to
save energy
Nodes decide to reduce/increase their transmit
power

31
Asymmetry in Transmit Power

Instead of C sending directly to A, it can go
through B
Saves transmit power, but may cause some
problems.

32
Problems due to one-way links.

Collision avoidance (RTS/CTS) scheme is impaired
Even across bidirectional links!
Unreliable transmissions through one-way link.
May need multi-hop Acks at Data Link Layer.
Link outage can be discovered only at downstream
nodes.

A
B
C
33
Problems for Routing Protocols

Route discovery mechanism.
Cannot reply using inverse path of route request.
Need to identify unidirectional links. (AODV)
Route Maintenance.
Need explicit neighbor discovery mechanism.
Connectivity of the network.
Gets worse (partitions!) if only bidirectional
links are used.

34
Wireless bandwidth and Power savings

In addition to transmit power, what else can we
do to save energy?
Power has a direct relation with signal to noise
ratio (SNR)
The higher the power, the higher the signal, the
less noise, the less errors, the more data a node
can transmit
Increasing the power allows for higher bandwidth
Turn transceivers off when not used this
creates problems when a node needs to relay
messages for other nodes.

35
Using Optical Interconnections in Supercomputers
Collaborators Alex JonesPhD student Ding
Zhu Dan Li (now doing AI) Shuyi Shao
36
Motivation for using Optical Circuit Switching
(OCS) in Suprecomputers

Many HPCS applications have only a small degree
(6-10) of high bandwidth communication among
processes/threads
Rest of a threads / process communication
traffic are low bandwidth communication
exceptions.
Many HPCS applications have persistent
communication patterns
Fixed over all the programs run time, or slowly
changing
But there are bad applications, or phases in
applications, which are chaos
GUPS..!
Optics is good for high bandwidth, bad for fast
switching. Electronics is the other way around
and is good for processing (collectives)
Need two networks to complement each other

37
The OCS Network fabric
2 networks complement each other
Circuit-Switched all Optical Fat-Trees made of
512x512 MEMS-based optical switches
One of multiple fat-tree networks
One of multiple fat-tree networks
One of multiple fat-tree networks
One of multiple fat-tree networks
OCS
One of multiple fat-tree networks
One of multiple fat-tree networks
One of multiple fat-tree networks
One of multiple fat-tree networks
Fat-tree network
Fat-tree network
Fat-tree network
Fat-tree network
Fat-tree network
Fat-tree network
Fat-tree network
Storage/IO Network
PERCS D-block
PERCS D-block
Intelligent Network (1/10 or less BW) Including
collective communication
38
Communication Pattern node48 AMR CTH
Communication patterns change in phases lasting
10s of Sec
(Node 48)
250 sec phase
Communication Phases
39
UMT2K Fixed, Irregular Communication Pattern
Percentage of Traffic By Bandwidth
Communication Matrix
Max communication degree from each node to any
other node is about 10. The pattern is irregular
but fixed.
100
10
1
40
Handling HPCS application in OCS
Communication Un-predictability
Use multiple hops through OCS OR use
intelligent network
Un-Predictable
Run-time predictor
Communication
Run-Time Predictable
Compile Time Statically Analyzable
Compiled Communication
Temporal Locality
Low
High
NOTE No changes in the applications code. OCS
setup by the compiler, run time auto prediction,
and multi hop routing
41
Paradigm of compiled communication
MPI trace code
HPC systems
Traces
MPI application
Optimized MPI code
Compiler
Communication Patterns
Network configuration Instruction Enhanced MPI
code
Network Configurations
Performance statistics
Compiled communication
Run-time predictor
HPC systems (Simulator)
42
Compilation Framework

Targets MPI applications

Compiler
Recognize and represent communication patterns
Communication compiling
Enhance applications with network configuration
instructions
Automate trace generation

43
Communication Pattern

Communication Classification
Static
Persistent
Dynamic
Executions of parallel applications show
phasescommunication phases

44
The Communication Predictor

Initially, setup the OCS for random traffic
Keep track of connections utilization
A migration policy to create circuits in OCS
A simple threshold policy
An intelligent pattern predictor
An evacuation policy to remove circuits from OCS
LRU replacement
A compiler inserted directive

45
Dealing with Unpredictable Communications
Set up the OCS planes so that any D-block can
reach any other D-block with at most two hops
through the network
Example Route from node 2 to node 4 (second node
in second group)
46
Scheduling in Buffer Limited Networks
Collaborators Taieb ZnatiPhD
student Mahmoud Elhaddad
47
Packet-switched Network with Fixed-Size Buffers

Packet routers connected via time-slotted
buffer-limited links
Packet duration is one slot
You cannot freely size packet buffers to prevent
loss
All-optical packet routers
On-chip (line-driver chip) SRAM buffers
Connections
Ingress--egress traffic aggregates
Fixed bandwidth demand
Connection has fixed path
Loss rate of a connection
Loss rate is fraction of lost packets
goal is to Guarantee loss rate
Loss guarantee depends on the path of connection

48
Link scheduling algorithms

Packet Service discipline
FCFS, LIFO, Fixed Priority, Nearest To Go
Drop policy
Drop tail, drop front, random drop, Furthest To
Go
Must be Work conserving
Drop excess packets only when buffer overflows
Serve packet in every slot as long as buffer not
empty
Must use only local information
No hints or coordination between routers

49
Link scheduling in buffer-limited networks

Problem
Minimize guaranteed loss rate for every
connection
Key question Is there a class of algorithms that
lead to better loss bounds as fn of utilization
and path length?

FCFS scheduling with drop tail
Proposed rolling priority scheduling
50
Link scheduling in buffer-limited networks

Findings
A local fairness property is necessary to
minimize the guaranteed loss rate for every path
length and utilization constraint.
FCFS/RD (Random Drop) is locally fair
A locally-fair algorithm Rolling Priority that
improves the loss guarantees compared to FCFS/RD,
and is simple to implement
Rolling Priority is optimal
FCFS/RD is near-optimal at light load

51
Rolling Priority

Time divided into epochs of fixed duration nT.
Connection Initialization
Ingress chooses a phase at random from the
duration of an epoch.
At toffset, ingress sends an init packet along
the path of connection
Init packets are rare and never dropped
At every link, a new epoch starts periodically
At each time slot, every link gives higher
priority to the connection with earlier starting
current epoch.

52
Roaming Honeypots for Mitigating
Denial-of-Service Attacks
Collaborators Daniel Mosse Taieb ZnatiPhD
student Sherif Khattab
53
Denial-of-Service (DoS) Attack

DoS attacks aim at disrupting legitimate
utilization of network/server resources.

54
Clients
Servers
Dropped Requests
55
Packet Filtering
Not Scalable (Grows with number of users)
??
56
Packet Filtering
More Scalable attackers ??
57
Roaming Honeypots Basic Idea
A1
A1
A1
A1
58
Roaming Honeypots Basic Idea
A1
A1
A1
A1
A1
A1
59
Effect of Attack Load
With roaming honeypots, the service exhibits a
stable average response time even in the presence
of attacks with increasing intensity

Write a Comment

User Comments (0)