Estimating the Worst-Case Energy Consumption of Embedded Software - PowerPoint PPT Presentation

About This Presentation
Title:

Estimating the Worst-Case Energy Consumption of Embedded Software

Description:

1. Estimating the Worst-Case Energy Consumption of Embedded Software ... Conventional scheduling techniques give timing guarantees ... Stall. Stall. 26. Our Approach ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 48
Provided by: Administ45
Category:

less

Transcript and Presenter's Notes

Title: Estimating the Worst-Case Energy Consumption of Embedded Software


1
Estimating the Worst-Case Energy Consumption of
Embedded Software
Ramkumar Jayaseelan Tulika Mitra Xianfeng
Li School of Computing National University of
Singapore
2
Motivation
  • Conventional scheduling techniques give timing
    guarantees
  • Processor cycles is the critical resource
  • WCET of the tasks are required input
  • Battery life is equally important for mobile
    devices
  • Scheduling technique have to give energy
    guarantees
  • Worst-Case Energy Consumption (WCEC) of the tasks
    are required input

3
Remotely Deployed Systems
Local Station
Sensor Network
  • Available energy unevenly distributed among nodes
  • Spatio-temporal scheduling benefits from WCEC

4
Energy-Based Guarantees
  • Scheduling critical and non-critical tasks in a
    battery-operated system
  • Non-critical tasks can be run only if energy
    constraints for critical tasks are satisfied
  • Worst-case energy estimation is crucial

5
Reward-Based Scheduling
  • Energy consumption ? Voltage
  • Delay ? (1 / Voltage)
  • Reward-based scheduling attempts to satisfy
    constraints on energy and timing
  • Energy guarantee only if worst-case energy
    consumption of tasks are known

6
Outline
  • Background
  • Relation between WCET and Worst-case energy
    consumption
  • Estimation technique Simplified model
  • Instruction cache and speculation
  • Experimental results
  • Conclusion

7
Background
  • Power and energy are often used interchangeably
  • Power is energy consumed per unit time
  • Energy consumed during program execution E P
    t
  • Approximation as P is also a function of time

8
EPT is an approximation
  • In reality when a program executes
  • Energy is the area under the curve
  • E ?P(t)dt

9
WCEC versus WCET
Full Input Space Expansion for a 5-element
Insertion Sort program
10
Cannot Estimate WCEC from WCET
Benchmark WCETavg_power µJ Observed µJ
isort 489.92 525.88
fft 12106.49 10260.86
fdct 138.20 105.57
ludcmp 131.76 119.33
matsum 972.03 1154.31
minver 93.61 80.80
bsearch 3.84 3.07
des 724.05 643.75
matmult 178.12 166.88
qsort 54.79 43.73
qurt 23.80 17.65
Possible underestimation using WCECWCET power
11
WCEC versus WCET
  • WCEC path need not be the same as the WCET path
  • WCEC cannot be directly estimated from the WCET
    value

12
A closer look at Power
  • Dynamic Power Power Consumption due to
    switching of transistors
  • Leakage Power Power consumed independent of
    switching activity
  • Dynamic power forms the bulk of power consumption
    in todays processors

13
Dynamic Power
  • Dynamic Power
  • P(1/2) A V2 C f
  • V is supply voltage
  • C is the capacitance of the circuit
  • f is the frequency
  • A is the activity factor
  • V, C, f are independent of program execution
  • Variation in P is due to the variation in A

14
Variation in Activity Factor (A)
  • Not all parts of the processor are used in every
    cycle
  • e.g., data-cache is used only for loads/stores
  • Clock gating disables unused components
  • Activity factor (A) varies during the execution
    of the program
  • Model variation in A through static analysis

15
Switch-off Energy
  • An inactive component cannot be fully switched
    off
  • A certain portion of the peak energy is consumed
    even in idle cycles
  • Switch-off energy is proportional to the number
    of idle cycles

16
Clock Energy and Leakage Energy
  • Clock power power consumed in clock distribution
    network
  • Leakage power power consumed due to leakage in
    transistors
  • Clock energy and leakage energy are directly
    proportional to the execution time

17
Energy Components Summary
  • Dynamic Energy
  • Switching of transistors during execution
  • Independent of execution time
  • Switch-off Energy
  • Energy consumed in unused components
  • Depends on idle cycles
  • Clock and Leakage energy
  • Directly proportional to execution time

18
WCEC versus WCET
Full Input Space Expansion for a 5-element
Insertion Sort program
19
Our Analysis Overview
  • Operate on the control flow graph
  • Estimate worst-case energy of basic blocks
  • Formulate estimation for whole program as an
    integer linear programming (ILP) problem

20
ILP Formulation
  • Input Control flow graph of the program
  • Objective function
  • Need to estimate Worst-Case Energy Consumption(
    WCECB) for each basic block

Worst Case Energy ? WCECB ? countB
21
Flow Constraints
Inflow Basic Block Execution Count
Outflow Bounds on maximum loop iterations
  • E0,1 B0 1
  • E2,3 E1,3 B3 1
  • E0,1 E2,1 E1,2 E1,3 B1
  • E1,2 E2,3 E2,1 B2
  • Loop bound E2,1 lt 100

B0
B1
B2
B3
22
Worst-Case Energy of a Basic Block
  • Processor Model
  • Energy Components
  • Instruction Specific Energy
  • Pipeline Specific Energy

23
Processor Model
I1
IF
IBUF
I
ID
I-1
I-4
ISSUE
EX
I-2
I-3
WB
ALU
MULT
CM
FPU
24
Pipelined Execution of Instructions
ADD R1,R2,R3 MUL R4,R5,R6 SUB R7,R8,R9
1 2 3 4 5 6 7 8
CC
IF ID IS EX WB CM
MUL
Difficult to statically predict the energy
consumption in each cycle
25
Pipelined Execution of Instructions
ADD R1,R2,R3 MUL R4,R5,R6 SUB R7,R8,R9
1 2 3 4 5 6 7 8
CC
IF ID IS EX WB
MUL
Difficult to statically predict the energy
consumption in each cycle
26
Our Approach
  • Determine the maximum energy consumed on a
    component by component basis
  • Static analysis to determine the maximum energy
    consumed by a component in a specified interval

27
Execution of Instruction
IF
ID
ISSUE
EX
WB
CM
28
Instruction Specific Energy
  • Energy consumed due to the sub-tasks associated
    with execution of an instruction
  • e.g., register file access, ALU usage, etc.
  • Depends on the type of executed instruction
  • No correlation with execution time

29
Pipeline Specific Energy
  • During program execution energy is consumed due
    to
  • Switch-off power (idle cycles)
  • Leakage power (every cycle)
  • Clock network power (every cycle)
  • Cannot be attributed to any instruction
  • Energy consumed even in idle cycles

30
Energy Components
  • Observation Energy consumed can be separated out
    as
  • Instruction Specific energy
  • Energy associated with the execution of a
    particular instruction
  • Independent of execution time
  • Pipeline Specific energy
  • Energy consumed in other components such as clock
    network, leakage etc.
  • Related to execution time

31
Worst-case Energy of a Basic block
  • dynamicBB Instruction-Specific Energy for BB
  • switchoffBB , leakageBB and clockBB are energy
    consumed in unused components, leakage and clock
    network during WCETBB

32
Instruction Specific Energy
  • Energy consumed due to switching activity
    generated by the instructions in BB
  • Sum of energy consumed by individual instructions
    in BB

33
Switch-off Energy
  • Unused units consume 10 of peak energy
  • Switch-off energy for a specific component (C)
  • Switch-off energy for basic block BB

34
Clock Energy and Leakage Energy
  • Clock Energy
  • Leakage Energy

35
Overlap among basic blocks
Time
t1
B1
B2
B1
t2
t3
BB
WCETBB
t4
B3
t5
B3
36
Switch-off Energy
  • Unused units consume 10 of peak energy
  • Switch-off energy for a specific component (C)
  • Switch-off energy for basic block BB

37
Instruction Cache Modeling
  • Context based ILP formulation used in WCET
    analysis Li et al RTSS 2004
  • Basic block divided into memory blocks
  • A context comprises of mapping each of these
    memory blocks to hit/miss
  • Estimate the worst-case energy of each context
    taking into account main memory access energy

38
Modeling Branch miss-prediction
39
Objective function
  • count(c,?) is the number of times the basic block
    Bi is executed with path from Bj and the branch
    is predicted correctly
  • count(m,?) is similarly defined where the branch
    is miss-predicted
  • In a similar manner energy(c,?) and energy(m,?)
    are defined
  • The ILP problem is solved to generate values for
    count using constraints similar to WCET analysis

40
Results
  • Platform Simplescalar toolset
  • Modified WCET analysis tool Li et al RTSS 2004
    to estimate worst-case energy
  • Energy values for processor components derived
    from parameterized models in Wattch
  • ILP problem is solved using CPLEX

41
Results
  • Compare estimated WCEC against the observed
    values for eleven benchmarks
  • Observed values are obtained using Wattch power
    simulator
  • Actual inputs producing WCEC is unknown
  • Manually select inputs that might produce WCEC

42
Styles of Clock Gating
  • Simple Peak power is consumed even if there is
    one access to a specific component
  • Ideal Power consumed is proportional to the
    number of ports accessed
  • Realistic Same as ideal but unused components
    consume switch-off power

43
Results
Ideal Clock Gating
Simple Clock Gating
Benchmarks
isort
fft
fdct
ludcmp
matsum
minver
bsearch
des
matmult
qsort
qurt
Est(µJ) Obs(µJ) Ratio
468.85 422.76 1.11
9600.99 8586.49 1.12
89.92 83.63 1.08
98.75 92.77 1.06
1012.83 929.94 1.09
63.66 59.61 1.07
2.54 2.40 1.06
546.41 518.22 1.05
149.70 132.08 1.13
34.90 31.16 1.12
13.98 11.91 1.17
Est(µJ) Obs(µJ) Ratio
524.95 455.94 1.15
11057.50 9185.39 1.20
99.31 88.79 1.11
115.39 100.32 1.15
1227.37 994.11 1.23
74.91 64.15 1.17
3.51 3.07 1.14
613.16 553.74 1.10
172.39 136.93 1.26
39.50 33.84 1.17
16.36 12.97 1.26
  • Results for ideal clock gating more accurate than
    simple because of distribution of accesses

44
Results
Realistic Clock Gating
Ideal Clock Gating
Benchmarks
isort
fft
fdct
ludcmp
matsum
minver
bsearch
des
matmult
qsort
qurt
Est(µJ) Obs(µJ) Ratio
596.93 525.88 1.14
13631.21 10260.86 1.33
121.65 105.57 1.15
139.75 119.33 1.17
1397.72 1154.31 1.21
90.95 80.80 1.13
3.81 3.07 1.24
715.58 643.75 1.11
212.94 166.88 1.28
49.84 43.73 1.14
21.95 17.65 1.24
Est(µJ) Obs(µJ) Ratio
468.85 422.76 1.11
9600.99 8586.49 1.12
89.92 83.63 1.08
98.75 92.77 1.06
1012.83 929.94 1.09
63.66 59.61 1.07
2.54 2.40 1.06
546.41 518.22 1.05
149.70 132.08 1.13
34.90 31.16 1.12
13.98 11.91 1.17
  • Results for ideal clock gating more accurate than
    realistic because of conservative WCET estimation

45
Conclusion
  • Static worst-case energy estimation technique
    that takes into account pipelining, instruction
    cache and branch prediction
  • Future work
  • Validation using commercial processors
  • Explore the possibility of providing thermal
    guarantees

46
Execution of an Add Instruction
I-Cache Access
IF
ADD
Instruction Decode Rename Logic
ID
ADD
Wakeup Selection logic
ADD
ISSUE
Register File Read Add unit access
ADD
EX
Result Bus
ADD
WB
ROB-retire Register file Update
CM
ADD
47
Instruction Specific Energy
  • Each Component Accessed once
  • Selection logic maybe accessed multiple times
  • Instruction Specific Energy is
Write a Comment
User Comments (0)
About PowerShow.com