Title: Cooperative cross-layer protection for resource constrained Mobile Multimedia systems
1Cooperative cross-layer protection for resource
constrained Mobile Multimedia systems
Prof. Nikil Dutt Prof. Nalini Venkatasubramanian P
rof. Lichun Bao
- Kyoungwoo Lee (final defense)
Nov. 26, 2008
2Contents
- Thesis Motivation
- Thesis Proposal Cooperative, Cross-layer
Methods - PPC (Partially Protected Caches)
- EAVE (Error-Aware Video Encoding)
- CC-PROTECT (Cooperative, Cross-layer Protection)
- Thesis Contribution and Future Direction
3Mobile Multimedia Embedded Systems
Resource-limited mobile devices! Main problem is
to achieve low power with high performance, high
QoS, and high reliability
Map Routing
3D Graphics
Image Browsing
Animation
Mobile TV
Web Browsing
Video Streaming
Satellite TV
Video Conferencing
4Reliability
- Reliability is an emerging and critical concern
in mobile devices - New enhanced technology makes devices vulnerable
to errors due to high complexity and high
integration - Exponential increase of soft error rate as
technology scales Baumann, 05 - Mobile applications are running close to humans
- In pervasive computing, failures of healthcare
mobile devices cause serious results - Redundancy techniques incur high overheads of
power and performance - TMR (Triple Modular Redundancy) may exceed 200
overheads without optimization Nieuwland, 06 - Challenging to optimize multiple properties
(e.g., performance, power, QoS, and reliability)
in mobile embedded systems
5Soft error is becoming an every second concern!
- Soft Error Rate (SER) FIT (Failures in Time)
number of errors in 109 hours
SER (FIT) MTTF Reason
1 Mbit _at_ 0.13 µm 1000 104 years
SER (FIT) MTTF Reason
1 Mbit _at_ 0.13 µm 1000 104 years
64 MB _at_ 0.13 µm 64x8x1000 81 days High Integration
SER (FIT) MTTF Reason
1 Mbit _at_ 0.13 µm 1000 104 years
64 MB _at_ 0.13 µm 64x8x1000 81 days High Integration
128 MB _at_ 65 nm 2x1000x64x8x1000 1 hour Technology scaling and Twice Integration
SER (FIT) MTTF Reason
1 Mbit _at_ 0.13 µm 1000 104 years
64 MB _at_ 0.13 µm 64x8x1000 81 days High Integration
128 MB _at_ 65 nm 2x1000x64x8x1000 1 hour Technology scaling and Twice Integration
A system _at_ 65 nm 2x2x1000x64x8x1000 30 minutes Memory takes up 50 of soft errors in a system
SER (FIT) MTTF Reason
1 Mbit _at_ 0.13 µm 1000 104 years
64 MB _at_ 0.13 µm 64x8x1000 81 days High Integration
128 MB _at_ 65 nm 2x1000x64x8x1000 1 hour Technology scaling and Twice Integration
A system _at_ 65 nm 2x2x1000x64x8x1000 30 minutes Memory takes up 50 of soft errors in a system
A system with voltage scaling _at_ 65 nm 100x2x2x1000x64x8x1000 18 seconds Exponential relationship b/w SER Supply Voltage
SER (FIT) MTTF Reason
1 Mbit _at_ 0.13 µm 1000 104 years
64 MB _at_ 0.13 µm 64x8x1000 81 days High Integration
128 MB _at_ 65 nm 2x1000x64x8x1000 1 hour Technology scaling and Twice Integration
A system _at_ 65 nm 2x2x1000x64x8x1000 30 minutes Memory takes up 50 of soft errors in a system
A system with voltage scaling _at_ 65 nm 100x2x2x1000x64x8x1000 18 seconds Exponential relationship b/w SER Supply Voltage
A system with voltage scaling _at_ flight (35,000 ft) _at_ 65 nm 800x100x2x2x1000x64x8x1000 FIT 0.02 seconds High Intensity of Neutron Flux at flight (high altitude)
6Errors and Failures in Mobile Embedded Systems
- Faults or Errors can cause Failures
Bug
Packet Loss
Exception
Soft Error
7Errors and Error Control Schemes at Hardware
Failures Causes Metrics Traditional Approaches
Soft Errors, Hard Failures, System Crash External Radiations, Thermal Effects, Power Loss, Poor Design, Aging FIT, MTTF, MTBF Spatial Redundancy (TMR, Duplex, RAID-1 etc.) and Data Redundancy (EDC, ECC, RAID-5, etc.)
- Hardware failures are increasing as technology
scales - (e.g.) SER increases by up to 1000 times
Mastipuram, 04 - Redundancy techniques are expensive
- (e.g.) ECC-based protection in caches can incur
95 performance penalty Li, 05
- FIT Failures in Time (109 hours)
- MTTF Mean Time To Failure
- MTBF Mean Time b/w Failures
- TMR Triple Modular Redundancy
- EDC Error Detection Codes
- ECC Error Correction Codes
- RAID Redundant Array of
- Inexpensive Drives
8Errors and Error Control Schemes at Software
Failures Causes Metrics Traditional Approaches
Wrong outputs, Infinite loops, Crash Incomplete Specification, Poor software design, Bugs, Unhandled Exception Number of Bugs/Klines, QoS, MTTF, MTBF Spatial Redundancy (N-version Programming, etc.), Temporal Redundancy (Checkpoints and Backward Recovery, etc.)
- Software errors become dominant as systems
complexity increases - (e.g.) Several bugs per kilo lines
- Hard to debug, and redundancy techniques are
expensive - (e.g.) Backward recovery with checkpoints is
inappropriate for real-time applications
9Errors and Error Control Schemes in Networks
Failures Causes Metrics Traditional Approaches
Data Losses, Deadline Misses, Node (Link) Failure, System Down Network Congestion, Noise/Interference, Malicious Attacks Packet Loss Rate, Deadline Miss Rate, SNR, MTTF, MTBF, MTTR Resource Reservation, Data Redundancy (CRC, etc.), Temporal Redundancy (Retransmission, etc.), Spatial Redundancy (Replicated Nodes, MIMO, etc.)
- Network is unreliable (especially, wireless
networks) - Joint approaches across OSI layers have been
investigated for minimal costs Vuran,
06Schaar, 07
- SNR Signal to Noise Ratio
- MTTR Mean Time To Recovery
- CRC Cyclic Redundancy Check
- MIMO Multiple-In Multiple-Out
10Conventional Approaches
- Most redundancy techniques incur overheads in
terms of performance, power, area, etc. - Conventional TRM (Triple Modular Redundancy) can
incur 200 overheads without optimization. - Backward Recovery with Checkpoints cannot
guarantee the completion time of a task. - Recently proposed techniques have focused on the
cost reduction without losing reliability - However, they still incur overheads
11Thesis Problem Statement
- Study tradeoffs among system properties
- (e.g.) Redundancy incurs energy overheads while
DVS increases SER significantly - Examine errors and error control schemes across
system abstraction layers - (e.g.) network errors error-resilient video
encoding, soft errors ECC or EDC, etc. - Maximize reliability with minimal costs of power
and performance for mobile embedded systems - Mainly focus on soft error reduction for mobile
multimedia embedded systems
12Cross-Layer Methods
- Cross-layer approaches
- aim at system-level optimization
- Integrate and coordinate techniques across system
layers - Classification Srivastava, 05
- Top-down, Bottom-up, or Both direction
- Top-down PPC, PDVS (GRACE), etc.
- Bottom-up EAVE, etc.
- Both direction CC-PROTECT, etc.
- Coupling or Merging layers
- Dynamo Mohapatra, xTune Kim, etc.
Merging
Bottom-up
Top-down
Coupling
- PDVS Practical Dynamic Voltage Scaling
13Cross-Layer Approaches GRACE
- GRACE project _at_ UIUC W. Yuan Ph.D. thesis in
04 and A. F. Harris III, Ph.D. thesis in 06 - QoS/Power tradeoffs
- Primarily OS adaptation for power management in
multimedia mobile devices - Network adaptation for power management in
multimedia communications
GRACE, 05
14Cross-Layer Approaches DYNAMO FORGE
- DYNAMO middleware for FORGE project _at_ UCI S.
Mohapatra Ph.D. thesis in 05 and R. Cornea Ph.D.
thesis in 07 - QoS/Performance/Power tradeoffs for mobile
embedded systems - Middleware-driven coordination and proxy-based
cooperation - Content transcoding at the application layer
- Network traffic shaping at the network layer
- Backlight (LCD display) setting at the hardware
layer - NIC shutdown, CPU DVS/DFS at the hardware layer
1
2
3
4
15Cross-Layer Approaches xTune
- xTune framework _at_ UCI and SRI M. Kim Ph.D.
thesis in 08 - QoS/Power/Timeliness adaptation for distributed
real-time embedded systems - A Formal Methodology for cross-layer tuning and
verifiable timeliness of Mobile Embedded Systems
16Thesis Proposed Contribution
- Thesis proposes a cross-layer design methodology
for mobile multimedia embedded systems with
minimal costs - Reliability/QoS/Power/Performance system
optimization for mobile multimedia systems - Cooperative, Cross-Layer Protection
- PPC, EAVE, CC-PROTECT
- Low-cost reliability
17Overview of Thesis Proposals
Multimedia Application
Error-Resilient Encoder (e.g., PBPAIR)
Packet Loss
Frame Drop
Error- Aware Video
EAVE
Application
- PPC (Partially Protected Caches)
- EAVE (Error-Aware Video Encoding)
- CC-PROTECT (Cooperative, Cross-layer Protection)
QoS
Original Video
Error-Controller (e.g., frame drop)
Monitor Translate SER
MW/OS
Error Injection Rate
Correction
Frame Loss Rate
Error detection
EDC
Hardware
18Contents
- Thesis Motivation
- Thesis Proposal Cooperative, Cross-layer
Methods - PPC (Partially Protected Caches)
- EAVE
- CC-PROTECT
- Thesis Contribution and Future Direction
19Conventional Protection for Caches
- Cache is the most hit by soft errors
- Conventional Protected Caches
- Unaware of fault tolerance at applications
- Implement a redundancy technique such as ECC to
protect all data for every access - Overkill for multimedia applications
- ECC (e.g., a Hamming Code) incurs high
performance penalty by up to 95, power overhead
by up to 22, and area cost by up to 25
Unaware of Application
High Cost
Cache
ECC
20PPC (Partially Protected Caches)
- Observation
- Not all data are equally failure critical
- Multimedia data vs. control variables
- Propose PPC architectures to provide an unequal
protection for mobile multimedia systems Lee,
CASES06Lee, TVLSI08 - Unprotected cache and Protected cache at the same
level of memory hierarchy - Protected cache is typically smaller to keep
power and delay the same as or less than those of
Unprotected cache
21PPC for Multimedia Applications
- Propose a selective data protection Lee,
CASES06 - Unequal protection at hardware layer exploiting
error-tolerance of multimedia data at application
layer - Simple data partitioning for multimedia
applications - Multimedia data is failure non-critical
- All other data is failure critical
Power/Delay Reduction
Fault Tolerance
22PPC for General Applications
- DPExplore Lee, PPCDIPES08
- Explore partitioning space by exploiting
vulnerability of each data page - Vulnerable time
- It is vulnerable for the time when eventually it
is read by CPU or written back to Memory - Pages causing high vulnerable time are failure
critical - Vulnerable time closely estimates failure rate
- Reduce the number of simulations to estimate the
failure rate
23Summary PPC
- All data are not equally failure critical
- Propose a PPC architecture to provide unequal
protection - Support an unequal protection at hardware layer
by exploiting error-tolerance and vulnerability
at application - Present cost-efficient reliability
- Related Publications
- Lee, CASES06 PPC for multimedia embedded
systems - Lee, PPCDIPES08 PPC for general applications
- Lee, TVLSI08 PPC and design space exploration
- Under submission
- Lee, TODAES?? Partitioning techniques for
general applications and instruction caches
Application Data Code
Error-tolerance of MM data Vulnerability of Data
Code
Page Partitioning Algorithms
Failure Non-Critical
Failure Critical
FNC FC are mapped into Unprotected Protected
Caches
Unprotected Cache
Protected Cache
PPC
24Contents
- Thesis Motivation
- Thesis Proposal Cooperative, Cross-layer
Methods - PPC
- EAVE (Error-Aware Video Encoding)
- CCPROTECT
- Thesis Contribution and Future Direction
25Active Error Exploitation Intentional Frame Drop
- Intentional Frame Drop (one way to actively
exploit errors) can result in energy reduction
for each operation - FDT-1 affects the following components with
respect to power, performance, and QoS in mobile
video applications
Mobile Video Application
Enc
Tx
Dec
Rx
CPU
WNI
CPU
WNI
FDT-1
FDT-2
FDT-3
Packet Loss
- FDT Frame Drop Type
- Enc Encoding, Dec Decoding
- WNI Wireless Network Interface
26Error-Aware Video Encoding
- Propose EE-PBPAIR Lee, DIPES08
- Intentionally drop frames at video encoding
- Reduce the energy consumption for video encoding
- Maintain the video quality by exploiting
error-resilience of PBPAIR
Packet Loss
Intentional frame drop
Error-Aware Video Encoder (EAVE)
Error- Resilient Video
Error- Aware Video
Original Video
Error-Resilient Encoder (e.g., PBPAIR)
Error-Controller (e.g., frame dropping)
EIR
27Summary EAVE
- Intentional Frame Drop is one way to exploit
errors actively - Propose an error-aware video encoding (EE-PBPAIR)
- Present a knob (EIR) to adjust the amount of
errors considering the QoS feedback - Maintain the video quality using error-resilience
of PBPAIR - Related Publication
- Lee, DIPES08 EE-PBPAIR
- Considering Submission
- Lee, TECS?? Generalized idea for
error-resilient video encodings
Error Resilient Video Encoder
Error-Aware Video Data
Application
Error Rate PLR EIR
Error Controller
Network or Decoding Side
EIR
PLR QoS
Middleware
Energy Reduction
CPU, Memory, and WNI
Hardware
- EIR Error Injection Rate
- PLR Packet Loss Rate
28Contents
- Thesis Motivation
- Thesis Proposal Cooperative, Cross-layer
Methods - PPC
- EAVE
- CC-PROTECT (Cooperative Cross-layer Protection)
- Thesis Contribution and Future Direction
29Errors and Error Control Schemes No Coupling
- Different errors and their protection techniques
have not been considered jointly - No coupling and no cooperation
- Cooperating control schemes in a cross-layer
manner can open a new venue
Mobile Video Application
Packet Loss
Soft Error
30PPC still incurs overheads due to ECC-protection
- Propose PPC architectures to provide an unequal
protection for mobile multimedia systems Lee,
TVLSI08 - Unprotected cache and Protected cache a the same
level of memory hierarchy - PPC still incurs overheads due to high expensive
ECC-protection at the protected cache - 29 energy reduction compared to the protected
cache - 10 energy overhead compared to the unprotected
cache
31PBPAIR is energy-inefficient in error-free network
- PBPAIR is error-resilient and energy-efficient in
general - PBPAIR may not be energy efficient in case of
error-free network
PLR
PBPAIR
Intra_Threshold
- PBPAIR Probability-Based Power Aware Intra
Refresh Kim, 06
32Outline of CC-PROTECT
QoS Loss
BER (Backward Error Recovery)
Soft Error
EDC
Error detection
33Energy Saving
EDC DFR impact 36 Reduction compared to
HW-PROTECT 26 Reduction compared to BASE
EDC DFR PBPAIR(CC-PROTECT) impact 56
Reduction compared to HW-PROTECT 49 Reduction
compared to BASE
EDC impact 17 Reduction compared to
HW-PROTECT 4 Reduction compared to BASE
- BASE Error-prone video encoding unprotected
cache - HW-PROTECT Error-prone video encoding PPC
with ECC - APP-PROTECT Error-resilient video encoding
unprotected cache - MULTI-PROTECT Error-resilient video encoding
PPC with ECC - CC-PROTECT1 Error-prone video encoding PPC
with EDC - CC-PROTECT2 Error-prone video encoding PPC
with EDC DFR - CC-PROTECT error-resilient video encoding PPC
with EDC DFR
34Summary CC-PROTECT
- Propose CC-PROTECT approach, which cooperates
existing schemes across layers to mitigate the
impact of soft errors on the failure rate and
video quality in mobile video encoding systems - PPC (Partially Protected Caches) with EDC (Error
Detection Codes) at hardware layer - DFR (Drop and Forward Recovery) at middleware
- PBPAIR (Probability-Based Power Aware Intra
Refresh) at application layer - Demonstrate the effectiveness of low-cost (about
50) reliability (1,000x) at the minimal cost of
QoS (less than 1) - Related Publication
- Lee, ACMMM08 CC-PROTECT
- Considering Submission
- Lee, ACMTOMCCAP?? Tradeoff space exploration
with CC-PROTECT
PBPAIR - Error Resilience
DFR - Error Correction
ECC
EDC
Unprotected Cache
Protected Cache
35Contents
- Thesis Motivation
- Thesis Proposal Cooperative, Cross-layer
Methods - PPC
- EAVE
- CC-PROTECT
- Thesis Contribution and Future Direction
36Overall Thesis Contribution
- Cross-layer methodology to design mobile
multimedia embedded systems with minimal costs
- Effective Cross-layer approaches for reliability
- Low-cost reliability
- Expanded trade-off space
- Extended applicability of existing techniques
Packet Loss
Frame Drop
Soft Error
37Effectiveness of Thesis Proposals (Energy Saving)
CCPROTECT
- 29 energy reduction, as compared to a
conventional protected cache with ECC
- 37 energy reduction, as compared to a
conventional video encoding
- 56 energy reduction, as compared to a
conventional composition of protections
38Publication
- Lee, ACMMM08 K. Lee, A. Shirvastava, M. Kim, N.
Dutt, and N. Venkatasubramanian, Mitigating the
impact of hardware defects on multimedia
applications A cross-layer approach, In ACM
International Conference on Multimedia, Oct.
2008. - Lee, TVLSI08 K. Lee, A. Shrivastava, I.
Issenin, N. Dutt, and N. Venkatasubramanian,
Partially protected caches to reduce failures
due to soft errors in multimedia applications,
In IEEE Transactions on Very Large Scale
Integration Systems (TVLSI), 2008, to appear. - Lee, DIPES08 K. Lee, M. Kim, N. Dutt, and N.
Venkatasubramanian, Error exploiting video
encoder to extend energy/QoS tradeoffs for mobile
embedded systems, In 6th IFIP Working Conference
on Distributed and Parallel Embedded Systems
(DIPES), Sep. 2008. - Lee, PPCDIPES08 K. Lee, A. Shrivastava, N.
Dutt, and N. Venkatasubramanian, Data
partitioning techniques for partially protected
caches to reduce soft error induced failures, In
6th IFIP Working Conference on Distributed and
Parallel Embedded Systems (DIPES), Sep. 2008. - Lee, CASES06 K. Lee, A. Shrivastava, I.
Issenin, N. Dutt, and N. Venkatasubramanian,
Mitigating soft error failures for multimedia
applications by selective data protection, In
Int. Conference on Compilers, Architecture,
Synthesis for Embedded Systems (CASES), Oct.
2006. - Lee, ICME05 K. Lee, N. Dutt, and N.
Venkatasubramanian, Experimental Study on Energy
Consumption of Video Encryption for Mobile
Handheld Devices", In IEEE International
Conference on Multimedia and Expo (ICME 05),
Poster Session, July 2005. - Mohapatra, IPDPS05 S. Mohapatra, R. Cornea, H.
Oh, K. Lee, M. Kim, N. Dutt, R. Gupta, A.
Nicolau, S. Shukla, and N. Venkatasubramanian, A
cross-layer approach for power-performance
optimization in distributed mobile systems, In
Next Generation Software Program in conjunction
with IEEE International Parallel and Distributed
Processing Symposium (IPDPS), April 2005.
Lee, DIPES08
Lee, TVLSI08 Lee, PPCDIPES08 Lee, CASES06
Lee, ACMMM08 Mohapatra, IPDPS05 Lee, ICME05
39Future Direction
- Error Rate Translation/Integration
- Different types of errors
- Different components across system layers
- Cross-layer methods for distributed embedded
systems (Horizontal Expansion) - Network-aware methods
- Context-aware approaches
Mobile Video Application
Bug
Packet Loss
Exception
Soft Error
40Thank you! Any Questions or Comments?
41Backup Slides
42Why Cross-Layer Approach?
- Cross-layer interactions and conflicts arise
between system properties - DVS increases SER exponentially
- Over protection or under protection
- All ECC for multimedia data is an overkill
- Cross-layer approaches can maximize the
reliability with minimal power and performance
overheads - Benefits of Cross-layer approaches
- Global system view
- Coordination for intelligent selection
- Adaptation
- Cross-layer approaches have been promising to
save the resources at the cost of QoS Mohapatra,
05Yuan, 04
- DVS Dynamic Voltage Scaling
- SER Soft Error Rate
- ECC Error Correction Codes
- QoS Quality of Service
43Thesis Proposed Contribution CC-PROTECT
- Cooperative Cross-layer Protection (CC-PROTECT)
by exploiting error-awareness and error control
schemes across system abstraction layers - Contribution
- Present cost-efficient reliability methods
(cooperative cross-layer protection) - Open expanded tradeoff spaces and operating
points - Rediscover applicability of existing approaches
for other purposes
44Performance vs. Capacity
- Total energy available from a battery is a design
issue and is fixed at a design time, along with
its weight and size - Stark contrast between linear growth rate of
battery capacity and exponential technology
improvement rate of system components
Udani Sanjay Udani and Jonathan Smith, Power
management in mobile computing
45Generalized Fault Tolerance Techniques
- Modular Redundancy
- N-Version Programming
- Error-Control Coding
- Checkpoints and Rollbacks
- Recovery Blocks
Chetan, SPC04 S. Chetan, A. Ranganathan, and R.
Campbell, Towards Fault Tolerant Pervasive
Computing, in SPC 04 Somani, IEEECom97 A. K.
Somani and N. H. Vaidya, Understanding Fault
Tolerance and Reliability, in IEEE Computer 97
vol. 30 issue 4
461) Modular Redundancy
- Modular Redundancy
- Multiple identical replicas of hardware modules
- Voter mechanism
- Compare outputs and select the correct output
- ?Tolerate most hardware faults
- ?Effective but expensive
fault
Data
Consumer
Producer A
voter
Producer B
472) N-version Programming
- N-version Programming
- Different versions by different teams
- Different versions may not contain the same bugs
- Voter mechanism
- ?Tolerate some software bugs
Data
Producer A
Consumer
voter
Program i
Program j
fault
Programmer K
Programmer L
483) Error-Control Coding
- Error-Control Coding
- Replication is effective but expensive
- Error-Detection Coding and Error-Correction
Coding - (example) Parity Bit, Hamming Code, CRC
- ? Much less redundancy than replication
fault
Data
Producer A
Consumer
Error Control
Data
494) Checkpoints Rollbacks
- Checkpoints and Rollbacks
- Checkpoint
- A copy of an applications state
- Save it in storage immune to the failures
- Rollback
- Restart the execution from a previously
saved checkpoint - ? Recover from transient and permanent hardware
and software failures
Data
Producer A
Consumer
Application
State K
Rollback
state (K-1)
state K
fault
Checkpoint
505) Recovery Blocks
- Recovery Blocks
- Multiple alternates to perform the same
functionality - One Primary module and Secondary modules
- Different approaches
- Select a module with output satisfying acceptance
test - Recovery Blocks and Rollbacks
- Restart the execution from a previously saved
checkpoint with secondary module - ?Tolerate software failures
Data
Producer A
Consumer
Application
Block X
Block X2
Block Y
Block Z
Rollback
state (K-1)
state K
fault
Checkpoint
51Soft Errors (Transient Faults)
- SER increases exponentially as technology scales
- Integration, voltage scaling, altitude, latitude
- Caches are most hit due to
- Larger portion in processors (more than 50)
- No masking effects (e.g., logical masking)
Intel Itanium II Processor
Baumann, 05
Transistor
5 hours MTTF
0
1
1 month MTTF
Bit Flip
- MTTF Mean time To Failure
52Related Work
- Process Technology Solutions
- Hardening Baze, IEEE Trans. on Nuclear Science
00 - SOI O. Musseau, IEEE Trans. on Nuclear Science
96 - Process complexity, yield loss, and substrate
cost
- Microarchitectural Solutions for Caches
- Cache Scrubbing Mukherjee, PRDC04
- Low Power Cache Li, ISLPED04
- Area Efficient Protection Kim, DATE06
- Multiple Bit Correction Neuberger, TODAES 03
- Cache Size Selection Cai, ASP-DAC06
- In-Cache Replication Zhang, DSN03
- Replication Cache Zhang, IEEE Computers 05
- High overheads in terms of power, performance,
and area
- Our Solution
- Protects caches from failures due to soft errors
exploiting error-tolerance of applications - Protection can be in conjunction with any
techniques
53Unequal Data Protection
- All pages are not equally failure critical
- Multimedia data is failure non-critical
- Program variables are failure critical
- Failures system crash, infinite loop,
segmentation faults, etc - QoS degradation is not a failure
Only 9 pages out of 83 are failure critical
54Failure Critical and Failure Non-Critical Data
55Soft Errors on Increase
- Increase exponentially due to technology scaling
- 0.18 µm
- 1,000 FIT per Mbit of SRAM
- 0.13 µm
- 10,000 to 100,000 FIT per Mbit of SRAM
- Voltage Scaling
- Voltage scaling increases SER significantly
Qcritical
SER
?
Nflux
CS
x
x
exp
-
Qs
where
Qcritical
V
C
x
56Experimental Setup for Page Failure Rates
57Experimental Framework
58Experimental Results Failure Rate
- Failure rate of PPC is close to that of Safe
(Safe is a protected cache configuration with an
ECC protection, i.e., protecting all data, and
Unsafe is an unprotected cache)
59Experimental Results Performance
- Runtime of PPC is close to that of Unsafe
60Experimental Results Power
- Energy consumption of PPC is close to that of
Unsafe
61Experimental Setup for DPExplore
62DPExplore Results
63Video Encoding
64Error-Resilient Video Encoding
Parameters
Resilience
PLR
- Error-resilient video encodings have been
developed to combat errors in networks - PBPAIR energy-efficient and error-resilient
video encoding Kim,06 - Passive Error Exploitation
- It compresses video data according to PLR
Mobile Video Application
Embed Error-Resilience against packet losses
Maintain the QoS
Packet Loss
- PBPAIR Probability-Based
- Power Aware Intra Refresh
65Related Work
- Energy/QoS-aware video encoding
- Video encoding parameters Mopatra, IPDPS05
- Motion estimation algorithm Tourapis, VCIP00
- Integrated power management Mohapatra, ACM MM03
- Global cross-layer adaption Yuan, MMCN04
- Transmission power and QoS Eisenberg, IEEE
Trans. on CSVT 02 - Not consider error-resilience
- Error-resilient video encoding
- Error-resilient GOP Yang, JVCIP07
- AIR (Adaptive Intra Refreshing) Worral,
ICASSP01 - PGOP (Progressive GOP) Cheng, PCS04
- PBPAIR (Probability-Based Power Aware Intra
Refresh) Kim, MCCR06 - Passive error exploitation
- Our Solution
- Error-aware video encoding exploits errors
actively to minimize energy consumption
66EE-PBPAIR
67Experimental Setup
68Experimental Results Energy Reduction
- Energy saving occurs at every component in a path
from encoding to decoding in mobile video
applications
EC Energy Consumption Enc EC EC for
Encoding Tx EC EC for Transmission Dec EC EC
for Decoding Rx EC EC for Receiving
- PSNR Peak Signal to Noise Ratio
69Experimental Results Expanded Tradeoff Space
70Experimental Energy Saving
- Source EC Enc EC Tx EC
- Destination EC Rx EC Dec EC
71Experimental Results Adaptive EIR
- Feedback-based approach (Adaptive EE-PBPAIR)
maintains the required video quality compared to
Static EE-PBPAIR
72Adaptive EIR
73Conclusion
73
- Studied two main cross-layer approaches
- PPC
- EAVE
- Demonstrated the effectiveness of our cooperative
cross-layer approaches by exploiting error
tolerance and error control schemes
Tolerance
Resilience
FLR
EIR
feedback
PLR
Unequal Protection
74Failure Rate
75Video Quality
76Memory Access Time (performance)
77Future Direction
77
- Cooperative approaches combining PPC and EAVE
- Middleware-driven cross-layer approach manages
error control schemes - Translate errors to exploit existing approaches
at other abstraction layers - PPC
- Apply our approach for other components
- Instruction caches and logics
- EAVE
- Intelligent frame dropping techniques
- To maximize the energy saving while minimizing
the quality degradation
Tolerance
Resilience
FLR
EIR
feedback
PLR
SER
Unequal Protection
78Thesis Outline
- Thesis proposes a cross-layer method
- Exploit errors and error control schemes across
layers to maximize reliability with minimal costs
for mobile embedded systems - Topic 1 Approach at hardware and application
layers - PPC (unequal data protection at hardware
exploiting error tolerance at application) Lee,
CASES06Lee, DIPES08Lee, TVLSI08 - Topic 2 Approach at application, middleware,
and network layers - EAVE (intentional exploitation of errors at
application, incorporating error resilience in
networks) Lee, DIPES08 - Topic 3 Approach across application/middleware-O
S/HW - CC-PROTECT (middleware-driven cooperative
exploitation of errors and error control schemes
across layers) Lee, ACM MM 08
79References (cross-layers and tools)
79
- Bajic, 07 I. V. Bajic. Efficient cross-layer
error control for wireless video multicast.
53(1)276285, Mar 2007. - Dynamo DYNAMO. Power Aware Middleware for
Distributed Mobile Computing. University of
California at Irvine, http//dynamo.ics.uci.edu/. - Forge FORGE Project. A Framework for
Optimization of Distributed Embedded Systems
Software. University of California at Irvine,
http//www.ics.uci.edu/forge/. - Grace GRACE Project. Global Resource Adaptation
through CoopEration. University of Illinois at
Urbana-Champaign, http//rsim.cs.uiuc.edu/grace/. - Kim, 08 M. Kim, N. Dutt, N. Venkatasubramanian,
and C. Talcott. xTune Online verifiable
cross-layer adaptation for distributed real-time
embedded systems. ACM SIGBED Review Special
Issue on the RTSS Forum on Deeply Embedded
Real-Time Computing, 5(1), Jan 2008. - Mohapatra, 03 S. Mohapatra, R. Cornea, N. Dutt,
A. Nicolau, and N. Venkatasubramanian. Integrated
power management for video streaming to mobile
handheld devices. In ACM international conference
on Multimedia, 2003. - Mohapatra, 05 S. Mohapatra, R. Cornea, H. Oh,
K. Lee, M. Kim, N. Dutt, R. Gupta, A. Nicolau, S.
Shukla, and N. Venkatasubramanian. A cross-layer
approach for power-performance optimization in
distributed mobile systems. In Next Generation
Software Program in conjunction with IPDPS, page
218.1, April 2005. - Shivakumar, 01 P. Shivakumar and N. Jouppi.
CACTI 3.0 An Integrated Cache Timing, Power, and
Area Model. In WRL Technical Report 2001/2, 2001. - Synopsys Synopsys Inc., Mountain View, CA, USA.
Design Compiler Reference Manual, 2001. - Schaar, 07 M. van der Schaar and D. S. Turaga.
Cross-layer packetization and retransmission
strategies for delay-sensitive wireless
multimedia transmission. IEEE Transactions on
Multimedia, 9(1)185197, Jan. 2007. - Vuran, 06 M. C. Vuran and I. F. Akyildiz.
Cross-layer analysis of error control in wireless
sensor networks. In IEEE Communications Society
on Sensor and Ad Hoc Communications and Networks
(SECON), pages 585594, Sep 2006. - Yuan, 03 W. Yuan and K. Nahrstedt.
Energy-efficient soft real-time CPU scheduling
for mobile multimedia systems. 37(5)149163, Dec
2003. - Yuan, 04 W. Yuan and K. Nahrstedt. Practical
voltage scaling for mobile multimedia devices. In
ACM international conference on Multimedia, pages
924931, 2004.
80References (soft errors and reliability)
80
- Baumann, 05 R. Baumann. Soft errors in advanced
computer systems. IEEE Design and Test of
Computers, pages 258266, 2005. - Hazucha, 00 P. Hazucha and C. Svensson. Impact
of CMOS technology scaling on the atmospheric
neutron soft error rate. IEEE Trans. on Nuclear
Science, 47(6)25862594, 2000. - Li, 05 J.-F. Li and Y.-J. Huang. An error
detection and correction scheme for RAMs with
partial-write function. In IEEE International
Workshop on Memory Technology, Design and Testing
(MTDT), pages 115120, 2005. - Li, 04 L. Li, V. Degalahal, N. Vijaykrishnan,
M. Kandemir, and M. J. Irwin. Soft error and
energy consumption interactions A data cache
perspective. In ISLPED, Aug 2004. - Mastipuram, 04 R. Mastipuram and E. C. Wee.
Soft Errors Impact on System Reliability.
http//www.edn.com/article/CA454636, Sep 2004. - Phelan, 03 R. Phelan. Addressing soft errors in
arm core-based designs. Technical report, ARM,
2003. - Pradhan, 96 D. K. Pradhan. Fault-Tolerant
Computer System Design. Prentice Hall, 1996. ISBN
0-1305-7887-8. - Shrivastava, 05 A. Shrivastava, I. Issenin, and
N. Dutt. Compilation techniques for energy
reduction in horizontally partitioned cache
architectures. In CASES, pages 9096, 2005. - Wrobel, 01 F. Wrobel, J. M. Palau, M. C.
Calvet, O. Bersillon, and H. Duarte. Simulation
of nucleon-induced nuclear reactions in a
simplified SRAM structure Scaling effects on SEU
and MBU cross sections. IEEE Trans. on Nuclear
Science, 48(6), 2001. - Xu, 96 J. Xu and B. Randell. Roll-forward error
recovery in embedded real-time systems. In
ICPADS, page 414, 1996. - Nieuwland, 06 A. K. Nieuwland and S. Jasarevic
and G. Jerin. Combinational Logic Soft Error
Analysis and Protection. In IOLTS06, 2006.
81References (error-resilient encoding, etc.)
81
- Cheng, 04 L. Cheng and M. E. Zarki. PGOP An
error resilient techniques for low bit rate and
low latency video communications. In Picture
Coding Symposium (PCS), Dec 2004. - Kim, 06 M. Kim, H. Oh, N. Dutt, A. Nicolau, and
N. Venkatasubramanian. PBPAIR An
energy-efficient error-resilient encoding using
probability based power aware intra refresh. ACM
SIGMOBILE Mobile Computing and Communications
Review, 10(3)5869, July 2006. - Wang, 98 Y.Wang and Q.-F. Zhu. Error control
and concealment for video communication A
review. 86(5)974997, May 1998. - Worrall, 01 S. Worrall, A. Sadka, P. Sweeney,
and A. Kondoz. Motion adaptive error resilient
encoding for MPEG-4. In ICASSP, May 2001.