Cooperative cross-layer protection for resource constrained Mobile Multimedia systems

About This Presentation
Title:

Cooperative cross-layer protection for resource constrained Mobile Multimedia systems

Description:

EAVE (Error-Aware Video Encoding) CC-PROTECT (Cooperative, Cross-layer Protection) ... Summary EAVE. Intentional Frame Drop is one way to exploit errors actively ... –

Number of Views:149
Avg rating:3.0/5.0
Slides: 82
Provided by: ics5
Learn more at: https://ics.uci.edu
Category:

less

Transcript and Presenter's Notes

Title: Cooperative cross-layer protection for resource constrained Mobile Multimedia systems


1
Cooperative cross-layer protection for resource
constrained Mobile Multimedia systems
Prof. Nikil Dutt Prof. Nalini Venkatasubramanian P
rof. Lichun Bao
  • Kyoungwoo Lee (final defense)

Nov. 26, 2008
2
Contents
  • Thesis Motivation
  • Thesis Proposal Cooperative, Cross-layer
    Methods
  • PPC (Partially Protected Caches)
  • EAVE (Error-Aware Video Encoding)
  • CC-PROTECT (Cooperative, Cross-layer Protection)
  • Thesis Contribution and Future Direction

3
Mobile Multimedia Embedded Systems
Resource-limited mobile devices! Main problem is
to achieve low power with high performance, high
QoS, and high reliability
Map Routing
3D Graphics
Image Browsing
Animation
Mobile TV
Web Browsing
Video Streaming
Satellite TV
Video Conferencing
4
Reliability
  • Reliability is an emerging and critical concern
    in mobile devices
  • New enhanced technology makes devices vulnerable
    to errors due to high complexity and high
    integration
  • Exponential increase of soft error rate as
    technology scales Baumann, 05
  • Mobile applications are running close to humans
  • In pervasive computing, failures of healthcare
    mobile devices cause serious results
  • Redundancy techniques incur high overheads of
    power and performance
  • TMR (Triple Modular Redundancy) may exceed 200
    overheads without optimization Nieuwland, 06
  • Challenging to optimize multiple properties
    (e.g., performance, power, QoS, and reliability)
    in mobile embedded systems

5
Soft error is becoming an every second concern!
  • Soft Error Rate (SER) FIT (Failures in Time)
    number of errors in 109 hours

SER (FIT) MTTF Reason
1 Mbit _at_ 0.13 µm 1000 104 years





SER (FIT) MTTF Reason
1 Mbit _at_ 0.13 µm 1000 104 years
64 MB _at_ 0.13 µm 64x8x1000 81 days High Integration




SER (FIT) MTTF Reason
1 Mbit _at_ 0.13 µm 1000 104 years
64 MB _at_ 0.13 µm 64x8x1000 81 days High Integration
128 MB _at_ 65 nm 2x1000x64x8x1000 1 hour Technology scaling and Twice Integration



SER (FIT) MTTF Reason
1 Mbit _at_ 0.13 µm 1000 104 years
64 MB _at_ 0.13 µm 64x8x1000 81 days High Integration
128 MB _at_ 65 nm 2x1000x64x8x1000 1 hour Technology scaling and Twice Integration
A system _at_ 65 nm 2x2x1000x64x8x1000 30 minutes Memory takes up 50 of soft errors in a system


SER (FIT) MTTF Reason
1 Mbit _at_ 0.13 µm 1000 104 years
64 MB _at_ 0.13 µm 64x8x1000 81 days High Integration
128 MB _at_ 65 nm 2x1000x64x8x1000 1 hour Technology scaling and Twice Integration
A system _at_ 65 nm 2x2x1000x64x8x1000 30 minutes Memory takes up 50 of soft errors in a system
A system with voltage scaling _at_ 65 nm 100x2x2x1000x64x8x1000 18 seconds Exponential relationship b/w SER Supply Voltage

SER (FIT) MTTF Reason
1 Mbit _at_ 0.13 µm 1000 104 years
64 MB _at_ 0.13 µm 64x8x1000 81 days High Integration
128 MB _at_ 65 nm 2x1000x64x8x1000 1 hour Technology scaling and Twice Integration
A system _at_ 65 nm 2x2x1000x64x8x1000 30 minutes Memory takes up 50 of soft errors in a system
A system with voltage scaling _at_ 65 nm 100x2x2x1000x64x8x1000 18 seconds Exponential relationship b/w SER Supply Voltage
A system with voltage scaling _at_ flight (35,000 ft) _at_ 65 nm 800x100x2x2x1000x64x8x1000 FIT 0.02 seconds High Intensity of Neutron Flux at flight (high altitude)
6
Errors and Failures in Mobile Embedded Systems
  • Faults or Errors can cause Failures

Bug
Packet Loss
Exception
Soft Error
7
Errors and Error Control Schemes at Hardware
Failures Causes Metrics Traditional Approaches
Soft Errors, Hard Failures, System Crash External Radiations, Thermal Effects, Power Loss, Poor Design, Aging FIT, MTTF, MTBF Spatial Redundancy (TMR, Duplex, RAID-1 etc.) and Data Redundancy (EDC, ECC, RAID-5, etc.)
  • Hardware failures are increasing as technology
    scales
  • (e.g.) SER increases by up to 1000 times
    Mastipuram, 04
  • Redundancy techniques are expensive
  • (e.g.) ECC-based protection in caches can incur
    95 performance penalty Li, 05
  • FIT Failures in Time (109 hours)
  • MTTF Mean Time To Failure
  • MTBF Mean Time b/w Failures
  • TMR Triple Modular Redundancy
  • EDC Error Detection Codes
  • ECC Error Correction Codes
  • RAID Redundant Array of
  • Inexpensive Drives

8
Errors and Error Control Schemes at Software
Failures Causes Metrics Traditional Approaches
Wrong outputs, Infinite loops, Crash Incomplete Specification, Poor software design, Bugs, Unhandled Exception Number of Bugs/Klines, QoS, MTTF, MTBF Spatial Redundancy (N-version Programming, etc.), Temporal Redundancy (Checkpoints and Backward Recovery, etc.)
  • Software errors become dominant as systems
    complexity increases
  • (e.g.) Several bugs per kilo lines
  • Hard to debug, and redundancy techniques are
    expensive
  • (e.g.) Backward recovery with checkpoints is
    inappropriate for real-time applications
  • QoS Quality of Service

9
Errors and Error Control Schemes in Networks
Failures Causes Metrics Traditional Approaches
Data Losses, Deadline Misses, Node (Link) Failure, System Down Network Congestion, Noise/Interference, Malicious Attacks Packet Loss Rate, Deadline Miss Rate, SNR, MTTF, MTBF, MTTR Resource Reservation, Data Redundancy (CRC, etc.), Temporal Redundancy (Retransmission, etc.), Spatial Redundancy (Replicated Nodes, MIMO, etc.)
  • Network is unreliable (especially, wireless
    networks)
  • Joint approaches across OSI layers have been
    investigated for minimal costs Vuran,
    06Schaar, 07
  • SNR Signal to Noise Ratio
  • MTTR Mean Time To Recovery
  • CRC Cyclic Redundancy Check
  • MIMO Multiple-In Multiple-Out

10
Conventional Approaches
  • Most redundancy techniques incur overheads in
    terms of performance, power, area, etc.
  • Conventional TRM (Triple Modular Redundancy) can
    incur 200 overheads without optimization.
  • Backward Recovery with Checkpoints cannot
    guarantee the completion time of a task.
  • Recently proposed techniques have focused on the
    cost reduction without losing reliability
  • However, they still incur overheads

11
Thesis Problem Statement
  • Study tradeoffs among system properties
  • (e.g.) Redundancy incurs energy overheads while
    DVS increases SER significantly
  • Examine errors and error control schemes across
    system abstraction layers
  • (e.g.) network errors error-resilient video
    encoding, soft errors ECC or EDC, etc.
  • Maximize reliability with minimal costs of power
    and performance for mobile embedded systems
  • Mainly focus on soft error reduction for mobile
    multimedia embedded systems

12
Cross-Layer Methods
  • Cross-layer approaches
  • aim at system-level optimization
  • Integrate and coordinate techniques across system
    layers
  • Classification Srivastava, 05
  • Top-down, Bottom-up, or Both direction
  • Top-down PPC, PDVS (GRACE), etc.
  • Bottom-up EAVE, etc.
  • Both direction CC-PROTECT, etc.
  • Coupling or Merging layers
  • Dynamo Mohapatra, xTune Kim, etc.

Merging
Bottom-up
Top-down
Coupling
  • PDVS Practical Dynamic Voltage Scaling

13
Cross-Layer Approaches GRACE
  • GRACE project _at_ UIUC W. Yuan Ph.D. thesis in
    04 and A. F. Harris III, Ph.D. thesis in 06
  • QoS/Power tradeoffs
  • Primarily OS adaptation for power management in
    multimedia mobile devices
  • Network adaptation for power management in
    multimedia communications

GRACE, 05
14
Cross-Layer Approaches DYNAMO FORGE
  • DYNAMO middleware for FORGE project _at_ UCI S.
    Mohapatra Ph.D. thesis in 05 and R. Cornea Ph.D.
    thesis in 07
  • QoS/Performance/Power tradeoffs for mobile
    embedded systems
  • Middleware-driven coordination and proxy-based
    cooperation
  • Content transcoding at the application layer
  • Network traffic shaping at the network layer
  • Backlight (LCD display) setting at the hardware
    layer
  • NIC shutdown, CPU DVS/DFS at the hardware layer

1
2
3
4
15
Cross-Layer Approaches xTune
  • xTune framework _at_ UCI and SRI M. Kim Ph.D.
    thesis in 08
  • QoS/Power/Timeliness adaptation for distributed
    real-time embedded systems
  • A Formal Methodology for cross-layer tuning and
    verifiable timeliness of Mobile Embedded Systems

16
Thesis Proposed Contribution
  • Thesis proposes a cross-layer design methodology
    for mobile multimedia embedded systems with
    minimal costs
  • Reliability/QoS/Power/Performance system
    optimization for mobile multimedia systems
  • Cooperative, Cross-Layer Protection
  • PPC, EAVE, CC-PROTECT
  • Low-cost reliability

17
Overview of Thesis Proposals
Multimedia Application
Error-Resilient Encoder (e.g., PBPAIR)
Packet Loss
Frame Drop
Error- Aware Video
EAVE
Application
  • PPC (Partially Protected Caches)
  • EAVE (Error-Aware Video Encoding)
  • CC-PROTECT (Cooperative, Cross-layer Protection)

QoS
Original Video
Error-Controller (e.g., frame drop)
Monitor Translate SER
MW/OS
Error Injection Rate
Correction
Frame Loss Rate
Error detection
EDC
Hardware
18
Contents
  • Thesis Motivation
  • Thesis Proposal Cooperative, Cross-layer
    Methods
  • PPC (Partially Protected Caches)
  • EAVE
  • CC-PROTECT
  • Thesis Contribution and Future Direction

19
Conventional Protection for Caches
  • Cache is the most hit by soft errors
  • Conventional Protected Caches
  • Unaware of fault tolerance at applications
  • Implement a redundancy technique such as ECC to
    protect all data for every access
  • Overkill for multimedia applications
  • ECC (e.g., a Hamming Code) incurs high
    performance penalty by up to 95, power overhead
    by up to 22, and area cost by up to 25

Unaware of Application
High Cost
Cache
ECC
20
PPC (Partially Protected Caches)
  • Observation
  • Not all data are equally failure critical
  • Multimedia data vs. control variables
  • Propose PPC architectures to provide an unequal
    protection for mobile multimedia systems Lee,
    CASES06Lee, TVLSI08
  • Unprotected cache and Protected cache at the same
    level of memory hierarchy
  • Protected cache is typically smaller to keep
    power and delay the same as or less than those of
    Unprotected cache

21
PPC for Multimedia Applications
  • Propose a selective data protection Lee,
    CASES06
  • Unequal protection at hardware layer exploiting
    error-tolerance of multimedia data at application
    layer
  • Simple data partitioning for multimedia
    applications
  • Multimedia data is failure non-critical
  • All other data is failure critical

Power/Delay Reduction
Fault Tolerance
22
PPC for General Applications
  • DPExplore Lee, PPCDIPES08
  • Explore partitioning space by exploiting
    vulnerability of each data page
  • Vulnerable time
  • It is vulnerable for the time when eventually it
    is read by CPU or written back to Memory
  • Pages causing high vulnerable time are failure
    critical
  • Vulnerable time closely estimates failure rate
  • Reduce the number of simulations to estimate the
    failure rate

23
Summary PPC
  • All data are not equally failure critical
  • Propose a PPC architecture to provide unequal
    protection
  • Support an unequal protection at hardware layer
    by exploiting error-tolerance and vulnerability
    at application
  • Present cost-efficient reliability
  • Related Publications
  • Lee, CASES06 PPC for multimedia embedded
    systems
  • Lee, PPCDIPES08 PPC for general applications
  • Lee, TVLSI08 PPC and design space exploration
  • Under submission
  • Lee, TODAES?? Partitioning techniques for
    general applications and instruction caches

Application Data Code
Error-tolerance of MM data Vulnerability of Data
Code
Page Partitioning Algorithms
Failure Non-Critical
Failure Critical
FNC FC are mapped into Unprotected Protected
Caches
Unprotected Cache
Protected Cache
PPC
24
Contents
  • Thesis Motivation
  • Thesis Proposal Cooperative, Cross-layer
    Methods
  • PPC
  • EAVE (Error-Aware Video Encoding)
  • CCPROTECT
  • Thesis Contribution and Future Direction

25
Active Error Exploitation Intentional Frame Drop
  • Intentional Frame Drop (one way to actively
    exploit errors) can result in energy reduction
    for each operation
  • FDT-1 affects the following components with
    respect to power, performance, and QoS in mobile
    video applications

Mobile Video Application
Enc
Tx
Dec
Rx
CPU
WNI
CPU
WNI
FDT-1
FDT-2
FDT-3
Packet Loss
  • FDT Frame Drop Type
  • Enc Encoding, Dec Decoding
  • WNI Wireless Network Interface

26
Error-Aware Video Encoding
  • Propose EE-PBPAIR Lee, DIPES08
  • Intentionally drop frames at video encoding
  • Reduce the energy consumption for video encoding
  • Maintain the video quality by exploiting
    error-resilience of PBPAIR

Packet Loss
Intentional frame drop
Error-Aware Video Encoder (EAVE)
Error- Resilient Video
Error- Aware Video
Original Video
Error-Resilient Encoder (e.g., PBPAIR)
Error-Controller (e.g., frame dropping)
EIR
  • EIR Error Injection Rate

27
Summary EAVE
  • Intentional Frame Drop is one way to exploit
    errors actively
  • Propose an error-aware video encoding (EE-PBPAIR)
  • Present a knob (EIR) to adjust the amount of
    errors considering the QoS feedback
  • Maintain the video quality using error-resilience
    of PBPAIR
  • Related Publication
  • Lee, DIPES08 EE-PBPAIR
  • Considering Submission
  • Lee, TECS?? Generalized idea for
    error-resilient video encodings

Error Resilient Video Encoder
Error-Aware Video Data
Application
Error Rate PLR EIR
Error Controller
Network or Decoding Side
EIR
PLR QoS
Middleware
Energy Reduction
CPU, Memory, and WNI
Hardware
  • EIR Error Injection Rate
  • PLR Packet Loss Rate

28
Contents
  • Thesis Motivation
  • Thesis Proposal Cooperative, Cross-layer
    Methods
  • PPC
  • EAVE
  • CC-PROTECT (Cooperative Cross-layer Protection)
  • Thesis Contribution and Future Direction

29
Errors and Error Control Schemes No Coupling
  • Different errors and their protection techniques
    have not been considered jointly
  • No coupling and no cooperation
  • Cooperating control schemes in a cross-layer
    manner can open a new venue

Mobile Video Application
Packet Loss
Soft Error
30
PPC still incurs overheads due to ECC-protection
  • Propose PPC architectures to provide an unequal
    protection for mobile multimedia systems Lee,
    TVLSI08
  • Unprotected cache and Protected cache a the same
    level of memory hierarchy
  • PPC still incurs overheads due to high expensive
    ECC-protection at the protected cache
  • 29 energy reduction compared to the protected
    cache
  • 10 energy overhead compared to the unprotected
    cache

31
PBPAIR is energy-inefficient in error-free network
  • PBPAIR is error-resilient and energy-efficient in
    general
  • PBPAIR may not be energy efficient in case of
    error-free network

PLR
PBPAIR
Intra_Threshold
  • PBPAIR Probability-Based Power Aware Intra
    Refresh Kim, 06

32
Outline of CC-PROTECT
QoS Loss
BER (Backward Error Recovery)
Soft Error
EDC
Error detection
33
Energy Saving
EDC DFR impact 36 Reduction compared to
HW-PROTECT 26 Reduction compared to BASE
EDC DFR PBPAIR(CC-PROTECT) impact 56
Reduction compared to HW-PROTECT 49 Reduction
compared to BASE
EDC impact 17 Reduction compared to
HW-PROTECT 4 Reduction compared to BASE
  • BASE Error-prone video encoding unprotected
    cache
  • HW-PROTECT Error-prone video encoding PPC
    with ECC
  • APP-PROTECT Error-resilient video encoding
    unprotected cache
  • MULTI-PROTECT Error-resilient video encoding
    PPC with ECC
  • CC-PROTECT1 Error-prone video encoding PPC
    with EDC
  • CC-PROTECT2 Error-prone video encoding PPC
    with EDC DFR
  • CC-PROTECT error-resilient video encoding PPC
    with EDC DFR

34
Summary CC-PROTECT
  • Propose CC-PROTECT approach, which cooperates
    existing schemes across layers to mitigate the
    impact of soft errors on the failure rate and
    video quality in mobile video encoding systems
  • PPC (Partially Protected Caches) with EDC (Error
    Detection Codes) at hardware layer
  • DFR (Drop and Forward Recovery) at middleware
  • PBPAIR (Probability-Based Power Aware Intra
    Refresh) at application layer
  • Demonstrate the effectiveness of low-cost (about
    50) reliability (1,000x) at the minimal cost of
    QoS (less than 1)
  • Related Publication
  • Lee, ACMMM08 CC-PROTECT
  • Considering Submission
  • Lee, ACMTOMCCAP?? Tradeoff space exploration
    with CC-PROTECT

PBPAIR - Error Resilience
DFR - Error Correction
ECC
EDC
Unprotected Cache
Protected Cache
35
Contents
  • Thesis Motivation
  • Thesis Proposal Cooperative, Cross-layer
    Methods
  • PPC
  • EAVE
  • CC-PROTECT
  • Thesis Contribution and Future Direction

36
Overall Thesis Contribution
  • Cross-layer methodology to design mobile
    multimedia embedded systems with minimal costs
  1. Effective Cross-layer approaches for reliability
  2. Low-cost reliability
  3. Expanded trade-off space
  4. Extended applicability of existing techniques

Packet Loss
Frame Drop
Soft Error
37
Effectiveness of Thesis Proposals (Energy Saving)
  • PPC
  • EAVE

CCPROTECT
  • 29 energy reduction, as compared to a
    conventional protected cache with ECC
  • 37 energy reduction, as compared to a
    conventional video encoding
  • 56 energy reduction, as compared to a
    conventional composition of protections

38
Publication
  • Lee, ACMMM08 K. Lee, A. Shirvastava, M. Kim, N.
    Dutt, and N. Venkatasubramanian, Mitigating the
    impact of hardware defects on multimedia
    applications A cross-layer approach, In ACM
    International Conference on Multimedia, Oct.
    2008.
  • Lee, TVLSI08 K. Lee, A. Shrivastava, I.
    Issenin, N. Dutt, and N. Venkatasubramanian,
    Partially protected caches to reduce failures
    due to soft errors in multimedia applications,
    In IEEE Transactions on Very Large Scale
    Integration Systems (TVLSI), 2008, to appear.
  • Lee, DIPES08 K. Lee, M. Kim, N. Dutt, and N.
    Venkatasubramanian, Error exploiting video
    encoder to extend energy/QoS tradeoffs for mobile
    embedded systems, In 6th IFIP Working Conference
    on Distributed and Parallel Embedded Systems
    (DIPES), Sep. 2008.
  • Lee, PPCDIPES08 K. Lee, A. Shrivastava, N.
    Dutt, and N. Venkatasubramanian, Data
    partitioning techniques for partially protected
    caches to reduce soft error induced failures, In
    6th IFIP Working Conference on Distributed and
    Parallel Embedded Systems (DIPES), Sep. 2008.
  • Lee, CASES06 K. Lee, A. Shrivastava, I.
    Issenin, N. Dutt, and N. Venkatasubramanian,
    Mitigating soft error failures for multimedia
    applications by selective data protection, In
    Int. Conference on Compilers, Architecture,
    Synthesis for Embedded Systems (CASES), Oct.
    2006.
  • Lee, ICME05 K. Lee, N. Dutt, and N.
    Venkatasubramanian, Experimental Study on Energy
    Consumption of Video Encryption for Mobile
    Handheld Devices", In IEEE International
    Conference on Multimedia and Expo (ICME 05),
    Poster Session, July 2005.
  • Mohapatra, IPDPS05 S. Mohapatra, R. Cornea, H.
    Oh, K. Lee, M. Kim, N. Dutt, R. Gupta, A.
    Nicolau, S. Shukla, and N. Venkatasubramanian, A
    cross-layer approach for power-performance
    optimization in distributed mobile systems, In
    Next Generation Software Program in conjunction
    with IEEE International Parallel and Distributed
    Processing Symposium (IPDPS), April 2005.

Lee, DIPES08
Lee, TVLSI08 Lee, PPCDIPES08 Lee, CASES06
Lee, ACMMM08 Mohapatra, IPDPS05 Lee, ICME05
39
Future Direction
  • Error Rate Translation/Integration
  • Different types of errors
  • Different components across system layers
  • Cross-layer methods for distributed embedded
    systems (Horizontal Expansion)
  • Network-aware methods
  • Context-aware approaches

Mobile Video Application
Bug
Packet Loss
Exception
Soft Error
40
Thank you! Any Questions or Comments?
41
Backup Slides
42
Why Cross-Layer Approach?
  • Cross-layer interactions and conflicts arise
    between system properties
  • DVS increases SER exponentially
  • Over protection or under protection
  • All ECC for multimedia data is an overkill
  • Cross-layer approaches can maximize the
    reliability with minimal power and performance
    overheads
  • Benefits of Cross-layer approaches
  • Global system view
  • Coordination for intelligent selection
  • Adaptation
  • Cross-layer approaches have been promising to
    save the resources at the cost of QoS Mohapatra,
    05Yuan, 04
  • DVS Dynamic Voltage Scaling
  • SER Soft Error Rate
  • ECC Error Correction Codes
  • QoS Quality of Service

43
Thesis Proposed Contribution CC-PROTECT
  • Cooperative Cross-layer Protection (CC-PROTECT)
    by exploiting error-awareness and error control
    schemes across system abstraction layers
  • Contribution
  • Present cost-efficient reliability methods
    (cooperative cross-layer protection)
  • Open expanded tradeoff spaces and operating
    points
  • Rediscover applicability of existing approaches
    for other purposes

44
Performance vs. Capacity
  • Total energy available from a battery is a design
    issue and is fixed at a design time, along with
    its weight and size
  • Stark contrast between linear growth rate of
    battery capacity and exponential technology
    improvement rate of system components

Udani Sanjay Udani and Jonathan Smith, Power
management in mobile computing
45
Generalized Fault Tolerance Techniques
  • Modular Redundancy
  • N-Version Programming
  • Error-Control Coding
  • Checkpoints and Rollbacks
  • Recovery Blocks

Chetan, SPC04 S. Chetan, A. Ranganathan, and R.
Campbell, Towards Fault Tolerant Pervasive
Computing, in SPC 04 Somani, IEEECom97 A. K.
Somani and N. H. Vaidya, Understanding Fault
Tolerance and Reliability, in IEEE Computer 97
vol. 30 issue 4
46
1) Modular Redundancy
  • Modular Redundancy
  • Multiple identical replicas of hardware modules
  • Voter mechanism
  • Compare outputs and select the correct output
  • ?Tolerate most hardware faults
  • ?Effective but expensive

fault
Data
Consumer
Producer A
voter
Producer B
47
2) N-version Programming
  • N-version Programming
  • Different versions by different teams
  • Different versions may not contain the same bugs
  • Voter mechanism
  • ?Tolerate some software bugs

Data
Producer A
Consumer
voter
Program i
Program j
fault
Programmer K
Programmer L
48
3) Error-Control Coding
  • Error-Control Coding
  • Replication is effective but expensive
  • Error-Detection Coding and Error-Correction
    Coding
  • (example) Parity Bit, Hamming Code, CRC
  • ? Much less redundancy than replication

fault
Data
Producer A
Consumer
Error Control
Data
49
4) Checkpoints Rollbacks
  • Checkpoints and Rollbacks
  • Checkpoint
  • A copy of an applications state
  • Save it in storage immune to the failures
  • Rollback
  • Restart the execution from a previously
    saved checkpoint
  • ? Recover from transient and permanent hardware
    and software failures

Data
Producer A
Consumer
Application
State K
Rollback
state (K-1)
state K
fault
Checkpoint
50
5) Recovery Blocks
  • Recovery Blocks
  • Multiple alternates to perform the same
    functionality
  • One Primary module and Secondary modules
  • Different approaches
  • Select a module with output satisfying acceptance
    test
  • Recovery Blocks and Rollbacks
  • Restart the execution from a previously saved
    checkpoint with secondary module
  • ?Tolerate software failures

Data
Producer A
Consumer
Application
Block X
Block X2
Block Y
Block Z
Rollback
state (K-1)
state K
fault
Checkpoint
51
Soft Errors (Transient Faults)
  • SER increases exponentially as technology scales
  • Integration, voltage scaling, altitude, latitude
  • Caches are most hit due to
  • Larger portion in processors (more than 50)
  • No masking effects (e.g., logical masking)

Intel Itanium II Processor
Baumann, 05
Transistor
5 hours MTTF
0
1
1 month MTTF
Bit Flip
  • MTTF Mean time To Failure

52
Related Work
  • Process Technology Solutions
  • Hardening Baze, IEEE Trans. on Nuclear Science
    00
  • SOI O. Musseau, IEEE Trans. on Nuclear Science
    96
  • Process complexity, yield loss, and substrate
    cost
  • Microarchitectural Solutions for Caches
  • Cache Scrubbing Mukherjee, PRDC04
  • Low Power Cache Li, ISLPED04
  • Area Efficient Protection Kim, DATE06
  • Multiple Bit Correction Neuberger, TODAES 03
  • Cache Size Selection Cai, ASP-DAC06
  • In-Cache Replication Zhang, DSN03
  • Replication Cache Zhang, IEEE Computers 05
  • High overheads in terms of power, performance,
    and area
  • Our Solution
  • Protects caches from failures due to soft errors
    exploiting error-tolerance of applications
  • Protection can be in conjunction with any
    techniques

53
Unequal Data Protection
  • All pages are not equally failure critical
  • Multimedia data is failure non-critical
  • Program variables are failure critical
  • Failures system crash, infinite loop,
    segmentation faults, etc
  • QoS degradation is not a failure

Only 9 pages out of 83 are failure critical
54
Failure Critical and Failure Non-Critical Data
55
Soft Errors on Increase
  • Increase exponentially due to technology scaling
  • 0.18 µm
  • 1,000 FIT per Mbit of SRAM
  • 0.13 µm
  • 10,000 to 100,000 FIT per Mbit of SRAM
  • Voltage Scaling
  • Voltage scaling increases SER significantly

Qcritical
SER
?
Nflux
CS
x
x
exp
-

Qs
where
Qcritical

V
C
x
56
Experimental Setup for Page Failure Rates
57
Experimental Framework
58
Experimental Results Failure Rate
  • Failure rate of PPC is close to that of Safe
    (Safe is a protected cache configuration with an
    ECC protection, i.e., protecting all data, and
    Unsafe is an unprotected cache)

59
Experimental Results Performance
  • Runtime of PPC is close to that of Unsafe

60
Experimental Results Power
  • Energy consumption of PPC is close to that of
    Unsafe

61
Experimental Setup for DPExplore
62
DPExplore Results
63
Video Encoding
64
Error-Resilient Video Encoding
Parameters
Resilience
PLR
  • Error-resilient video encodings have been
    developed to combat errors in networks
  • PBPAIR energy-efficient and error-resilient
    video encoding Kim,06
  • Passive Error Exploitation
  • It compresses video data according to PLR

Mobile Video Application
Embed Error-Resilience against packet losses
Maintain the QoS
Packet Loss
  • PBPAIR Probability-Based
  • Power Aware Intra Refresh

65
Related Work
  • Energy/QoS-aware video encoding
  • Video encoding parameters Mopatra, IPDPS05
  • Motion estimation algorithm Tourapis, VCIP00
  • Integrated power management Mohapatra, ACM MM03
  • Global cross-layer adaption Yuan, MMCN04
  • Transmission power and QoS Eisenberg, IEEE
    Trans. on CSVT 02
  • Not consider error-resilience
  • Error-resilient video encoding
  • Error-resilient GOP Yang, JVCIP07
  • AIR (Adaptive Intra Refreshing) Worral,
    ICASSP01
  • PGOP (Progressive GOP) Cheng, PCS04
  • PBPAIR (Probability-Based Power Aware Intra
    Refresh) Kim, MCCR06
  • Passive error exploitation
  • Our Solution
  • Error-aware video encoding exploits errors
    actively to minimize energy consumption

66
EE-PBPAIR
67
Experimental Setup
68
Experimental Results Energy Reduction
  • Energy saving occurs at every component in a path
    from encoding to decoding in mobile video
    applications

EC Energy Consumption Enc EC EC for
Encoding Tx EC EC for Transmission Dec EC EC
for Decoding Rx EC EC for Receiving
  • PLR 10 and EIR 10
  • PSNR Peak Signal to Noise Ratio

69
Experimental Results Expanded Tradeoff Space
70
Experimental Energy Saving
  • Source EC Enc EC Tx EC
  • Destination EC Rx EC Dec EC

71
Experimental Results Adaptive EIR
  • Feedback-based approach (Adaptive EE-PBPAIR)
    maintains the required video quality compared to
    Static EE-PBPAIR

72
Adaptive EIR
73
Conclusion
73
  • Studied two main cross-layer approaches
  • PPC
  • EAVE
  • Demonstrated the effectiveness of our cooperative
    cross-layer approaches by exploiting error
    tolerance and error control schemes

Tolerance
Resilience
FLR
EIR
feedback
PLR
Unequal Protection
74
Failure Rate
75
Video Quality
76
Memory Access Time (performance)
77
Future Direction
77
  • Cooperative approaches combining PPC and EAVE
  • Middleware-driven cross-layer approach manages
    error control schemes
  • Translate errors to exploit existing approaches
    at other abstraction layers
  • PPC
  • Apply our approach for other components
  • Instruction caches and logics
  • EAVE
  • Intelligent frame dropping techniques
  • To maximize the energy saving while minimizing
    the quality degradation

Tolerance
Resilience
FLR
EIR
feedback
PLR
SER
Unequal Protection
78
Thesis Outline
  • Thesis proposes a cross-layer method
  • Exploit errors and error control schemes across
    layers to maximize reliability with minimal costs
    for mobile embedded systems
  • Topic 1 Approach at hardware and application
    layers
  • PPC (unequal data protection at hardware
    exploiting error tolerance at application) Lee,
    CASES06Lee, DIPES08Lee, TVLSI08
  • Topic 2 Approach at application, middleware,
    and network layers
  • EAVE (intentional exploitation of errors at
    application, incorporating error resilience in
    networks) Lee, DIPES08
  • Topic 3 Approach across application/middleware-O
    S/HW
  • CC-PROTECT (middleware-driven cooperative
    exploitation of errors and error control schemes
    across layers) Lee, ACM MM 08

79
References (cross-layers and tools)
79
  • Bajic, 07 I. V. Bajic. Efficient cross-layer
    error control for wireless video multicast.
    53(1)276285, Mar 2007.
  • Dynamo DYNAMO. Power Aware Middleware for
    Distributed Mobile Computing. University of
    California at Irvine, http//dynamo.ics.uci.edu/.
  • Forge FORGE Project. A Framework for
    Optimization of Distributed Embedded Systems
    Software. University of California at Irvine,
    http//www.ics.uci.edu/forge/.
  • Grace GRACE Project. Global Resource Adaptation
    through CoopEration. University of Illinois at
    Urbana-Champaign, http//rsim.cs.uiuc.edu/grace/.
  • Kim, 08 M. Kim, N. Dutt, N. Venkatasubramanian,
    and C. Talcott. xTune Online verifiable
    cross-layer adaptation for distributed real-time
    embedded systems. ACM SIGBED Review Special
    Issue on the RTSS Forum on Deeply Embedded
    Real-Time Computing, 5(1), Jan 2008.
  • Mohapatra, 03 S. Mohapatra, R. Cornea, N. Dutt,
    A. Nicolau, and N. Venkatasubramanian. Integrated
    power management for video streaming to mobile
    handheld devices. In ACM international conference
    on Multimedia, 2003.
  • Mohapatra, 05 S. Mohapatra, R. Cornea, H. Oh,
    K. Lee, M. Kim, N. Dutt, R. Gupta, A. Nicolau, S.
    Shukla, and N. Venkatasubramanian. A cross-layer
    approach for power-performance optimization in
    distributed mobile systems. In Next Generation
    Software Program in conjunction with IPDPS, page
    218.1, April 2005.
  • Shivakumar, 01 P. Shivakumar and N. Jouppi.
    CACTI 3.0 An Integrated Cache Timing, Power, and
    Area Model. In WRL Technical Report 2001/2, 2001.
  • Synopsys Synopsys Inc., Mountain View, CA, USA.
    Design Compiler Reference Manual, 2001.
  • Schaar, 07 M. van der Schaar and D. S. Turaga.
    Cross-layer packetization and retransmission
    strategies for delay-sensitive wireless
    multimedia transmission. IEEE Transactions on
    Multimedia, 9(1)185197, Jan. 2007.
  • Vuran, 06 M. C. Vuran and I. F. Akyildiz.
    Cross-layer analysis of error control in wireless
    sensor networks. In IEEE Communications Society
    on Sensor and Ad Hoc Communications and Networks
    (SECON), pages 585594, Sep 2006.
  • Yuan, 03 W. Yuan and K. Nahrstedt.
    Energy-efficient soft real-time CPU scheduling
    for mobile multimedia systems. 37(5)149163, Dec
    2003.
  • Yuan, 04 W. Yuan and K. Nahrstedt. Practical
    voltage scaling for mobile multimedia devices. In
    ACM international conference on Multimedia, pages
    924931, 2004.

80
References (soft errors and reliability)
80
  • Baumann, 05 R. Baumann. Soft errors in advanced
    computer systems. IEEE Design and Test of
    Computers, pages 258266, 2005.
  • Hazucha, 00 P. Hazucha and C. Svensson. Impact
    of CMOS technology scaling on the atmospheric
    neutron soft error rate. IEEE Trans. on Nuclear
    Science, 47(6)25862594, 2000.
  • Li, 05 J.-F. Li and Y.-J. Huang. An error
    detection and correction scheme for RAMs with
    partial-write function. In IEEE International
    Workshop on Memory Technology, Design and Testing
    (MTDT), pages 115120, 2005.
  • Li, 04 L. Li, V. Degalahal, N. Vijaykrishnan,
    M. Kandemir, and M. J. Irwin. Soft error and
    energy consumption interactions A data cache
    perspective. In ISLPED, Aug 2004.
  • Mastipuram, 04 R. Mastipuram and E. C. Wee.
    Soft Errors Impact on System Reliability.
    http//www.edn.com/article/CA454636, Sep 2004.
  • Phelan, 03 R. Phelan. Addressing soft errors in
    arm core-based designs. Technical report, ARM,
    2003.
  • Pradhan, 96 D. K. Pradhan. Fault-Tolerant
    Computer System Design. Prentice Hall, 1996. ISBN
    0-1305-7887-8.
  • Shrivastava, 05 A. Shrivastava, I. Issenin, and
    N. Dutt. Compilation techniques for energy
    reduction in horizontally partitioned cache
    architectures. In CASES, pages 9096, 2005.
  • Wrobel, 01 F. Wrobel, J. M. Palau, M. C.
    Calvet, O. Bersillon, and H. Duarte. Simulation
    of nucleon-induced nuclear reactions in a
    simplified SRAM structure Scaling effects on SEU
    and MBU cross sections. IEEE Trans. on Nuclear
    Science, 48(6), 2001.
  • Xu, 96 J. Xu and B. Randell. Roll-forward error
    recovery in embedded real-time systems. In
    ICPADS, page 414, 1996.
  • Nieuwland, 06 A. K. Nieuwland and S. Jasarevic
    and G. Jerin. Combinational Logic Soft Error
    Analysis and Protection. In IOLTS06, 2006.

81
References (error-resilient encoding, etc.)
81
  • Cheng, 04 L. Cheng and M. E. Zarki. PGOP An
    error resilient techniques for low bit rate and
    low latency video communications. In Picture
    Coding Symposium (PCS), Dec 2004.
  • Kim, 06 M. Kim, H. Oh, N. Dutt, A. Nicolau, and
    N. Venkatasubramanian. PBPAIR An
    energy-efficient error-resilient encoding using
    probability based power aware intra refresh. ACM
    SIGMOBILE Mobile Computing and Communications
    Review, 10(3)5869, July 2006.
  • Wang, 98 Y.Wang and Q.-F. Zhu. Error control
    and concealment for video communication A
    review. 86(5)974997, May 1998.
  • Worrall, 01 S. Worrall, A. Sadka, P. Sweeney,
    and A. Kondoz. Motion adaptive error resilient
    encoding for MPEG-4. In ICASSP, May 2001.
Write a Comment
User Comments (0)
About PowerShow.com