Dynamic Processor Sparing - PowerPoint PPT Presentation

1 / 6
About This Presentation
Title:

Dynamic Processor Sparing

Description:

Dynamic Processor Sparing. IBM Academy of Research study on Proactive Problem ... Non-retry mechanisms possible, but require more redundancy ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 7
Provided by: scottb81
Category:

less

Transcript and Presenter's Notes

Title: Dynamic Processor Sparing


1
Dynamic Processor Sparing
  • IBM Academy of Research study on Proactive
    Problem Prediction, Avoidance and Diagnosis
  • 04-28-2003

2
Redundancy
  • Redundancy always provides a theoretical
    mechanism for avoiding failures
  • Requires precise fault detection and containment,
    and operation retry capability
  • Non-retry mechanisms possible, but require more
    redundancy
  • In a multi-processor environment, each processor
    is inherently redundant

3
Processor Recovery
  • Several different recovery techniques used
    throughout the industry
  • Most focused on array errors
  • ECC, miss/purge/re-fetch,
  • Few focused on logic errors
  • Z-series uses Instruction Checkpoint Retry

4
Instruction Checkpoint Retry
  • Processor micro-architectural checkpoint
    maintained on instruction boundaries
  • Protected with robust error detection
  • When error detected, processor micro-architected
    state restored from checkpoint
  • Operation resumed from last known-good checkpoint
  • Checkpoint may be restored on the same processor,
    or an alternate (redundant) processor

5
Dynamic Processor Sparing
  • Dynamically move workload from a defective
    processor to a healthy, redundant processor
  • Wide range of implementations
  • Defective processor still able to make forward
    progress
  • Gracefully de-allocate workload from defective
    processor and dispatch to alternate processor
  • Defective processor unable to make forward
    progress
  • Defective processor abruptly shut-down
  • Checkpoint extracted from defective processor and
    transplanted into alternate processor
  • Redundant processor could be active or dormant

6
Proactive Implementation of Dynamic Processor
Sparing
  • Implement thresholds for recoverable errors
  • Take action to remove defective processor from
    configuration
  • Guarantee spare (dormant) processors in every
    machine configuration
  • Use of spare processors does not reduce overall
    capacity
  • Potentially transparent to operating system
  • No impact to customer, no parts need to be
    replaced
  • Align number of available spares with concurrent
    repair capabilities
  • Still support sparing to active processors
  • Effective for continuous reliable operation
  • More difficult to hide from operating system
  • Usually does not avoid parts needing replaced
Write a Comment
User Comments (0)
About PowerShow.com