Review last week - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

Review last week

Description:

Logical refers to the Hyperthreading side, physical means core. ... multithreading (SMT) is the same concept as Intel's Pentium 4 'hyper-threading' ... – PowerPoint PPT presentation

Number of Views:117

Avg rating:3.0/5.0

Slides: 41

Provided by: danielort

Category:

more less

Transcript and Presenter's Notes

Title: Review last week

1
Review last week

The software problem
Robust SW coding techniques
Regression testing
Reliability models for software
Redundancy in Software
Reliability of N-versioning
Software rejuvenation

2
Today

Reliability of networks
Hardware related FTC techniques
Watchdog Techniques
Redundancy in time (Re-execution)
RESO
Processes, threads
Superscalar, CMP,SMT
Research on FT microarchitectures
AR-stream, DIVA
Other error detection mechanisms in HW
BIST

3
Reliability of Networks

Based on graph theory nodes represent computers,
branches represent communication links
Simplest model assumes nodes do not fail but
links do.
Link failures may be due to traffic congestion or
physical failures
Path is a collection of branches that provide
communications between specific pair of nodes
In general we are interested in knowing
RallP(all nodes are connected)
RstP(nodes s and t are connected)
RkP(k nodes are connected)

4
Reliability of Networks
Simple state space enumeration
b
a
1
5
4
6
2
c
3
d
Represent all possible ways to go from node a to
b considering that links fail
Prob. 1 link failure
Prob. 2 links failure
If all links are equal and pprob. of being
up qprob. being down
If p0.9 and q0.1 then Rab0.997
5
Reliability of Networks
b
a

To improve network reliability we can increase p
or add more branches to the network
There are other more efficient methods to compute
network reliability
Cut sets
Graph Reduction

1
5
4
6
2
c
3
d
6
Cut Sets
b
a
Cut sets A group of links that break all paths
between s and t when they are removed from the
graph (sa, tb in the example graph) C1145 C
2162 C31563 C41234
1
5
4
6
2
d
c
3
R1-P(C1 or C2 or C3 orCJ) Rab1-P(C1 or C2 or
C3 or C4) Rab1-P(145 or 162 or 1563 or
1234) Rab1-P(145)P(162)P(1563)P
(1234) P(12456)P(13456)P(1234
5)P(12356)P(123456)P(123456)

P(A or B) P(A) P(B) - P(A and B)
7
Primary Graph Reductions

Graph reductions facilitate calculation
Series sequence of edges are required
simultaneously combine with axiom of
probability
P(A?B) P(A)P(B)
Parallel network is operational if any of these
edges are operational combine with axiom of
probability
P(A?B) P(A) P(B) P(A?B)

serial
S
.9
.9
A
B
.9
.9
T
Serial reduction
P(A?B) .81.81-(.81.81) .9639
Parallel reduction
8
Watchdog Techniques

Key concept
A process or processor is checked by another
hardware (normally) unit of its actions such as
if the process is still active, alive, not
executing incorrect paths during execution, etc.

9
Watchdog Timers

Check for aliveness
Processor resets the timer at certain interval or
on certain conditions
Timer raises error flag if not reset before it
overruns

10
Watchdog Timers (contd.)

Check for timeout
Processor sends a message and starts a timer, the
second processor must reply within this time
(hardware/software implementation)

11
Watchdog Timers (contd.)

Applications
Processor control systems (chemical, mechanical
and other control systems)
Switching systems messages sent or received
often await certain length of time before they
are repeated
Networks email messages often have timeouts
associated with them

12
Watchdog Processors

Consider the following simple architecture

Watchdog can Observe the address bus Observe
the data Observe instructions Check the flow of
program control
Need to know what kind of errors can occur
13
Watchdog Control flow checking

Some studies have found that 60 of all transient
faults could be detected by monitoring control
flow
Control flow basic principle
Analyze the program and extract control
information
Branch free intervals
Subroutine calls
Assign signatures to branch free intervals and
provide these signatures to the watchdog
processor to check these values
Signatures can be checksums of instruction opcodes

14
Watchdog Control flow checking (contd.)
Watchdog Receive start Observe instr.
flow Calculate signature Check with stored
signature
Program Start branch free code End branch free
code
15
Watchdog Mem access

What to do about memory/data errors
Use ECC
AMD Opteron, Intel Pentium D multicore processors
use ECC techniques to avoid transient errors in
memory access
Few other methods using watchdog techniques
Check for non existent memory addresses
Check for out of range addresses

16
Fault Detection in Complex Processors

High density and complexity of current processors
increases the probability of occurrence of
transient, intermittent and permanent faults
Diverse techniques are used to detect these
faults
RESO
Re-excution
BIST

17
Re-execution with Shifting operands (RESO)

Re-execute the same arithmetic operations, but
shifting the operands
Goal detect errors in ALU
Example shift left by 2
1 0 1 0 1 0 X X
1 0 0 1 0 1 X X
0 0 1 0 1 1 X X
By comparing output bit 0 of the first execution
and output bit 2 of the shifted re-execution, we
detect an error in the ALU, since they should be
equal

error
18
Re-execution

Replicate the actions on a module either on the
same module (temporal redundancy) or on spare
modules (temporal spatial redundancy)
Good for detecting and/or correcting transient
faults
Transient error will only affect one execution
Can implement this policy at many different
levels
ALU
Thread context
Processor
System

19
Race Conditions

In concurrent applications race conditions may
happen
A race condition is a bug that occurs when the
outcome of a program depends on which of two or
more threads reaches a particular block of code
first. Running the program many times produces
different results, and the result of any given
run cannot be predicted.
Re-execution of the same threads may be used to
detect a race condition.

20
Break
21
Re-execution with Processes

Idea Use redundant processes to detect errors
Problem in a uniprocessor serialization,
slowdown factor of 2
In a multicore/multiprocessor, we can execute
multiple copies of the same process
simultaneously on 2 processors and have them
periodically compare their results
Almost no slowdown, except for comparisons
Disadvantage not using the other processor to
perform non-redundant work

Process
Process
CPU
Check errors
Process
Process
CPU
CPU
Check errors
22
Current Multi-Core Procesors

A multi-core CPU combines independent processors
(cores) onto a single silicon chip.
Intel Distinguishes between logical and
physical processors
Logical refers to the Hyperthreading side,
physical means core.
An Intel Dual-Core processor has two physical
processors in the same chip package (Paxville)
AMD Uses the concept of logical processor count
to refer to multiple cores existing within the
same chip package.
Dual-core Opteron and AMD64 (X2) dual-core

23
Shared Memory Multiprocessor Architectures
Athlon 64FX2
Pentium D
24
Past, Present and the Future?
Basic Multicore IBM Power5
Traditional Multiprocessor
Integrated Multicore 16 Tile MIT Raw
PE
PE
PE
PE

Memory
Memory
Memory
Memory
25
Re-execution of microinstructions
Superscalar UniProcessor Microarchitecture
Pipleline Stages IF

ID

RD

( in order )
Dispatch
Buffer
Re-execute instructions on different Functional
Units
Drawback -Tests only FUs not whole pipeline
( out of order )
ALU
MEM1
FP1
BR
EX
MEM2
FP2
FP3
( out of order )
Reorder
Buffer
( in order )
WB

26
Re-execution with Threads

Use redundant threads to detect errors
Many current superscalar microprocessors are
multithreaded ( Intel Pentium4, IBM Power5,
Compaqs Alpha21464,Suns UltraSparc 3)
Each processor can run multiple processes or
multiple threads of the same process
Can re-execute a program on multiple thread
contexts, just like with multiple processors
Better performance than re-execution with
multiple processors, since the comparison can be
performed on-chip
Lower cost to use an extra thread context rather
than extra processor

27
SMT

Simultaneous multithreading (SMT) is the same
concept as Intels Pentium 4 hyper-threading
Main idea of SMT
Improve efficiency of a superscalar processor by
exploiting thread level parallelism (TLP) and
instruction level parallelism (ILP) at same time
Threads are generated by a compiler or OS
(processes)
According to Intels data SMT provides 30 of
improvement at the cost of 5 more chip area

28
SMT - Flow of Instructions
Thread 1
Thread 2
Thread 3
Thread 4
29
Re-execution with Simultaneous Multithreaded (SMT)

Motivation (Rotenberg 99)
Increasingly high clock rates and chip density
may cause transient errors in high performance
microprocessors
High cost of multiprocessor (at that time)
Active stream/redundant stream Simultaneous
Multithreading (SMT)
Low overhead, broad coverage of transient faults
and some permanent faults
In AR-SMT, two explicit copies of the program run
concurrently on the same processor resources

30
Re-execution with Simultaneous Multithreaded (SMT)

A-stream is executed on SMT and results are
committed in the delay buffer
R-stream executes on the SMT, delayed from the
A-stream, by no more than the size of the delay
buffer
R-stream results are compared to A-stream results
in delay buffer, a fault is detected if results
differ
SMT Pipeline
time-shared, in any given cycle, the pipeline
stage is consumed entirely by one thread.
space-shared, every cycle a fraction of the
bandwidth is allocated to both threads.

31
DIVA Dynamic Implementation and Verification
architecture

Permits detection and recovery of all functional
and electrical faults
Extends the speculative mechanism to fault
detection and recovery
Addresses recovery from permanent faults that
maybe caused through design faults

32
A high level view of processor
DIVA processor
33
DIVA Overview

The processor is divided into a deeply
speculative core and a functionally and
electrically robust DIVA checker
Core has all the stages except the retirement
stage
DIVA checker verifies correctness of the
computations before saving in architected storage
Incorporates a watchdog timer that is used to
restart the core if no forward progress is being
made

34
DIVA - Architecture

Two pipelines
CHKcomp verifies integrity of all functional
units computations
CHKcomm verifies register and memory
communications between the instructions
CT Commit stage. Instructions are committed if
both CHKcomp and CHKcomm pass

35
DIVA

EX results of instruction are recomputed
CMP Recomputed results are compared with the
one from the core
RD Reads register/memory values from
architected storage
CHK compares the values read to the input
operands from the core
A bypass is provided in case an instruction
immediately before is checking the values
currently being written

36
Other Error Detection Mechanisms

Testing techniques are used to detect errors in
critical components in a processor
BIST (Built in Self Test) random testing
patterns of bits are applied to the circuit under
test and the output is checked for errors
ATP (Automatic Test Pattern Generation)

37
Basic BIST Architecture
BIST Start
BIST Done
Test Controller
Pass/Fail
Output Response Analyzer (ORA)
Test Pattern Generator
System Outputs
Input Isolation circuitry
Circuit Under Test
System Inputs
38
Advantages of BIST

Can be used at all levels of testing
System level testing in field
No need for external test machines
Less I/O pins needed for testing
Burn-in Test made easy
No need for test vector development

39
Disadvantages of BIST

Area overhead susceptibility to manufacturing
defects
Performance penalties
Extra efforts to designing and verifying proper
operation of BIST at design level.
Additional risk in project

40
Summary