HWSW Fault Analysis of Multiprocessor Systems - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

HWSW Fault Analysis of Multiprocessor Systems

Description:

Pulls expired tasks first. Pulls highest priority first. Pulls ' ... Load situations requiring task-migration becomes more important in the embedded system domain ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 40
Provided by: jensbraune
Category:

less

Transcript and Presenter's Notes

Title: HWSW Fault Analysis of Multiprocessor Systems


1
HW/SW Fault Analysis ofMultiprocessor Systems
  • Rainer G. Spallek,
  • Steffen Köhler
  • TU Dresden

2
Outline
  • Software Bugs and Fault Analysis
  • Principles of debugging uniprocessor systems
  • Performance and Efficiency of Embedded
    Microprocessors
  • The evolution of embedded systems and its
    software development requirements
  • Symmetric Multiprocessing Architectures
  • A flexible approach to provide performance
    scalability
  • SMP Operation System and Application Development
  • Partitioning of execution load through
    task/thread creation
  • Software Debugging in SMP Environments
  • Debugging concurrent tasks/threads at different
    abstraction levels

3
Software Bugs andFault Ananlysis
4
Software Fault Sources
  • The programmer creates a defect. A defect is a
    piece of the code
  • that can cause an infection. Because the defect
    is part of the
  • code, and because every code is initially written
    by a programmer,
  • the defect is technically created by the
    programmer.
  • If the programmer creates a defect, does that
    mean the
  • programmer was at fault? Not in every case.
  • A program behavior may become classified as a
    failure only when the user sees it for the
    first time.
  • In a modular program, a failure may happen
    because of incompatible interfaces of two
    modules.
  • In a distributed program (e.g. a multiprocessor
    system), a failure may be the result of some
    unpredictable interaction of several components.

5
Software Fault Propagation
  • The defect causes an infection. The program is
    executed, and with
  • it the defect. The defect now creates an
    infection - that is, after
  • execution of the defect, the program state
    differs from what the
  • programmer intended.
  • A defect in the code does not necessarily cause
    an infection.
  • The defective code must be executed, and it must
    be executed under such conditions that the
    infection actually occurs.
  • An infection need not, however, propagate
    continuously. It may be overwritten, masked, or
    corrected by some later program action.
  • The infection causes a failure. A failure is an
    externally observable error in the program
    behavior. It is caused by an infection in the
    program state.

6
Performance and Efficiencyof Embedded
Microprocessors
7
Evolution of Embedded Systems
  • Embedded systems in the past
  • Single processor
  • Simple memory system
  • Small applications
  • Software development was reasonable simple
  • Embedded systems today and in future
  • Many processors
  • Multi-stage memory/communication hierarchy
  • Complex parallel and concurrent applications
  • Software development getting more and more
    complex

8
Microprocessor Design Challenges
9
The Cycles Per Instruction Problem
  • Limited amount of instruction level parallelism
    in programs
  • Super-scalar units not fully utilized
  • High hardware costs for out-of-order issue
    implementation
  • Solution Use of task and thread level
    parallelism

10
Multiprocessor Architectures
  • Hardware efficiency is the main argument
  • Potential performance gain scales nearly linear
    with the number of processor cores
  • Chip area and power consumption scale linear with
    the number of processor cores
  • One exception communication and memory hierarchy
  • Utilization of task and thread parallelism is a
    complex task
  • Identify large blocks of data-independent program
    code
  • Partition these blocks in such a way, that
    communication between them can be achieved with
    the available resources
  • Map the concurrent program blocks to physical
    processor cores
  • Manage to adapt this mapping in accordance with
    the current load situation

11
Symmetric Multiprocessing Architectures
12
SMP Basics
  • Symmetric Multi Processing
  • From symmetry follows every task can be
    executed on every particular processor core
  • Trade-off between higher hardware effort for
    universal communication network (shared memory)
    and more flexible task and thread partitioning
    scheme
  • All CPU are equivalent in access and performance
  • Interconnected with a bus or crossbar
  • Simplified programming through shared memory
    model
  • Unified interrupt distribution sub-system
  • Several IP vendors provide SMP enabled processor
    solution for embedded SoCs (ARM, PowerPC, etc)

13
ARM11_MPCore Architecture
14
SMP Programmers Model
  • Software developer partitions the applications
    manually or semiautomatic
  • Concurrent execution of threads in same address
    space
  • Thread synchronization issues are handled within
    the application context through additional
    dedicated OS functions
  • Developer is responsible for synchronisation. OS
    supports by providing dedicated functions (e.g.
    Linux futex, pthread mutex)
  • Mapping of application threads to physical
    processor cores is handled transparently to the
    user by the OS kernel

15
SMP Software Development
  • User driven application partitioning is an
    iterative process
  • Sophisticated development tools required
    (compiler, profiler, trace analysis, etc.)
  • Objective Find a partitioning, that maximizes
    the overall system performance

16
Problems Introduced by Concurrency
  • Efficiency problems
  • Inefficient partitioning through lack of thread
    parallelism in the considered application
    (synchronization and communication reduce
    performance benefits)
  • OS kernel overhead through automatic thread
    mapping onto a particular core, thread migration
    to a different core, use of synchronization
    primitives (load balancing inefficiencies)
  • Potential Software Bugs
  • Deadlock, blocking or 'starving' situations
  • Race conditions data race, message race, relaxed
    order memory access
  • Unprotected entries into critical sections
  • Shared use of local variables (re-entrancy)
  • Non-thread-safe libraries

17
SMP Operationg System andApplication Development
18
SMP Control Inside Linux Kernel
Contain SMP support functions
19
Single Processor vs. SMP
  • Objectives
  • Fair load sharing, efficient load distribution
  • System-Speedup Ncpu

20
Task Migration
  • Kernel function load_balance()
  • Called at most every 200 ms
  • Is called on empty runqueue on each cpu

CPU 0
CPU 1
CPU 2
  • Pulls from 'busiest run-queue'
  • Pulls expired tasks first
  • Pulls highest priority first
  • Pulls 'not running' tasks first
  • Repeat the last 2 steps until 'busiest run-queue'
    has no overhead to CPU's run-queue

load_balance() Lock(rq1, rq2) Pull(task) Un
lock(rq1,rq2)
Scheduler Task
Scheduler Task
Scheduler Task
Shared Memory
Task
Task
Task
Task
Task
Task
Task
Run-queue CPU 1
Run-queue CPU 2
Run-queue CPU 0
21
Multi-Thread Application Problems
  • Locking
  • Shared and private memory access
  • Order of execution
  • Synchronization / communication overhead

22
Example Relaxed Order Memory Access
double G,L pragma omp parallel pragma
shared(G) private(L)
Parallel Region
Thread 0
Thread 1
G 0.0 L work() pragma omp atomic G L
G 0.0 L work() pragma omp
atomic G L
write stalled
write stalled
G changed by thread 0.
Changed G overwritten by thread 1. Results of
thread 0 are lost.
Memory
Temporary View
Temporary View
23
Application Thread Partitioning
  • pthread library
  • Explicit creation and termination of threads
  • Explicit, fast synchronization primitives (locks,
    mutexes, conditions)
  • OpenMP compiler directed
  • Explicit specified parallel regions
  • Implicit creation and termination of threads
  • Explicit synchronization primitives (nested
    locks, memory access barriers, ordered execution,
    critical section, etc.)
  • Extraction of thread parallelism is controlled by
    additional compiler pragma statements

24
Software Debugging inSMP Environments
25
System Behavior vs. Invasive Debugging
  • Typically stop, evaluate and restart the entire
    target system
  • All cores are synchronously controlled, but may
    operate asynchronously to peripheral devices and
    memory sub-system
  • System state may not be completely restorable
    after restart
  • System observability depends on debug HW/SW
    system capabilities and includes all typical
    cases
  • Relaxed Memory access race conditions
  • Execution order, or timing dependent bugs
  • Transient system state dependent bugs
  • Preemption based timing conditions
  • Sources of performance losses
  • Communication conditions
  • Cache Performance

26
JTAG/TAP Emulation
  • Entire system state is observable
  • Physical Mapping of tasks / threads onto
    particular SMP cores
  • High level of intrusion - system completely
    stopped
  • Stopping a single core in a multi core SMP system
    might impact the system stability
  • Expensive and complex debug access (JTAG
    accelerator)
  • Application state has to be evaluated through a
    complex interpretation of the raw SMP system
    state (core specific MMU tables, kernel thread
    table, etc.)
  • May be required for kernel development, but is
    rather oversized for pure application development

27
MMU Page Table Interpretation
28
Test Access Port IEEE 1149.1
29
MultiCore Debug-Interface IEEE 1500
30
MultiCore Debugger Architecture
31
Detecting Bugs through Trace-HW
  • E.g. Deadlock detection at a given check-point
  • Build the list of exclusive locks
  • Owned by the thread or
  • The thread is blocked by
  • Build the dependency graph and find a cycle
  • Required information (for hardware trace)
  • Thread ID(content of Context/Thread ID Register)
  • Enter/leave trigger events of lock-functions
    (content of program counter)
  • Addresses of locks(content of first argument
    register or stack)

32
Intelligent Trace Hardware Required
  • Paradigm shift from post-trace analysis to
    pre-trace specification
  • User has to specify
  • What has to be captured?
  • When it has to be captured?
  • On-Chip trace pre-processing requires extensive
    HW support
  • On-chip filter capabilities
  • Trace data compression
  • Trigger logic
  • Cross triggers (multi-core / multi-component)

33
On-Chip Trace Architecture (CoreSight)
  • Unfortunately still not available on any SMP
    processor hardware

34
Trace Programming Model
  • Everything is hierarchical
  • Overall system is build from components (ETMs,
    HTMs, Embedded Cross Trigger, ...)
  • Register based programming model for each trace
    component
  • Registers for identification and management
  • Component specific control registers
  • Memory mapped interface to provide access to
    component registers
  • Typically access via AMBA bus bridge

35
Operating System Debug Extensions
  • Application level debugging is supported by the
    Linux kernel through several interfaces (ptrace,
    thread_db).
  • One of the most common debuggers based on these
    interfaces is GDB.
  • Low level of intrusion
  • Only selected tasks/threads are stopped /
    observed
  • simple debug system
  • Thread / core mapping is transparent
  • Only effects of core mapping and concurrent
    execution are visible
  • only user threads are observable

36
Thread State Observation
  • Display the number of current application threads
  • Observe the related run state of every particular
    application
  • All threads are started and stopped
    synchronously, allowing the debugger to evaluate
    all thread contexts

37
Conclusion
38
Conclusion
  • SMP offers a good trade-off when considering
    hardware effort and programming complexity
  • For a low number of processor cores, hardware
    cost scales nearly linear with the potential
    performance gain.
  • Through the unified shared memory model, the
    implementation of multi-thread application is
    significantly simplified.
  • Debugging SMP systems is a complex task
  • Kernel level development requires physical core
    access (JTAG)
  • User application development might also benefit
    from physical core access, but high intrusion
    level effects can make debugging inefficient
  • OS kernel provided thread debug interfaces are
    sufficient in most cases
  • Non Invasive trace support is always beneficial
    when debugging concurrent tasks / threads
  • Load situations requiring task-migration becomes
    more important in the embedded system domain

39
Thank You
Write a Comment
User Comments (0)
About PowerShow.com