Multithreaded Processor Architectures - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Multithreaded Processor Architectures

Description:

Multithreaded Processor Architectures. Gregory T. Byrd and Mark A. Holliday ... Often, in their unending quest for computers with higher performance, architects ... – PowerPoint PPT presentation

Number of Views:497
Avg rating:3.0/5.0
Slides: 24
Provided by: liangb
Category:

less

Transcript and Presenter's Notes

Title: Multithreaded Processor Architectures


1
Multithreaded Processor Architectures
  • Gregory T. Byrd and Mark A. Holliday
  • IEEE Spectrum, Volume 32, Issue 8, Page 38-46,
    Aug. 1995
  • Speaker Wen-Kai Huang

2
Abstract
  • Often, in their unending quest for computers with
    higher performance, architects seek to reduce or
    hide latency the number of cycles an operation
    takes from start to finish. Multithreaded
    architectures take the tack of hiding latency by
    supporting multiple concurrent streams of
    threads, which are independent of one another.
    This article gives an introduction to multithread
    processor architectures and discussions for the
    design challenges

3
Whats the Problem
  • The long latency operations such as
  • memory accesses
  • remote reads
  • synchronization operations
  • may extend for 10 to 100 cycles, forcing the
  • traditional processor to sit idle until the
  • result comes in

4
Main Idea
  • Multithreaded processors multiplex the execution
    of a number of concurrent threads onto the
    hardware in order to hide latencies
  • When a long-latency operation occurs in one of
    the threads, another begins execution

5
Multiple Hardware Contexts
  • Threads are mapped onto hardware contexts, which
    each include
  • General-purpose registers
  • Program counters
  • Status registers
  • Since many cycles are required to switch between
    threads, multithread processor must support some
    mechanisms to reduce context switching overhead
  • To provide multiple hardware contexts

6
Illustration of Multithreaded Processors
- - - - running thread - - - - ready threads
7
Striking the Right Balance
  • A multithreaded processors efficiency is
    determined by four parameters
  • Number of contexts supported by the hardware
  • Context switching overhead
  • Run length
  • The number of cycles executed between context
    switches
  • Characteristic latency

8
Multithreaded Processor Efficiency
9
Multithread Models
  • Coarse-grained (block interleaving)
  • Executes a single thread until it reaches certain
    situations
  • Fine-grained (cycle-by-cycle interleaving)
  • The processor switches each cycle to a different
    thread
  • Multiple-issue (simultaneous)
  • Integrate multithreaded mechanism into
    superscalar architectures

10
Coarse-grained Multithreading
  • The triggering event in a block interleaving
    (coarse-grained) model can be classified as
    follows
  • 1

11
Coarse-grained Example - Sparcle
  • The Sparcle processor
  • Supports four hardware contexts and switches from
    one to another whenever a cache miss occurs

12
Discussions about Sparcle
  • There has only one program counter and status
    register
  • 14 cycles context-switching overhead
  • The number of contexts that can be effectively
    used is often less than four
  • Improvements of Efficiency
  • Reduce the switching overhead
  • Reduce the switching frequency (longer run
    length)
  • Memory prefetching
  • Supports more hardware context?? ? NO!!

13
Fine-grained Architectures
  • Fine-grained architectures issue an instruction
    from different thread on every cycle
  • Zero switching overhead ? achieved by the
    presence of enough registers so that no saving
    and restoring needed when switches contexts
  • Major problems of fine-grained solutions
  • Hardware cost
  • Not all workloads contain sufficient parallelism
  • Poor performance with single-thread workload

14
Cycle-by-cycle Interleaving
  • An instruction of a thread can be fed into the
    pipeline after the completion of the previous
    instruction of that thread
  • It eliminates pipeline hazards so that the
    processor pipeline can be very simple
  • However, single-thread performance is poor

15
Fine-grained Example - MTA
  • The MTA processor
  • Three-stage pipeline
  • 128 hardware contexts
  • No cache, so memory accesses have very long
    latency
  • Its an expensive design

16
The Laudon Scheme
  • In Laudons architecture, the cache systems had
    been adopted
  • It supports only four hardware contexts
  • Caches keep most latencies short, many threads
    are not needed
  • Also, if running single-thread workload, it can
    fill up the CPU pipeline

17
Multiple Issue (Simultaneous)
  • In superscalar processor, each operation proceeds
    through one of the several functional units
  • A setup clearly compatible with multithreading
  • Operations can be issued by different threads in
    the same cycle

18
Simultaneous Example M-Machine
  • The M-Machine processor supports 8 functional
    units
  • This example illustrate the processor coupling
    with 3 threads for 5 consecutive cycles

19
Detecting Parallelism
  • The coarsest level
  • Task-level parallelism, identified by the user
  • Medium-grained, or control-level
  • Function or subroutine-level parallelism,
    specified by the programmer or compiler
  • Fine-grained, or data-level
  • Involves executing the same set of instructions
    on different data, e.g. software pipeline
  • Very fine-grained
  • VLIW or superscalar

20
Different Level of Parallelism
  • The threads of a multithreaded architecture
    usually correspond to medium- and fine-grain units

21
Prospects for Success
  • To date (1995), there have been no successful
    multithreaded machines because of the
  • Extra cost
  • Complexity of hardware
  • Dearth of tools for extracting thread-level
    parallelism
  • Today, however, there are many solutions for
    these problems
  • The advanced semi-conductor technologies
  • The advanced CAD tools for circuits design
  • The advanced software tools for parallelism
    exploitation

22
Conclusion Multithreaded Trends
  • Recent announcements by processor vendors depict
    a trend, with an increasing emphasis on enhancing
    chip-level throughput through multiple cores and
    threads 2
  • Each core or thread may not necessarily aim at
    delivering the highest frequency 2

23
Appendix
  • IEEE Micro Hot Chip 2004 (total 5 chips)
  • Sun Microsystems, A Chip MultiThreaded (CMT)
    Processor for Network Workloads ? Dual-thread
  • IBM, IBM Power5 A Dual-Core Multithreaded
    Processor ? Dual-thread for each core
  • Related Work to ARM processor
  • G. Cui and Z. Li, MT_ARM Multithreading
    Implementation in ARM7 Architecture, Proc. 4th
    International Conference on ASIC, pp. 793 - 796
    Oct. 2001
  • Reference
  • 1 J. Kreuzinger, etl., Context-switching
    Techniques for Decoupled Multithreaded
    Processors, Proc. 25th EUROMICRO Conference,
    Vol. 1, pp. 248 - 251 Sept. 1999
  • 2 P. Bose, Chip-level microarchitecture
    trends, IEEE Micro, Vol. 24, Issue 2, pp. 5,
    Mar.-Apr. 2004
Write a Comment
User Comments (0)
About PowerShow.com