The Stanford FLASH Multiprocessor - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

The Stanford FLASH Multiprocessor

Description:

The Stanford FLASH Multiprocessor J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chapin, D. Nakahira, J. Baxter, – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 15
Provided by: MichaelAr150
Category:

less

Transcript and Presenter's Notes

Title: The Stanford FLASH Multiprocessor


1
The Stanford FLASH Multiprocessor
  • J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R.
    Simoni,
  • K. Gharachorloo, J. Chapin, D. Nakahira, J.
    Baxter,
  • M. Horowitz, A. Gupta, M. Rosenblum, J. Hennessy.
  • In Proceedings of the 21st Annual International
    Symposium on Computer Architecture (ISCA) 1994.

Henry Cook CS258 4/2/2008
2
Goals
  • Support both cache-coherent shared memory and
    message passing
  • Not just either/or, but both at the same time
  • Design a custom node controller
  • Build actual hardware
  • 256 node target, scaling to thousands
  • (Actually built 64)

3
Big Idea
  • Principal difference between CCSM and MP is
    protocol for transferring data
  • Overall machine structure is the same
  • Functions performed by node controller are also
    the same
  • By making the controller a special purpose
    protocol processor, can leverage flexibility of
    software while reducing overheads

4
System Architecture
  • Each processor is a single MIPS chip
  • MAGIC has a protocol processor
  • Different messages types are processed by
    different software handlers

5
Protocols - CCSM
  • Directory-based, with dynamic pointer allocation
    structure
  • Similar to DASH protocol
  • Separate request-reply networks to eliminate
    cycles
  • Handlers must yield if they cannot run to
    completion

6
Protocols - MP
  • Long messages vs short messages
  • Block transfer vs synchronization
  • User-level parameters passed to transfer handler
    running on MAGIC
  • When all user message components have arrived at
    dest., a reception handler is invoked

7
Protocols - Extensions
  • Just have to change the handlers
  • Emulate COMA attraction memories
  • Implement synchronization primitives as MAGIC
    handlers
  • Short message support similar to active messages
  • Not user-level active messages

8
MAGIC Architecture
  • Separation and specialization
  • Use hardwired data movement logic for speed
  • Use control logic that runs software protocols
    for flexibility

9
MAGIC Architecture
  • Must operate quickly enough to avoid being the
    bottleneck
  • Hardware based speculative message dispatch to PP
  • Separate hardware for message sends

10
Protocol Processor
  • Implements subset of DLX ISA with extensions for
    common protocol ops
  • Statically-scheduled dual-issue superscalar
    processor
  • No interrupts, exceptions, address translation or
    interlocks

11
Performance
  • How do latencies look for local read misses?
  • Derived from Verilog model
  • Assuming PP cache hits

12
Performance
  • PP occupancies are longest sub-operation that
    must be performed
  • Block transfer bandwidth of 300-400MB/s

13
Testbed
  • System-level simulator in C
  • Protocol verifier, SPLASH benchmarks
  • Verilog description of MAGIC
  • N-1 simulated nodes and one Verilog node

14
Retrospective
  • Flexibility of software handlers is key
  • Supporting multiple protocols
  • Scaling to arch to multiple machine sizes
  • Debugging protocol operation
  • Building hardware in an academic environment is
    difficult and time consuming
Write a Comment
User Comments (0)
About PowerShow.com