The Parallel Computing Laboratory - PowerPoint PPT Presentation

About This Presentation
Title:

The Parallel Computing Laboratory

Description:

Only buy if computer wears out. 4. Personal Health. Image Retrieval. Hearing, Music. Speech ... Music/Hearing. Robust Speech Input. Personal Health. Parallel ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 13
Provided by: dbCsBe
Learn more at: https://dsf.berkeley.edu
Category:

less

Transcript and Presenter's Notes

Title: The Parallel Computing Laboratory


1
The Parallel Computing Laboratory
  • Krste Asanovic, Ras Bodik, Jim Demmel, Tony
    Keaveny, Kurt Keutzer, John Kubiatowicz, Edward
    Lee, Nelson Morgan, George Necula, Dave
    Patterson, Koushik Sen, John Wawrzynek, David
    Wessel, and Kathy Yelick

March 17, 2008
2
A Parallel Revolution, Ready or Not
  • Embedded per product ASIC to programmable
    platforms ? Multicore chip most competitive path
  • Amortize design costs Reduce design risk
    Flexible platforms
  • PC, Server Power Wall Memory Wall Brick Wall
  • End of way built microprocessors for last 40
    years
  • New Moores Law is 2X processors (cores) per
    chip every technology generation, but same clock
    rate
  • This shift toward increasing parallelism is not
    a triumphant stride forward based on
    breakthroughs instead, this is actually a
    retreat from even greater challenges that thwart
    efficient silicon implementation of traditional
    solutions.
  • The Parallel Computing Landscape A Berkeley
    View, Dec 2006
  • Sea change for HW SW industries since changing
    the model of programming and debugging

3
P.S. Parallel Revolution May Fail
  • John Hennessy, President, Stanford University,
    1/07when we start talking about parallelism
    and ease of use of truly parallel computers,
    we're talking about a problem that's as hard as
    any that computer science has faced. I would
    be panicked if I were in industry.
  • A Conversation with Hennessy Patterson, ACM
    Queue Magazine, 410, 1/07.
  • 100 failure rate of Parallel Computer Companies
  • Convex, Encore, Inmos (Transputer), MasPar,
    NCUBE, Kendall Square Research, Sequent,
    (Silicon Graphics), Thinking Machines,
  • What if IT goes from a growth industry to
    areplacement industry?
  • If SW cant effectively use 32, 64, ... cores
    per chip ? SW no faster on new computer ? Only
    buy if computer wears out

4
Par Lab Research Overview
Easy to write correct programs that run
efficiently on manycore
Personal Health
Image Retrieval
Hearing, Music
Speech
Parallel Browser
Applications
Motifs
Composition Coordination Language (CCL)
Static Verification
CCL Compiler/Interpreter
Productivity Layer
Parallel Libraries
Parallel Frameworks
Type Systems
Diagnosing Power/Performance
Correctness
Efficiency Languages
Directed Testing
Sketching
Efficiency Layer
Autotuners
Dynamic Checking
Legacy Code
Schedulers
Communication Synch. Primitives
Efficiency Language Compilers
Debugging with Replay
Legacy OS
OS Libraries Services
OS
Hypervisor
Multicore/GPGPU
RAMP Manycore
Arch.
5
Compelling Client Applications
Music/Hearing
Robust Speech Input
Parallel Browser
Personal Health
6
Motif" Popularity (Red Hot ? Blue Cool)
  • How do compelling apps relate to 13 motifs?

7
Developing Parallel Software
  • 2 types of programmers ? 2 layers
  • Efficiency Layer (10 of todays programmers)
  • Expert programmers build Frameworks Libraries,
    Hypervisors,
  • Bare metal efficiency possible at Efficiency
    Layer
  • Productivity Layer (90 of todays programmers)
  • Domain experts / Naïve programmers productively
    build parallel apps using frameworks libraries
  • Frameworks libraries composed to form app
    frameworks
  • Effective composition techniques allows the
    efficiency programmers to be highly leveraged ?
    Create language for Composition and Coordination
    (CC)

8
ParLab OS Research
Device Driver
Logical System View
Root Partition Manager
Suspended
Root Partition implements policy to timeshare
partitions - Suspended partition (passive data
structure in memory) is not mapped by Root onto
physical cores.
Physical System View
9
InfiniCore Architecture Overview
  • Four separate on-chip network types
  • Control networks combine 1-bit signals in
    combinational tree for interrupts barriers
  • Active message networks carry register-register
    messages between cores
  • L2/Coherence network connects L1 caches to L2
    slices and indirectly to memory
  • Memory network connects L2 slices to memory
    controllers
  • I/O and accelerators potentially attach to all
    network types.
  • Flash replaces rotating disks.
  • Only high-speed I/O is network display.

10
1008 Core RAMP Blue
  • 1008 12 32-bit RISC cores / FPGA, 4
    FGPAs/board, 21 boards
  • Simple MicroBlaze soft cores _at_ 90 MHz
  • Full star-connection between modules
  • NASA Advanced Supercomputing (NAS)
    Parallel Benchmarks (all class S)
  • UPC versions (C plus shared-memory abstraction)
    CG, EP, IS, MG
  • RAMPants creating HW SW for many- core
    community using next gen FPGAs
  • Chuck Thacker Microsoft designing next boards
  • 3rd party to manufacture and sell boards 1H08
  • Gateware, Software BSD open source
  • RAMP Gold for Par Lab new CPU

11
Physical Par Lab - 5th Floor Soda
12
ParLab Summary
Easy to write correct programs that run
efficiently and scale up on manycore
  • Whole IT industry has bet its future on
    parallelism (!)
  • Try Apps-Driven vs. CS Solution-Driven Research
  • Motifs as anti-benchmarks
  • Efficiency layer for 10 todays programmers
  • Productivity layer for 90 todays programmers
  • CC language to help compose and coordinate
  • Autotuners vs. Compilers
  • OS HW Primitives
  • Diagnose Power/Perf.
  • March 19 announcement UPCRC winner from top 25
    CS departments

Personal Health
Image Retrieval
Hearing, Music
Speech
Parallel Browser
Apps
Motifs
Composition Coordination Language (CCL)
Static Verification
CCL Compiler/Interpreter
Productivity
Parallel Libraries
Parallel Frameworks
Type Systems
Diagnosing Power/Performance Bottlenecks
Correctness
Efficiency Languages
Sketching
Directed Testing
Efficiency
Autotuners
Legacy Code
Schedulers
Communication Synch. Primitives
Dynamic Checking
Efficiency Language Compilers
Debugging with Replay
OS
Legacy OS
OS Libraries Services
Hypervisor
Arch.
Multicore/GPGPU
RAMP Manycore
Write a Comment
User Comments (0)
About PowerShow.com