Summary - PowerPoint PPT Presentation

About This Presentation
Title:

Summary

Description:

Title: The IC Wall Collaboration between Computer science + Physics Author: rob Last modified by: rob Document presentation format: Custom Other titles – PowerPoint PPT presentation

Number of Views:14
Avg rating:3.0/5.0
Slides: 7
Provided by: Rob244
Category:

less

Transcript and Presenter's Notes

Title: Summary


1
Summary
  • Background
  • Why do we need parallel processing? Moores law.
    Applications.
  • Introduction in algorithms and applications
  • Methodology to develop efficient parallel
    (distributed-memory) algorithms 
  • Understand various forms of overhead
    (communication, load imbalance, search overhead,
    synchronization)
  • Understand various distributions (blockwise,
    cyclic)
  • Understand various load balancing strategies
    (static, dynamic master/worker model)
  • Understand correctness problems (e.g. message
    ordering)

2
Summary
  • Parallel machines and architectures
  • Processor organizations, topologies, criteria
  • Types of parallel machines
  • arrays/vectors, shared-memory, distributed memory
  • Routing
  • Flynns taxonomy
  • What are cluster computers?
  • What networks do real machines (like the Blue
    Gene) use?
  • Speedup, efficiency ( their implications),
    Amdahls law

3
Summary
  • Programming methods, languages, and environments
  • Different forms of message passing
  • naming, explicit/implicit receive,
    synchronous/asynchronous sending
  • Select statement
  • SR primitives (not syntax)
  • MPI message passing primitives, collective
    communication
  •  
  • Java parallel programming model and primitives
  • HPF problems with automatic parallelization
    division of work between programmer and HPF
    compiler alignment/distribution primitives
    performance implications

4
Summary
  • Applications
  • N-body problems
  • load balancing and communication (locality)
    optimizations, costzones, performance comparison
  • Search algorithm (TDS)
  • use asynchronous communication clever
    (transposition-driven) scheduling

5
Summary
  • Many different types of many-core hardware
  • Understand how to analyze it
  • Hardware performance metrics theoretical peak
    performance memory bandwidth power, flops/W
  • Performance analysis Operational intensity,
    arithmetic intensity, Roofline
  • Understand basics of GPU architectures
  • Hierarchical
  • Computational PCI board -gt chips -gt SMs -gt cores
    -gt threads
  • Memories host -gt device -gt shared -gt registers
  • Hardware multi-threading, SIMT model

6
Summary
  • Many-core Programming techniques
  • Vectorization
  • DMA and overlapping communication and computation
  • Coalescing
  • How to exploit fast local memories
  • LS on Cell, shared memory on GPUs
  • Atomic instructions
  • Software telescopes
  • Correlator
  • Tiling
  • How to compare implementations on different
    hardware
Write a Comment
User Comments (0)
About PowerShow.com