Computer Science Overview - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Computer Science Overview

Description:

Vacate, adjust to speed, share. Automatic checkpointing. Change ... Restart can be speeded up by spreading out objects from failed processor. Long term project ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 17
Provided by: willia259
Learn more at: http://charm.cs.uiuc.edu
Category:

less

Transcript and Presenter's Notes

Title: Computer Science Overview


1
Computer Science Overview
  • Laxmikant (Sanjay) Kale

2
Computer Science Projects Posters
  • Rocketeer
  • Home-grown visualizer
  • John, Fiedler
  • Rocpanda
  • Parallel I/O
  • Winslett et al
  • Novel Linear System Solvers
  • de Sturler, Heath, Saylor
  • Performance monitoring
  • Campbell, Zheng, Lee
  • Parallel Mesh support
  • FEM Framework
  • Parallel remeshing
  • Parallel Solution transfer
  • Adaptive mesh refinement

3
Computer Science Projects Talks
  • Kale
  • Processor virtualization via migratable objects
  • Jiao
  • Integration Framework
  • Surface propagation
  • Mesh adaptation

4
Migratable Objects and Charm
  • Charm
  • Parallel C
  • Arrays of objects
  • Automatic load balancing
  • Prioritization
  • Mature system
  • Available on all parallel machines we know
  • Rocket Center Collaborations
  • It was clear that Charm wont be adopted by the
    whole application community
  • It was equally clear to us that it was a unique
    technology that will improve programmer
    productivity substantially
  • Led to the development of AMPI
  • Adaptive MPI

5
Processor Virtualization
Benefits
Programmer Over decomposition into virtual
processors Runtime Assigns VPs to
processors Enables adaptive runtime
strategies Implementations Charm, AMPI
  • Software engineering
  • Number of virtual processors can be independently
    controlled
  • Separate VPs for different modules
  • Message driven execution
  • Adaptive overlap of communication
  • Predictability
  • Automatic out-of-core
  • Asynchronous reductions
  • Dynamic mapping
  • Heterogeneous clusters
  • Vacate, adjust to speed, share
  • Automatic checkpointing
  • Change set of processors used
  • Automatic dynamic load balancing
  • Communication optimization
  • Collectives

MPI processes
Virtual Processors (user-level migratable threads)
6
Highly Agile Dynamic load balancing
  • Needed, for example, for handling Advent of
    plasticity around a crack
  • Here a simple example
  • Plasticity in a bar

7
Optimizing all-to-all via Mesh
Organize processors in a 2D (virtual) grid
Phase 1 Each processor sends messages
within its row
Phase 2 Each processor sends messages
within its column
Message from (x1,y1) to (x2,y2) goes via (x1,y2)
  1. messages instead of P-1

8
Optimized All-to-all Surprise
Completion time vs. computation overhead
76 bytes all-to-all on Lemieux
CPU is free during most of the time taken by a
collective operation
Led to the development of Asynchronous
Collectives now supported in AMPI
9
Latency Tolerance Multi-Cluster Jobs
  • Job co-scheduled to run across two clusters to
    provide access to large numbers of processors
  • But cross cluster latencies are large!
  • Virtualization within Charm masks high
    inter-cluster latency by allowing overlap of
    communication with computation

Cluster A
Cluster B
Intra-cluster latency (microseconds)
Inter-cluster latency (milliseconds)
10
Hypothetical Timeline of a Multi-Cluster
Computation
A
B
cross-cluster boundary
C
  • Processors A and B are on one cluster, Processor
    C on a second cluster
  • Communication between clusters via high-latency
    WAN
  • Processor Virtualization allows latency to be
    masked

11
Multi-cluster Experiments
  • Experimental environment
  • Artificial latency environment VMI delay
    device adds a pre-defined latency between
    arbitrary pairs of nodes
  • TeraGrid environment Experiments run between
    NCSA and ANL machines (1.725 ms one-way latency)
  • Experiments
  • Five-point stencil (2D Jacobi) for matrix sizes
    2048x2048 and 8192x8192
  • LeanMD molecular dynamics code running a 30,652
    atom system

12
Five-Point Stencil Results (P64)
13
Fault Tolerance
  • Automatic Checkpointing for AMPI and Charm
  • Migrate objects to disk!
  • Automatic fault detection and restart
  • Now available in distribution version of AMPI and
    Charm
  • New work
  • In-memory checkpointing
  • Scalable fault tolerance
  • Impending Fault Response
  • Migrate objects to other processors
  • Adjust processor-level parallel data structures

14
Scalable Fault Tolerance
  • Motivation
  • When a processor out of 100,000 fails, all
    99,999 shouldnt have to run back to their
    checkpoints!
  • How?
  • Sender-side message logging
  • Latency tolerance mitigates costs
  • Restart can be speeded up by spreading out
    objects from failed processor
  • Long term project
  • Current progress
  • Basic scheme idea implemented and tested in
    simple programs
  • General purpose implementation in progress

Only failed processors objects recover from
checkpoints, while others continue
15
Develop abstractions in context of full-scale
applications
Protein Folding
Quantum Chemistry (QM/MM)
Molecular Dynamics
Computational Cosmology
Parallel Objects, Adaptive Runtime System
Libraries and Tools
Crack Propagation
Space-time meshes
Dendritic Growth
Rocket Simulation
The enabling CS technology of parallel objects
and intelligent Runtime systems has led to
several collaborative applications in CSE
16
Next
  • Jim Jiao
  • Integration Framework
  • Surface propagation
  • Mesh adaptation
Write a Comment
User Comments (0)
About PowerShow.com