Automatic Differentiation: Introduction - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Automatic Differentiation: Introduction

Description:

Sensitivity of flow through Drake passage to bottom topography, using MIT shallow water model ... deals with partitioning a set of binary-related objects into ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 17
Provided by: pauld174
Category:

less

Transcript and Presenter's Notes

Title: Automatic Differentiation: Introduction


1
Automatic Differentiation Introduction
  • Automatic differentiation (AD) is a technology
    for transforming a subprogram that computes some
    function into a subprogram that computes the
    derivatives of that function
  • Derivatives used in optimization, nonlinear
    solvers, sensitivity analysis, uncertainty
    quantification
  • Forward mode of AD is efficient for problems with
    few independent variables or Jacobian-vector
    products
  • Reverse mode of AD is efficient for problems with
    few dependent variables or JTv products
  • Efficiency of generated code depends on
    sophistication of underlying compiler analysis
    and combinatorial algorithms

2
AD Current Capabilities
  • Fortran 77 ADIFOR 2.0/3.0
  • Robust, mature tool with excellent language
    coverage
  • Excellent compiler analysis
  • Efficient forward mode (small number of
    independents)
  • Adequate reverse mode (small number of
    dependents)
  • C/C ADIC 2.0
  • Semi-mature tool with full C language coverage
  • Sophisticated differentiation algorithms
  • Efficient forward mode
  • Fortran 90 OpenAD/F
  • New tool with partial language coverage
  • Sophisticated differentiation algorithms
  • Accurate and novel compiler analysis
  • Innovative templating mechanism
  • Efficient forward and reverse modes

3
AD Application Highlight
Sensitivity of flow through Drake passage to
bottom topography, using MIT shallow water model
4
AD Future Capabilities
  • C/C ADIC 2.x
  • Enhanced support for C (basic templating,
    operator overloading)
  • Fortran 90 OpenAD/F
  • Improved language coverage (user-defined types,
    pointers, etc.)
  • Both tools
  • New differentiation algorithms
  • New checkpointing mechanisms
  • Advanced compiler analysis
  • Efficient forward and reverse modes
  • Integration with CSCAPES coloring algorithms
  • Ease of use through integration with PETSc and
    Zoltan toolkits

5
Load Balancing Introduction
  • Goals
  • Provide software and algorithms for load
    balancing (partitioning) that can easily be used
    by parallel applications.
  • Load balancing distribute work evenly among
    processors while minimizing communication cost.
    Reduces parallel run time.
  • Static load balancing (often called
    partitioning)
  • Application computation and communication
    patterns do not change
  • Partition and distribute data once
  • Dynamic load balancing
  • In dynamic or adaptive applications, computation
    and communication change over time.
  • Load balancing should be invoked at certain
    intervals.
  • Try to reduce data migration (application data to
    move)

6
Load Balancing Current Capabilities
  • Zoltan Software toolkit for parallel data
    management and load balancing
  • Available at http//www.cs.sandia.gov/Zoltan
  • Collection of many load-balancing methods
  • Geometric RCB, space filling curves
  • Graph and hypergraph partitioning
  • Data-structure neutral interface
  • Call-back functions
  • Single, common interface for many methods
  • Allows applications to plug and play
  • Portable, parallel code (MPI)
  • Used in many DOE and Sandia applications
  • Can run on thousands of processors

7
Load Balancing Applications
  • Large variety of applications, requirements, data
    structures.

8
Load Balancing Future Capabilities
  • Scalable hypergraph partitioning
  • Hypergraphs accurately model communication volume
  • We aim to improve scalability to thousands of
    processors
  • 2d matrix partitioning
  • Reduce communication compared to standard 1d
    distribution
  • Multiconstraint partitioning
  • Multi-physics simulation
  • Complex objectives partitioning
  • E.g., simultaneously balance computation and
    memory
  • Parallel sparse matrix ordering (nested
    dissection)

9
Reordering Transformations Introduction
  • Irregular memory access patterns make performance
    sensitive to data and iteration orders
  • Run-time reordering transformations schedule data
    accesses and iterations to maximize performance
  • Preliminary work on reordering heuristics shows
    that hypergraph models outperform graph models
  • Full sparse tiling new inspector/executor
    strategy that exploits inter-iteration locality

10
RT Current Capabilities
  • Open source package implementing several data and
    iteration reordering heuristics
    Data_N_Comp_Reorder
  • Data reordering heuristics
  • Breadth first search (graph-based)
  • Consecutive packing
  • Partitioning (graph-based)
  • Breadth first search (hypergraph-based)
  • Consecutive packing (hypergraph-based)
  • Partitioning (hypergraph-based)
  • Iteration reordering heuristics
  • Breadth first search (hypergraph-based)
  • Lexicographical sorting and various
    approximations
  • Consecutive packing (hypergraph-based)
  • Partitioning (hypergraph-based)
  • Full sparse tiling implementation for model
    problems

11
RT Application Highlight
  • Reordering for a mesh-quality improvement code
    (FeasNewt T. Munson)
  • Hypergraph-BFS data reordering coupled with Cpack
    iteration reordering offers best performance
  • Reordering leads to performance within 90 of
    memory bandwidth limit for sparse matvec

12
RT Future Capabilities
  • New hypergraph-based runtime reordering
    transformations
  • Comparison between hypergraph-based and bipartite
    graph-based runtime reordering transformations
  • Hypergraph partitioners for load balancing
    modified to work well for reordering
    transformations
  • Hierarchical full sparse tiling for hierarchical
    parallel systems

13
Graph Coloring and Matching Introduction
  • Graph coloring deals with partitioning a set of
    binary-related objects into few groups of
    independent objects
  • Sparsity exploitation in computation of Jacobians
    and Hessians leads to a variety of graph coloring
    problems. Sources of problem variations
  • Unsymmetric vs symmetric matrix
  • Direct vs substitution method
  • Uni- vs bi-directional partitioning
  • Matching deals with finding a large set of
    independent edges in a graph
  • Variant matching problems occur in
    load-balancing, process scheduling, linear
    solvers, preconditioners, etc.
  • Orthogonal sources of variation in matching
    problems
  • Bipartite vs general graphs
  • Cardinality vs weighted problems

14
GCM Current Capabilities
  • Coloring
  • Serial
  • Developed novel (greedy) algorithms for
    distance-1, distance-2, star and acyclic coloring
    problems. A package implementing these algorithms
    and corresponding variant ordering routines
    available.
  • Parallel
  • Developed a scheme for parallelizing greedy
    coloring algorithms on distributed-memory
    computers. MPI implementations of distance-1 and
    distance-2 coloring made available via Zoltan.
  • Matching
  • Algorithms that compute optimal solutions for
    matching problems are polynomial in time, but
    slow and difficult to parallelize.
  • High quality approximate solutions can be
    computed in (near) linear time. Approximation
    techniques make parallelization easier.
  • Developed fast approximation algorithms for
    several matching problems.
  • Efficient implementations of exact matching
    algorithms available.

15
GCM Application Highlights
  • Coloring
  • Automatic differentiation (sparse Jacobians and
    Hessians)
  • Parallel computation (discovery of concurrency,
    data migration)
  • Frequency allocation
  • Register allocation in compilers, etc
  • Matching
  • Numerical preprocessing in sparse linear systems
  • permute a matrix such that its diagonal or block
    diagonal are heavy.
  • Block triangular decomposition in sparse linear
    systems
  • decompose a system of equations into smaller sets
    of systems.
  • Graph partitioning
  • guide the coarsening phase of multilevel graph
    partitioning methods.

16
GCM Future Capabilities
  • Develop and implement star and acyclic bicoloring
    algorithms for Jacobian computation
  • Develop parallel algorithms that scale to
    thousands of processors for the various coloring
    problems (distance-1, distance-2, star, acyclic)
  • Integrate coloring software with automatic
    differentiation tools
  • Develop petascale parallel matching algorithms
    based on approximation techniques
Write a Comment
User Comments (0)
About PowerShow.com