Automatic Differentiation: Introduction - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Automatic Differentiation: Introduction

Description:

... is a technology for transforming a subprogram that computes some function into a subprogram that computes the derivatives ... vector products Reverse ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 17
Provided by: PaulD211
Category:

less

Transcript and Presenter's Notes

Title: Automatic Differentiation: Introduction


1
Automatic Differentiation Introduction
  • Automatic differentiation (AD) is a technology
    for transforming a subprogram that computes some
    function into a subprogram that computes the
    derivatives of that function
  • Derivatives used in optimization, nonlinear
    solvers, sensitivity analysis, uncertainty
    quantification
  • Forward mode of AD is efficient for problems with
    few independent variables or Jacobian-vector
    products
  • Reverse mode of AD is efficient for problems with
    few dependent variables or JTv products
  • Efficiency of generated code depends on
    sophistication of underlying compiler analysis
    and combinatorial algorithms

2
AD Current Capabilities
  • Fortran 77 ADIFOR 2.0/3.0
  • Robust, mature tool with excellent language
    coverage
  • Excellent compiler analysis
  • Efficient forward mode (small number of
    independents)
  • Adequate reverse mode (small number of
    dependents)
  • C/C ADIC 2.0
  • Semi-mature tool with full C language coverage
  • Sophisticated differentiation algorithms
  • Efficient forward mode
  • Fortran 90 OpenAD/F
  • New tool with partial language coverage
  • Sophisticated differentiation algorithms
  • Accurate and novel compiler analysis
  • Innovative templating mechanism
  • Efficient forward and reverse modes

3
AD Application Highlight
Sensitivity of flow through Drake passage to
bottom topography, using MIT shallow water model
Runtime (ms) Ratio Memory
Simulation alone 220 1.0
Basic adjoint 14337 61.6 6.87M
Improved checkpointing 14120 60.6 21.44M
Add compiler analysis 2151 9.4 3.17M
Finite differences 23 days 14,400
4
AD Future Capabilities
  • C/C ADIC 2.x
  • Enhanced support for C (basic templating,
    operator overloading)
  • Fortran 90 OpenAD/F
  • Improved language coverage (user-defined types,
    pointers, etc.)
  • Both tools
  • New differentiation algorithms
  • New checkpointing mechanisms
  • Advanced compiler analysis
  • Efficient forward and reverse modes
  • Integration with CSCAPES coloring algorithms
  • Ease of use through integration with PETSc and
    Zoltan toolkits

5
Load Balancing Introduction
  • Goals
  • Provide software and algorithms for load
    balancing (partitioning) that can easily be used
    by parallel applications.
  • Load balancing distribute work evenly among
    processors while minimizing communication cost.
    Reduces parallel run time.
  • Static load balancing (often called
    partitioning)
  • Application computation and communication
    patterns do not change
  • Partition and distribute data once
  • Dynamic load balancing
  • In dynamic or adaptive applications, computation
    and communication change over time.
  • Load balancing should be invoked at certain
    intervals.
  • Try to reduce data migration (application data to
    move)

6
Load Balancing Current Capabilities
  • Zoltan Software toolkit for parallel data
    management and load balancing
  • Available at http//www.cs.sandia.gov/Zoltan
  • Collection of many load-balancing methods
  • Geometric RCB, space filling curves
  • Graph and hypergraph partitioning
  • Data-structure neutral interface
  • Call-back functions
  • Single, common interface for many methods
  • Allows applications to plug and play
  • Portable, parallel code (MPI)
  • Used in many DOE and Sandia applications
  • Can run on thousands of processors

7
Load Balancing Applications
  • Large variety of applications, requirements, data
    structures.

8
Load Balancing Future Capabilities
  • Scalable hypergraph partitioning
  • Hypergraphs accurately model communication volume
  • We aim to improve scalability to thousands of
    processors
  • 2d matrix partitioning
  • Reduce communication compared to standard 1d
    distribution
  • Multiconstraint partitioning
  • Multi-physics simulation
  • Complex objectives partitioning
  • E.g., simultaneously balance computation and
    memory
  • Parallel sparse matrix ordering (nested
    dissection)

9
Reordering Transformations Introduction
  • Irregular memory access patterns make performance
    sensitive to data and iteration orders
  • Run-time reordering transformations schedule data
    accesses and iterations to maximize performance
  • Preliminary work on reordering heuristics shows
    that hypergraph models outperform graph models
  • Full sparse tiling new inspector/executor
    strategy that exploits inter-iteration locality

10
RT Current Capabilities
  • Open source package implementing several data and
    iteration reordering heuristics
    Data_N_Comp_Reorder
  • Data reordering heuristics
  • Breadth first search (graph-based)
  • Consecutive packing
  • Partitioning (graph-based)
  • Breadth first search (hypergraph-based)
  • Consecutive packing (hypergraph-based)
  • Partitioning (hypergraph-based)
  • Iteration reordering heuristics
  • Breadth first search (hypergraph-based)
  • Lexicographical sorting and various
    approximations
  • Consecutive packing (hypergraph-based)
  • Partitioning (hypergraph-based)
  • Full sparse tiling implementation for model
    problems

11
RT Application Highlight
  • Reordering for a mesh-quality improvement code
    (FeasNewt T. Munson)
  • Hypergraph-BFS data reordering coupled with Cpack
    iteration reordering offers best performance
  • Reordering leads to performance within 90 of
    memory bandwidth limit for sparse matvec

12
RT Future Capabilities
  • New hypergraph-based runtime reordering
    transformations
  • Comparison between hypergraph-based and bipartite
    graph-based runtime reordering transformations
  • Hypergraph partitioners for load balancing
    modified to work well for reordering
    transformations
  • Hierarchical full sparse tiling for hierarchical
    parallel systems

13
Graph Coloring and Matching Introduction
  • Graph coloring deals with partitioning a set of
    binary-related objects into few groups of
    independent objects
  • Sparsity exploitation in computation of Jacobians
    and Hessians leads to a variety of graph coloring
    problems. Sources of problem variations
  • Unsymmetric vs symmetric matrix
  • Direct vs substitution method
  • Uni- vs bi-directional partitioning

1d partition 2d partition
Jacobian Distance-2 coloring Star bicoloring Direct
Hessian Star coloring NA Direct
Jacobian NA Acyclic bicoloring Subst
Hessian Acyclic coloring NA Subst
  • Matching deals with finding a large set of
    independent edges in a graph
  • Variant matching problems occur in
    load-balancing, process scheduling, linear
    solvers, preconditioners, etc.
  • Orthogonal sources of variation in matching
    problems
  • Bipartite vs general graphs
  • Cardinality vs weighted problems

14
GCM Current Capabilities
  • Coloring
  • Serial
  • Developed novel (greedy) algorithms for
    distance-1, distance-2, star and acyclic coloring
    problems. A package implementing these algorithms
    and corresponding variant ordering routines
    available.
  • Parallel
  • Developed a scheme for parallelizing greedy
    coloring algorithms on distributed-memory
    computers. MPI implementations of distance-1 and
    distance-2 coloring made available via Zoltan.
  • Matching
  • Algorithms that compute optimal solutions for
    matching problems are polynomial in time, but
    slow and difficult to parallelize.
  • High quality approximate solutions can be
    computed in (near) linear time. Approximation
    techniques make parallelization easier.
  • Developed fast approximation algorithms for
    several matching problems.
  • Efficient implementations of exact matching
    algorithms available.

15
GCM Application Highlights
  • Coloring
  • Automatic differentiation (sparse Jacobians and
    Hessians)
  • Parallel computation (discovery of concurrency,
    data migration)
  • Frequency allocation
  • Register allocation in compilers, etc
  • Matching
  • Numerical preprocessing in sparse linear systems
  • permute a matrix such that its diagonal or block
    diagonal are heavy.
  • Block triangular decomposition in sparse linear
    systems
  • decompose a system of equations into smaller sets
    of systems.
  • Graph partitioning
  • guide the coarsening phase of multilevel graph
    partitioning methods.

16
GCM Future Capabilities
  • Develop and implement star and acyclic bicoloring
    algorithms for Jacobian computation
  • Develop parallel algorithms that scale to
    thousands of processors for the various coloring
    problems (distance-1, distance-2, star, acyclic)
  • Integrate coloring software with automatic
    differentiation tools
  • Develop petascale parallel matching algorithms
    based on approximation techniques
Write a Comment
User Comments (0)
About PowerShow.com