Title: zoltan
1Robert Heaphy
2Zoltan Dynamic Load Balancing and Parallel Data
Services
- Erik Boman, Karen Devine, Robert Heaphy, Bruce
Hendrickson, William Mitchell (NIST), Robert
Preis (University of Paderborn), Courtenay
Vaughan - Sandia National Laboratories
- Albuquerque, NM 87185
3The Zoltan Toolkit
- Parallel, dynamic, adaptive computations need
many services to obtain peak performance. - Processor work loads change during computation.
- Communication patterns are complicated.
- Memory usage is dynamic.
- Application developers wrote their own solutions.
- Little expertise in such parallel algorithms.
- No capability to compare approaches.
- No code reuse.
Zoltan Toolkit of data services for dynamic,
unstructured, adaptive computations
4Support for Many Applications
- Different applications, requirements, data
structures.
5Applications Adaptive Mesh Refinement
- Dynamic load balancing.
- Redistribute elements after mesh refinement.
- Keep data movement costs low.
- Recursive Coordinate Bisection
- Parent and child elements assigned to same
processor. - Inexpensive.
- Incremental.
Using RCB with AMR in SIERRA (Edwards, Rath,
Lober, et al., Sandia)
6Applications Crash Simulations
- Dynamic load balancing.
- Assigns physically close surfaces to the same
processor. - Recursive coordinate bisection Inexpensive
fast incremental.
- Multiphase simulation
- Graph-based decomposition for finite element
calculation. - RCB decomposition for contact detection.
- Unstructured Communication package maps between
decompositions.
Using RCB for Contact Detection in
Pronto (Attaway, Hendrickson, Plimpton, et al.,
Sandia)
7Applications Parallel Circuit Simulation
- Load balance matrix fill phase.
- Load time for devices can vary by two orders of
magnitude. - Problem is a network, not a mesh.
- Load balance solve phase.
- Equal number of rows while minimizing
communication. - Trilinos solver library (Heroux, et al.)
partitions matrix with Zoltan. - Apply graph partitioning to each phase.
Parallel analog circuit simulation in XYCE
(Hutchinson, Hoekstra, et al., Sandia)
8Applications Multiphysics Simulations
- Multiphysics simulations
- Difficult to estimate work in advance.
- Rebalance infrequently want high quality.
- Dynamic load balancing
- Multi-constraint graph partitioning.
- Two balance criteria matrix fill and linear
solve. - Using Zoltan in MPSalsa.
- Load-balancing query functions implemented in
lt200 lines of code. - All communication for migrating data done with
Zoltans data migration tools. - No additional communication routines written by
application developer.
MPSalsa multiphysicssimulations (J. Shadid, A.
Salinger, et al., Sandia)
9Dynamic Load Balancing
- Desirable characteristics for dynamic load
balancing - Distribute work evenly among processors.
- Minimize interprocessor communication.
- Keep data movement costs low.
- Incremental partitioning small changes in
workloads produce only small changes in
decomposition. - Parallel, scalable implementation.
10No One-Size-Fits-All Solutions
- No single partitioner works best for all
applications. - Trade-offs
- Quality vs. speed.
- Geometric locality vs. data dependencies.
- Low data-movement costs vs. tolerance for
remapping. - Application developers may not know which
partitioner is best for application. - Zoltan contains suite of partitioning methods.
- Application changes only one parameter to switch
methods. - Allows experimentation/comparisons to find most
effective partitioner for application. - Advantage of toolkit approach.
11Zoltan Suite of Partitioning Algorithms
Recursive Coordinate Bisection (Berger,
Bokhari) Recursive Inertial Bisection (Taylor,
Nour-Omid)
ParMETIS (Karypis, Schloegel, Kumar) Jostle
(Walshaw)
Space Filling Curves (Peano, Hilbert) Refinement-t
ree Partitioning (Mitchell) Octree Partitioning
(Loy, Flaherty)
12Recursive Coordinate Bisection (RCB)
- Developed by Berger Bokhari, 1987, for AMR.
- Idea
- Divide work into two equal parts using a cutting
plane orthogonal to a coordinate axis. - Recursively cut the resulting subdomains.
13RCB Advantages
- Conceptually simple fast and inexpensive.
- Regular subdomains.
- Can be used for structured or unstructured
applications. - All processors can inexpensively know entire
decomposition. - Effective when connectivity info is not
available. - Implicitly incremental.
14RCB Disadvantages
- No explicit control of communication costs.
- Can generate disconnected subdomains.
- Mediocre partition quality.
15Variations on RCB in Zoltan
- Recursive Inertial Bisection
- Simon, Taylor, et al., 1991
- Cutting planes orthogonal to principle axes of
geometry. - Not incremental.
- Point-Assign and Box-Assign.
- Given a decomposition, determine to which
processor(s) a new item should be added (based on
its geometric location). - Useful in contact detection and multiphase
simulations. - Structured-mesh support.
- Set parameter to generate regular block
subdomains.
16Space-Filling Curve Partitioning (SFC)
- Developed by Peano, 1890.
- Space-Filling Curve
- Mapping from R3 to R1 that completely fills a
domain. - Applied recursively to obtain desired
granularity. - Used for partitioning by
- Warren and Salmon, 1993, gravitational
simulations. - Pilkington and Baden, 1994, smoothed particle
hydrodynamics. - Patra and Oden, 1995, adaptive mesh refinement.
17SFC Algorithm
- Run space-filling curve through domain.
- Order objects according to position on curve.
- Perform 1-D partition of curve.
18SFC Advantages
- Simple, fast, inexpensive.
- Maintains geometric locality of objects in
processors. - Linear ordering of objects may improve cache
performance. - Implicitly incremental.
19SFC Disadvantages
- No explicit control of communication costs.
- Can generate disconnected subdomains.
- Slightly lower quality partitions than RCB.
20Implementations of SFC in Zoltan
- Binned Hilbert SFC
- Heaphy, Edwards, 2001
- Replace linear sort of objects by adaptive
binning strategy. - Improved speed over traditional implementations.
- Box-Assign and Point-Assign supported.
- Refinement-Tree Partitioning
- Mitchell, 1998
- Topology-based rather than geometry-based.
- Uses parent-child relationships in AMR to build
tree. - Octree Partitioning
- Loy, Flaherty, 1998
- Explicitly builds octree data structure.
- Partial tree traversals give (binned) linear
ordering.
21Applications using SFC
- Adaptive hp-refinement finite element methods.
- Assigns physically close elements to same
processor. - Inexpensive incremental fast.
- Linear ordering can be used to order elements
for efficient memory access.
hp-refinement mesh 8 processors. Patra, et al.
(SUNY-Buffalo)
22Graph Partitioning
- Represent problem as a weighted graph.
- Nodes objects to be partitioned.
- Edges communication between objects.
- Weights work load or amount of communication.
- Partition graph so that
- Partitions have equal nodal weight.
- Weight of edges cut by subdomain boundaries is
small.
23Multi-Level Graph Partitioning
- Bui Jones (1993) Hendrickson Leland (1993)
Karypis and Kumar (1995) - Construct smaller approximations to graph.
- Perform graph partitioning on coarse graph.
- Propagate partition back, refining as needed.
24Multi-level Graph Partitioning
- Advantages
- High quality partitions for many applications.
- Explicit control of communication costs.
- Widely used for static partitioning (Chaco,
METIS, Party, Scotch) - Disadvantages
- More expensive than geometric approaches.
- Not incremental.
25Diffusive Graph Partitioning
- Cybenko (1989) Hu Blake (1995)
- Work is moved from heavily loaded processors to
more lightly loaded neighbors.
26Diffusive Graph Partitioning
- Advantages
- Local and parallel.
- Inexpensive.
- Incremental.
- Disadvantages
- Several iterations needed for global balance.
- Partition quality can degrade.
- Hybrid approach may work best.
27Graph Partitioning in Zoltan
- Zoltan provides interfaces to popular parallel
graph partitioning packages. - ParMETIS (U. Minnesota)
- PJostle (U. Greenwich)
- Both ParMETIS and PJostle include
- Multilevel graph partitioning
- Diffusive partitioning
- Hybrids of the two strategies
- Multi-constraint partitioning
- Zoltan interface simple callbacks for neighbor
lists. - Zoltan builds complicated graph data structures
needed by graph-partitioning packages.
28Zoltan Data Services
29Zoltan Data Migration Tools
- Data must be moved for new decomposition.
- Depends strongly on application data structures.
- Complicated communication patterns.
- Zoltan can help!
- Application supplies query functions to
pack/unpack data. - Zoltan does all communication to new processors.
30Zoltan Matrix Ordering Interface
- Produce fill-reducing ordering for sparse matrix
factorization. - Generic matrix-ordering interface in Zoltan.
- Easy to add new ordering algorithms.
- Specific interface to ordering methods in
ParMETIS.
31Zoltan Unstructured Communication Package
- Simple primitives for efficient irregular
communication. - Zoltan_Comm_Create Generates communication plan.
- Processors and amount of data to send and
receive. - Zoltan_Comm_Do Send data using plan.
- Can reuse plan. (Same plan, different data.)
- Zoltan_Comm_Do_Reverse Inverse communication.
- Used for most communication in Zoltan.
32Zoltan Dynamic Memory Package
- Support for debugging dynamic memory usage.
- Tracking of mallocs and frees.
- Memory-leak warnings.
- Source code file and line numbers of operations.
- Simple allocation of multi-dimensional arrays.
- Simple to use.
- Replace calls to malloc, free with ZOLTAN_MALLOC,
ZOLTAN_FREE. - Link with memory package library.
33Zoltan Distributed Data Directory
- Helps applications locate off-processor data.
- Rendezvous algorithm (Pinar, 2001).
- Directory distributed in known way (hashing)
across processors. - Requests for object location sent to processor
storing theobjects directory entry. - Easy to use.
- Functions to create, update, search, destroy.
- Customizable data storage, user distribution.
- Scalable performance.
- Constant communication cost for look-ups.
- Linear total memory usage.
- Avoids communication bottlenecks.
34Zoltan Toolkit Summary
- Data-structure neutral design.
- Application need not use/build prescribed data
structures. - High-quality implementations of many
partitioners. - No single algorithm is appropriate for all
applications. - Suite of algorithms allows experimentation and
comparison. - Data management tools for dynamic applications.
- Data migration, unstructured communication,
memory management. - Uses of Zoltan
- Effective toolkit for many different
applications. - Research test-bed for new algorithm development.
- Interface for new graph-, tree-, or
geometry-based tools.
35Zoltan Interface
- Simple, easy-to-use interface.
- Small number of callable Zoltan functions.
- Callable from C, C, Fortran.
- Data-structure neutral design.
- Supports wide range of applications and data
structures. - Imposes no restrictions on applications data
structures. - Application does not have to build Zoltans data
structures. - Only requirement unique global IDs for objects.
- Application interface
- Zoltan queries the application for needed info.
- IDs of objects, coordinates, relationships to
other objects. - Application provides simple functions to answer
queries. - Small extra costs in memory and function-call
overhead.
36Zoltan Query Functions
- Query mechanism supports
- Geometric algorithms
- Queries for dimensions, coordinates, etc.
- Graph-based algorithms
- Queries for edge lists, edge weights, etc.
- Tree-based algorithms
- Queries for parent/child relationships, etc.
- Once query functions are implemented, application
can access all Zoltan functionality. - Can switch between algorithms by setting
parameters.
37Example Zoltan Application Interface
APPLICATION
Initialize Zoltan (Zoltan_Initialize,
Zoltan_Create)
COMPUTE
Select LB Method (Zoltan_Set_Params)
Re-partition (Zoltan_LB_Partition)
Register query functions (Zoltan_Set_Fn)
Move data (Zoltan_Migrate)
Clean up (Zoltan_Destroy)
38Current Development in Zoltan
- Partitioning for complex objectives.
- Communication and computation.
- Overlapped preconditioners.
- Work with Pinar (LBL).
- Heterogeneous partitioning.
- Model machine as hierarchy of components.
- Partition each level of hierarchy.
- Work with Flaherty (RPI), Teresco (Williams).
- Multiconstraint geometric partitioning.
- Find one partition that is good with respect to
multiple weights. - Graph-based multiconstraint partitioning
available in Zoltan through ParMETIS3.0.
39For More Information...
- Zoltan Home Page
- http//www.cs.sandia.gov/Zoltan
- Users and Developers Guides
- Download Zoltan software under GNU LGPL.
- Email
- zoltan_at_cs.sandia.gov
- kddevin_at_sandia.gov
- rheaphy_at_sandia.gov