Reconnect - PowerPoint PPT Presentation

About This Presentation

Title:

Reconnect

Description:

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, ... In development: ALPS/BiCePs/BLIS ... – PowerPoint PPT presentation

Number of Views:169

Avg rating:3.0/5.0

Slides: 35

Provided by: willia112

Learn more at: http://archive.dimacs.rutgers.edu

Category:

more less

Transcript and Presenter's Notes

Title: Reconnect

1
Reconnect 04Introduction to PICO

Cynthia Phillips, Sandia National Laboratories
Joint work with
Jonathan Eckstein, Rutgers
William E. Hart, Sandia National Laboratories

2
Parallel Computing Systems

A set of processors (from 2 up to tens of
thousands) working together on a problem
communicating by messages (even if hidden from
user)
Architectures
Grid
Network of workstations (LAN)
Beowulf cluster
Tightly-coupled system

3
Parallelism in Branch and Bound

Two sources of parallelism in BB
Within subproblems
Across subproblems
Warning
Can solve problems otherwise unsolvable but
A constant-factor increase in processors (even
10,000) cannot overcome exponential growth.
We still have to be clever

4
Parallelism Issues for Branch and Bound

In the best of cases, all the processors are busy
all the time doing useful, independent work
Overhead (coordination, exchange of data)
Load balancing
What to do when the tree is small?
Tree shape depends on order of node evaluation
Can lead to slowdown anomalies
Try to emulate a good serial ordering
Wed do a lot better with a single processor
1000x faster

5
Parallel Experimental Algorithmics/Engineering
Issues

Inherent nondeterminism
Parallel random number generators
e.g. for randomized algorithms
Debugging

6
Solution Options for Integer Programming

Commercial codes (ILOGs cplex)
Good and getting better
Expensive
Serial (or modest SMP)
Free serial codes (ABACUS, MINTO, BCP)
Modest-level parallel codes (Symphony)
Grid parallelism (FATCOP)
In development ALPS/BiCePs/BLIS
Massive parallelism PICO (Parallel Integer and
Combinatorial Optimizer)
Note Parallel BB for simple bounding PUBB,
BoB/BOB, PPBB-lib, Mallba, Zram

7
Parallel Integer and Combinatorial Optimizer
(PICO)

Distributed memory (MPI), C
Massively parallel (scalable)
General parallel Branch Bound environment
Portable, flexible
Serial, small LAN, Cplant, ASCI Red, Red Storm
Allows exploitation of problem-specific
knowledge/structure
Open Source release
Always support a free LP solver

8
PICO Features for Efficient Parallel BB

Efficient processor use during ramp-up
Integration of heuristics to generate good
solutions early
Efficient work storage/distribution
Load balancing
Non-preemptive proportional-share thread
scheduler
Flexible hub/worker interaction
Subproblem states with flexible search strategy
Correct termination
Early output

9
What To Do With 9000 Processors and One
Subproblem?

Option 1 Presplitting
Make log P branching choices and expand all ways
(P problems)
P processors
BAD!
Expands many problems that would be fathomed in a
serial solution.

10
PICO MIP Ramp-up

Serialize tree growth
All processors work in parallel on a single node
Parallelize
LP bounding
Preprocessing
Cutting plane generation
Incumbent Heuristics
Pseudocost (gradient) initialization
Work division by processor ID/rank
Crossover to parallel with perfect load balance
When there are enough subproblems to keep the
processors busy
When single subproblems cannot effectively use
parallelism

11
Parallel Incumbent Search

Genetic algorithms
Decomposition-based methods (general)
Pivot, cut, and dive general heuristic
Custom Methods

12
Interior-Point Method for Solving the Root Problem

Mehrotras predictor-corrector (primal-dual)
method
Iterative method where the computational core of
each iteration is the solution of a linear system
with constraint matrix
A is the original LP constraint matrix.
D is a diagonal matrix that changes each
iteration.
Direct Cholesky Solvers OK for moderate
parallelism
Iterative methods
Preconditioning is a big issue
Support theory can help if the matrix has network
structure

13
Resolving LP on Subproblems
Cutting plane (valid inequality or branch
constraint)
Original LP Feasible region
LP optimal solution
Integer optimal

Dual simplex is much faster than starting over
Need parallel dual simplex!

14
Hubs and Workers

Each hub controls some number of workers (can
work itself)
Setting parameters, can go from fully centralized
to fully distributed
Subproblem pools at both the hub and workers
Heap (best-first), stack (depth-first), queue
(breadth-first), custom
Hubs only keep tokens

15
Subproblem Movement

Hub Worker
When worker has low load or low-quality local
pool
Worker Hub
Draw back when hub out of work and cluster
unbalanced
Send new subproblem tokens to hub
(probabilistically) depending on load
Probabilistically scatter tokens to a random hub.
If load in cluster is high relative to others,
scatter probability increases.
Setting parameters, go from pure master-slave to
local
Tradeoffs Communication, Processor utilization,
approximation of serial search order

16
Subproblem Movement/Data Storage
T
Hub
Worker
T
T
T
T
SP
SP
T
SP
SP
SP
SP
SP Server
SP
SP
SP Receiver
SP
SP
SP
SP Server
17
Load Balancing

Hub pullback
Random scattering
Rendezvous
Hubs determine load (function of quantity and
quality)
Use binary tree of hubs
Determine donors and receivers, match them,
exchange

18
Non-Preemptive Scheduler is Sufficient for PICO

Processes are cooperating
Control is returned voluntarily so data
structures left in clean state
No memory access conflicts, no locks
PICO has its own thread scheduler
High priority, short threads are round robin and
done first
Hub communications, incumbent broadcasts, sending
subproblems
If these are delayed by long tasks could lead to
Idle processors
Processors working on low-quality work
Compute threads are proportional share (stride)
scheduling
Adjust during computation (e.g. between lower and
upper-bounding)

19
(No Transcript)
20
Subproblem States
Boundable
Being Bounded
Bounded
Being Separated
Separated
Dead
Handlers lazy, eager, hybrid, build your own
21
Early Output

Problem If you have to abort a long run, want to
know variable settings for the incumbent
May be good enough to stop
Otherwise seed new search with the incumbent
value
PICO will save a new incumbent if
It is a strict improvement over the last saved
value (or is the first)
A sufficient time has passed since the last write
Requires a new message-triggered thread in
parallel
Hub, incumbent holder, I/O processor

22
Serial Class Structure - Inheritance

Branching classes - control search
Branchsub classes - subproblems (tree nodes)
Problem data classes - derived only

PICO Core
AMPL Interface (optional)
Nonlinear BranchPrune
Knapsack
PICO MIP CORE
PICO BC CORE
MIP Application
23
Required Methods for Derived Node Class

bGlobal( ) - subproblem pointer to branching
(search control)
setRootComputation( ) - create the root of the
search tree
boundComputation( ) - compute subproblem lower
bound (for min)
splitComputation( ) - determine how to partition
the subproblem
makeChild( ) - create a child subproblem from a
split parent
candidateSolution( ) - determine whether a
proposed solution is viable candidate for
optimality

24
Optional Customizations