Title: Reconnect
1Reconnect 04Introduction to PICO
- Cynthia Phillips, Sandia National Laboratories
- Joint work with
- Jonathan Eckstein, Rutgers
- William E. Hart, Sandia National Laboratories
2Parallel Computing Systems
- A set of processors (from 2 up to tens of
thousands) working together on a problem - communicating by messages (even if hidden from
user) - Architectures
- Grid
- Network of workstations (LAN)
- Beowulf cluster
- Tightly-coupled system
3Parallelism in Branch and Bound
- Two sources of parallelism in BB
- Within subproblems
- Across subproblems
- Warning
- Can solve problems otherwise unsolvable but
- A constant-factor increase in processors (even
10,000) cannot overcome exponential growth. - We still have to be clever
4Parallelism Issues for Branch and Bound
- In the best of cases, all the processors are busy
all the time doing useful, independent work - Overhead (coordination, exchange of data)
- Load balancing
- What to do when the tree is small?
- Tree shape depends on order of node evaluation
- Can lead to slowdown anomalies
- Try to emulate a good serial ordering
- Wed do a lot better with a single processor
1000x faster
5Parallel Experimental Algorithmics/Engineering
Issues
- Inherent nondeterminism
- Parallel random number generators
- e.g. for randomized algorithms
- Debugging
6Solution Options for Integer Programming
- Commercial codes (ILOGs cplex)
- Good and getting better
- Expensive
- Serial (or modest SMP)
- Free serial codes (ABACUS, MINTO, BCP)
- Modest-level parallel codes (Symphony)
- Grid parallelism (FATCOP)
- In development ALPS/BiCePs/BLIS
- Massive parallelism PICO (Parallel Integer and
Combinatorial Optimizer) - Note Parallel BB for simple bounding PUBB,
BoB/BOB, PPBB-lib, Mallba, Zram
7Parallel Integer and Combinatorial Optimizer
(PICO)
- Distributed memory (MPI), C
- Massively parallel (scalable)
- General parallel Branch Bound environment
- Portable, flexible
- Serial, small LAN, Cplant, ASCI Red, Red Storm
- Allows exploitation of problem-specific
knowledge/structure - Open Source release
- Always support a free LP solver
8PICO Features for Efficient Parallel BB
- Efficient processor use during ramp-up
- Integration of heuristics to generate good
solutions early - Efficient work storage/distribution
- Load balancing
- Non-preemptive proportional-share thread
scheduler - Flexible hub/worker interaction
- Subproblem states with flexible search strategy
- Correct termination
- Early output
9What To Do With 9000 Processors and One
Subproblem?
- Option 1 Presplitting
- Make log P branching choices and expand all ways
(P problems) - P processors
- BAD!
- Expands many problems that would be fathomed in a
serial solution.
10PICO MIP Ramp-up
- Serialize tree growth
- All processors work in parallel on a single node
- Parallelize
- LP bounding
- Preprocessing
- Cutting plane generation
- Incumbent Heuristics
- Pseudocost (gradient) initialization
- Work division by processor ID/rank
- Crossover to parallel with perfect load balance
- When there are enough subproblems to keep the
processors busy - When single subproblems cannot effectively use
parallelism
11Parallel Incumbent Search
- Genetic algorithms
- Decomposition-based methods (general)
- Pivot, cut, and dive general heuristic
- Custom Methods
12Interior-Point Method for Solving the Root Problem
- Mehrotras predictor-corrector (primal-dual)
method - Iterative method where the computational core of
each iteration is the solution of a linear system
with constraint matrix - A is the original LP constraint matrix.
- D is a diagonal matrix that changes each
iteration. - Direct Cholesky Solvers OK for moderate
parallelism - Iterative methods
- Preconditioning is a big issue
- Support theory can help if the matrix has network
structure
13Resolving LP on Subproblems
Cutting plane (valid inequality or branch
constraint)
Original LP Feasible region
LP optimal solution
Integer optimal
- Dual simplex is much faster than starting over
- Need parallel dual simplex!
14Hubs and Workers
- Each hub controls some number of workers (can
work itself) - Setting parameters, can go from fully centralized
to fully distributed - Subproblem pools at both the hub and workers
- Heap (best-first), stack (depth-first), queue
(breadth-first), custom - Hubs only keep tokens
15Subproblem Movement
- Hub Worker
- When worker has low load or low-quality local
pool - Worker Hub
- Draw back when hub out of work and cluster
unbalanced - Send new subproblem tokens to hub
(probabilistically) depending on load - Probabilistically scatter tokens to a random hub.
If load in cluster is high relative to others,
scatter probability increases. - Setting parameters, go from pure master-slave to
local - Tradeoffs Communication, Processor utilization,
approximation of serial search order
16Subproblem Movement/Data Storage
T
Hub
Worker
T
T
T
T
SP
SP
T
SP
SP
SP
SP
SP Server
SP
SP
SP Receiver
SP
SP
SP
SP Server
17Load Balancing
- Hub pullback
- Random scattering
- Rendezvous
- Hubs determine load (function of quantity and
quality) - Use binary tree of hubs
- Determine donors and receivers, match them,
exchange
18Non-Preemptive Scheduler is Sufficient for PICO
- Processes are cooperating
- Control is returned voluntarily so data
structures left in clean state - No memory access conflicts, no locks
- PICO has its own thread scheduler
- High priority, short threads are round robin and
done first - Hub communications, incumbent broadcasts, sending
subproblems - If these are delayed by long tasks could lead to
- Idle processors
- Processors working on low-quality work
- Compute threads are proportional share (stride)
scheduling - Adjust during computation (e.g. between lower and
upper-bounding)
19(No Transcript)
20Subproblem States
Boundable
Being Bounded
Bounded
Being Separated
Separated
Dead
Handlers lazy, eager, hybrid, build your own
21Early Output
- Problem If you have to abort a long run, want to
know variable settings for the incumbent - May be good enough to stop
- Otherwise seed new search with the incumbent
value - PICO will save a new incumbent if
- It is a strict improvement over the last saved
value (or is the first) - A sufficient time has passed since the last write
- Requires a new message-triggered thread in
parallel - Hub, incumbent holder, I/O processor
22Serial Class Structure - Inheritance
- Branching classes - control search
- Branchsub classes - subproblems (tree nodes)
- Problem data classes - derived only
PICO Core
AMPL Interface (optional)
Nonlinear BranchPrune
Knapsack
PICO MIP CORE
PICO BC CORE
MIP Application
23Required Methods for Derived Node Class
- bGlobal( ) - subproblem pointer to branching
(search control) - setRootComputation( ) - create the root of the
search tree - boundComputation( ) - compute subproblem lower
bound (for min) - splitComputation( ) - determine how to partition
the subproblem - makeChild( ) - create a child subproblem from a
split parent - candidateSolution( ) - determine whether a
proposed solution is viable candidate for
optimality
24Optional Customizations
- Incumbent heuristic
- Incumbent representation/update
- Solution output
- Solution validation
- Preprocessing
- Override default parameters
- In MIP
- Custom cutting planes
- Adjust branching priorities
- Plan to add more complex branching strategies
25All PICOs Parallelism Comes (Almost) For Free
PICO serial Core
PICO parallel Core
Serial application
Parallel application
- User must
- Define serial application (debug in serial)
- Describe how to pack/unpack data (using a generic
packing tool) - C inheritance gives parallel management
- User may add threads to
- Share global data
- Exploit problem-specific parallelism
- MIP pseudocosts
26Utilib
- Predates STL
- Abstract data types arrays, heaps, hash tables,
balanced trees - Random number generators
- Hash tables
- work well for doubles very close in value
- Arrays offer
- Protected access (bounds checking)
- Sharing
- PackBuffer methods facilitate parallelization
27Pieces of PICO
- PICO requires
- utilib (for data structures, math, etc)
- COIN (an IBM-sponsored optimization interface
standard) - Base interface to LP solvers
- We add more PICO-specific functionality
- Cut generation library
- An LP solver
- Currently support cplex, soplex, CLP
28Using A Math Programming Language
- How easily can one bring up applications?
- In our world, applications are a moving target
need agility
29A Mathematical Programming Language (AMPL)
- AMPL builds the matrix.
- Nice cross between programming language and LaTeX
(math view)
30AMPL-PICO Interface
Data Files
IP
Exact
Solver PICO
AMPL
LP
Model Files
Compute Approximate Solution
Cutting Planes
- Write cutting-plane and approximate-solution code
using AMPL variables - Mapping transparent
31AMPL-PICO Interface
- Standard AMPL interfaces
- Customized PICO Interface
32Availability
- PICO will be free under GNU lesser public license
- MIP Requires serial LP solver
- Cplex is expensive, but many companies/universitie
s have it - CLP is free (through COIN)
- Part of ACRO (A Common Repository for Optimizers)
- http//software.sandia.gov/Acro/
- Need password for CVS checkout otherwise
tarballs
33Open Problems (Wish List)
- Tools wed like to see
- Parallel matrix generation from a
math-programming interface - Parallel (sparse) dual simplex solver for linear
programming - Open algorithms questions
- Ramp up management multiple subproblems in
parallel
34Development Team
- Core Team
- Jonathan Eckstein (RUTCOR) PICO core
- Bill Hart (Sandia) scheduler, utilib, AMPL
interface, design, etc - Cindy Phillips (Sandia) MIP layer, MIP
applications - Other Developers
- Harvey Greenberg (UCD) preprocessor design
- Vitus Leung (Sandia) preprocessor
- Tod Morrison (UCD, student) soplex interface,
porting - Mikhail Nediak (RUTCOR student, now McMaster)
MIP heuristic - Konrad Borys (RUTCOR student) core
templatization, heuristic integration - Mike Eldred (Sandia) DAKOTA optimization
framework - Ojas Parekh (ex-CMU student, Sandia) soplex
interface - Mario Alleva (Sandia) porting