Parallel Maximum Likelihood Fitting Using MPI - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Parallel Maximum Likelihood Fitting Using MPI

Description:

Communications interface to Fortran, C or C (maybe others) ... One iteration in MIGRAD. Most of the time is spent in computing the gradient: ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 16
Provided by: Phys156
Category:

less

Transcript and Presenter's Notes

Title: Parallel Maximum Likelihood Fitting Using MPI


1
Parallel Maximum Likelihood Fitting Using MPI
  • Brian Meadows, U. Cincinnati
  • and
  • David Aston, SLAC

2
What is MPI ?
  • Message Passing Interface - a standard defined
    for passing messages between processors (CPUs)
  • Communications interface to Fortran, C or C
    (maybe others)
  • Definitions apply across different platforms (can
    mix Unix, Mac, etc.)
  • Parallelization of code is explicit - recognized
    and defined by users
  • Memory can be
  • Shared between CPUs
  • Distributed among CPUs OR
  • A hybrid of these
  • Number of CPUs allowed is not pre-defined, but
    is fixed in any one application
  • The required number of CPUs is defined by the
    user at job startup and does not undergo runtime
    optimization.

3
How Efficient is MPI ?
  • The best you can do is speed up a job by a factor
    equal to the number of physical CPUs involved.
  • Factors limiting this
  • Poor synchronization between CPUs due to
    unbalanced loads
  • Sections of code that cannot be vectorized
  • Signalling delays.
  • NOTE it is possible to request more CPUs than
    physically exist
  • This will produce some overhead in processing,
    though !

4
Running MPI
  • Run the program with
  • mpirun ltjobgt -np N
  • which submits N identical jobs to the system
  • (You can also specify IP addresses for
    distributed CPUs)
  • The OS in each machine allocates physical CPUs
    dynamically as usual.
  • Each job
  • is given an ID (0 ?N-1) which it can access
  • needs to be in an identical environment to the
    others
  • Users can use this ID to label a main job (JOB0
    for example) and the remaining satellite jobs.

5
Fitting with MPI
  • For a fit, each job should be structured to be
    able to run the parts it is required to do
  • Any set up (read in events, etc.)
  • The parts that are vectorized (e.g. its group of
    events or parameters).
  • One job needs to be identified as the main one
    JOB0 and must do everything, farming out groups
    of events or parameters to the others.
  • Each satellite job must send results (signals)
    back to JOB0 when done with its group and await
    return signal from JOB0 when it must start
    again.

6
How MPI Runs
Scatter-Gather running
CPU 0
CPU 0
CPU 0
CPU 0
m p i r u n
CPU 1
CPU 1
Wait
Wait
CPU 2
CPU 2
CPU
CPU
Scatter
Gather
Start
7
Ways to Implement MPI in Maximum Likelihood
Fitting
  • Two main alternatives
  • Vectorize FCN - evaluates f(x) -2S ln W
  • Vectorize MINUIT (which finds the best
    parameters)
  • Alternative A has been used in previous Babar
    analyses
  • E.g. Mixing analysis of D0 ? Kp-
  • Alternative B is reported here (done by DYAEB and
    tested by BTM)
  • An advantage of B over A is that the
    vectorization is implemented outside a users
    code.
  • Vectorizing FCN may not be efficient if an
    integral is computed on each call
  • Unless the integral evaluation is also vectorized.

8
Vectorize FCN
  • Log-likelihood always includes a sum
  • where n number of events or bins.
  • Vectorize computation of sum - 2 steps
    (Scatter-Gather)
  • Scatter Divide up events (or bins) among the
    CPUs. Each CPU computes
  • Gather Re-combine the N CPUs

9
Vectorize FCN
  • Computation of the integral
  • also needs to be vectorized
  • This is usually a sum (over bins) so can be done
    in a similar way.
  • Main advantage of this method
  • Assuming function evaluation dominates CPU
    cycles, your gain coefficient is close to 1.0
    independent of number of CPUs or pars.
  • Main dis-advantage
  • It requires that the user code each application
    appropriately.

10
Vectorize MINUIT
  • Several algorithms in MINUIT
  • MIGRAD (Variable metric algorithm)
  • Finds local minimum and error matrix at that
    point
  • SIMPLEX (Nelder-Mead method)
  • Linear programming method
  • SEEK (MC method)
  • Random search virtually obsolete
  • Most often used is MIGRAD so focus on that
  • Is easily vectorized, but results may not be at
    highest efficiency

11
One iteration in MIGRAD
  • Compute function and gradient at current position
  • Use current curvature metric to
    compute step
  • Take (large) step
  • Compute function and gradient there then (cubic)
    interpolate back to local minimum (may need to
    iterate)
  • If satisfactory, improve Curvature metric

12
One iteration in MIGRAD
  • Most of the time is spent in computing the
    gradient
  • Numerical evaluation of gradient requires 2 FCN
    calls per parameter
  • Vectorize this computation in two steps
    (Scatter-Gather)
  • Scatter Divide up parameters (xi) among the
    CPUs. Each CPU computes
  • Gather Re-combine the N CPUs.

13
Vectorize MIGRAD
  • This is less efficient the smaller the number of
    parameters
  • Works well if NPAR comparable to the number of
    CPUs.
  • Gain NCPU(NPAR 2) /
    (NPAR 2NCPU)
  • Max. Gain NCPU

For 105 parameters a factor 3.7 was gained with 4
CPUs.
14
Initialization of MPI
  • Program FIT_Kpipi
  • C
  • C- Maximum likelihood fit of D -gt Kpipi Dalitz
    plot.
  • C
  • Implicit none
  • Save
  • external fcn
  • include 'mpif.h'
  • MPIerr 0
  • MPIrank 0
  • MPIprocs 1
  • MPIflag 1
  • call MPI_INIT(MPIerr)

    ! Initialize MPI
  • call MPI_COMM_RANK(MPI_COMM_WORLD, MPIrank,
    MPIerr) ! Get number of CPUs
  • call MPI_COMM_SIZE(MPI_COMM_WORLD,
    MPIprocs, MPIerr) ! Which one am I ?
  • call MINUIT, etc.

15
Use of Scatter-Gather Mechanismin MNDERI
(Fortran)
  • C Distribute the parameters from proc 0 to
    everyone
  • 33 call MPI_BCAST(X, NPAR1,
    MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, MPIerr)
  • C Use scatter-gather mechanism to compute subset
    of derivatives in each process
  • nperproc (NPAR-1)/MPIprocs 1
  • iproc1 1nperprocMPIrank
  • iproc2 MIN(NPAR,iproc1nperproc-1)
  • call MPI_SCATTER(GRD, nperproc,
    MPI_DOUBLE_PRECISION,
  • A GRD(iproc1), nperproc,
    MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, MPIerr)
  • C
  • C Loop over variable parameters
  • DO 60 iiproc1,iproc2
  • compute G(I)
  • End Do
  • C
  • C Wait until everyone is done
  • call MPI_GATHER(GRD(iproc1), nperproc,
    MPI_DOUBLE_PRECISION,
  • A GRD, nperproc,
    MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, MPIerr)
Write a Comment
User Comments (0)
About PowerShow.com