Parallel Computing using Condor on Windows PCs - PowerPoint PPT Presentation

About This Presentation
Title:

Parallel Computing using Condor on Windows PCs

Description:

Parallel Computing using Condor on Windows PCs. Peng Wang and Corey Shields ... Parallel applications need coordination through message passing ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 13
Provided by: peng45
Category:

less

Transcript and Presenter's Notes

Title: Parallel Computing using Condor on Windows PCs


1
Parallel Computing using Condor on Windows PCs
Peng Wang and Corey Shields Research and Academic
Computing Division University Information
Technology Services Indiana University
2
Problem Description
  • Turn Windows desktop systems in STC labs (around
    2000) into a parallel scientific computer

3
Discussion
  • Parallel applications need coordination through
    message passing
  • MPI does not handle ephemeral processes well
  • Multiplexing communication among processes
  • Ports brokered among multiple parallel sessions

4
What do we have ?
  • Condor NT, vanilla universe
  • match-making, file transfer, fair sharing, job
    submission, suspension, preemption, restart,
    security
  • Test application fastDNAml-p
  • Parallel application, master-worker model,
    small granularity of work

5
How we did it
  • Simple Message Brokering Library (SMBL)
  • Process and Port Manager (PPM)
  • A mechanism for users to submit jobs (web portal)

6
SMBL
  • An IO multiplexing server in charge of message
    delivery for each parallel session (serialize
    communication)
  • SMBL client library implements selected MPI-like
    calls
  • Both the server and the client library are based
    on a TCP socket abstraction library

7
Process and Port Manager
  • Assigns port to each of the SMBL server process
  • start the SMBL server and application processes
    on demand
  • direct workers to their servers

8
The Portal
  • Apache based
  • PHP web interface
  • Creates and submits the condor submit files

9
The Big Picture
The shaded box indicates components hosted on
multiple desktop computers
10
Statistics
Red total owner Blue total idle Green total
Condor
11
Scalability Issues
  • Needed big server
  • Adjusted condor_config
  • MAX_JOBS_RUNNING 1000
  • SHADOW_SIZE_ESTIMATE 900KB
  • MAX_STARTD_LOG 640KB
  • Lost workers because of
  • per-process file descriptor limit (1024)

12
Summary
  • Built a large parallel scientific computing
    facility using Condor
  • Built parallel message passing library to deal
    with ephemeral resources
  • Built port broker to handle multiple parallel
    sessions
  • Built web portal
  • It is open source, visit http//smbl.sourceforge.
    net
Write a Comment
User Comments (0)
About PowerShow.com