Distributed Computing in Biomedicine - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Distributed Computing in Biomedicine

Description:

Only one sequence can be submitted at a time ... Usage. Record. User/Grp/Org. Record. User/ Accounting. Policy. App. Profile. Record ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 21
Provided by: imc88
Category:

less

Transcript and Presenter's Notes

Title: Distributed Computing in Biomedicine


1
Distributed Computing in Biomedicine
  • Arun Krishnan, PhD
  • Francis Tang, PhD
  • BioInformatics Institute, Singapore

2
Agenda
  • Project with NCC
  • SCATTER High Throughput BLAST
  • GridBLAST
  • Grid-enabled high-throughput BLAST
  • inGRD
  • Inter Network Grid Resource Discovery
  • GridX
  • Meta scheduler for the grid
  • Other Projects

3
High-throughput BLAST
4
PROBLEM
  • 10,000 sequences / year increasing to 100,000 /
    year in future
  • Sequence lengths 400-600 bp
  • Current Process
  • Involves submitting sequences one at a time on
    public servers like NCBIs
  • Inherently Limiting
  • Only one sequence can be submitted at a time
  • There is a limit on the number of sequences that
    can be submitted in a week

5
Problem Formulation Contd
  • Requirements
  • High Throughput solution
  • Storage for the databases, query sequences and
    the results
  • Web Interface for submission of jobs
  • Submission of multiple sequences at the same time
    in the form of a file
  • Automatic vector clipping of the sequences
  • Password protected login for the users

6
Solution Architecture
  • Client-server architecture
  • Jobs submitted on the Master
  • Master spawns jobs across the Slave nodes
  • Scalability is nearly linear
  • Web-based access
  • Login with password protection

7
GridBLAST
8
GridBlast
  • Distributed Grid Computing main focus areas
  • Integrated computing resources form Grid.
  • Developing applications to run on the grid
    provides unique challenges
  • Dynamic configurations eg., performance changes,
    hardware failures etc.
  • Data management
  • Execution management
  • Application management

9
GridBLAST Solution Architecture
Queries Executables Databases
Results
COMPUTE/DATA GRID Grid Middleware (GLOBUS)
CLIENT/REMOTE MACHINES
SERVER/LOCAL MACHINE
10
SPMD scheduling for GRIDs
  • Heterogeneous environment communications,
    processing speed, processor count
  • Naïve proportional
  • More sophisticated Minmax
  • A performance model also considering inter-node
    latencies and bandwidth
  • Reduce to a linear optimization problem

11
Performance Results Speedup
12
inGRD Inter Network Grid Resource Discovery
13
Why inGRD?
  • Inconsistency in information that MDS can
    provide. Dependent on Globus GIIS/GRIS
    configuration by Grid Administrators.
  • Does not require further installation of sensors
    on every compute node within a grid node. Makes
    use of readily available resource information
    collected by the job managers.
  • Pre-formatted data on Grid nodes enable faster
    request, collection and processing of large
    amounts of data.

14
inGRD overview
  • inGRD sensors are installed on Grid nodes to
    collect available resource information from their
    compute cluster.
  • inGRD client applications facilitate the
    submission of requests and collection of
    responses from the inGRD enabled Grid nodes.
  • Results are represented as a single XML document.

15
Client Machine
Ingrd Client
Grid.xml
Globus Grid Middleware
External Grid Node
inGRD Executor
inGRD Sensor
Local.xml
Local job manager
External Grid node
16
GridX Meta-scheduler for the Grid
17
GridX Metascheduler for the Grid
  • Metascheduler for scheduling jobs in a grid
    framework
  • Will provide a user-friendly interface for grid
    users to submit jobs
  • Provides Grid resources information by
    interfacing with inGRD
  • Provides basic grid requirements job
    submission, monitoring, cancellation, file
    transfer, etc.
  • Advanced features include accounting, load
    balancing, static and dynamic scheduling
    strategies

18
inGRD NWS Ganglia MDS
User inputs
GridInfoCrawler
AccountManager
User/Grp/Org Record
GridKeeper
Grid Info.
PolicyManager
User/AccountingPolicy
LogRecord
GridInfoMiner
Resource Monitoring Service
Administration Service
GridScheduler
GridBanker
Usage Record
Accounting and Billing Service
Performance Evaluator
Application Profiler
GridBalancer
GridMapper
GridLauncher
GridMonitor
Job Supervisor Service
Scheduling Service
App. Profile Record
Meta-Scheduler
Resource Reservation requests
Job submission Request
Monitoring events
19
Other Projects
  • GridGene Project
  • High-throughput, grid-enabled version of two
    different gene-finding applications, GenScan and
    GeneWise
  • Project with GIS
  • parallelization of mass spectrometry code for
    analysis of proteomics data
  • Project with NCC
  • In-silico cloning of genes

20
Thank you!!
Write a Comment
User Comments (0)
About PowerShow.com