Title: The Lattice Project A Grid Computing System
1The Lattice Project - A Grid Computing System
- Michael P. Cummings
- Laboratory of Molecular Evolution
- Center for Bioinformatics and Computational
Biology
2Acknowledgments
- Core Middleware Development
- Adam Bazinet
- Daniel Myers (now MIT)
- John Fuetsch (now Dreamworks Animation)
- Stephen McLellan, Chris Milliron, Deji Akinyemi
- Semantic Web-based Grid Services
- Sung Lee, Fujitsu Laboratories of America
- Nada Hashmi (now CBA, Saudi Arabia)
- David Wang, UMIACS
3Outline
- Grid computing introduction and motivation
- Goals of The Lattice Project
- Basic architecture
- Our current production Grid system
- Implementation details
- Results of usage
- Research and development
4Grid Computing A Definition
- A model of distributed computing that uses
geographically and administratively disparate
resources. In Grid computing, individual users
can access computers and data transparently,
without having to consider location, operating
system, account administration, and other
details. In Grid computing, the details are
abstracted, and the resources are virtualized.
5Why Go Grid?
- Scientific problems are solved faster
- Parallel execution means higher throughput
- Make compute resources a commodity
- Analogous to the electrical power grid
- Foster growth and interaction in the research
community - Use of the Grid spans departments and domains
- Grid resources are typically shared resources
6Grid Computing Advantages
- Provides increased resources for research
- Utilizes resources already purchased
- Space and HVAC needs already met
- Little increased administrative burden
- Economically and environmentally appealing
7Research Projects Using the Grid
- The Laboratory of David Fushman has run
protein-protein docking algorithms on Lattice - CNS is the primary Grid service in this project
- Floyd Reed and Holly Mortensen from the
Laboratory of Sarah Tishkoff have run a number of
population genetics analyses - MDIV and IM are the primary Grid services
- The Laboratory of Molecular Evolution has run
statistical phylogenetic analyses - GSI is the primary Grid service
8Recent Grid Usage
- IM 0.13 CPU years (BOINC)
- MDIV 4.93 CPU years (BOINC)
- CNS 12.4 CPU years (BOINC)
- GSI 94.05 CPU years (Condor)
- Total 111.51 CPU years
- BOINC participants in 21 countries
9Outline
- Grid computing intorduction and motivation
- Goals of The Lattice Project
- Basic architecture
- Our current production Grid system
- Implementation details
- Results of usage
- Research and development
10The Lattice Project Initial Goals
- Develop a Grid system for scientific research
that - Speeds up workflows by Grid-enabling various
programs - Is simple and intuitive
- Takes advantage of heterogeneous resources
- Is capable of managing large numbers of jobs
(thousands) - Supports multiple users and lowers the barriers
to getting involved - Is community-driven and supported
11Principles of Design
- Make use of well supported open source software
- Globus Toolkit
- BOINC
- Condor
- Engineered software should be scalable, modular,
and robust - Expose programs as well-defined services
- Arbitrary user-supplied code cannot be run
12Outline
- Grid computing introduction and motivation
- Goals of The Lattice Project
- Basic architecture
- Our current production Grid system
- Implementation details
- Usage statistics
- Research and development
13Terminology
- Client A Grid user interface OR a machine that
performs computation - Grid Service A Grid-enabled program
- Scheduler Decides where Grid jobs will run
- Resource Executes Grid jobs
14Basic Architecture (1 of 3)
15Basic Architecture (2 of 3)
16Basic Architecture (3 of 3)
17Outline
- Grid computing introduction and motivation
- Goals of The Lattice Project
- Basic architecture
- Our current production Grid system
- Implementation details
- Results of usage
- Research and development
18Software Components
- Globus Toolkit version 3.2.1
- Backbone of the Grid
- http//www.globus.org/
- Condor-G
- Grid-level scheduler / resource broker
- http//www.cs.wisc.edu/condor/
- BOINC Berkeley Open Infrastructure for Network
Computing - SETI_at_home-style desktop grid
- http//boinc.berkeley.edu/
- Custom components
- GSBL, GSG, Globus-BOINC adaptor, MDS-matchmaking
bridge, user interface(s), administrative
scripts, and much more
19Globus Toolkit 3
- Key components
- Globus Core
- Grid service hosting environment
- GSI Grid Security Infrastructure
- Uses public key cryptography
- Secures communication
- Authenticates and authorizes Grid users
- WS GRAM Job management
- GASS Point to point file transfer
- MDS2 Information provider
20Condor-G
- Condor-G is part of the Condor suite
- Resources and jobs send Condor-G descriptions of
themselves called ClassAds - Condor-G matches Grid jobs to suitable resources,
then submits and manages them - This process is called matchmaking
21BOINC
- Most novel feature of our Grid
- Public computing model
- Untrusted resources
- Potentially our largest resource
- We have targeted 3 platforms
- Windows / Linux x86 / Mac OS X
22Our Current Grid System
23User Interface
- The Grid Brick a machine used to submit Grid
jobs - Our primary interface for Grid users
- Command line clients mimic normal program
execution - Lattice Intranet
- Provides instructions for submitting jobs and
managing data input and output - Provides tools for describing and monitoring jobs
- Other possibilities
- Web portal model of job submission
- A client capable of composing complex workflows
using Task Computing and Semantic Web technology
developed by collaborators at Fujitsu
24Basic Architecture Client/Service
25Grid Client Stack
Command-line Interface
Perl
Java
Service-specific templates and stubs are
created by the Grid Service Generator
26Grid Service Stack
Grid Service Hosting Environment, a.k.a. the
container
Java
Service-specific templates and stubs are
created by the Grid Service Generator
27Tools for Writing Grid Services
- Grid Service Base Library (GSBL)
- Java API for building Grid services with the
Globus Toolkit - Shields programmers from having to work with the
Globus API directly - Provides a high-level interface for operations
such as job submission and file transfer - Grid Service Generator (GSG)
- Simplifies the process of creating Grid Services
- Intended for use with GSBL
28Grid Service Generator
- Deploying a Grid service with GT3 is absurdly
complicated - Many files, namespaces lots of potential typos
- GSG takes as input a few parameters (service
name, location, an XML argument description,
etc.) and generates all requisite configuration
files and skeleton Java classes
29Grid Services
30Grid Services
- Creating Grid Services requires
- Knowledge of the application
- Techniques for compiling and porting the
application to various platforms - Knowledge of the infrastructure so it can be
effectively tested and deployed - Challenges
- Maintaining bodies of Grid Service code as the
number of applications grow and new versions of
applications are released - Minimizing the number of updates that need to be
applied when the framework changes
31Basic Architecture - Scheduling
32Condor-G ClassAds
- Resources and jobs send Condor-G descriptions of
themselves called ClassAds - Jobs require certain capabilities of resources
- Resources advertise their capabilities
- Similar to a dating service central broker
points pairs of compatible jobs/resources at each
other
33Condor G ClassAds
34Generating ClassAds
- Job ClassAds are generated by the Condor-G job
manager - Job requirements are specified in the Grid
service configuration files - Resource ClassAds are generated by extracting
information from MDS - Lattice information providers supply data
required for matchmaking
35Monitoring and Discovery System (MDS2)
- Globus information services component
- LDAP based
- Answers questions like
- What resources are available?
- What capabilities do these resources have?
- What is the load on these resources?
- This in turn allows for intelligent decisions to
be made in areas such as scheduling and resource
accounting
36Basic Architecture - Resources
37Current Grid Resources
- http//lattice.umiacs.umd.edu/resources/
- UMIACS Condor pool
- 400 processors
- BOINC pools
- Clients on campus gt 100
- Public (off-campus) clients gt 1000
38BOINC
- Works on the pull model, that is
- One or more servers create workunits
- Clients connect asynchronously, pull down work,
and return the results - Clients are relatively lightweight and easy to
install and manage - One client can process work for multiple projects
- Participants can join teams and are given credit
for the work they complete - http//lattice.umiacs.umd.edu/boinc_public
39Globus-BOINC Adapter
- Consists of a number of components that allow us
to run Grid Services on BOINC - BOINC job manager
- Custom validator and assimilator
- Registers BOINC with Globus as a GRAM-addressable
resource - BOINC compatibility library eases the process of
porting applications to BOINC
40Outline
- Grid computing introduction and motivation
- Goals of The Lattice Project
- Basic architecture
- Our current production Grid system
- Implementation details
- Results of usage
- Research and development
41GT4 Research and Development
- We are currently upgrading the Grid system to use
Globus Toolkit 4.0 - GT4 adheres strictly to emerging and established
Web service standards - Actively developed and supported
- Many components have been greatly improved
- GridFTP/RFT (replace GASS)
- WS GRAM
- MDS4 (XML based replaces MDS2, LDAP based)
- Our basic architecture remains the same, and the
upgrade has been made easier because of tools we
have already developed (GSBL, GSG)
42More Information
- Lattice Website
- http//lattice.umiacs.umd.edu/