Title: Toward a CampusWide Grid Computing System
1Toward a Campus-Wide Grid Computing System
- An Overview of The Lattice Project
- Adam L. Bazinet and Michael P. Cummings
- Laboratory of Molecular Evolution
- Center for Bioinformatics and Computational
Biology
2Outline
- Grid computing motivation
- Goals of The Lattice Project
- Basic architecture
- Our current production Grid system
- Implementation details
- Results of usage
- Demo
- Research and development
- Task Computing with colleagues at Fujitsu
- Creating Grid-enabled workflows
3Grid Computing
- Definition A model of distributed computing that
uses resources that are geographically and
administratively disparate. Individual users can
access computers and data transparently, without
having to consider location, operating system,
account administration, and other details. In
Grid computing the details are abstracted, and
the resources are virtualized.
4Why Go Grid?
- Scientific problems are solved faster
- Parallel execution means higher throughput
- Make compute resources a commodity
- Analogous to the electrical power grid
- Foster growth and interaction in the research
community - Use of the Grid spans departments and domains
- Grid resources are typically shared resources
5Outline
- Grid computing motivation
- Goals of The Lattice Project
- Basic architecture
- Our current production Grid system
- Implementation details
- Results of usage
- Demonstration
- Research and development
- Task Computing with colleagues at Fujitsu
- Creating Grid-enabled workflows
6The Lattice Project Initial Goals
- Develop a Grid system for scientific research
that - Speeds up workflows by Grid-enabling various
programs - Is simple and intuitive
- Takes advantage of heterogeneous resources
- Is capable of managing large numbers of jobs
(thousands) - Supports multiple users and lowers the barriers
to getting involved - Is community-driven and supported
7Principles of Design
- Make use of well supported open source software
- Globus Toolkit
- BOINC
- Condor
- Engineered software should be scalable, modular,
and robust - Expose programs as well-defined services
- Arbitrary user-supplied code cannot be run
8Outline
- Grid computing motivation
- Goals of The Lattice Project
- Basic architecture
- Our current production Grid system
- Implementation details
- Results of usage
- Demo
- Research and development
- Task Computing with colleagues at Fujitsu
- Creating Grid-enabled workflows
9Terminology
- Client A Grid user interface OR a machine that
performs computation - Grid Service A Grid-enabled program
- Scheduler Decides where Grid jobs will run
- Resource Executes Grid jobs
10Basic Architecture (1 of 3)
11Basic Architecture (2 of 3)
12Basic Architecture (3 of 3)
13Outline
- Grid computing motivation
- Goals of The Lattice Project
- Basic architecture
- Our current production Grid system
- Implementation details
- Results of usage
- Demo
- Research and development
- Task Computing with colleagues at Fujitsu
- Creating Grid-enabled workflows
14Software Components
- Globus Toolkit version 3.2.1
- Backbone of the Grid
- http//www.globus.org/
- Condor-G
- Grid-level scheduler / resource broker
- http//www.cs.wisc.edu/condor/
- BOINC Berkeley Open Infrastructure for Network
Computing - SETI_at_home-style desktop grid
- http//boinc.berkeley.edu/
- Custom components
- GSBL, GSG, Globus-BOINC adaptor, MDS-matchmaking
bridge, user interface(s), administrative
scripts, and much more
15Globus Toolkit 3
- Key components
- Globus Core
- Grid service hosting environment
- GSI Grid Security Infrastructure
- Uses public key cryptography
- Secures communication
- Authenticates and authorizes Grid users
- WS GRAM Job management
- GASS Point to point file transfer
- MDS2 Information provider
16Condor-G
- Condor-G is part of the Condor suite
- Resources and jobs send Condor-G descriptions of
themselves called ClassAds - Condor-G matches Grid jobs to suitable resources,
then submits and manages them - This process is called matchmaking
17BOINC
- Most novel feature of our Grid
- Public computing model
- Untrusted resources
- Is potentially our largest resource
- We have targeted 3 platforms
- Windows / Linux x86 / Mac OS X
18Our Current Grid System
19User Interface
- The Grid Brick a machine used to submit Grid
jobs - Our primary interface for Grid users
- Command line clients mimic normal program
execution - Lattice Intranet
- Provides instructions for submitting jobs and
managing data input and output - Provides tools for describing and monitoring jobs
- Other possibilities
- Web portal model of job submission
- A client capable of composing complex workflows
using Task Computing and Semantic Web technology
developed by collaborators at Fujitsu
20Demonstration
21Basic Architecture Client/Service
22Grid Client Stack
Command-line Interface
Perl
Java
Service-specific templates and stubs are
created by the Grid Service Generator
23Grid Service Stack
Grid Service Hosting Environment, a.k.a. the
container
Java
Service-specific templates and stubs are
created by the Grid Service Generator
24Tools for Writing Grid Services
- Grid Service Base Library (GSBL)
- Java API for building Grid services with the
Globus Toolkit - Shields programmers from having to work with the
Globus API directly - Provides a high-level interface for operations
such as job submission and file transfer - Grid Service Generator (GSG)
- Simplifies the process of creating Grid Services
- Intended for use with GSBL
25GSBL Design and Features
- Classes for
- Clients and services (base classes)
- Argument description and processing
- File transfers
- Job submission and control
- Security configuration
- Java synchronization and Globus notifications to
paper over event-based model
26Grid Service Generator
- Deploying a Grid service with GT3 is absurdly
complicated - Many files, namespaces lots of potential typos
- GSG takes as input a few parameters (service
name, location, an XML argument description, etc)
and generates all requisite configuration files
and skeleton Java classes
27Grid Services
28Grid Services
- Creating Grid Services requires
- Knowledge of the application
- Techniques for compiling and porting the
application to various platforms - Knowledge of the infrastructure so it can be
effectively tested and deployed - Challenges
- Maintaining bodies of Grid Service code as the
number of applications grow and new versions of
applications are released - Minimizing the number of updates that need to be
applied when the framework changes
29Basic Architecture - Scheduling
30Condor-G ClassAds
- Resources and jobs send Condor-G descriptions of
themselves called ClassAds - Jobs require certain capabilities of resources
- Resources advertise their capabilities
- Similar to a dating service central broker
points pairs of compatible jobs/resources at each
other
31Condor G ClassAds
32Generating ClassAds
- Job ClassAds are generated by the Condor-G job
manager - Job requirements are specified in the Grid
service configuration files - Resource ClassAds are generated by extracting
information from MDS - Lattice information providers supply data
required for matchmaking
33Monitoring and Discovery System (MDS2)
- Globus information services component
- LDAP based
- Answers questions like
- What resources are available?
- What capabilities do these resources have?
- What is the load on these resources?
- This in turn allows for intelligent decisions to
be made in areas such as scheduling and resource
accounting
34MDS to Condor-G Diagram
35Basic Architecture - Resources
36Current Grid Resources
- http//lattice.umiacs.umd.edu/resources/
- UMIACS Condor pool
- 400 processors
- BOINC pools
- Clients on campus gt 100
- Public (off-campus) clients gt 1000
37BOINC
- Works on the pull model, that is
- One or more servers create workunits
- Clients connect asynchronously, pull down work,
and return the results - Clients are relatively lightweight and easy to
install and manage - One client can crunch work for multiple projects
- Participants can join teams and are given credit
for the work they complete - http//lattice.umiacs.umd.edu/boinc_public
38Globus-BOINC Adapter
- Consists of a number of components that allow us
to run Grid Services on BOINC - BOINC job manager
- Custom validator and assimilator
- Registers BOINC with Globus as a GRAM-addressable
resource - BOINC compatibility library eases the process of
porting applications to BOINC
39Demonstration
40Research Projects Using the Grid
- The Laboratory of David Fushman has run
proteinprotein docking algorithms on Lattice - CNS is the featured Grid service in this project
- Floyd Reed and Holly Mortensen from the
Laboratory of Sarah Tishkoff have run a number of
population genetics simulations - MDIV and IM are the featured Grid services
- The Laboratory of Michael Cummings has run
statistical phylogenetic analyses - GSI is the featured Grid service
41Results of Grid Usage
- IM 0.13 CPU years (BOINC)
- MDIV 4.93 CPU years (BOINC)
- CNS 12.4 CPU years (BOINC)
- GSI 94.05 CPU years (Condor)
- Total 111.51 CPU years
42Outline
- Grid computing motivation
- Goals of The Lattice Project
- Basic architecture
- Our current production Grid system
- Implementation details
- Results of usage
- Demo
- Research and development
- Task Computing with colleagues at Fujitsu
- Creating Grid-enabled workflows
43GT4 Research and Development
- We are currently upgrading the Grid system to use
Globus Toolkit 4.0 - GT4 adheres strictly to emerging and established
Web service standards - Actively developed and supported
- Many components have been greatly improved
- GridFTP/RFT (will replace GASS)
- WS GRAM
- MDS4 (XML based replaces MDS2, LDAP based)
- Our basic architecture will remain the same, and
the upgrade will be made easier because of tools
we have already developed (GSBL, GSG)
44Outline
- Grid computing motivation
- Goals of The Lattice Project
- Basic architecture
- Our current production Grid system
- Implementation details
- Results of usage
- Demo
- Research and development
- Task Computing with colleagues at Fujitsu
- Creating Grid-enabled workflows
45Fujitsu Task Computing Research
- http//taskcomputing.org/
- Fujitsu Laboratories of America, Inc.
- College Park, Maryland
46Task Computing (TC)
- Goals of Task Computing
- Lets ordinary end-users accomplish complex tasks
easily in environments rich with applications,
devices, and services - Tasks can be composed on-the-fly from the
services found in each environment and on the
Internet - Then, tasks can be shared and edited later by
others to suit their needs - Based on the Semantic Web Services technology, TC
provides many ways to interact with tasks
comprised of services
47The Core Idea
Play Jeffs Video Dial Contact from Outlook View
Weather of Maryland
The key is Semantic Service Descriptions (SSDs)
for resources
Web Services
OS/Application (.NET, etc.)
Device (UPnP)
Video from DV
Video from DV
Add into Outlook
Dial
Open
Save
Print
Add into Outlook
Dial
Open
Save
Print
Weather of
Weather of
Aerial Photo of
Aerial Photo of
Jeffs Video
Play (Video)
Play (Audio)
View
Contact from Outlook
Jeffs Video
Play (Video)
Play (Audio)
View
Contact from Outlook
OS/Application
Devices
Web Pages
48Task Computing Environment (TCE)
- Windows software to realize TC
- Core is written in Java
- Requirements
- Windows XP with IIS (Internet Information Server)
installed - Java Runtime Environment (for TC clients only)
- Single Windows installer with
- TC clients in many modalities (graphical, voice,
Web-based) - More than 50 kinds of TC services
- OS, application functions, devices
- Many mechanisms for dynamic service creation
- Web Services APIs for TC functions to program
your own application - Available from http//taskcomputing.org for
research institutes
49TC Architecture
User
Task Computing Environment
Presentation Layer
Task ComputingClient
Applications
Web-basedClient
Web Service API
Middleware Layer
Discovery Engine
Execution Execution Monitoring Engine
ServiceComposition Engine
Management Tools
Semantic ServiceDescription
Semantic ServiceDescription
Semantic ServiceDescription
Semantic ServiceDescription
Service Layer
Service
Service
Service
Service
E-service
Realization Layer
Device
Application
Content
50TC Process
Discover
Execute
Create Task
Execute
By Web
By Email
Save
Share
Edit
51For Your Applications
- Clients
- Service Discovery
- Task Creation/Edit/Save
- Task Execution
- Services
- Create Web Services
- .NET for Windows, Axis for Java
- Provide Semantic Service Description (SSD)
- Reuse ontology (schema)
- Or create your own ontology
- Create SSD based on the ontology
- Publish SSD
Tools available
P Provided by TCE or available
52Problems In Scientific Domain
- Complex workflow generation involving many
interconnected software tools - Requires expert knowledge of each tools
- Too many variations of tools
- Too many tools!
- Requires sophisticated level of computing
- Format conversions
- Different platforms
- Difficult for
- young scientists to start out
- existing scientists to explore new
tools/workflows - Need a new environment where scientists can
- easily experiment with combining several tools to
accomplish their research goals - without requiring sophisticated computing support
- Abstract the IT details into the hands of domain
scientists
53Task from Distributed Resources
- Tasks can be composed quickly by end-users from
distributed and heterogeneous resources, then,
they can be easily shared and later edited
Task
Task
Task
S
S
S
S
S
S
S
54Bio-STEER
- Application of Task Computing and Semantic Web
technologies in bioinformatics domain - Collaboration work with UMD
- Professor Mike Cummings, Center For
Bioinformatics and Computational Biology - Lattice Project
- Offers growing list of bio services on grid
- http//lattice.umiacs.umd.edu
55Bio-STEER Benefits
Data into semantic layer
- User-friendly environments
- Frees from computing support to build workflows
- Enable convenient sharing of workflows (sharing
not just data, but process) - Promotes collaboration among scientists
- High reusability and changeability
- Encourages scientists to experiment
Reuse, share, modify workflows
Easily track progress
Easy composition
Interact only when necessary otherwise automated
execution
Scientists can concentrate on their research
56Demo
57What is needed?
- Development of semantic services and web services
- However,
- One time cost applications are stable and
limited - Semantic services are easily shared and modified
- Bio IT support team can now concentrate on
building better infrastructure including semantic
services - Near-term enhancements expected
- Better support from TCE concurrent branch
execution and better error handling/recovery - User feedback support for additional features
and services based on feedback
58More Information
- Lattice Website
- http//lattice.umiacs.umd.edu/
- Task Computing
- http//taskcomputing.org/