HTC in Research - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

HTC in Research

Description:

Claims for 'benefits' provided by Distributed Processing Systems ... Easy Expansion in Capacity and/or Function ' ... IBM Watson Research: Alan King, Jim Sexton ... – PowerPoint PPT presentation

Number of Views:162
Avg rating:3.0/5.0
Slides: 44
Provided by: rga62
Category:
Tags: htc | research | sexton

less

Transcript and Presenter's Notes

Title: HTC in Research


1
HTC inResearch Education
2
Claims for benefits provided by Distributed
Processing Systems
  • High Availability and Reliability
  • High System Performance
  • Ease of Modular and Incremental Growth
  • Automatic Load and Resource Sharing
  • Good Response to Temporary Overloads
  • Easy Expansion in Capacity and/or Function

What is a Distributed Data Processing System? ,
P.H. Enslow, Computer, January 1978
3
Democratizationof ComputingYou do not need to
be asuper-person to do super-computing
4
NCBI FTP
Searching for small RNAs candidates in a
kingdom 45 CPU days
.ffn
.fna
IGRExtract3
.ptt
.gbk
RNAMotif
FindTerm
TransTerm
All other IGRs
ROI IGRs
BLAST
Terminators
Conservation
Known sRNAs, riboswitches
sRNAPredict
IGR sequences of candidates
Candidate loci
FFN_parse
IGRs all known sRNAs
BLAST
BLAST
TFBS matrices
homology
QRNA
ORFs flank known
ORFs flank candidates
Patser
2o cons.
BLAST
BLAST
paralogy
TFBSs
sRNA_Annotate
synteny
Annotated candidate sRNA-encoding genes
5
Education and Training
  • Computer Science develop and implement novel
    HTC technologies (horizontal)
  • Domain Sciences develop and implement
    end-to-end HTC capabilities that are fully
    integrated in the scientific discovery process
    (vertical)
  • Experimental methods develop and implement a
    curriculum that harnesses HTC capabilities to
    teach how to use modeling and numerical data to
    answer scientific questions.
  • System Management develop and implement a
    curriculum that uses HTC resources to teach how
    to build, deploy, maintain and operate
    distributed systems

6
  • "As we look to hire new graduates, both at the
    undergraduate and graduate levels, we find that
    in most cases people are coming in with a good,
    solid core computer science traditional education
    ... but not a great, broad-based education in all
    the kinds of computing that near and dear to our
    business."
  • Ron BrachmanVice President of Worldwide Research
    Operations, Yahoo

7
  • Yahoo! Inc., a leading global Internet company,
    today announced that it will be the first in the
    industry to launch an open source program aimed
    at advancing the research and development of
    systems software for distributed computing.
    Yahoos program is intended to leverage its
    leadership in Hadoop, an open source distributed
    computing sub-project of the Apache Software
    Foundation, to enable researchers to modify and
    evaluate the systems software running on a
    4,000-processor supercomputer provided by Yahoo.
    Unlike other companies and traditional
    supercomputing centers, which focus on providing
    users with computers for running applications and
    for coursework, Yahoos program focuses on
    pushing the boundaries of large-scale systems
    software research.

8
1986-2006Celebrating 20 years since we first
installed Condor in our CS department
9
Integrating Linux Technology with Condor Kim van
der Riet Principal Software Engineer

10
What will Red Hat be doing?
  • Red Hat will be investing into the Condor project
    locally in Madison WI, in addition to driving
    work required in upstream and related projects.
    This work will include
  • Engineering on Condor features infrastructure
  • Should result in tighter integration with related
    technologies
  • Tighter kernel integration
  • Information transfer between the Condor team and
    Red Hat engineers working on things like
    Messaging, Virtualization, etc.
  • Creating and packaging Condor components for
    Linux distributions
  • Support for Condor packaged in RH distributions
  • All work goes back to upstream communities, so
    this partnership will benefit all.
  • Shameless plug If you want to be involved, Red
    Hat is hiring...

10
11
High Throughput Computingon Blue Gene
  • IBM Rochester Amanda Peters, Tom Budnik
  • With contributions from
  • IBM Rochester Mike Mundy, Greg Stewart, Pat
    McCarthy
  • IBM Watson Research Alan King, Jim Sexton
  • UW-Madison Condor Greg Thain, Miron Livny,
    Todd Tannenbaum

12
Condor and IBM Blue Gene Collaboration
  • Both IBM and Condor teams engaged in adapting
    code to bring Condor and Blue Gene technologies
    together
  • Initial Collaboration (Blue Gene/L)
  • Prototype/research Condor running HTC workloads
    on Blue Gene/L
  • Condor developed dispatcher/launcher running HTC
    jobs
  • Prototype work for Condor being performed on
    Rochester On-Demand Center Blue Gene system
  • Mid-term Collaboration (Blue Gene/L)
  • Condor supports HPC workloads along with HTC
    workloads on Blue Gene/L
  • Long-term Collaboration (Next Generation Blue
    Gene)
  • I/O Node exploitation with Condor
  • Partner in design of HTC services for Next
    Generation Blue Gene
  • Standardized launcher, boot/allocation services,
    job submission/tracking via database, etc.
  • Study ways to automatically switch between
    HTC/HPC workloads on a partition
  • Data persistence (persisting data in memory
    across executables)
  • Data affinity scheduling
  • Petascale environment issues

13
The Grid Blueprint for a New Computing
Infrastructure Edited by Ian Foster and Carl
Kesselman July 1998, 701 pages.
The grid promises to fundamentally change the way
we think about and use computing. This
infrastructure will connect multiple regional and
national computational grids, creating a
universal source of pervasive and dependable
computing power that supports dramatically new
classes of applications. The Grid provides a
clear vision of what computational grids are, why
we need them, who will use them, and how they
will be programmed.
14
  • We claim that these mechanisms, although
    originally developed in the context of a cluster
    of workstations, are also applicable to
    computational grids. In addition to the required
    flexibility of services in these grids, a very
    important concern is that the system be robust
    enough to run in production mode continuously
    even in the face of component failures.

Miron Livny Rajesh Raman, "High Throughput
Resource Management", in The Grid Blueprint for
a New Computing Infrastructure.
15
(No Transcript)
16
CERN 92
17
The search for SUSY
  • Sanjay Padhi is a UW Chancellor Fellow who is
    working at the group of Prof. Sau Lan Wu located
    at CERN (Geneva)
  • Using Condor Technologies he established a grid
    access point in his office at CERN
  • Through this access-point he managed to harness
    in 3 month (12/05-2/06) more that 500 CPU years
    from the LHC Computing Grid (LCG) the Open
    Science Grid (OSG) the Grid Laboratory Of
    Wisconsin (GLOW) resources and local group owned
    desk-top resources.

Super-Symmetry
18
High Throughput Computing
  • We first introduced the distinction between High
    Performance Computing (HPC) and High Throughput
    Computing (HTC) in a seminar at the NASA Goddard
    Flight Center in July of 1996 and a month later
    at the European Laboratory for Particle Physics
    (CERN). In June of 1997 HPCWire published an
    interview on High Throughput Computing.

19
Why HTC?
  • For many experimental scientists, scientific
    progress and quality of research are strongly
    linked to computing throughput. In other words,
    they are less concerned about instantaneous
    computing power. Instead, what matters to them is
    the amount of computing they can harness over a
    month or a year --- they measure computing power
    in units of scenarios per day, wind patterns per
    week, instructions sets per month, or crystal
    configurations per year.

20
High Throughput Computingis a24-7-365activity
FLOPY ? (606024752)FLOPS
21
High Throughput Computing
EPFL 97
  • Miron Livny
  • Computer Sciences
  • University of Wisconsin-Madison
  • miron_at_cs.wisc.edu

22
Customers of HTC
  • Most HTC application follow the Master-Worker
    paradigm where a group of workers executes a
    loosely coupled heap of tasks controlled by on or
    more masters.
  • Job Level - Tens to thousands of independent jobs
  • Task Level - A parallel application (PVM,MPI-2)
    that consists of a small group of master
    processes and tens to hundreds worker processes.

23
The Challenge
  • Turn large collections of existing
    distributively owned computing resources into
    effective High Throughput Computing Environments
  • Minimize Wait while Idle

24
Obstacles to HTC
(Sociology) (Robustness) (Portability) (Technology
)
  • Ownership Distribution
  • Size and Uncertainties
  • Technology Evolution
  • Physical Distribution

25
Sociology
  • Make owners ( system administrators) happy.
  • Give owners full control on
  • when and by whom private resources are used for
    HTC
  • impact of HTC on private Quality of Service
  • membership and information on HTC related
    activities
  • No changes to existing software and make it easy
  • to install, configure, monitor, and maintain

Happy owners ? more resources ? higher throughput

26
Sociology
  • Owners look for a verifiable contract with the
    HTC environment that spells out the rules of
    engagements.
  • System administrators do not like weird
    distributed applications that have the potential
    of interfering with the happiness of their
    interactive users.

27
Robustness
  • To be effective, a HTC environment must run as
    a 24-7-356 operation.
  • Customers count on it
  • Debugging and fault isolation may be a very
    time consuming processes
  • In a large distributed system, everything that
    might go wrong will go wrong.

Robust system ? less down time ? higher throughput
28
Portability
  • To be effective, the HTC software must run on
    and support the latest greatest hardware and
    software.
  • Owners select hardware and software according to
    their needs and tradeoffs
  • Customers expect it to be there.
  • Application developer expect only few (if any)
    changes to their applications.

Portability ? more platforms? higher throughput
29
Technology
  • A HTC environment is a large, dynamic and
    evolving Distributed System
  • Autonomous and heterogeneous resources
  • Remote file access
  • Authentication
  • Local and wide-area networking

30
Robust and PortableMechanisms Hold The
ToHigh ThroughputComputing
Policies play only a secondary role in HTC
31
Leads to a bottom upapproach to building and
operating distributed systems
32
My jobs should run
  • on my laptop if it is not connected to the
    network
  • on my group resources if my certificate expired
  • ... on my campus resources if the meta scheduler
    is down
  • on my national resources if the trans-Atlantic
    link was cut by a submarine

33
The Open Science Grid(OSG)
  • Miron Livny - OSG PI Facility Coordinator,
  • Computer Sciences Department
  • University of Wisconsin-Madison

Supported by the Department of Energy Office of
Science SciDAC-2 program from the High Energy
Physics, Nuclear Physics and Advanced Software
and Computing Research programs, and the
National Science Foundation Math and Physical
Sciences, Office of CyberInfrastructure and
Office of International Science and Engineering
Directorates.
34
The Evolution of the OSG



LIGO operation
LIGO preparation


LHC construction, preparation
LHC Ops


iVDGL
(NSF)


OSG
Trillium
Grid3
GriPhyN
(DOENSF)
(NSF)

PPDG
(DOE)

DOE Science Grid
(DOE)

1999
2000
2001
2002
2005
2003
2004
2006
2007
2008
2009

European Grid Worldwide LHC Computing Grid

Campus, regional grids
35
The Open Science Grid vision
  • Transform processing and data intensive science
    through a cross-domain self-managed national
    distributed cyber-infrastructure that brings
    together campus and community infrastructure and
    facilitating the needs of Virtual Organizations
    (VO) at all scales

36
D0 Data Re-Processing
Total Events
12 sites contributed up to 1000 jobs/day
OSG CPUHours/Week
2M CPU hours 286M events 286K Jobs on
OSG 48TB Input data 22TB Output data
37
The Three Cornerstones
Need to be harmonized into a well integrated
whole.
National
Campus
Community
38
OSG challenges
  • Develop the organizational and management
    structure of a consortium that drives such a
    Cyber Infrastructure
  • Develop the organizational and management
    structure for the project that builds, operates
    and evolves such Cyber Infrastructure
  • Maintain and evolve a software stack capable of
    offering powerful and dependable capabilities
    that meet the science objectives of the NSF and
    DOE scientific communities
  • Operate and evolve a dependable and well managed
    distributed facility

39
6,400 CPUs available Campus Condor pool
backfills idle nodes in PBS clusters - provided
5.5 million CPU-hours in 2006, all from idle
nodes in clusters Use on TeraGrid 2.4 million
hours in 2006 spent Building a database of
hypothetical zeolite structures 2007 5.5
million hours allocated to TG
http//www.cs.wisc.edu/condor/PCW2007/presentation
s/cheeseman_Purdue_Condor_Week_2007.ppt
40
Clemson Campus Condor Pool
  • Machines in 27 different locations on Campus
  • 1,700 job slots
  • gt1.8M hours served in6 months
  • users from Industrial and Chemical engineering,
    and Economics
  • Fast ramp up of usage
  • Accessible to the OSG through a gateway

41
Grid Laboratory of Wisconsin
2003 Initiative funded by NSF(MIR)/UW at 1.5M.
Second phase funded in 2007 by NSF(MIR)/UW at
1.5M. Six Initial GLOW Sites
  • Computational Genomics, Chemistry
  • Amanda, Ice-cube, Physics/Space Science
  • High Energy Physics/CMS, Physics
  • Materials by Design, Chemical Engineering
  • Radiation Therapy, Medical Physics
  • Computer Science

Diverse users with different deadlines and usage
patterns.
42
GLOW Usage 4/04-11/08
Over 35M CPU hours served!
43
The next 20 years
  • We all came to this meeting because we believe in
    the value of HTC and are aware of the challenges
    we face in offering researchers and educators
    dependable HTC capabilities.
  • We all agree that HTC is not just about
    technologies but is also very much about people
    users, developers, administrators, accountants,
    operators, policy makers,
Write a Comment
User Comments (0)
About PowerShow.com