Thank you for coming! - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Thank you for coming!

Description:

... to get back here afterwards we will restart in this room _at_ 1:00pm sharp. ... Todd, Ross, and I will hold the problem solving Q&A concurrently in Room 3180. ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 14
Provided by: peterco7
Category:

less

Transcript and Presenter's Notes

Title: Thank you for coming!


1
Welcome
  • Thank you for coming!
  • If you need a breakout room this AM, we have room
    3111 reserved one floor down feel free to use
    it.
  • Lunch is at the Fluno Center 1.5 blocks away. Be
    sure to leave yourself 5-10 minutes to get back
    here afterwards we will restart in this room _at_
    100pm sharp.
  • At 330pm, Becky will hold the afternoon tutorial
    here, while Todd, Ross, and I will hold the
    problem solving QA concurrently in Room 3180.
  • Do me a favor please fill out the survey at the
    end of the day.

2
Background
  • What are Metronome and the NMI Lab? Why Should I
    Care?

3
The Problem
  • High-quality distributed computing (grid)
    software is
  • badly needed
  • hard to find
  • hard to build and test

4
The Fix(Part of it, anyway)
  • Good build/test cycle
  • To be good, build/test process must be
  • frequent
  • reliable
  • automatic
  • repeatable

5
The (Next) Problem
  • Building and testing distributed computing
    software requires
  • Distributed resources
  • Not always in-house, not always dedicated to
    builds
  • I.e., shared, scheduled resources
  • Unless you have a spare Blue Gene lying around
    and an old Alpha running RedHat 7.2 and an HPUX
    11 box and an Itanium running Scientific Linux 3
    (CERN-flavored) and
  • Distributed testbeds, tests
  • Not our grid middleware works on my machine
    ship it!

6
Grid Build and Test
  • Building and testing distributed computing
    software brings distributed challenges
  • Complex workflows, shared resources,
    multi-site/cross-project/multi-user scheduling
    and priorities, data management, fault-tolerance,
    failure recovery
  • A lot like real distributed computing
  • Tinderbox or the latest Web 2.0 build system
    dont address most of the challenges
  • Deep, integrated software stacks
  • Distributed providers

7
Metronome Principles
  • Tool-independent
  • Lightweight
  • Enable the use of disparate, shared resources by
    multiple collaborators dont assume one user,
    one project, one environment.
  • Encourage explicit, well-controlled build/test
    environments
  • Central results repository
  • Fault-tolerance
  • Support platform-neutral and platform-specific
    tasks
  • Build/test separation

8
Metronome Architecture
INPUT
Distributed Build/Test Pool
Metronome
Spec File
Condor Queue
Spec File
DAG
Customer Source Code
build/test jobs
results
results
Customer Build/Test Scripts
results
Web Portal
MySQL Results DB
Finished Binaries
OUTPUT
9
(No Transcript)
10
NMI Lab
  • Dedicated, heterogeneous distributed computing
    facility
  • Opposite extreme from typical cluster --
    instead of 1000s of identical CPUs, we have a
    handful of CPUs each for 50 platforms.
  • Much harder to manage. You try finding a
    monitoring tool that works on 50 platforms!
  • Carefully-controlled resources
  • No mystery meat
  • 200 registered users, 84k build test runs
    carried out by 1.1M Condor jobs, producing 6.5M
    individual BT tasks/results recorded in the DB

11
The Team
  • Dedicated Subset of the Condor Team
  • Becky Gietzel, Todd Miller, Peter Couvares
    Metronome development and NMI Lab support
  • Ross Oldenburg and Ken Hahn Condor Team
    sysadmins Rosss primary focus is the NMI Lab,
    Kens focus is the rest of the Condor world, and
    they back each other up.
  • 2 new hires in progress (one developer, one
    sysadmin)
  • Undergraduate hourly staff Kevin, iTom, webTom,
    more soon.

12
Why This Meeting?
  • Big reasons
  • Science Matters, Science Needs Quality
    Distributed Computing Software
  • Nurture Community Working on Automated Testing
  • Metronome NMI Lab Education and Training
  • Solicit Feedback Collaboration
  • Small reasons
  • We want to show off so that the NSF knows its
    money is well spent.
  • Its an excuse to get our own ducks in a row
    nothing focuses ones work and plans like the
    need to present it to others

13
Lets Get On With It...
  • Users First
  • Successful Users
  • Cranky Users
  • Honest Users
  • Useful Users?
  • Developers Next
  • N-way Communication
  • Directions/collaborations for coming year
Write a Comment
User Comments (0)
About PowerShow.com