Title: Condor and the Grid
1Condor and the Grid
- D. Thain, T. Tannenbaum, M. Livny
Christopher M. Moretti 23 February 2007
2Problem Opportunity
- Users need CPUs
- Scientific computing
- Mathematical modeling
- Data mining
- Many CPU cycles are unused
- Personal workstations
- General use laboratories
- Research machines
3Solution Condor
- A hunter of idle workstations
- Keeps track of resources
- needed and available
- Determines and assigns matches
- Monitors progress
- Cleans up and reports results
4Architecture
- Three principals
- Agent machine needing resources
- Matchmaker
- Resource machine lending resources
- Three phases
- Advertising
- Matching/Claiming
- Deploying/Executing
5Advertising
Does Y satisfy X?
MatchMaker
I need X
I have Y
Agent
Lender
idle.cse.nd.edu
needy.cse.nd.edu
6Matching Claiming
MatchMaker
Use idle.cse.nd.edu
Listen for needy.cse.nd.edu
Agent
Lender
Are you still available?
Yes.
idle.cse.nd.edu
needy.cse.nd.edu
7Deploying / Executing
Agent
Lender
Fork!
Fork!
Shadow
Sandbox
Run job J.
J
I need file /tmp/foo.
Split Execution
idle.cse.nd.edu
needy.cse.nd.edu
8Matching
- How are matches determined?
- Policy
- ClassAds
- Why independently claim a match?
- What if the Matchmaker dies?
9ClassAds
- MyTypeJob
- TargetTypeMachine
- Requirements
- ((other.ArchINTELother.OpSysLINUX
-
- KeyboardIdlegt600))
- Cmd/tmp/a.out
- Ownercmoretti
- MyTypeMachine
- TargetTypeJob
- Machine
- dustpuppy.cse.nd.edu
- Requirements
- ((
- KeyboardIdlegt600
- ))
- ArchINTEL
- OpSysLINUX
10Flocking
- Using another pools resources
- Utilize more total resources
- Find resources that match needs
- Two methods
- Gateway flocking
- Direct flocking
11Gateway Flocking
- Each pool has a known gateway
- Gateways negotiate sharing
- Advertise resources and needs
- Transmit requests to local matchmaker
- Pool-level granularity
- Accounting
- Policy
- Now obsolete
12Gateway Flocking
R
1
MM
A
R
2
Gateway
Gateway
3
4
R
5
MM
R
R
5
R
R
13Direct Flocking
- Agents report to other matchmakers
- No gateways
- Equivalent to being in multiple pools?
- Now the preferred (only) method
14Gateway Flocking
R
MM
A
1
R
2
R
MM
R
R
3
R
R
15Flocking Comparison
Gateway Flocking Direct Flocking
- No gateways
- Individual relationships supported
- Non-transparent
- Fewer organization-level agreements
- Transparency
- Fosters organization-level sharing
- Poor accounting
- Complicated
16Things Arent Perfect
- What happens if (when)
- Matchmaker goes down
- Network or Agent fails during deploy
- Resource or App fails during compute
- Non-dedicated machines.
- How do we keep owners happy?
- What happens when an owner reclaims a resource?
17Total Consumption in 2006
Condor at Notre Dame
http//www.cse.nd.edu/ccl/operations/condor/2005/
users.html
Harnessing Idle Computers with Condor at Notre
Dame Impact on Research in 2006, Douglas Thain
18Current Donors Feb 2007
Owner Nodes CPUs Storage (TB)
CRC/OIT 92 92 3.7
CSE 73 124 11.7
Prof. Thain 59 91 5.5
Prof. Flynn 18 35 0.65
Prof. Striegel 10 20 0.65
Misc 7 17
Total 259 379 20.2 TB
Harnessing Idle Computers with Condor at Notre
Dame Impact on Research in 2006, Douglas Thain
19CPU History
Harnessing Idle Computers with Condor at Notre
Dame Impact on Research in 2006, Douglas Thain
20Recap
- Condor facilitates distributed computation on
dedicated or scavenged CPUs arranged by a
matchmaker using ClassAds. - Split Execution is necessary to fit the jobs
needs to the environment. - An agent can advertise to multiple matchmakers to
examine more potential matches.