Title: Exa-Scale Volunteer Computing
1Exa-Scale Volunteer Computing
- David P. Anderson
- Space Sciences Laboratory
- U.C. Berkeley
2Outline
- Volunteer computing
- BOINC
- Applications
- Research directions
3 processors
program runs too slow on PC
1
multiple jobs
single job
High-throughput computing
cluster (batch)
High-performance computing
100
cluster (MPI)
Grid
1000
Commercial cloud
supercomputer
Volunteer computing
10K-1M
4Volunteer computing
- Early projects
- 1997 GIMPS, distributed.net
- 1999 SETI_at_home, Folding_at_home
- Today
- 50 projects
- 500K volunteers
- 900K computers
- 10 PetaFLOPS
5The PetaFLOPS barrier
- September 2007 Folding_at_home
- January 2008 BOINC
- June 2008 IBM Roadrunner
6ExaFLOPS
- Current PetaFLOPS breakdown
- Potential ExaFLOPS by 2010
- 4M GPUs 1 TFLOPS 0.25 availability
7BOINC
- Middleware for volunteer computing
- client, server, web
- Based at UC Berkeley Space Sciences Lab
- Open source (LGPL)
- NSF-funded since 2002
- http//boinc.berkeley.edu
8BOINC volunteers and projects
projects
volunteers
LHC_at_home
CPDN
WCG
attachments
9The BOINC computing ecosystem
The worlds computing power
Scientific research
The public
- Goals
- Better research gets more computing power
- The public decides whats better
10BOINC software overview
MySQL
daemons
scheduler
data server
HTTP
project server
GUI
client
screensaver
volunteer host
apps
11Scheduler RPC
- Request
- hardware, software description
- work requests (CPU, GPU)
- completed jobs
- Reply
- application descriptions
- job descriptions
12Client job scheduling
- Queue lots of jobs
- to avoid starvation
- for variety
- Job scheduling
- Round-robin time-slicing
- Earliest deadline first
13Client work fetch policy
- When? From which project? How much?
- Goals
- maintain enough work
- minimize scheduler requests
- honor resource shares
- per-project debt
CPU 0
CPU 1
CPU 2
CPU 3
max
min
14Work fetch for GPUs goals
- Queue work separately for different resource
types - Resource shares apply to aggregate
- Example projects A, B have same resource share
- A has CPU and GPU jobs, B has only GPU jobs
GPU
B
A
CPU
A
15Work fetch for GPUs
- For each resource type
- per-project backoff
- per-project debt
- accumulate only while not backed off
- A projects overall debt is weighted average of
resource debts - Get work from project with highest overall debt
16Scheduling server
- Possible outcomes of a job
- success
- runs but returns wrong answer
- doesnt run, returns wrong answer (hacker)
- crashes, client reports it
- never hear from client again
- Job delay bounds
- Replicated computing
- homogeneous replication
17Server abstractions
applications
app versions
Win32
Win32 NVIDIA
Win64
Win32 N-core
Mac OS X
jobs
instances
18Scheduler overview
MySQL
schedulers
feeder
share-memory job cache
client
19How scheduler chooses app versions
- App versions have project-supplied planning
function - Inputs
- host description
- Outputs
- Whether host can run app version
- Resource usage (CPUs, GPUs)
- expected FLOPS
20App version selection
- Call planning function for platforms app
versions - Skip versions that use resources for which no
work is being requested - Use the version with highest expected FLOPS
- Repeat this when a resource request is satisfied
21Anonymous platform mechanism
- The idea volunteer supplies app versions. Why?
- security
- optimization
- unsupported platforms
22Science areas using BOINC
- Biology
- protein study, genetic analysis
- Medicine
- drug discovery, epidemiology
- Physics
- LHC, nanotechnology, quantum computing
- Astronomy
- data analysis, cosmology, galactic modeling
- Environment
- climate modeling, ecosystem simulation
- Math
- Graphics rendering
23Application types
- Computing-intensive analysis of large data
- Physical simulations
- Genetic algorithms
- GA-inspired optimization
- Non-CPU-intensive
- Internet study
- distributed sensor network
24Malariacontrol.net
- Simulation models of the transmission dynamics
and health effects of malaria are an important
tool for malaria control. They can be used to
determine optimal strategies for delivering
mosquito nets, chemotherapy, or new vaccines
which are currently under development and testing.
25Climateprediction.net
26Einstein_at_home
- Gravitational waves gravitational pulsars
27SETI_at_home
28Milkyway_at_home
29GPUGRID.net
30AQUA_at_home
- D-Wave Systems
- Simulation of adiabatic quantum algorithms for
binary quadratic optimization
31Collatz Conjecture
- even N ? N/2
- odd N ? 3N 1
- always goes to 1?
32Quake Catcher Network
33Organizational models
- Umbrella projects
- Institutional
- Lattice, VTU_at_home
- Corporate
- IBM World Community Grid
- Community
- AlmereGrid
- Research community
- MindModeling.org
publicity web development sysadmin
Project
34Volunteer computing research
- Goals (mutually incompatible)
- maximize throughput
- minimize makespan of job batches
- minimize average time until credit
- minimize network traffic
- minimize server disk usage
35Characterizing hosts
powered on
available
connected
- What are good models? What are correlations with
other characteristics? How to model churn? - BOINC client is instrumented to log all this
have data from 200K hosts over 1 year - Mining for Statistical Models of Availability in
Large-Scale Distributed Systems An Empirical
Study of SETI_at_home. Bahman Javadi, Derrick Kondo,
Jean-Marc Vincent, David P. Anderson. 17th Annual
Meeting of the IEEE/ACM International Symposium
on Modelling, Analysis and Simulation of Computer
and Telecommunication Systems, Sept 21-23 2009,
London. - On Correlated Availability in Internet-Distributed
Systems. Derrick Kondo, Artur Andrzejak, and
David P. Anderson. 9th IEEE/ACM International
Conference on Grid Computing (Grid 2008),
Tsukuba, Japan, Sept 29 - Oct 1 2008.
36Studying server scheduling policies
MySQL
Simulator of a large, dynamic set of volunteer
hosts
feeder
scheduler
share-memory job cache
- EmBOINC BOINC project emulator
- Performance Prediction and Analysis of BOINC
Projects An Empirical Study with EmBOINC. Trilce
Estrada, Michela Taufer, David Anderson. To
appear, Journal of Grid Computing. - EmBOINC An Emulator for Performace Analysis of
BOINC Projects. Trilce Estrada, Michela Taufer,
Kevin Reed, David Anderson. 3rd Workshop on
Desktop Grids and Volunteer Computing Systems
(PCGrid 2009), May 29, 2009, Rome.
37Studying client scheduling policies
- BOINC client simulator
- simulates a client connected to several projects
- based on actual client code
- Performance Evaluation of Scheduling Policies for
Volunteer Computing. Derrick Kondo, David P.
Anderson and John McLeod VII. 3rd IEEE
International Conference on e-Science and Grid
Computing. Banagalore, India, December 10-13
2007. - Local Scheduling for Volunteer Computing. David
P. Anderson and John McLeod VII. Workshop on
Large-Scale, Volatile Desktop Grids (PCGrid 2007)
held in conjunction with the IEEE International
Parallel Distributed Processing Symposium
(IPDPS), March 30, 2007, Long Beach.
38Supporting distributed applications
- Volpex Linda-like dataspace system
- MPI layer
- centralized implementation
- fault tolerance, performance issues
- A Communication Framework for Fault-tolerant
Parallel Execution. Nagarajan Kanna, Jaspal
Subhlok, Edgar Gabriel, Eshwar Rohit and David
Anderson. The 22nd International Workshop on
Languages and Compilers for Parallel Computing,
Newark, Delaware, Oct 8-10 2009.
39Using virtual machines
hypervisor (VirtualBox, kQEMU,etc.)
BOINC client
VM wrapper
VM
- App version is VM wrapper virtual machine image
- VM image may contain the client of a non-BOINC
distributed batch system
40Data-intensive computing
- Maintain large data set on clients
- 10 years of radio telescope data
- gene/protein data
- Compute against data set
- MapReduce, other models
41Volunteer motivation study
- Online survey correlated with participation data
- Survey is currently being designed
- Preliminary findings
- Talk is cheap claimed motivations not supported
by data - Team members contribute more
- Contribution decreases over time (especially for
non-team members)
42Conclusion
- Volunteer computing Exa-scale potential
- GPUs are crucial
- BOINC enabling technology
- Bottlenecks
- organizational models
- public awareness
- Lots of research opportunities