Cooperative Computing for Data Intensive Science - PowerPoint PPT Presentation

About This Presentation
Title:

Cooperative Computing for Data Intensive Science

Description:

University of Notre Dame. NSF Bridges to Engineering 2020 Conference. 12 March 2008 ... We design and build distributed systems that helps people to attack BIG ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 16
Provided by: dougla9
Learn more at: https://www3.nd.edu
Category:

less

Transcript and Presenter's Notes

Title: Cooperative Computing for Data Intensive Science


1
Cooperative Computing for Data Intensive Science
  • Douglas Thain
  • University of Notre Dame
  • NSF Bridges to Engineering 2020 Conference
  • 12 March 2008

2
What is Cooperative Computing?
  • By combining our computing and storage resources
    together, we can attack problems larger than we
    could alone.
  • I can use your computer when it is idle, and vice
    versa. (Most computers are idle about 90 percent
    of the day.)
  • Also known as
  • Grid computing, distributed computing,
    metacomputing, volunteer computing, etc

3
Who Needs Coop Computing?
  • Many fields of study rely on simulation and data
    processing to conduct science.
  • Physics, chemistry, biology, engineering,
    finance, sociology, computer science.
  • More Computing Better Results
  • NOT High Performance Speed up one program.
  • High Throughput Produce as many results as
    possible over the next day / week / year.

4
Cooperative Computing Lab
  • We design and build distributed systems that
    helps people to attack BIG problems.
  • Work directly with end users to make sure that
    our solutions affect the real world.
  • Operate a modest computing system as both a
    production service and a research testbed.
  • Currently about 500 cpus and 300 disks.
  • CS Research challenges scalability, robustness,
    usability, debugging, and performance.

http//www.nd.edu/ccl
5
(No Transcript)
6
(No Transcript)
7
What Makes this Challenging?
  • The Programming Model
  • I want to process 10 TB of data on 100 machines,
    then distribute it across 20 disks, then view the
    best results on my workstation.
  • Fault Tolerance
  • Something is always broken!
  • Performance Robustness
  • There is always one slowpoke.
  • Debugging
  • My job runs correctly here but not there...!?

8
An Example CollaborationBiometrics
ResearchandDistributed Systems
9
A Common Pattern in Biometrics

1 .8 .1 0 0 .1
1 0 .1 .1 0
1 0 .1 .3
1 0 0
1 .1
1
Sample Workload 4000 images 256KB each 1s per
F 185 CPU-days Future Workload 60000 images 1MB
each 0.1s per F 4166 CPU-days
10
Non-Expert User Using 500 CPUs
11
All Pairs Production System
300 active storage units 500 CPUs, 40TB disk
Web Portal
F
G
H
4 Choose optimal partitioning and submit batch
jobs.
S
T
F
F
F
1 - Upload F and S into web portal.
2 - AllPairs(F,S)
F
F
F
All-Pairs Engine
6 - Return result matrix to user.
3 - O(log n) distribution by spanning tree.
5 - Collect and assemble results.
12
Some Results on Real Workload
13
Collaboration is Where the Interesting Problems
Are!
(Cooperative ComputingProvides the Resources)
14
What Makes a Collaboration Work?
  • Like a marriage? (old joke.)
  • First, a show of commitment go after some low
    hanging fruit, and publish it.
  • A proposal for funding only succeeds if you have
    already started working together.
  • Need very concrete goals your partner may not
    share your idea of an interesting tangent.
  • Students sometimes need a big push to leave their
    comfort zone and work together.

15
For more information
  • Douglas Thain
  • dthain_at_nd.edu
  • Cooperative Computing Lab
  • http//www.nd.edu/ccl
  • Apply for Summer 2008 REU
  • http//www.nd.edu/ccl/reu

Supported by NSF Grants CCF-0621434 and
CNS-0643229.
Write a Comment
User Comments (0)
About PowerShow.com