Highly Distributed Parallel Computing - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Highly Distributed Parallel Computing

Description:

a network of computers all working towards a similar goal ... Painlessly Scalable. smooth curve upwards for both cost and performance. Simpler to Program ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 42
Provided by: neilsk4
Category:

less

Transcript and Presenter's Notes

Title: Highly Distributed Parallel Computing


1
Highly DistributedParallel Computing
  • Neil Skrypuch
  • COSC 3P93
  • 3/21/2007

2
Overview
  • a network of computers all working towards a
    similar goal
  • network consists of many nodes, few servers
  • nodes perform computing and send results to a
    server
  • servers distribute jobs
  • node machines do not communicate with eachother

3
Pros
4
Relatively Simple
  • don't need to worry about special
    interconnections
  • don't need to worry about cluster booting

5
Non-Homogeneous Network
  • can work across different computer architectures,
    OSes, etc
  • computers can be of varying speeds
  • doesn't require the fastest or most expensive
    computers
  • computers can be distributed anywhere in the world

6
Infrastructure
  • infrastructure for HDPC already exists almost
    everywhere
  • anyone with a network of computers is already
    ready for HDPC
  • lots of programs already exist that take
    advantage of HDPC

7
Expansion
  • expansion is painless
  • there are no special constraints on the shape
    of the network
  • not fast enough yet? keep adding more computers
    until it is

8
Resilience to Failure
  • it doesn't matter if one or more nodes die
  • only the reliability of the central server(s)
    matter

9
Cons
10
Suitability
  • not all problems are suited to HDPC
  • highly communication bound problems are a poor
    fit for HDPC

11
Server Dependence
  • central server dependence is a double edged sword
  • if the central server becomes unavailable,
    everything grinds to a halt

12
Network (In)security
  • how to verify if a client should be allowed to
    join the network?
  • protecting data sent over the network
  • verifying integrity and authenticity of data sent
    over the network

13
Network (Un)reliability
  • nodes temporarily losing connectivity may make
    them temporarily useless

14
Dealing With the Issues
15
Server Dependence
  • the central server need not be a single server
  • server itself may be clustered
  • countless ways to cluster servers

16
Clustering With a Database
  • allow nodes to talk directly to the database
  • cluster the database over multiple servers
  • multi-master replication
  • single master replication
  • lots more...

17
Server Hierarchy
  • multiple tiers of servers may also be used
  • could be considered recursive HDPC
  • very similar to the tree architecture of
    supercomputers

18
Lost Nodes
  • define a maximum amount of time to wait for a
    node's response
  • use redundancy
  • assume some nodes will always be lost
  • send duplicate jobs to multiple nodes
    simultaneously

19
Network (In)security
  • not as big of an issue as one might think
  • encryption and public key infrastructures
    mitigate most confidentiality and authenticity
    concerns
  • redundancy is useful for both reliability and
    security

20
Work Buffering
  • taking larger portions of work at a time
  • temporary connectivity issues pose less of a
    problem this way
  • a node can continue working without talking to a
    central server for longer

21
Where is HDPC Useful?
22
Combinatorics
  • search
  • enumeration
  • generation

23
Cryptography
  • brute force cipher cracking
  • gives a glimpse of the future, in terms of what
    the average person will be able to crack

24
Artificial Intelligence
  • genetic algorithms
  • genetic programming
  • alpha-beta search

25
Graphics
  • ray tracing
  • animation
  • fractal generation and calculation

26
Simulation
  • weather and climate modeling
  • particle physics

27
Guidelines for Suitability
  • most problems involving a large search tree are
    well suited to HDPC
  • anything that can be broken down into smaller,
    self-contained, chunks is a good candidate for
    HDPC

28
How Well Does HDPC Work?
29
Folding_at_Home
  • 200,000 non-dedicated nodes
  • 240 TFLOPS
  • approximately 40 central servers, unknown speeds

30
SETI_at_Home
  • 200,000 non-dedicated nodes
  • 288 TFLOPS
  • 10 central servers, all relatively modest

31
Blue Gene/L
  • currently the fastest supercomputer
  • not HDPC
  • 65,536 dedicated nodes
  • 280 TFLOPS
  • cost about 100,000,000 US

32
HDPC Works Well
  • typical speedup is close to linear
  • cost is substantially less than a comparable
    supercomputer
  • nodes can also be general purpose computers

33
Why Does HDPC Work Well?
34
Infrastructure Reuse
  • in general, new hardware investments are not
    necessary
  • creating new infrastructure is expensive and time
    consuming
  • it's easy to justify using things you already
    have for additional purposes
  • there are tons of idle CPUs at any given time,
    why not use them?

35
Low Barrier to Entry
  • anyone with a couple of networked computers can
    start experimenting

36
Painlessly Scalable
  • smooth curve upwards for both cost and performance

37
Simpler to Program
  • doesn't require as much thinking in parallel in
    comparison to other approaches
  • thinking in parallel is hard and fundamentally
    different than thinking serially
  • pushes the heavy lifting onto the database
    instead of the application programmer

38
Commodity Hardware is Fast
  • a typical desktop machine today is more powerful
    than a supercomputer from 15 years ago
  • and costs orders of magnitude less
  • and outputs much less heat
  • and takes up much less space
  • and consumes much less power

39
The Future
  • supercomputers will become faster
  • HDPC will become even faster than supercomputers
  • as both number of computers and speed increases
  • both supercomputers and HDPC will fill their own
    separate niche

40
Questions and Discussion
41
References
  • http//fah-web.stanford.edu/cgi-bin/main.py?qtype
    osstats
  • http//www.boincstats.com/stats/project_graph.php?
    prsah
  • http//www.boincstats.com/stats/project_graph.php?
    prbo
  • http//www.itjungle.com/tlb/tlb033004-story04.html
  • http//setiathome.berkeley.edu/sah_status.html
  • http//fah-web.stanford.edu/serverstat.html
  • http//top500.org/list/2006/11/100
Write a Comment
User Comments (0)
About PowerShow.com