Example: Sorting on Distributed Computing Environment - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Example: Sorting on Distributed Computing Environment

Description:

the 'Enshu' part of this class. Give a talk about the problem attached to you ... OSI model: divides facilities of network devices into 7 layers. Application Layer ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 17
Provided by: okawebEc
Category:

less

Transcript and Presenter's Notes

Title: Example: Sorting on Distributed Computing Environment


1
ExampleSorting on Distributed Computing
Environment
  • Apr 20, 2009

2
About this presentation
  • Example for starting preparation of the "Enshu"
    part of this class
  • Give a talk about the problem attached to
    youaccording to the theme of the day.
  • Does not show complete presentation.
  • Just shows the points to be studied, explained
    and solved.

3
Sorting Large Number of Data
  • Data size gt Memory size of single computer
  • Ex. 1100trillion integer numbers
  • Distributed Parallel SortDistribute the data
    into multiple computers on a network and sort.
    ? Use multiple computational power ? Requires
    communication among computers

4
Computational Infrastructures
  • Case 1 PCs in a computer room
  • Use all of the PCs on holidays or in midnights
  • 100 PCs (200400GB of memory in total)
  • Case 2 Supercomputers in Japan
  • Enable "Ultra Large Scale Computation"by using
    supercomputers all over Japan
  • 1020 supercomputers
  • Speed 10TFLOPS 100TFLOPS / each
  • Memory 10TB 100TB / each

5
Network Infrastructures
  • Case 1 Ethernet Switch
  • Bandwidth 100Mbps 1Gbps
  • Latency 0.050.1msec
  • Case 2 SINET3 (Academic Network in Japan)
  • Backbone Bandwidth 1040Gbps
  • Bandwidth per computer 10Gbps
  • Latency 10100msec
  • depends on the length of physical networks

6
Bandwidth? Latency?
  • Bandwidth Available speed of data transfer
    (bit/sec) on the network
  • LatencyMinimum time required for each data
    transfer
  • Estimation of the cost for a data transfer T
    L S / B L Latency, B Bandwidth, S
    Data size (bit)

7
System Infrastructure
  • Case 1
  • Network environment can be "Reliable",since no
    other user is using the system.
  • Implementation of the program will be easierby
    installing MPI(Message Passing Interface).
  • Case 2
  • Network environment may be "Unreliable" since
    many users share the network routes.
  • Usage of MPI is difficult, since the environment
    is"Heterogeneous"
  • Different architectures and OSs

8
Implementation on Internet
  • Everything can be built on "Application Layer"
  • Choose a protocol for internet TCP or UDP
  • Case 1 UDP (or MPI over UDP)
  • Case 2 TCP
  • Choose a parallel algorithm of sorting
  • Parallel Algorithm algorithm for solving a
    problem by dividing it into multiple tasks and
    running them concurrently

9
"Layers" of networks
  • OSI model divides facilities of network devices
    into 7 layers
  • Application Layer
  • Presentation Layer
  • Session Layer
  • Transport Layer
  • Network Layer
  • Data Link Layer
  • Physical Layer

10
TCP or UDP
  • TCP (Transmission Control Protocol)
  • Guarantees the completion of data transfer.
  • Slow but reliable.
  • UDP(User Datagram Protocol)
  • No guarantee about data transfer.
  • Fast but unreliable.
  • Sorting requires every data to be correctly
    transferred. ? TCP is preferred.
  • On reliable networks such as Case 1, UDP can be
    used.
  • MPI is an interface over UDP (or TCP).
  • Guarantees data transfer even over UDP.

11
Detailed Implementation of Softing Program
  • Implementation of parallel algorithm
  • Cost of computation?
  • Cost of communication?
  • Requirements of Memory?
  • Policies for distributing computation and data
    affects the performance.

12
Characteristics of each case
  • Case 1
  • Low latency and narrow bandwidth
  • Total amount of computational power and memory is
    small
  • No need for load-balancing
  • Case 2
  • High latency and wide bandwidth
  • (Possibly) Large amount of computational power
    and memory
  • Requires load-balancing according to the
    computational power of each machine.

13
Implementation for Case 1.
  • Distribute same amount of computation and data on
    each computer
  • Consider the number of PCs to be used
  • Communication cost increases according to the
    number of PCs
  • If the target data is large enough, it will
    achieve sufficient speedup by parallelization
    even with 100PCs.

14
Implementation for Case 2
  • The amount of computation and data depends on the
    relative performance of each computer.
  • Accurate analysis of the performance of each
    machine and network is important.
  • It must be difficult to obtain sufficient effect
    of parallelization with large number of nodes.
  • Performance degradation by load unbalance and
    communication cost.

15
To complete the presentation of your solution
  • Detailed information about the infrastructure.
  • Detailed information of implementation
  • Parallel Algorithm?
  • Policy for distributing data and computation?
  • Estimate computation and communication time,and
    find the optimal distribution.
  • How to distribute the target data?and how to
    gather the results?
  • Management of multiple jobs.
  • Standardization of the solution
  • Relationship with the future networks

16
Exercise
  • Find existing parallel algorithms
  • For example Sorting
  • Note Algorithms for "distributed memory
    parallel" computing environment Each computer
    has its own memory gt Requires explicit
    communication
  • Consider how to implement them on computers
    connected with Internet.
Write a Comment
User Comments (0)
About PowerShow.com