Parallel Sorting on Heterogeneous Cluster - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Parallel Sorting on Heterogeneous Cluster

Description:

the high availability of spare CPU time and ... If time permits we may include some intel based machines in our cluster. References: ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 21
Provided by: muqeem
Category:

less

Transcript and Presenter's Notes

Title: Parallel Sorting on Heterogeneous Cluster


1
Parallel Sorting on Heterogeneous Cluster
  • Muqeem Lodhi
  • Pradeep Monga

2
What is a heterogeneous cluster?
  • A cluster is heterogeneous if the computers are
    connected by a high-speed network,
  • the processors are of different speed and/or
  • use different operating systems.

3
Motivation to use the heterogeneous cluster.
  • Conventional parallel machines are very expensive
    as compared to individual workstations.
  • Usually in any organization there are
  • Availability of variety of individual machine
    architectures,
  • the high availability of spare CPU time and
  • the recent advances in high-speed networks
    technology have made clusters of workstations a
    promising and cost effective alternative to
    conventional parallel machines.

4
What are the cons using heterogeneous cluster?
  • Issues needed to be taken care of are
  • Maximize the utilization of all CPUs
  • Minimize the overhead and
  • Load balancing is also a key issue in
    heterogeneous clusters.

5
Heterogeneous cluster
  • Torch (Server) 900 Mhz cpu- Ultra Sparc
    III
  • Caper (ws) 270 Mhz cpu- Ultra Sparc V
  • Robot (ws) 167 Mhz cpu- Ultra Sparc I
  • Potato-head (ws) 248 Mhz cpu- Ultra Sparc II

6
We aimed for the algorithm with the following
properties and limitations
  • Good memory utilization The number of elements
    that can be sorted is close to the number that
    can be stored in the memory of the machine.
  • Comparison based The only operation used on keys
    is binary comparison.
  • Flexibility No restrictions are placed on the
    number of keys to sort.

7
Algorithms
  • PSRS (Parallel Sorting by Regular Sampling)
  • QuickSort
  • Basic Idea
  • Split data into k equal-sized segments.
  • Sort segments concurrently (e.g., using
    quicksort).
  • Parallel k-way merge of sorted segments.
  • This approach does not work well in our case. Why?

8
Problem
  • No mechanism of load balancing in this approach.
  • This is a heterogeneous cluster and each machine
    has different cpu clock speed and memory.
  • We need to have some kind of mechanism which can
    provide some sort of load balancing while
    distributing the data among the different
    processors.

9
  • Assumptions
  • We have a performance vector perf , that
    contains the processing speeds of p different
    processors.
  • Values in perf i processor speed of i /
    slowest processor speed.
  • Numbers of duplicates is less than O( n / p ).

10
Best case scenario
  • n k perf 0 lcm (perf, p) .. k perf
    p-1 lcm (perf, p)
  • where, n input size
  • lcm (perf, p) lcm of values in perf
    vector,
  • k any integer.

11
Flexibility
  • If perf vector contains same values we have
    got an homogeneous cluster case.
  • Number of nodes in the cluster is scalable.

12
Algorithm Graphically
  • Phase I
  • Given Input size n,
  • No. of processor p,
  • perf i where i 0, 1, 2,.,p-1
  • Data is distributed to all processors as per
    their respective processing speed.
  • Data size of ith processor ( n perf i
    ) / totalperf
  • Each Processor sorts local data and chooses
    local regular samples.
  • No. of Local Samples L in ith processor ( p
    1 ) perf i
  • Gap between Samples in ith processor (n
    perf i ) / (totalperf L)
  • where, totalperf sum of all values in perf
    vector

13
  • Phase II
  • Local regular samples are received by processor
    0.
  • p 1 final pivots are selected and broadcasted
    to all the processors.

14
  • Phase III
  • Each processor receives p -1 Final Pivots from
    processor 0,
  • Each processor sends its local data to rest of
    the Processors,
  • Each processor i keeps values gt i -1th value in
    Final Pivots and lt ith value in Final Pivots and
    drops the rest of the data,
  • All data in ith processor now is less than i1th
    processor.

15
  • Phase IV
  • Data in all processors is sorted.
  • All the data is merged in the 0th processor and
    this is our sorted data.

16
Evaluation (Randomized Data)
  • 13MB 2001450 int
  • 27 MB 4002900 int
  • 55 MB 8005800 int
  • 110 MB 16011600 int
  • Timing starts before distribution of data to all
    processors and ends after gathering sorted data
    in processor 0

17
Next step
  • Further analysis on skewed data sets,
  • How does the performance vary by changing the
    number of nodes in heterogeneous cluster ?
  • If time permits we may include some intel based
    machines in our cluster.

18
References
  • A Synthesis of Parallel Out-of-Core Sorting
    Programs on Heterogeneous Clusters Cerin
    C., Fkaier, H. and Jemni, M Cluster Computing
    and the Grid, 2003. Proceedings. CCGrid 2003. 3rd
    IEEE/ACM International Symposium, On page(s) 78-
    85 2003
  • Evaluation of Two BSP Libraries through Parallel
    Sorting on Clusters Cerin, C. Proceedings of the
    Second Workshop on Cluster-Based Computing, May
    2000, Santa Fe, New Mexico
  • An out-of-core sorting algorithm for clusters
    with processors at different speed Cerin,
    C.Parallel and Distributed Processing Symposium,
    Proceedings International, IPDPS 2002, Abstracts
    and CD-ROM , 15-19 April 2002 Page(s) 72 -77
  • Parallel sorting on heterogeneous platforms
    Mateescu, G.High Performance Computing Systems
    and Applications, 2002. Proceedings. 16th Annual
    International Symposium

19
  • Cluster Computing Future Generation Computer
    Systems Buyya R., Jin H., Cortes T. Future
    Generation Computer Systems 18 (2002) v--viii -
    Guest Editorial Cluster
  • Algorithmic issues for (Distributed)
    Heterogeneous computing platforms Boudet V.
    ,Rastello F., Robert Y. 1999
  • A Simple, Fast Parallel Implementation of
    Quicksort and its Performance Evaluation on SUN
    Enterprise 10000 Philippas Tsigas and Yi Zhang
  • Parallel Sorting by regular Sampling Xiaobo Li
    Pau Li Jonathan Schaeffer John Shillington and
    Pak sze wang

20
  • Questions and Comments ?
Write a Comment
User Comments (0)
About PowerShow.com