Title: Programming Multicore Processors
1Programming Multicore Processors
- Aamir Shafi
- High Performance Computing Lab
- http//hpc.seecs.nust.edu.pk
2Serial Computation
- Traditionally, software has been written for
serial computation - To be run on a single computer having a single
Central Processing Unit (CPU) - A problem is broken into a discrete series of
instructions
3Parallel Computation
- Parallel computing is the simultaneous use of
multiple compute resources to solve a
computational problem - Also known as High Performance Computing (HPC)
- The prime focus of HPC is performancethe ability
to solve biggest possible problems in the least
possible time
4Traditional Usage of Parallel Computing--Scientifi
c Computing
- Traditionally parallel computing is used to solve
challenging scientific problems by doing
simulations - For this reason, it is also called Scientific
Computing - Computational science
5Emergence of Multi-core Processors
- In the last decade, performance of processors is
not enhanced by increasing clock speed - Increasing clock speed directly increases power
consumption - Power is dissipated as heat, not practical to
cool down processors - Intel canceled a project to produce 4 GHz
processor! - This led to the emergence of multi-core
processors - Performance is increased by increasing processing
cores that run on lower clock speed - Implies better power usage
Disruptive Technology!
6Moores Law is Alive and Well
7Power Wall
8Why Multi-core Processors Consume Lesser Power
- Dynamic power is proportional to V2fC
- Increasing frequency (f) also increases supply
voltage (V) more than linear effect - Increasing cores increases capacitance (C) but
has only a linear effect
9Software in the Multi-core Era
- The challenge has been thrown to the software
industry - Parallelism is perhaps the answer
- The Free Lunch Is Over A Fundamental Turn Toward
Concurrency in Software - http//www.gotw.ca/publications/concurrency-ddj.ht
m - Some excerpts
- The biggest sea change in software development
since the OO revolution is knocking at the door,
and its name is Concurrency - This essentially means every software programmer
will be a parallel programmer - The main motivation behind conducting this
Programming Multicore Processors workshop
10About the Programming Multicore Processors
Workshop
11Instructors
- This workshop will be taught by
- Akbar Mehdi (http//hpc.seecs.nust.edu.pk/akbar/)
- Masters from Stanford University, USA
- NVIDIA CUDA API, POSIX Threads, Operating
Systems, Algorithms - Mohsan Jameel (http//hpc.seecs.nust.edu.pk/mohsa
n/) - Masters from KTH, Sweden
- Scientific Computing, Parallel Computing
- Languages, OpenMP
12Course Contents A little background on
Parallel Computing Approaches
13Parallel Hardware
- Three main classifications
- Shared Memory Multi-processors
- Symmetric Multi-Processors (SMP)
- Multi-core Processors
- Distributed Memory Multi-processors
- Massively Parallel Processors (MPP)
- Clusters
- Commodity and custom clusters
- Hybrid Multi-processors
- Mixture of shared and distributed memory
technologies
14First Type Shared Memory Multi-processors
- All processors have access to shared memory
- Notion of Global Address Space
15Symmetric Multi-Processors (SMP)
- A SMP is a parallel processing system with a
shared-everything approach - The term signifies that each processor shares the
main memory and possibly the cache - Typically a SMP can have 2 to 256 processors
- Also called Uniform Memory Access (UMA)
- Examples include AMD Athlon, AMD Opteron 200 and
2000 series, Intel XEON etc
16Multi-core Processors
17Second Type Distributed Memory
- Each processor has its own local memory
- Processors communicate with each other by message
passing on an interconnect
18Cluster Computers
- A group of PCs or workstations or Macs (called
nodes) connected to each other via a fast (and
private) interconnect - Each node is an independent computer
- Each cluster has one head-node and multiple
compute-nodes - Users logon to head-node and start parallel jobs
on compute-nodes - Two popular cluster classifications
- Beowulf Clusters (http//www.beowulf.org)
- Rocks Clusters (http//www.rocksclusters.org)
19Cluster Computer
Proc 1
Proc 2
Proc 0
Proc 3
Proc 7
Proc 6
Proc 4
Proc 5
20Third Type Hybrid
- Modern clusters have hybrid architecture
- Distributed memory for inter-node (between nodes)
communications - Shared memory for intra-node (within a node)
communications
21SMP and Multi-core clusters
- Most modern commodity clusters have SMP and/or
multi-core nodes - Processors not only communicate via interconnect,
but shared memory programming is also required - This trend is likely to continue
- Even a new name constellations has been proposed
22Classification of Parallel Computers
Parallel Hardware
Shared Memory Hardware
Distributed Memory Hardware
SMPs
Multicore Processors
Clusters
MPPs
In this workshop, we will learn how to program
shared memory parallel hardware Parallel
Hardware ? Shared Memory Hardware ?
23Writing Parallel Software
- There are mainly two approaches for writing
parallel software - The first approach is to use libraries (packages)
written in already existing languages - Economical
- The second and more radical approach is to
provide new languages - Parallel Computing has a history of novel
parallel languages - These languages provide high level parallelism
constructs
24Shared Memory Languages and Libraries
- Designed to support parallel programming on
shared memory platforms - OpenMP
- Consists of a set of compiler directives, library
routines, and environment variables - The runtime uses fork-join model of parallel
execution - Cilk
- A design goal was to support asynchronous
parallelism - A set of keywords
- cilk_for, cilk_spawn, cilk_sync
- POSIX Threads (PThreads)
- Threads Building Blocks (TBB)
25Distributed Memory Languages and Libraries
- Libraries
- Message Passing Interface (MPI)defacto standard
- PVM
- Languages
- High Performance Fortran (HPF)
- Fortran M
- HPJava
26Our Focus
- Shared Memory and Multi-core Processors Machines
- Using POSIX Threads
- Using OpenMP
- Using Cilk (covered briefly)
- Disruptive Technology
- Using Graphics Processing Units (GPUs) by NVIDIA
for general-purpose computing
We are assuming that all of us know the C
programming language
27Day One
Timings Topic Presenter
1000 to 1030 Introduction to multicore computing Aamir Shafi
1030 to 1130 Background discussionreview of processes, threads, and architecture. Speedup analysis Akbar Mehdi
1130 to 1145 Break
1145 to 1255P Introduction to POSIX Threads Akbar Mehdi
1255P to 125P Prayers break
125P to 230P Practical SessionRun hello world PThreads program, introduce Linux, top, Solaris. Also introduce the first coding assignment Akbar Mehdi
28Day Two
Timings Topic Presenter
1000 to 1100 POSIX Threads continued Akbar Mehdi
1100 to 1255P Introduction to OpenMP Mohsan Jameel
1255P to 125P Prayer Break
125P to 230P OpenMP continued Lab session Mohsan Jameel
29Day Three
Timings Topic Presenter
1000 to 1200 Parallelizing the Image Processing Application using PThreads and OpenMPPractical Session Akbar Mehdi and Mohsan Jameel
1200 to 1255P Introduction to Intel Cilk Aamir Shafi
1255 to 125P Prayer Break
125P to 230P Introduction to NVIDIA CUDA Akbar Mehdi
230P to 235P Concluding Remarks Aamir Shafi
30Learning Objectives
- To become aware of the multicore revolution and
its impact on the computer software industry - To program multicore processors using POSIX
Threads - To program multicore processors using OpenMP and
Cilk - To program Graphics Processing Units (GPUs) for
general purpose computation (using NVIDIA CUDA
API)
You may download the tentative agenda from
http//hpc.seecs.nust.edu.pk/aamir/res/mc_agenda.
pdf
31Next Session
- Review of important and relevant Operating
Systems and Computer Architecture concepts by
Akbar Mehdi .