Multitasking and Parallelism - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Multitasking and Parallelism

Description:

Tilera produces 64-core chips (TILE64) with an architecture made for many cores ... AMD.com. PCLaunches.com (New Intel Processors) Tilera.com ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 18
Provided by: boss79
Category:

less

Transcript and Presenter's Notes

Title: Multitasking and Parallelism


1
Multitasking and Parallelism
  • Kristopher Windsor
  • CS 147, Fall 2008

2
Table of contents
  • Parallel processing on one core
  • Multicore usage, difficulties, and next steps
  • Alternatives to multicore CPUs
  • Multicore benchmarks

3
Optimizing each clock cycle
  • Multiple instructions and / or data can be
    processed each cycle, for batch-processing
    efficiency
  • For example, MMX has many ALUs operate
    simultaneously to process multiple data
  • Vector architecture is similar to SIMD, but its
    speed comes from parallel data movement, not
    parallel data processing

4
Hardware multithreading
  • Required whenever there are more threads than
    cores
  • There are multiple ways for a core to switch to a
    different thread
  • Fine-grained multithreading switch every cycle
  • Course-grained multithreading switch when the
    current thread is stalled (IE it is waiting for
    some data to come back from the RAM)
  • Simultaneous multithreading (SMT) multiple
    threads are processed each cycle

5
Reasons for multiple cores and processors
  • Clock speed limits for each core due to heat
  • Heat produced is exponentially related to clock
    speed, and cooling methods are limited
  • This limit has already been reached, and one core
    is not enough
  • Power efficiency
  • Smaller CPU designs can be optimized better
  • Individual cores or processors can be turned off
    when not needed

6
Two types of multicore use
  • Job-level parallelism
  • Parallel processing program
  • Each process can only use one core
  • Easier to code
  • Most programs are written like this
  • Inefficient when you have multiple cores but only
    one main program
  • Each process can have multiple threads, which run
    on different cores
  • Harder to code
  • Used in OS, which has many independent tasks, and
    in web servers, where each request can be handled
    separately
  • Best use of multiple cores

7
Problem Parallel processing Game programming
dilemma
  • Software-rendered display represents most of the
    games CPU usage (IE more than the physics
    calculations), and the graphics output cannot
    naturally be split into multiple threads
  • 3D hardware-accelerated graphic output is
    typically the performance bottleneck, and since
    the GPU is 50x faster on a video card than on a
    CPU, multicore CPUs will not help
  • In games where every object can collide with
    every other object, physics cannot be
    parallelized easily because any two collisions
    may need to access the same memory
  • Every event has to happen in order, but parallel
    processing does not naturally do this

8
Problem Parallel processing Complexity
  • Sequential
  • Concurrent
  • Dim Shared As Integer total
  • Sub program ()
  • 'this part can be done several times at once
  • 'because it does not depend on
  • 'other parts of the program
  • Dim As Integer addme 0
  • For i As Integer 1 To 10000
  • addme 1
  • Next i
  • 'accesses a global variable
  • total addme
  • End Sub
  • For i As Integer 1 To 100
  • program()
  • Next i
  • Dim Shared As Integer total
  • Dim Shared As Any Ptr mutex
  • Sub program ()
  • Dim As Integer addme 0
  • For i As Integer 1 To 10000
  • addme 1
  • Next i
  • Mutexlock(mutex)
  • total addme
  • Mutexunlock(mutex)
  • End Sub
  • mutex Mutexcreate()
  • Dim As Any Ptr threads(1 To 100)
  • For i As Integer 1 To 100
  • threads(i) Threadcreate(_at_program())
  • Next i

9
Problem Parallel processing Cache coherance
  • Each processor has its own cache
  • If one processor changes the memory, the other
    processors may have the wrong data cached
  • Snooping protocol when one processor changes the
    data, every other processor must remove
    (invalidate) its copy
  • AMDs MOESI protocol every cache block has data
    in one of these five states modified, owned,
    exclusive, shared, or invalid

10
Amdahls law
  • Adding several cores to a machine will provide
    limited speed improvements, because the other
    components have not been upgraded
  • In this example, adding cores allows more FLOPs,
    but not more data transfer

11
Parallel processing next steps
  • Intel is developing 6 and 8 core processors
    (Westmere and Nehalem)
  • Tilera produces 64-core chips (TILE64) with an
    architecture made for many cores
  • Removes the bus data-transfer bottleneck
  • Saves power by powering-off individual cores
  • Comes with developer tools for making parallel
    processing programs

12
Alternative architecture the GPU
  • CPU
  • GPU
  • Slowly adopting multiple cores
  • Caches exploit locality
  • Needs low-latency RAM
  • Naturally better suited to parallelism, and uses
    major multithreading to achieve performance
  • The GeForce 8800 GTX has 16 multiprocessors and
    16 8 multithreaded floating-point processors
  • No locality uses course-grained hardware
    multithreading to minimize time loss
  • Needs high-bandwidth RAM

13
Alternative architecture clusters
  • Costs
  • Benefits
  • Maintenance and storage costs for each machine
  • Operating systems will take RAM from each machine
  • Resources such as RAM cannot be shared well among
    machines
  • Can be built with mass-produced computers and
    standard LAN hardware.
  • Can reach sizes beyond the limits of current
    multicore chips
  • Can be spread over multiple physical locations
  • Gives your company more bandwidth than any one
    ISP offers
  • Provides redundancy in case of fire or power
    outage
  • Can be upgraded without replacing the current
    hardware

14
Benchmarks
15
Benchmarks
  • Sparse Matrix-Vector multiplication test and the
    Lattice-Boltzmann Magneto-Hydrodynamics test give
    different results
  • Less FLOPs per core when there are many cores
  • Upgrading from 2 cores to 4 may have little
    effect
  • Certain processors better for certain
    applications (IE Xeon)
  • Multicores demand new methods of software
    optimization

16
References
  • Computer Organization and Design the Hardware /
    Software Interface, 4th ed., by David A.
    Patterson and John L. Hennessy
  • AMD.com
  • PCLaunches.com (New Intel Processors)
  • Tilera.com

17
The end
Write a Comment
User Comments (0)
About PowerShow.com