Fast MultiThreading on Shared Memory MultiProcessors - PowerPoint PPT Presentation

About This Presentation
Title:

Fast MultiThreading on Shared Memory MultiProcessors

Description:

Implementation of MESH, a user level threads package, on to a shared memory ... The run-queue's FIFO structure ensures that oldest threads will be executed first ... – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 12
Provided by: josephc53
Category:

less

Transcript and Presenter's Notes

Title: Fast MultiThreading on Shared Memory MultiProcessors


1
Fast Multi-Threading on Shared Memory
Multi-Processors
  • Joseph Cordina
  • B.Sc. Computer Science and Physics
  • Year IV

2
Aims of Project
  • Implementation of MESH, a user level threads
    package, on to a shared memory multi-processor
    machine
  • Take advantage of concurrent processing while
    maintaining the advantages of fine grain user
    level thread scheduling with low latency context
    switching
  • Enable concurrent inter-process communication on
    same machine and on an Ethernet network through
    the NIC

3
What Is MESH ?
  • A tightly coupled fine grain uni-processor user
    level thread scheduler for the C language
  • MESH provides an environment in which to manage
    user level threads
  • Makes use of inline active context switching
    relying on the compiler knowledge of the
    registers in use at any one time (min.c.s/w 55ns)
  • Direct hardware access close to maximum
    theoretical limit when using jumbo frames
  • Communication API supports message pools, ports
    and poolports

4
Concurrent Resource Access
  • Scheduler entry points are explicit
  • Scheduler entry occurs concurrently when using
    more than one thread of execution
  • Access to global data needs to be protected from
    concurrency
  • Data read access does not need to be protected
  • Data write access cannot occur concurrently with
    data reads
  • Spin-lock protected resources with small critical
    section providing minimum busy wait time
  • Spin on read to preserve cache

5
Scheduling in SMP-MESH
  • Shared run queue to store user level threads
    descriptors at 32 levels of priority
  • Multiple Kernel level threads access it to
    retrieve threads and place new ones and can lead
    to data corruption
  • Lock protected run queue forces synchronisation
  • Fine thread granularity increases contention for
    run queue lock

6
Scheduling in SMP-MESH (2)
Linux does not provide a private memory area for
each Kernel Level Thread unlike SunOS LWPs
  • Kernel level threads need knowledge of self
    identification achieved through comparing stack
    space
  • Kernel level thread should equal number of
    processors for best utilization

7
Well Behaved Idling
  • Upon finding the run queue empty, kernel level
    threads sleep in the kernel giving up the
    processor for other applications to execute
  • Sleeping on a semaphore removes risk of lost
    wakeup unlike signals and message passing
  • Upon re-awaking, the new user level threads are
    passed directly to the sleeping thread, without
    invoking run queue access

8
Load Balancing
  • No Kernel Level Thread is idle when a user level
    thread is on the shared run-queue
  • The run-queues FIFO structure ensures that
    oldest threads will be executed first
  • Cache consistency is not ensured when using
    shared run-queue

9
Communication in SMP-MESH
  • Inter thread communication on same system and in
    between different systems
  • All instances of message pools, ports and
    poolports have a private lock providing maximum
    concurrent communication
  • Consecutive memory needs to be protected when
    creating messages
  • Message transmission to NIC using a lock,
    reception from NIC using no lock

10
Results
  • 500,000 context switches at differing thread
    granularity
  • Contention for shared resources at fine thread
    granularity gives worse performance on an SMP
    machine than on a uni-processor machine

11
Conclusion
  • SMP-MESH takes advantage of multi-processors for
    fine grain multi-threading
  • Concurrency is encouraged in all areas unless
    risk of data corruption exists
  • Overheads on the uni-processor MESH are expected
    yet counter balanced as number of processors are
    increased
  • Considerable speedup is available at minimum
    cost, the main disadvantage is requiring more
    careful synchronisation in application design
Write a Comment
User Comments (0)
About PowerShow.com