Fast MultiThreading on Shared Memory MultiProcessors - PowerPoint PPT Presentation

About This Presentation

Title:

Fast MultiThreading on Shared Memory MultiProcessors

Description:

Number of Views:85

Avg rating:3.0/5.0

Slides: 12

Provided by: josephc53

Category:

more less

Transcript and Presenter's Notes

Title: Fast MultiThreading on Shared Memory MultiProcessors

1
Fast Multi-Threading on Shared Memory
Multi-Processors

2
Aims of Project

Implementation of MESH, a user level threads
package, on to a shared memory multi-processor
machine
Take advantage of concurrent processing while
maintaining the advantages of fine grain user
level thread scheduling with low latency context
switching
Enable concurrent inter-process communication on
same machine and on an Ethernet network through
the NIC

3
What Is MESH ?

A tightly coupled fine grain uni-processor user
level thread scheduler for the C language
MESH provides an environment in which to manage
user level threads
Makes use of inline active context switching
relying on the compiler knowledge of the
registers in use at any one time (min.c.s/w 55ns)
Direct hardware access close to maximum
theoretical limit when using jumbo frames
Communication API supports message pools, ports
and poolports

4
Concurrent Resource Access

Scheduler entry points are explicit
Scheduler entry occurs concurrently when using
more than one thread of execution
Access to global data needs to be protected from
concurrency
Data read access does not need to be protected
Data write access cannot occur concurrently with
data reads
Spin-lock protected resources with small critical
section providing minimum busy wait time
Spin on read to preserve cache

5
Scheduling in SMP-MESH

Shared run queue to store user level threads
descriptors at 32 levels of priority
Multiple Kernel level threads access it to
retrieve threads and place new ones and can lead
to data corruption
Lock protected run queue forces synchronisation
Fine thread granularity increases contention for
run queue lock

6
Scheduling in SMP-MESH (2)
Linux does not provide a private memory area for
each Kernel Level Thread unlike SunOS LWPs

Kernel level threads need knowledge of self
identification achieved through comparing stack
space
Kernel level thread should equal number of
processors for best utilization

7
Well Behaved Idling

Upon finding the run queue empty, kernel level
threads sleep in the kernel giving up the
processor for other applications to execute
Sleeping on a semaphore removes risk of lost
wakeup unlike signals and message passing
Upon re-awaking, the new user level threads are
passed directly to the sleeping thread, without
invoking run queue access

8
Load Balancing

No Kernel Level Thread is idle when a user level
thread is on the shared run-queue
The run-queues FIFO structure ensures that
oldest threads will be executed first
Cache consistency is not ensured when using
shared run-queue

9
Communication in SMP-MESH

Inter thread communication on same system and in
between different systems
All instances of message pools, ports and
poolports have a private lock providing maximum
concurrent communication
Consecutive memory needs to be protected when
creating messages
Message transmission to NIC using a lock,
reception from NIC using no lock

10
Results

Contention for shared resources at fine thread
granularity gives worse performance on an SMP
machine than on a uni-processor machine

11
Conclusion

SMP-MESH takes advantage of multi-processors for
fine grain multi-threading
Concurrency is encouraged in all areas unless
risk of data corruption exists
Overheads on the uni-processor MESH are expected
yet counter balanced as number of processors are
increased
Considerable speedup is available at minimum
cost, the main disadvantage is requiring more
careful synchronisation in application design