Fast Communication and User Level Parallelism - PowerPoint PPT Presentation

About This Presentation

Title:

Fast Communication and User Level Parallelism

Description:

... at some areas where more control has been given to the user on parallelism ... Allows smaller granularity to programs for better parallelism and performance. ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 28

Provided by: howard105

Learn more at: https://www.cs.cornell.edu

Category:

more less

Transcript and Presenter's Notes

Title: Fast Communication and User Level Parallelism

1
Fast Communication and User Level Parallelism

Howard Marron

2
Introduction

We have studied systems that have attempted to
build transparent layers below the application
that created properties like replication and
group communication.
We will look at some areas where more control
has been given to the user on parallelism

3
Threads

Allows smaller granularity to programs for better
parallelism and performance.
Will have lower overhead than processes
Same program will run on one machine as a
multiprocessor with little or no modification
Threads in same process can easily communicate
since they share the same address space

4
Implementation

Do we want threads and if so where should we
implement them?

Latency in µsec on a Firefly system
5
Advantages and problems of ULT

Advantages
Thread switching does not involve the kernel
Scheduling can be application specific choose
the best algorithm.
ULTs can run on any OS. Only needs a thread
library

Disadvantages
Most system calls are blocking and the kernel
blocks processes. So all threads within the
process will be blocked
The kernel can only assign processes to
processors. Two threads within the same process
cannot run simultaneously on two processors

6
Advantages and inconveniences of KLT

Advantages
The kernel knows what the processing environment
is and will assign threads accordingly.
Blocking is done on a thread level
Kernel routines can be multithreaded

Disadvantages
Thread switching within the same process involves
the kernel. We have 2 mode switches per thread
switch.
This results in a significant slow down in thread
switches within same process

7
ULT with Scheduler Activations

Implement user level threads with the help of the
kernel.
Gain the flexibility and performance of ULT
Have functionality of KLT without the overhead

8
ULT over KLT

Kernel operates without knowledge of user
programming
User threads are never notified of what the
kernel schedules since it is transparent to user
Kernel schedules threads without respect to user
thread priorities and memory locations.

9
The Model
User level
Thread pool
Scheduler
Scheduler
Kernel runs an instance of the scheduler on each
processor.
P1
P2
10
Kernel Support of ULT

Kernel has control of processor allocation
ULT has control of what threads to run on
allocated processors
Kernel notifies ULT scheduler of any changes to
environment
ULT scheduler can notify Kernel of current
processor needs

11
Scheduler Activations

Add processor run a thread here
Processor preempted returns state of preempted
processor, can run another thread
Scheduler has blocked can run thread here
Scheduler has unblocked return thread to ready
list

12
How the kernel and scheduler work together
13
Hints to Kernel

Add more processors
This processor is idle

14
Critical Sections

Idea 1
On a CS conflict give control back to thread
holding lock
Thread will give control back after done with CS.
Found that was too slow to find if thread was in
CS
Hard to make thread give up control after CS is
done

15
Critical Sections (Cont.)

Idea 2
Make copies of critical sections available to
scheduler.
Compare PC of thread with CS to check if holding
a lock
Can run the copy of CS and will return sooner
than before since the release of the lock is
known to the scheduler.

16
Results
17
Results 2
18
Threads Summary

Best solution to threads problem will lay
somewhere between ULT and KLT
Both must cooperate for best performance
Want to have most of control in user level to
manage threads since kernel is far away from
threads

19
Remote Procedure Calls

A technique for constructing distributed systems
Allows user to have no knowledge of transport
system
Called procedure can be located anywhere
Strong client/server model of computing

20
Problems with RPC

Adds huge amount of overhead
More protection in every call
All calls trap to OS
Have to wait for response from other system
All calls treated the same worst case

21
Ways to improve

95lt all RPCs are to local domain
Optimize most taken path
Reduce number of system boundaries that RPC
crosses

22
Anatomy of a remote RPC
SERVER
CLIENT
Kernel
User
User
Protection checks
Interpret and Dispatch
callRPC()
Message transfer
Schedule
Run service
Protection checks
Reply
Message transfer
Wake up thread reschedule
23
Lightweight RPC (LRPC)

Create new routines for cross domain calls
Use RPC similar calls for cross system calls
Blur the line of client/server in new calls
Reduce number of variable copies to messages and
stacks by maintaining stacks that are dedicated
to individual calls
Eliminates needs to schedule threads on RPC
receipt at server, because processor can be
instructed to just switch the calling and called
threads

24
Anatomy of a local LRPC
CLIENT
Kernel
User
Protection checks
callRPC()
There is no need to schedule Threads here, the
scheduler Can be told to just switch The two
threads
Copy to Stack
Run service
Reply
Copy to Stack
Resume
25
Multiprocessors

Can cache whole processor contexts on idle
processors
Instead of context switching local processor for
cross domain calls, run procedure on cached
processor
Saves on TLB misses and other exchanges like
virtual memory

26
Results
27
LRPC Conclusions