Fast Communication and User Level Parallelism - PowerPoint PPT Presentation

About This Presentation
Title:

Fast Communication and User Level Parallelism

Description:

... at some areas where more control has been given to the user on parallelism ... Allows smaller granularity to programs for better parallelism and performance. ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 28
Provided by: howard105
Category:

less

Transcript and Presenter's Notes

Title: Fast Communication and User Level Parallelism


1
Fast Communication and User Level Parallelism
  • Howard Marron

2
Introduction
  • We have studied systems that have attempted to
    build transparent layers below the application
    that created properties like replication and
    group communication.
  • We will look at some areas where more control
    has been given to the user on parallelism

3
Threads
  • Allows smaller granularity to programs for better
    parallelism and performance.
  • Will have lower overhead than processes
  • Same program will run on one machine as a
    multiprocessor with little or no modification
  • Threads in same process can easily communicate
    since they share the same address space

4
Implementation
  • Do we want threads and if so where should we
    implement them?

Latency in µsec on a Firefly system
5
Advantages and problems of ULT
  • Advantages
  • Thread switching does not involve the kernel
  • Scheduling can be application specific choose
    the best algorithm.
  • ULTs can run on any OS. Only needs a thread
    library
  • Disadvantages
  • Most system calls are blocking and the kernel
    blocks processes. So all threads within the
    process will be blocked
  • The kernel can only assign processes to
    processors. Two threads within the same process
    cannot run simultaneously on two processors

6
Advantages and inconveniences of KLT
  • Advantages
  • The kernel knows what the processing environment
    is and will assign threads accordingly.
  • Blocking is done on a thread level
  • Kernel routines can be multithreaded
  • Disadvantages
  • Thread switching within the same process involves
    the kernel. We have 2 mode switches per thread
    switch.
  • This results in a significant slow down in thread
    switches within same process

7
ULT with Scheduler Activations
  • Implement user level threads with the help of the
    kernel.
  • Gain the flexibility and performance of ULT
  • Have functionality of KLT without the overhead

8
ULT over KLT
  • Kernel operates without knowledge of user
    programming
  • User threads are never notified of what the
    kernel schedules since it is transparent to user
  • Kernel schedules threads without respect to user
    thread priorities and memory locations.

9
The Model
User level
Thread pool
Scheduler
Scheduler
Kernel runs an instance of the scheduler on each
processor.
P1
P2
10
Kernel Support of ULT
  • Kernel has control of processor allocation
  • ULT has control of what threads to run on
    allocated processors
  • Kernel notifies ULT scheduler of any changes to
    environment
  • ULT scheduler can notify Kernel of current
    processor needs

11
Scheduler Activations
  • Add processor run a thread here
  • Processor preempted returns state of preempted
    processor, can run another thread
  • Scheduler has blocked can run thread here
  • Scheduler has unblocked return thread to ready
    list

12
How the kernel and scheduler work together
13
Hints to Kernel
  • Add more processors
  • This processor is idle

14
Critical Sections
  • Idea 1
  • On a CS conflict give control back to thread
    holding lock
  • Thread will give control back after done with CS.
  • Found that was too slow to find if thread was in
    CS
  • Hard to make thread give up control after CS is
    done

15
Critical Sections (Cont.)
  • Idea 2
  • Make copies of critical sections available to
    scheduler.
  • Compare PC of thread with CS to check if holding
    a lock
  • Can run the copy of CS and will return sooner
    than before since the release of the lock is
    known to the scheduler.

16
Results
17
Results 2
18
Threads Summary
  • Best solution to threads problem will lay
    somewhere between ULT and KLT
  • Both must cooperate for best performance
  • Want to have most of control in user level to
    manage threads since kernel is far away from
    threads

19
Remote Procedure Calls
  • A technique for constructing distributed systems
  • Allows user to have no knowledge of transport
    system
  • Called procedure can be located anywhere
  • Strong client/server model of computing

20
Problems with RPC
  • Adds huge amount of overhead
  • More protection in every call
  • All calls trap to OS
  • Have to wait for response from other system
  • All calls treated the same worst case

21
Ways to improve
  • 95lt all RPCs are to local domain
  • Optimize most taken path
  • Reduce number of system boundaries that RPC
    crosses

22
Anatomy of a remote RPC
SERVER
CLIENT
Kernel
User
User
Protection checks
Interpret and Dispatch
callRPC()
Message transfer
Schedule
Run service
Protection checks
Reply
Message transfer
Wake up thread reschedule
23
Lightweight RPC (LRPC)
  • Create new routines for cross domain calls
  • Use RPC similar calls for cross system calls
  • Blur the line of client/server in new calls
  • Reduce number of variable copies to messages and
    stacks by maintaining stacks that are dedicated
    to individual calls
  • Eliminates needs to schedule threads on RPC
    receipt at server, because processor can be
    instructed to just switch the calling and called
    threads

24
Anatomy of a local LRPC
CLIENT
Kernel
User
Protection checks
callRPC()
There is no need to schedule Threads here, the
scheduler Can be told to just switch The two
threads
Copy to Stack
Run service
Reply
Copy to Stack
Resume
25
Multiprocessors
  • Can cache whole processor contexts on idle
    processors
  • Instead of context switching local processor for
    cross domain calls, run procedure on cached
    processor
  • Saves on TLB misses and other exchanges like
    virtual memory

26
Results
27
LRPC Conclusions
  • RPCs can be improved for general case
  • Common case should be emphasized not the most
    general case
  • Can reduce many unnecessary tasks when optimizing
    for cross domain tasks.
Write a Comment
User Comments (0)
About PowerShow.com