Title: Outline of Lecture 10
1Outline of Lecture 10
- Cache Replacement in Multithreaded Architectures
- Page Replacement
- Stack Replacement Algorithms and Their Properties
2Context Selection in Multithreaded Architectures
Context Selection
Functional Units
Memory
Interconnection Network
I/O
3Context Selection, States
Software Hardware
Hardware Controls
Registers
Status word
Program Counter
Running
Waiting
Ready
Context Selection
4Threads
Processor
Memory
Program Counter and Status Register
Ready Queue
Suspended Queue
...
...
Unloaded Ready Thread
Loaded Thread, it could be ready or suspended
CP
Context Pointer
Unloaded Suspended Thread
Register frames
5Context Switching
One-thread execution
Two-thread execution
Three-thread execution
Time
Context-switching
6Processor Efficiency
1.0
0.9
Linear Region
Saturation region
0
5
11
15
10
Number of Contexts
7 Efficiency Saturation Point
Factors influencing saturation point - context
switching time, ts - cache loading time, tc -
cache miss probability, pm Saturation point
happens at 1tc/(ts1/pm) number of
contexts. For small ts this is about 1tc
pm Large cache line yields small pm Large cache
line or slow memory result in large tc.
8Context Switching Analysis
Each gray block lasts 1/pm instructions and
accesses 1/pm-1 data items. There is also s
register sets, so s threads are ready to execute
with single instruction context switch.
One-thread execution
1/pm
Each gap lasts tc cycles
Two-thread execution, s2
tc -ts-1/pm
ts1/pm
ts1/pm
In general with ts1 tc -(s-1)(1/pm1)
- We assume that ts1, costs one instruction, so in
each of s threads lasts 1/pm instructions - and accesses 1/pm-1 (1-pm)/pm data items. Two
cases needs to be considered - s(1/pm1)lt tc 1/pm (gap is not filled) then in
time tc 1/pm the system accesses - s(1-pm)/pm data items so the ADDT is (tc
1/pm)/(s(1-pm)/pm) (1 tcpm)/(s-spm) - otherwise, the gap is filled so in time 11/pm
the system executes 1/pm-1 data - accesses and ADDT is (1/pm1/(1/pm-1)
(1pm)/(1-pm) - To combine this two formulas, we notice that
condition s(1/pm1)lt tc 1/pm means that - s(1pm) lt1 tcpm so (1 tcpm)/sgt1 pm and (1
tcpm)/(s-spm)gt(1 pm)/(1- pm). But then - the following expression gives us the correct
value - max((1 tcpm)/(s-spm),(1 pm)/(1- pm))
max((1 tcpm)/s,1 pm)/(1- pm).
9Performance in Virtual Memory
- The performance of a virtual memory management
system depends on the total number of page
faults, which depend on - The paging policies, including frame allocation
- Static allocation - the number of frames
allocated to a process is fixed - Dynamic allocation - the number of frames
allocated to a process changes - The frame allocation policies
10Page Replacement
- When there is a page fault, the referenced page
must be loaded - If there is no available frame in memory one page
is selected for replacement - If the selected page has been modified, it must
be copied back to disk (swapped out) - Same pages may be referenced several times, so
for good performance a good replacement algorithm
will strive to cause minimum number of page
faults.
11Paging Policies
- Fetch policy -- decides when a page should be
loaded into memory -gt demand paging - Replacement policy -- decides which page in
memory should be replaced -gt difficult - Placement policy -- decides where in memory
should a page be loaded -gt easy for paging
12Page Faults and Performance Issues
- A page fault requires the operating system to
carry out the page fault service. The total time
it takes to service a page fault includes several
time components - The time interval to service the page fault
interrupt - system - The time interval to store back (swap out) the
replaced page to the secondary storage device
process/cleaning - The time interval to load (swap in) the
referenced page from the secondary storage device
(disk unit) - process - Delay in queuing for the secondary storage
device - process - Delay in scheduling the process with the
referenced page - process
13Demand Paging
- In demand paging, a page fault occurs when a
reference is made to a page not in memory. The
page fault may occur while - fetching an instruction, or
- fetching an operand of an instruction.
14Problems to be Solved within Demand Paging
- Two major problems must be solved to implement
demand paging - Each process needs a minimum number of frames.
This minimum number is based on the machine
architecture. - 1. Frame allocation - decide how many frames to
allocate to each process, usually needed only for
loading the process to the memory initially. - 2. Page replacement - select which pages are to
be replaced when a page fault occurs.
15Page Replacement Algorithms
- Paging System may be characterized by 3 items
- The Reference String
- The Page Replacement Algorithm
- The number of page frames available in memory, m
- A page replacement algorithm is said to satisfy
the inclusion property or is called a stack
algorithm if the set of pages in a k-frame memory
is always a subset of the pages in a (k
1)-frame memory.
16Page Reference
- A page reference string is a sequence of page
numbers in order of reference - An example of a sequence of page references is
- lt 3,6,2,1,4,7,3,5,8,9,2,8,10,7 gt
- The sequence of page references represents the
behavior of a process during execution - Every Process generates a sequence of memory
references as it runs - Each memory reference corresponds to a specific
virtual page - A process memory access may be characterized by
an ordered list of page numbers, referred to as
the reference string
17Reference String
- w r1 r2 ...rT-1 rT sequence of virtual page
references - M0 - initial memory state
- M0, M1, ... MT real memory state
- Mt under request for page rt is Mt-1 Xt - Yt
- where Xt pages brought in,
- and Yt pages moved out in t-th step
For demand driven fetching
18Cost of Fetching
- f(k) - cost of fetching k pages,
- f(1) 1 tseek ttransfer
- f(0) 0
- f(k1) gt f(k). The cost C(m,w) is
for demand replacement with it
simplifies to C(m,w) p x f(1)p,
where p denotes number of page faults in 1...T.
19Demand Policy Optimality
for electronic auxiliary memory (electronic
disk) f(k) kf(1)k (ignoring costs
of page fault interrupt). When f(k)k, there is a
demand replacement algorithm with a cost function
that does at least as well for all memory sizes
and reference strings as any other algorithm,
nice result but not very useful when disks are
used! Conclusion pre-fetching helps by bringing
more than one page per page fault but it is
difficult to predict which pages to pre-fetch!
20Replacement Policies
- OPT (Optimal) - remove the page with next
reference most distant into the future, - FIFO (First In First Out)- circulating pointer,
often inefficient, - LIFO (Last In First Out) - special situations
(sequential access), - LFU (Least Frequently Used) good performance
- MRU (Most Recently Used), not the same as Last In
First Out, e.g., consider string 12314 with m3,
then after 1231, MRU will replace 1 while LIFO 3 - Example w (123)r (456)s, m3.
- LFU - 3 x (1min(r,s)), MRU - 3 x (1s) page
faults.
21Stack Algorithm Definition
- M(m,w) is a state of real memory after
referencing string w in m frames, where M(m,0)
0. - Inclusion property characterizing stack
algorithms
Given w, there is a permutation of the virtual
pages labeled 1,2,...,n, called stack, S(w)
s1 (w), ..., sn (w) such that M(m,w) s1
(w), ..., sm (w) For a sequence of references,
there is a sequence of stacks. dp(w) is the
distance of page p from the top of the stack
after processing string w.
22Stack Updating via Priority
- Consider stack-updating procedure through
priority list with the following properties - Priority is independent of the number of frames
m. - The currently referenced page has highest
priority. - 3. The resident page with the lowest priority is
moved up in stack (from real memory) only when
necessary and only to its priority level. - With these three properties, class of priority
algorithms is the same as class of stack
algorithms, since only removal can brake stack
property and if M is the stack of m-frames and y
is the page in m1 frame, then -
Selecting a victim with m1 frames Selecting
a victim with m frames Although minminM,y
may be y, the stack is the same with m and m1
frames, because in such a case m1st frame will
contain minM.