CS 2200 Lecture 27 Review

About This Presentation

Title:

CS 2200 Lecture 27 Review

Description:

CS 2200 Lecture 27. Review ... A: average sigma. B: average (i.e. above average is a B) C: average - sigma. D: average - 2*sigma ... – PowerPoint PPT presentation

Number of Views:133

Avg rating:3.0/5.0

Slides: 126

Provided by: michaelt8

Category:

more less

Transcript and Presenter's Notes

Title: CS 2200 Lecture 27 Review

1
CS 2200 Lecture 27Review

(Lectures based on the work of Jay Brockman,
Sharon Hu, Randy Katz, Peter Kogge, Bill Leahy,
Ken MacKenzie, Richard Murphy, and Michael
Niemier)

2
Logistics Grade Breakdown

Course highlights
Test 1 and Test 2 2/5/04, 3/25/04
Final exam TBD, please consult OSCAR
Grade breakdown
Homeworks 10
Projects 30
Test 1 20
Test 2 20
Final Exam 20
(FYI grade breakdown is essentially the same for
both sections)
(Tests, HWs, projects, etc. will also be the
same)
Course website
http//www.cc.gatech.edu/classes/AY2004/cs2200_spr
ing

3
Logistics Grade Cutoffs

Grade cutoffs are calculated as follows
(sigma is the standard deviation)
A average sigma
B average (i.e. above average is a B)
C average - sigma
D average - 2sigma

4
Instruction Set Architecture
Instruction Set Architectures
C
Fortran
Ada
etc.
Basic
Java
Compiler
Compiler
Assembly Language
Byte Code
Assembler
Interpreter
Executable
HW Implementation 1
HW Implementation N
HW Implementation 2
5
Pros and cons for each ISA type
6
ISAs What about memory addresses?

Usually instruction sets are byte addressed
Provide access for bytes (8 bits), half-words (16
bits), words (32 bits), double words (64 bits)
Two different ordering types big/little endian

Little Endian
31
23
15
7
0
Puts byte w/addr. xx00 at least significant
position in the word
Big Endian
31
23
15
7
0
Puts byte w/addr. xx00 at most significant
position in the word
7
ISAs Endianess

No, were not making this up.
at word address 100 (assume a 4-byte word)
long a 0x11223344
big-endian (MSB at word address) layout
100 101 102 103
100 11 22 33 44
0 1 2 3
little-endian (LSB at word address) layout
103 102 101 100
11 22 33 44 100
3 2 1 0

8
ISAs Procedures Assembly Language

Software conventions (Why?)
Reserve some number of registers for parameters,
return values, and return address
e.g. LC2200
5 for params, 1 for return values, one for return
address
JALR ltproc-addr in reggt, ra ra is
return-addr
JALR ra, zero Where does this go?
What if we have more params or return values?
Common use stack/memory
Registers used in procedures
Temporary registers
Caller does not expect value to be preserved upon
return
LC2200 a0 to a4
Saved registers
Caller does expect value to be preserved on
return
LC2200 s0 to s3 (simplifies amount of state to
be saved)

9
ISAs Example LC2200 Registers
Recall
10
ISAs Caller/Callee Mechanics
who does what when?

Four places

foo() bar(int a)

int temp 3 bar(42)
... ...
return(temp a)

2. callee at entry
1. caller at call time
4. caller after return
3. callee at exit
11
ISAs Caller/Callee Conventions
do most work at callee entry/exit

Caller at call time
put arguments in a0..a4
save any caller-save temporaries
jalr ..., ra
Callee at entry
allocate all stack space
save ra s0..s3 if necessary
Callee at exit
restore ra s0..s3 if used
deallocate all stack space
put return value in v0
Caller after return
retrieve return value from v0
restore any caller-save temporaries

most of the work
12
ISAs Instruction Formatting

Human Readable
add s0, s1, a2
Machine Readable
0 9 5 0000 b16
0000 1001 0101 0000 0000 0000 0000 1011

13
Metrics So, how do we compare?

Best to stick with execution time! (more later)
If we say X is faster than Y, we mean the
execution time is lower on X than on Y.
Alternatively

Execution timeY
X is n times faster than Y
n
Execution timeX
1
i.e. 200 MHz
Execution timeY
PerformanceX
PerformanceY
n

1
Execution timeX
Performancey
PerformanceX
i.e. 50 MHz
50/200 ¼, therefore x 4 times slower than y
14
Metrics Amdahls Law

Qualifies performance gain
Amdahls Law defined
The performance improvement to be gained from
using some faster mode of execution is limited by
the amount of time the enhancement is actually
used.
Amdahls Law defines speedup

Perf. for entire task using enhancement when
possible
Speedup
Perf. For entire task without using enhancement
Or
Execution time for entire task without enhancement
Speedup
Execution time for entire task using
enhancement when possible
15
Metrics Amdahls Law and Speedup

Speedup tells us how much faster the machine will
run with an enhancement
2 things to consider
1st
Fraction of the computation time in the original
machine that can use the enhancement
i.e. if a program executes in 30 seconds and 15
seconds of exec. uses enhancement, fraction ½
(always lt 1)
2nd
Improvement gained by enhancement (i.e. how much
faster does the program run overall)
i.e. if enhanced task takes 3.5 seconds and
original task took 7, we say the speedup is 2
(always gt 1)

16
Metrics Amdahls Law Equations
Fractionenhanced
Execution timenew
Execution timeold x
(1 Fractionenhanced)
Speedupenhanced
1
Execution Timeold
Speedupoverall

Execution Timenew
Fractionenhanced
(1 Fractionenhanced)
Speedupenhanced
Use previous equation, Solve for speedup
Please, please, please, dont just try to
memorize these equations and plug numbers into
them. Its always important to think about the
problem too!
17
Metrics Amdahls Law Example

A certain machine has a
Floating point multiply that runs too slow
It adversely affects benchmark performance.
One option
Re-design the FP multiply hardware to make it run
15 times faster than it currently does.
However, the manager thinks
Re-designing all of the FP hardware to make each
FP instruction run 3 times faster is the way to
go.
FP multiplies account for 10 of execution time.
FP instructions as a whole account for 30 of
execution time.
Which improvement is better?

18
Metrics Amdahls Law Example (cont.)

The speedup gained by improving the multiply
instruction is
1 / (1-0.1) (0.1/15) 1.10
The speedup gained by improving all of the
floating point instructions is
1 / (1-0.3) (.3/3) 1.25
Believe it or not, the manager is right!
Improving all of the FP instructions despite the
lesser improvement is the better way to go

19
More CPU metrics

Instruction count also figures into the mix
Can affect throughput, execution time, etc.
Interested in
instruction path length and instruction count
(IC)
Using this information and the total number of
clock cycles for program can determine clock
Cycles Per Instruction (CPI)

20
Metrics The Bigger Picture

Recall
We can see CPU performance dependent on
Clock rate, CPI, and instruction count
CPU time is directly proportional to all 3
Therefore an x improvement in any one variable
leads to an x improvement in CPU performance
But, everything usually affects everything

Clock Cycle Time
Instruction Count
Hardware Tech.
Compiler Technology
Organization
ISAs
CPI
21
Dataflow the single-bus datapath
y a bx cx2
Ex. A2, B4, C6, x2
Part 1 C ? x D ? x
Part 2 D ? x2 C ? 6
Part 3 A ? Cx2
Part 4 C ? 4 D ? x
Part 5 B ? Bx
Part 6 B ? Bx Cx2 (or RegA RegB)
Part 7 A ? A
Part 8 Y ? RegA RegB
22
Dataflow A Single BusAny (and all) functional
units can access bus
Functional Unit
Functional Unit
Functional Unit
Functional Unit
(Remember, we need to assert the right control
signals, at the right time, in the right order)
23
The beginnings of a generic dataflow

Abstract / Simplified View
Two types of signals data and control
clocking strategy
All storage elements are clocked by the same
clock edge.

Data
Address
PC
Ra
Instruction
Address
Rb
A
L
U
Instruction Memory
Register File
Rw
Data Memory
Data
24
Dataflow Ex. MIPS Instruction Formats

R-type format (i.e. ADD, SUB, OR, etc.)
I-type format (i.e. ADDI, LW, SW, BEQ)
J-type (i.e. JUMP)

(Remember, opcodes/function codes used to
generate control signals)
25
Single cycle MIPS dataflow
26
A pipelined dataflow
Need to carry control signals too
IF/ID
ID/EX
EX/MEM
MEM/WB
4
M u x
ADD
PC
Branch taken
Comp.
IR6...10
M u x
Inst. Memory
IR11..15
Register File
ALU
MEM/ WB.IR
M u x
Data Mem.
M u x
Data must be stored from one stage to the next in
pipeline registers/latches. hold temporary values
between clocks and needed info. for execution.
Sign Extend
16
32
27
Dataflow Execution Sequence Summary (MIPS)
IR ? MemoryPC
PC ? PC 4
A ? RegIR(2521)
B ? RegIR(2016)
ALUOut ? PC SignEx(IR(150) ltlt 2)
3 cycles
3 cycles
4 cycles
5 cycles
28
Dataflow FSM (MIPS Machine)
Instruction Fetch
MemRead ALUSrcA 0 IorD 0 IRWrite ALUSrcB
01 ALUOp 00 PCWrite PCSource 00
Instruction decode/ Register fetch
1
0
ALUSrcA 0 ALUSrcB 11 ALUOp 00
start
8
9
Branch Completion
Memory address computation
Jump Completion
2
6
Execution
ALUSrcA 1 ALUSrcB 00 ALUOp
01 PCWriteCond PCSource 01
ALUSrcA 1 ALUSrcB 10 ALUOp 00
ALUSrcA 1 ALUSrcB 00 ALUOp 10
PCWrite PCSource 10
Memory access
5
Memory access
RegDst 1 RegWrite MemToReg 0
MemRead IorD 1
MemRead IorD 1
3
Tells us what values are needed and during what
step
R-type completion
7
RegDst 0 RegWrite MemToReg 1
4
Memory read completion
29
Interrupts
Address Bus
Processor
Data Bus
Int
Inta
Device 1
Device 2
If the processor decides to handle the interrupt
it asserts the inta (interrupt acknowledege) line
30
Example Device Interrupt(Say, arrival of
network message)
Save registers ? lw r1,20(r0) lw r2,0(r1) addi
r3,r0,5 sw 0(r1),r3 ? Restore registers Clear
current Int RETI
? add r1,r2,r3 subi r4,r1,4 slli
r4,r4,2 Hiccup(!) lw r2,0(r4) lw r3,4(r4) add r2
,r2,r3 sw 8(r4),r2 ?
(callee save)
External Interrupt
Interrupt Handler
code to handle int.
(callee restore)
(reset bit)
(return from interrupt)
31
Program Execution w/Protection ( w/IO)
I/O (kernel) space
a loop
user space
PC (mem. addr.)
a system call
kernel space
an interrupt
time
32
Pipelining Lessons

Multiple tasks operating simultaneously
Pipelining doesnt help latency of single task,
it helps throughput of entire workload
Pipeline rate limited by slowest pipeline stage
Potential speedup Number pipe stages
Unbalanced lengths of pipe stages reduces speedup
Also, need time to fill and drain the
pipeline.

6 PM
7
8
9
Time
T a s k O r d e r
33
(Pipelining) More on throughput

All pipe stages are connected so everything must
move from one to another at the same time
How fast this can occur is a function of the time
it takes for the slowest stage to finish
Example If a laundry takes 30 min. to wash but
40 min. to dry, its going to sit in the washer
for 10 min. idle
In a mP, this is the machine cycle time (usually
1 clock)
If a each pipe stage is perfectly balanced time
wise
Time/Instruction Time on unpipelined/ of pipe
stages
Therefore speedup from pipelining of pipe
stages
But of course nothings perfect!

34
Speed Up Equation for Pipelining
For simple RISC pipeline, CPI 1. W/microcode,
unpipelined CPI pipeline depth
Single-cycle HW would have a slow clock
35
The hazards of pipelining

Pipeline hazards prevent the next instruction
from executing during its designated clock cycle
There are 3 classes of hazards
Structural Hazards
Arise from resource conflicts when HW cannot
support all possible combinations of instructions
Data Hazards
Occur when a given instruction depends on data
from an instruction ahead of it in the pipeline
Control Hazards
Result from branch type and other instructions
that change the flow of the program (i.e. the PC)

36
(Pipelining) Stalls and performance

Stalls impede the progress of a pipeline and
result in the deviation of 1 instruction
executing each clock cycle
Recall that pipelining can be viewed to
Decrease the CPI or clock cycle time for an
instruction
Lets see what affect stalls have on CPI
CPI pipelined
Ideal CPI Pipeline stall cycles per instruction
1 Pipeline stall cycles per instruction
Ignoring overhead and assuming stages are
balanced

37
Process States
New
Terminated
Ready
Running
A longer example using these states later on in
lecture
Waiting
38
Process Control Block
Pointer
Process State
Process Number
Program Counter
Another question for the class Why do we need
each one of these elements?
Registers
Scheduling Info
Memory Limits
I/O Status Info
Accounting Info
39
Process Scheduling
CPU
Ready Queue
I/O
I/O Request
I/O Queue
Time Slice Expired
Child Executes
Fork a Child
Wait for an Interrupt
Interrupt Occurs
40
(Process) Scheduling Algorithms

First-Come, First-Served
Shortest-Job-First
Priority
Round-Robin
Multilevel Queue
Multilevel Feedback Queue

41
Average Memory Access Time
AMAT HitTime (1 - h) x MissPenalty

Hit time basic time of every access.
Hit rate (h) fraction of access that hit
Miss penalty extra time to fetch a block from
lower level, including time to replace in CPU

42
The Full Memory Hierarchyalways reuse a good
idea
Capacity Access Time Cost
Upper Level
Staging Xfer Unit
faster
CPU Registers 100s Bytes lt10s ns
Registers
prog./compiler 1-8 bytes
Instr. Operands
Cache K Bytes 10-100 ns 1-0.1 cents/bit
Cache
cache cntl 8-128 bytes
Blocks
Main Memory M Bytes 200ns- 500ns .0001-.00001
cents /bit
Memory
OS 4K-16K bytes
Pages
Disk G Bytes, 10 ms (10,000,000 ns) 10 - 10
cents/bit
Disk
-5
-6
user/operator Mbytes
Files
Larger
Tape infinite sec-min 10
Tape
Lower Level
-8
43
(Memory) Caches where we put data
Fully Associative
Direct Mapped
Set Associative
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
Cache
Set 0
Set 1
Set 2
Set 3
Block 12 can go anywhere
Block 12 can go only into Block 4 (12 mod 8)
Block 12 can go anywhere in set 0 (12 mod 4)
1 2 3 4 5 6 7 8 9..
Memory
12
44
Ex. the Alpha 20164 data and instruction cache
(Memory)
Block Addr.
Block Offset
lt21gt
lt8gt
lt5gt
1
CPU Address
Tag
Index
4
Data in
Data out
Valid lt1gt
Tag lt21gt
Data lt256gt
(256 blocks)
2
...
41 Mux
3
?
Lower level memory
45
Ex. Cache Math (Memory)

First, the address coming into the cache is
divided into two fields
29-bit block address and a 5-bit block offset
Block offset further divided into
An address tag and a cache index
The cache index selects the tag to be tested to
see if the desired block is in the cache
Size of the index depends on cache size, block
size, and set associativity
So, the index is 8-bits wide and the tag is 29-8
21 bits

46
Memory access equations

Using what we defined on the previous slide, we
can say
Memory stall clock cycles
Reads x Read miss rate x Read miss penalty
Writes x Write miss rate x Write miss penalty
Often, reads and writes are combined/averaged
Memory stall cycles
Memory access x Miss rate x Miss penalty
(approximation)
Its also possible to factor in instruction count
to get a complete formula

47
A clear explanation of pages vs. segments!
(Virtual Memory)
Segments
Paging
(Much like a cache!)
(Pages no longer the same size)
Virtual Address
Now, 2 fields of the virtual address have
variable length
Page
Offset

One specifies the segment
Other offset wi/page

Segment
Offset
Segment
Offset
Offset
Frame
The Problem This length can vary
Can concatenate physical address w/offset as
all pages the same size
Offset takes you to specific word
48
Address Translation in a Paging System (Virtual
Memory)
49
Translation in a Segmentation System (Virtual
Memory)
This could use all 32 bits cant just
concatenate
50
A Memory Hierarchy Flow Chart
TLB access
Virtual Address
Yes
No
TLB Hit?
Yes
No
Try to read from page table
Write?
Try to read from cache
Set in TLB
Page fault?
Cache/buffer memory write
Yes
Yes
No
No
Cache hit?
Replace page from disk
TLB miss stall
Cache miss stall
Deliver data to CPU
51
A disk, pictorially (I/O)

When accessing data we read or write to a sector
All sectors the same size, outer tracks just less
dense
To read or write, moveable arm with read/write
head moves over each surface
Cylinder all tracks under the arms at a given
point on all surfaces
To read or write
Disk controller moves arm over proper track a
seek
The time to move is called the seek time
When sector found, data is transferred

52
Disk Terminology (I/O)
Cylinder Track 'x' on all platters/surfaces
53
Disk Device Terminology (I/O)

Several platters, with information recorded
magnetically on both surfaces (usually)

Bits recorded in tracks, which in turn divided
into sectors (e.g., 512 Bytes)

Actuator moves head (end of arm,1/surface) over
track (seek), select surface, wait for sector
rotate under head, then read or write
Cylinder all tracks under heads

54
Ex. average disk access time (I/O)

What is the average time to read or write a
512-byte sector for a typical disk?
The average seek time is given to be 9 ms
The transfer rate is 4 MB per second
The disk rotates at 7200 RPM
The controller overhead is 1 ms
The disk is currently idle before any requests
are made (so there is no queuing delay)
Average disk access time
average seek time average rotational delay
transfer time controller overhead

55
Allocation Strategies (I/O)

Fixed contiguous regions
Contiguous regions with overflow areas
Linked allocation
File Allocation Table (FAT) MS-DOS
Indexed Allocation
Multilevel Indexed Allocation
Hybrid (BSD Unix)

56
Disk Scheduling Algorithms (I/O)

First come, First served
Shortest seek time first
SCAN (elevator algorithm)
C-SCAN
Look
C-Look

Same as SCAN but reverse direction if no more
requests in the scan direction Leads to better
performance than SCAN
57
Speedup (Parallel Processing) metric for
performance on latency-sensitive applications

Time(1) / Time(P) for P processors
note must use the best sequential algorithm for
Time(1) -- the parallel algorithm may be
different.

linear speedup (ideal)
speedup
typical rolls off w/some of processors
1 2 4 8 16 32 64
occasionally see superlinear... why?
1 2 4 8 16 32 64
processors
58
Speedup Challenge (Parallel Processing)

To get full benefit of parallelism need to be
able to parallelize the entire program!
Amdahls Law
Timeafter (Timeaffected/Improvement)Timeunaffec
ted
Example We want 100 times speedup with 100
processors
Timeunaffected 0!!!

59
Shared-Memory Hardware (1)Hardware and
programming model dont have to match, but this
is the mental model for shared-memory programming

Memory centralized with uniform access time
(UMA) and bus interconnect, I/O
Examples Dell Workstation 530, Sun Enterprise,
SGI Challenge

typical
1 cycle to local cache
20 cycles to remote cache
100 cycles to memory

60
Shared-Memory Hardware (2)

Variation memory is not centralized. Called
non-uniform access time (NUMA)
Shared memory accesses are converted into a
messaging protocol (usually by hardware)
Examples DASH/Alewife/FLASH (academic), SGI
Origin, Compaq GS320, Sequent (IBM) NUMA-Q

61
Multiprocessor Cache Coherency

Means that values in cache and memory are
consistent or that we know they are different and
can act accordingly
Considered to be a good thing.
Becomes more difficult with multiple processors
and multiple caches!
Popular technique Snooping!
Write-invalidate
Write-update

62
Cache coherence protocols (Multiprocessors)

Directory Based
Whether or not some physical memory location is
shared or not is recorded in 1 central location
Called the directory
Snooping
Every cache with entries from the centralized
main memory also have that particular blocks
sharing status
No centralized state is kept
Caches are connected to the shared memory bus
Whenever there is bus traffic, the cache check
(or snoop) to see if they have the block being
transferred on the bus

63
What is a Thread?

Basic unit of CPU utilization
A lightweight process (LWP)
Consists of
Program Counter
Register Set
Stack Space
Shares with peer threads
Code
Data
OS Resources
Open files
Signals

64
Threads
Recall from board code, data, files shared No
process context switching

Can be context switched more easily
Registers and PC
Not memory management
Can run on different processors concurrently in
an SMP
Share CPU in a uniprocessor
May (Will) require concurrency control
programming like mutex locks.

This is why we talked about critical sections,
etc. 1st
65
Classic CS Problem Producer Consumer (Threads)

Producer
If (! full)
Add item to buffer
empty FALSE
if(buffer_is_full)
full TRUE
Consumer
If (! empty)
Remove item from buffer
full FALSE
if(buffer_is_empty)
empty TRUE

66
Example Producer Threads Program

while(forever)
// produce item
pthread_mutex_lock(padlock)
while (full)
pthread_cond_wait(non_full, padlock)
// add item to buffer
buffercount
if (buffercount BUFFERSIZE)
full TRUE
empty FALSE
pthread_mutex_unlock(padlock)
pthread_cond_signal(non_empty)

67
Example Consumer Threads Program

while(forever)
pthread_mutex_lock(padlock)
while (empty)
pthread_cond_wait (non_empty, padlock)
// remove item from buffer
buffercount--
full false
if (buffercount 0)
empty true
pthread_mutex_unlock(padlock)
pthread_cond_signal(non_full)
// consume_item

68
Things to know? (Threads)

The reason threads are around?
2. Benefits of increased concurrency?
3. Why do we need software controlled "locks"
(mutexes) of shared data?
4. How can we avoid potential deadlocks/race
conditions.
5. What is meant by producer/consumer thread
synchronization/communication using pthreads?
6. Why use a "while" loop around a
pthread_cond_wait() call?
7. Why should we minimize lock scope (minimize
the extent of code within a lock/unlock block)?
8. Do you have any control over thread
scheduling?

69
Locks and condition variables (Threads)

A semaphore really serves two purposes
Mutual exclusion protect shared data
Always a binary semaphore
Synchronization temporally coordinate events
One thread waits for something, other thread
signals when its available
Idea
Provide this functionality in two separate
constructs
Locks Provide mutual exclusion
Condition variables Provide synchronization
Like semaphores, locks and condition variables
are language-independent, and are available in
many programming enviroments

70
Locks and condition variables (Threads)

Locks
Provide mutually exclusive access to shared data
A lock can be locked or unlocked
(Sometimes called busy or free)
Can be implemented
Trivially by binary semaphores
(create a private lock semaphore, use P and V)
By lower-level constructs, much like semaphores
are implemented

71
Locks and condition variables (Threads)

Example conventions
Before accessing shared data, call
LockAcquire() on a specific lock
Complain (via ASSERT) if a thread ties to acquire
a lock it already has
After accessing shared data, call
Lock()Release() on the same lock
Example
Thread A Thread B
milk?Acquire() milk?Acquire()
if(noMilk) if(noMilk)
buy milk buy milk
milk?Release() milk?Release()

72
Locks and condition variables (Threads)

Consider the following code
QueueAdd() QueueRemove()
lock?Acquire() lock?Acquire()
add item if item on queue, remove item
lock?Release() lock?Release()
return item
QueueRemove will only return an item if theres
already one in the queue

73
Locks and condition variables (Threads)

If the queue is empty, it might be more desirable
for QueueRemove to wait until there is
something to remove
Cant just go to sleep
If it sleeps while holding the lock, no other
thread can access the shared queue, add an item
to it, and wake up the sleeping thread
Solution
Condition variables will let a thread sleep
inside a critical section
By releasing the lock while the thread sleeps

74
Locks and condition variables (Threads)

Condition Variables coordinate events
Example (generic) syntax
Condition (name) create a new instance of
class Condition (a condition variable) with the
specified name
After creating a new condition, the programmer
must call LockLock() to create a lock that will
be associated with that condition variable
ConditionWait(conditionLock) release the lock
and wait (sleep) when the thread wakes up,
immediately try to re-acquire the lock return
when it has the lock
ConditionSignal(conditionLock) if threads are
waiting on the lock, wake up one of those threads
and put it on the ready list
Otherwise, do nothing

75
Locks and condition variables (Threads)

ConditionBroadcast(conditionLock) if threads
are waiting on the lock, wake up all of those
threads and put them on the ready list otherwise
do nothing
IMPORTANT
A thread must hold the lock before calling Wait,
Signal, or Broadcast
Can be implemented
Carefully by higher-level constructs (create and
queue threads, sleep and wake up threads as
appropriate)
Carefully by binary semaphores (create and queue
semaphores as appropriate, use P and V to
synchronize)
Carefully by lower-level constructs, much like
semaphores are implemented

76
Locks and condition variables (Threads)

Associated with a data structure is both a lock
and a condition variable
Before the program performs an operation on the
data structure, it acquires the lock
If it needs to wait until another operation puts
the data structure into an appropriate state, it
uses the condition variable to wait
(see next slide for example)

77
Locks and condition variables (Threads)

Unbounded-buffer producer-consumer
Lock lk int avail 0
Condition c
/ producer / / consumer /
while(1) while(1)
lk?Acquire() lk?Acquire()
produce next item if(avail 0)
avail c?Wait(lk)
c?Signal(lk) consume next item
lk?Release() avail--
lk?Release()

78
Locks and condition variables (Threads)

Semaphores and condition variables are pretty
similar perhaps we can build condition
variables out of semaphores
Does this work?
ConditionWait() ConditionSignal()
sema?P() sema?V()
NO! Were going to use these condition
operations inside a lock. What happens if we use
semaphores inside a lock?

79
Locks and condition variables (Threads)

How about this?
ConditionWait() ConditionSignal()
lock?Release() sema?V()
sema?P()
lock?Acquire()
How do semaphores and condition variables differ
with respect to keeping track of history?

80
Locks and condition variables (Threads)

Semaphores have a value, CVs do not!
On a semaphore signal (a V), the value of the
semaphore is always incremented, even if no one
is waiting
Later on, if a thread does a semaphore wait (a
P), the value of the semaphore is decremented and
the thread continues
On a condition variable signal, if no one is
waiting, the signal has no effect
Later on, if a thread does a condition variable
wait, it waits (it always waits!)
It doesnt matter how many signals have been made
beforehand

81
(Networks) Performance parameters

Bandwidth
Maximum rate at which interconnection network can
propagate data once a message is in the network
Usually headers, overhead bits included in
calculation
Units are usually in megabits/second, not
megabytes
Sometimes see throughput
Network bandwidth delivered to an application
Time of Flight
Time for 1st bit of message to arrive at receiver
Includes delays of repeaters/switches length /
m (speed of light) (m determines property of
transmission material)
Transmission Time
Time required for message to pass through the
network
size of message divided by the bandwidth

82
(Networks) More performance parameters

Transport latency
Time of flight transmission time
Time message spends in interconnection network
But not overhead of pulling out or pushing into
the network
Sender overhead
Time for mP to inject a message into the
interconnection network including both HW and SW
components
Receiver overhead
Time for mP to pull a message out of
interconnection network, including both HW and SW
components
So, total latency of a message is

83
Metrics graphically
84
(Networks) An example

Consider a network with the following parameters
Network has a bandwidth of 10 Mbit/sec
Were assuming no contention for this bandwidth
Sending overhead 230 mSec, Receiving overhead
270 mSec
We want to send a message of 1000 bytes
This includes the header
It will be sent as 1 message (no need to split it
up)
Whats the total latency to send the message to a
machine
100 m apart (assume no repeater delay for
this)
1000 km apart (not realistic to assume no
repeater
delay)

85
(Networks) An example, continued

Well use the facts that
The speed of light is 299,792.5 km/s
Our m value is 0.5
This means we have a good fiber optic or coaxial
cable (more later)
Lets use
If the machines are 100 m apart
If the machines are 1000 km apart

86
(Networks) Some more odds and ends

Note from the example (with regard to longer
distance)
Time of flight dominates the total latency
component
Repeater delays would factor significantly into
the equation
Message transmission failure rates rise
significantly
Its possible to send other messages with no
responses from previous ones
If you have control of the network
Can help increase network use by overlapping
overheads and transport latencies
Can simplify the total latency equation to
Total latency Overhead (Message
size/bandwidth)
Leads to
Effective bandwidth Message size/Total latency

87
(Networks) Switched vs. shared
Node
Node
Node
Shared Media (Ethernet)
Node
Node
Switched Media (ATM)
(A. K. A. data switching interchanges,
multistage interconnection networks, interface
message processors)
Switch
Node
Node
88
(Networks) Connection-Based vs. Connectionless

Telephone operator sets up connection between
the caller and the receiver
Once the connection is established, conversation
can continue for hours
Share transmission lines over long distances by
using switches to multiplex several conversations
on the same lines
Time division multiplexing divide B/W
transmission line into a fixed number of slots,
with each slot assigned to a conversation
Problem lines busy based on number of
conversations, not amount of information sent
Advantage reserved bandwidth

(see board for ex.)
89
(Networks) Connection-Based vs. Connectionless

Connectionless every package of information must
have an address gt packets
Each package is routed to its destination by
looking at its address
Analogy, the postal system (sending a letter)
also called Statistical multiplexing
Note Split phase buses are sending packets

90
(Networks) Store and Forward vs. Cut-Through

Store-and-forward policy each switch waits for
the full packet to arrive in switch before
sending to the next switch (good for WAN)
Cut-through routing or worm hole routing switch
examines the header, decides where to send the
message, and then starts forwarding it
immediately
In worm hole routing, when head of message is
blocked, message stays strung out over the
network, potentially blocking other messages
(needs only buffer the piece of the packet that
is sent between switches).
Cut through routing lets the tail continue when
head is blocked, accordioning the whole message
into a single switch. (Requires a buffer large
enough to hold the largest packet).

91
(Networks) Broadband vs. Baseband

A baseband network has a single channel that is
used for communication between stations. Ethernet
specifications which use BASE in the name refer
to baseband networks.
BASE refers to BASE BAND signaling. Only
Ethernet signals are carried on the medium
A broadband network is much like cable
television, where different services communicate
across different frequencies on the same cable.
Broadband communications would allow a Ethernet
network to share the same physical cable as voice
or video services. 10BROAD36 is an example of
broadband networking.

92
(Networks) Ethernet

The various Ethernet specifications include a
maximum distance
What do we do if we want to go further?
Repeater
Hardware device used to extend a LAN
Amplifies all signals on one segment of a LAN and
transmits them to another
Passes on whatever it receives (GIGO)
Knows nothing of packets, addresses
Any limit?

93
(Networks) Bridges

We want to improve performance over that provided
by a simple repeater
Add functionality (i.e. more hardware)
Bridge can detect if a frame is valid and then
(and only then) pass it to next segment
Bridge does not forward interference or other
problems
Computers connected over a bridged LAN don't know
that they are communicating over a bridge

94
(Networks) Network Interface Card

NIC
Sits on the host station
Allows a host to connect to a hub or a bridge
Hub merely extends multiple segments into
single LAN do not help with performance only 1
message can transmit at a time
If connected to a hub, then NIC has to use
half-duplex mode of communication (i.e. it can
only send or receive at a time)
If connected to a bridge, then NIC (if it is
smart) can use either half/full duplex mode
Bridges learn Media Access Control (MAC) address
and the speed of the NIC it is talking to.

95
(Networks) Routers

Routers
Devices that connect LANs to WANs or WANs to WANs
Resolve incompatible addresses (generally slower
than bridges)
Divides interconnection networks into smaller
subnets which simplifies manageability and
security
Work much like bridges
Pay attention to the upper network layer
protocols
(OSI layer 3) rather than physical layer (OSI
layer 1) protocols.
(This will make sense later)
Will decide whether to forward a packet by
looking at the protocol level addresses (for
instance, TCP/IP addresses) rather than the MAC
address.
(This will make sense later)

96
(Protocols) Recall

A protocol is the set of rules used to describe
all of the hardware and (mostly) software
operations used to send messages from Processor A
to Processor B
Common practice is to attach headers/trailers to
the actual payload forming a packet or frame.

97
(Protocols) Layering Advantages

Layering allows functionally partitioning the
responsibilities (similar to having procedures
for modularity in writing programs)
Alows easily integrating (plug and play) new
modules at a particular layer without any changes
to the other layers
Rigidity is only at the level of the interfaces
between the layers, not in the implementation of
these interfaces
By specifying the interfaces judiciously
inefficiencies can be avoided

98
(Protocols) ISO Model Examples
7
User program

Presentation
6
Session

Sockets open/close/read/write interface

5
Kernel Software
Transport

TCP reliable infinite-length stream

4
Network

IP unreliable datagrams anywhere in
world

Ethernet unreliable datagrams on local segment

Data Link
2
Hardware

10baseT ethernet spec twisted pair w/RJ45s

Physical
1
99
(Protocols) Layering Summary

Key to protocol families is that communication
occurs logically at the same level of the
protocol, called peer-to-peer,
but is implemented via services at the next lower
level
Encapsulation carry higher level information
within lower level envelope
Fragmentation break packet into multiple smaller
packets and reassemble
Danger is each level increases latency if
implemented as hierarchy (e.g., multiple check
sums)

100
Techniques Protocols Use

Sequencing for Out-of-Order Delivery
Sequencing to Eliminate Duplicate Packets
Retransmitting Lost Packets
Avoiding Replay Caused by Excessive Delay
Flow Control to Prevent Data Overrun
Mechanism to Avoid Network Congestion
Name Resolution (external to protocol really)

101
Internetworking (Protocols)

Different networking solutions exist
Why? No single networking technology is best for
all needs
Universal service
System where arbitrary pairs of computers can
communicate
Increases productivity
Networks, by themselves, are incompatible with
universal service
Solution Internetworking or an internet

Literally Communicating between networks of the
same and/or different types
102
Encapsulate universal packetsin (any) local
network frame format
Used to send msg. from 1 network to another (or
wi/the same)but we want a uniform standard.
Frame Header
Frame Data
Used to communicate within 1 network
103
Physical Network Connection (Protocols)
Router
Router facilitates communication
between networks
Individual Networks
Each cloud represents arbitrary network
technology LAN, WAN, ethernet, token ring, ATM,
etc.
104
Layered Model (Protocols)
TCP/IP Model
Application
5
Transport
4
Internet
3
Network Interface
2
Physical
1
105
(Protocols) Layer upon layer upon layer...

Layer 1 Physical
Basic network hardware (same as ISO model Layer
1)
Layer 2 Network Interface
How to organize data into frames and how to
transmit over network (similar to ISO model Layer
2)
Layer 3 Internet
Specify format of packets sent across the
internet as well as forwarding mechanisms used by
routers
Layer 4 Transport
Like ISO Layer 4 specifies how to ensure reliable
transfer
Layer 5 Application
Corresponds to ISO Layers 6 and 7. Each Layer 5
protocol specifies how one application uses an
internet

106
IP Addressing

Each host in the internet must have a unique
address
Users, application programs and software
operating in the higher layers of the protocol
stack use these addresses
In the IP protocol each host is assigned a unique
32 bit address. Any packet destined for a host on
the internet will contain the destination IP
address.

107
IP Address Hierarchy

Addresses are broken into a prefix and a suffix
for routing efficiency
The Prefix is uniquely assigned to an individual
network.
The Suffix is uniquely assigned to a host within
a given network

1
1
2
Network 1
Network 2
3
3
5
108
Five Classes of IP Address
Primary Classes
109
Computingthe Class (IP)
110
Classes and Dotted Decimal (IP)

Class
A
B
C
D
E

Range of Values
0 through 127
128 through 191
192 through 223
224 through 239
240 through 255

Does this mean there are 64 Class B networks?
Does this mean there are 32 Class C networks?
(on the board)
111
Division of the Address Space (IP)
Address Class
Bits in Prefix
Maximum Number of Networks
Bits in Suffix
Maximum Number of Hosts per Network
A B C
7 14 21
128 16384 2097152
24 16 8
16777216 65536 256
(on the board)
112
Special IP Addresses

Network Address
Directed Broadcast Address
Limited Broadcast Address
This Computer Address
Loopback Address
Berkeley Broadcast Address Form

113
Routers and IP Addressing

Each host has an address
Each router has two (or more) addresses!
Why?
A router has connections to multiple physical
networks
Each IP address contains a prefix that specifies
a physical network
An IP address does not really identify a specific
computer but rather a connection between a
computer and a network.
A computer with multiple network connections
(e.g. a router) must be assigned an IP address
for each connection

114
Example (IP)
Ethernet 131.108.0.0
Token Ring 223.240.129.0
131.108.99.5
223.240.129.2
223.240.129.17
78.0.0.17
WAN 78.0.0.0
Note!
115
How to Resolve Addresses (IP)

Table Lookup
Store bindings/mapping in table which software
can search
Closed-form Computation
Protocol addresses are chosen to allow
computation of hardware address from protocol
address using basic boolean and arithmetic
operations
Message Exchange
Computers exchange messages across a network to
resolve addresses. One computer sends a message
requesting a translation and another computer
replies

(more detail about items 1-3 on earlier slide)
116
Use of Default Routes (IP)
Simplified more
Node 1
Node 2
Node 3
Node 4
Dest
Next Hop
Dest
Next Hop
Dest
Next Hop
Dest
Next Hop
1
-
1
(2,1)
3
-
3
(4,3)

(1,2)
2
-
4
(3,4)
4
-
3
(2,3)

(3,2)

(4,2)
4
(2,4)
117
Error Reporting (ICMP)

TCP/IP includes a protocol used by IP to send
messages when problems are detected Internet
Control Message Protocol
IP uses ICMP to signal problems
ICMP uses IP to send messages
When IP detects an error (e.g. corrupt packet) it
sends an ICMP packet

118
ICMP Message Transport
119
Services Provided by TCP

Connection Orientation
Point-To-Point Communication
Complete Reliability
Full Duplex Communication
Stream Interface
Reliable Connection Startup
Graceful Connection Shutdown

120
End to End Services (TCP)

TCP provides a connection from one application on
a computer to an application on a remote computer
Connection is virtual - provided by software
passing messages
TCP messages are encapsulated in IP Datagrams
Upon arrival IP passes the TCP message on to the
TCP layer.
TCP exists at both ends of the connection but not
at intermediate points (routers).

121
Network Nodes
Application Layer
RPC Layer
Network Layer
Remote Procedure Call
122
Remote Procedure Call

Client can run other processes while waiting for
server to service request
Natural interface... just like procedure call
Implementation issues?

123
client
messages
server
User calls kernel to send RPC message to
procedure X
Remote Procedure Call
kernel sends message to nameserver to find port
number
To server From client Port Nameserver Re
Address for RPC X
Nameserver receives message, looks up answer
kernel places port P in user RPC message
To client From server Port kernel Re RPC
X Port P
Nameserver replies to client with Port P
kernel sends RPC
From client To server Port P ltContentsgt
daemon listening to port P receives message
kernel receives reply, passes it to user
From RPC Port P To client Port kernel ltOutputgt
daemon processes request and processes send output
124
RPC

User
No different from any program making use of
procedure calls
User stub
Marshall arguments
Identify target procedure
Hand over to RPC runtime to deliver to callee
Server stub
Unmarshall the arguments
Make a normal procedure call in server using the
arguments
User and server code part of distributed app
Stubs generated automatically

125
RPC
RPC runtime
user
user stub
server stub
server
call return
pack args unpack result
unpack args pack result
xmit wait rcv
rcv xmit
call exec return
Caller (Client) Node
Callee (Server) Node
Five pieces of program involved

Write a Comment

User Comments (0)