CS 2200 Lecture 27 Review - PowerPoint PPT Presentation

1 / 125
About This Presentation
Title:

CS 2200 Lecture 27 Review

Description:

CS 2200 Lecture 27. Review ... A: average sigma. B: average (i.e. above average is a B) C: average - sigma. D: average - 2*sigma ... – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 126
Provided by: michaelt8
Category:

less

Transcript and Presenter's Notes

Title: CS 2200 Lecture 27 Review


1
CS 2200 Lecture 27Review
  • (Lectures based on the work of Jay Brockman,
    Sharon Hu, Randy Katz, Peter Kogge, Bill Leahy,
    Ken MacKenzie, Richard Murphy, and Michael
    Niemier)

2
Logistics Grade Breakdown
  • Course highlights
  • Test 1 and Test 2 2/5/04, 3/25/04
  • Final exam TBD, please consult OSCAR
  • Grade breakdown
  • Homeworks 10
  • Projects 30
  • Test 1 20
  • Test 2 20
  • Final Exam 20
  • (FYI grade breakdown is essentially the same for
    both sections)
  • (Tests, HWs, projects, etc. will also be the
    same)
  • Course website
  • http//www.cc.gatech.edu/classes/AY2004/cs2200_spr
    ing

3
Logistics Grade Cutoffs
  • Grade cutoffs are calculated as follows
  • (sigma is the standard deviation)
  • A average sigma
  • B average (i.e. above average is a B)
  • C average - sigma
  • D average - 2sigma

4
Instruction Set Architecture
Instruction Set Architectures
C
Fortran
Ada
etc.
Basic
Java
Compiler
Compiler
Assembly Language
Byte Code
Assembler
Interpreter
Executable
HW Implementation 1
HW Implementation N
HW Implementation 2
5
Pros and cons for each ISA type
6
ISAs What about memory addresses?
  • Usually instruction sets are byte addressed
  • Provide access for bytes (8 bits), half-words (16
    bits), words (32 bits), double words (64 bits)
  • Two different ordering types big/little endian

Little Endian
31
23
15
7
0
Puts byte w/addr. xx00 at least significant
position in the word
Big Endian
31
23
15
7
0
Puts byte w/addr. xx00 at most significant
position in the word
7
ISAs Endianess
  • No, were not making this up.
  • at word address 100 (assume a 4-byte word)
  • long a 0x11223344
  • big-endian (MSB at word address) layout
  • 100 101 102 103
  • 100 11 22 33 44
  • 0 1 2 3
  • little-endian (LSB at word address) layout
  • 103 102 101 100
  • 11 22 33 44 100
  • 3 2 1 0

8
ISAs Procedures Assembly Language
  • Software conventions (Why?)
  • Reserve some number of registers for parameters,
    return values, and return address
  • e.g. LC2200
  • 5 for params, 1 for return values, one for return
    address
  • JALR ltproc-addr in reggt, ra ra is
    return-addr
  • JALR ra, zero Where does this go?
  • What if we have more params or return values?
  • Common use stack/memory
  • Registers used in procedures
  • Temporary registers
  • Caller does not expect value to be preserved upon
    return
  • LC2200 a0 to a4
  • Saved registers
  • Caller does expect value to be preserved on
    return
  • LC2200 s0 to s3 (simplifies amount of state to
    be saved)

9
ISAs Example LC2200 Registers
Recall
10
ISAs Caller/Callee Mechanics
who does what when?
  • Four places

foo() bar(int a)

int temp 3 bar(42)
... ...
return(temp a)

2. callee at entry
1. caller at call time
4. caller after return
3. callee at exit
11
ISAs Caller/Callee Conventions
do most work at callee entry/exit
  • Caller at call time
  • put arguments in a0..a4
  • save any caller-save temporaries
  • jalr ..., ra
  • Callee at entry
  • allocate all stack space
  • save ra s0..s3 if necessary
  • Callee at exit
  • restore ra s0..s3 if used
  • deallocate all stack space
  • put return value in v0
  • Caller after return
  • retrieve return value from v0
  • restore any caller-save temporaries

most of the work
12
ISAs Instruction Formatting
  • Human Readable
  • add s0, s1, a2
  • Machine Readable
  • 0 9 5 0000 b16
  • 0000 1001 0101 0000 0000 0000 0000 1011

13
Metrics So, how do we compare?
  • Best to stick with execution time! (more later)
  • If we say X is faster than Y, we mean the
    execution time is lower on X than on Y.
  • Alternatively

Execution timeY
X is n times faster than Y
n
Execution timeX
1
i.e. 200 MHz
Execution timeY
PerformanceX
PerformanceY
n

1
Execution timeX
Performancey
PerformanceX
i.e. 50 MHz
50/200 ¼, therefore x 4 times slower than y
14
Metrics Amdahls Law
  • Qualifies performance gain
  • Amdahls Law defined
  • The performance improvement to be gained from
    using some faster mode of execution is limited by
    the amount of time the enhancement is actually
    used.
  • Amdahls Law defines speedup

Perf. for entire task using enhancement when
possible
Speedup
Perf. For entire task without using enhancement
Or
Execution time for entire task without enhancement
Speedup
Execution time for entire task using
enhancement when possible
15
Metrics Amdahls Law and Speedup
  • Speedup tells us how much faster the machine will
    run with an enhancement
  • 2 things to consider
  • 1st
  • Fraction of the computation time in the original
    machine that can use the enhancement
  • i.e. if a program executes in 30 seconds and 15
    seconds of exec. uses enhancement, fraction ½
    (always lt 1)
  • 2nd
  • Improvement gained by enhancement (i.e. how much
    faster does the program run overall)
  • i.e. if enhanced task takes 3.5 seconds and
    original task took 7, we say the speedup is 2
    (always gt 1)

16
Metrics Amdahls Law Equations
Fractionenhanced
Execution timenew
Execution timeold x
(1 Fractionenhanced)
Speedupenhanced
1
Execution Timeold
Speedupoverall

Execution Timenew
Fractionenhanced
(1 Fractionenhanced)
Speedupenhanced
Use previous equation, Solve for speedup
Please, please, please, dont just try to
memorize these equations and plug numbers into
them. Its always important to think about the
problem too!
17
Metrics Amdahls Law Example
  • A certain machine has a
  • Floating point multiply that runs too slow
  • It adversely affects benchmark performance.
  • One option
  • Re-design the FP multiply hardware to make it run
    15 times faster than it currently does.
  • However, the manager thinks
  • Re-designing all of the FP hardware to make each
    FP instruction run 3 times faster is the way to
    go.
  • FP multiplies account for 10 of execution time.
  • FP instructions as a whole account for 30 of
    execution time.
  • Which improvement is better?

18
Metrics Amdahls Law Example (cont.)
  • The speedup gained by improving the multiply
    instruction is
  • 1 / (1-0.1) (0.1/15) 1.10
  • The speedup gained by improving all of the
    floating point instructions is
  • 1 / (1-0.3) (.3/3) 1.25
  • Believe it or not, the manager is right!
  • Improving all of the FP instructions despite the
    lesser improvement is the better way to go

19
More CPU metrics
  • Instruction count also figures into the mix
  • Can affect throughput, execution time, etc.
  • Interested in
  • instruction path length and instruction count
    (IC)
  • Using this information and the total number of
    clock cycles for program can determine clock
    Cycles Per Instruction (CPI)

20
Metrics The Bigger Picture
  • Recall
  • We can see CPU performance dependent on
  • Clock rate, CPI, and instruction count
  • CPU time is directly proportional to all 3
  • Therefore an x improvement in any one variable
    leads to an x improvement in CPU performance
  • But, everything usually affects everything

Clock Cycle Time
Instruction Count
Hardware Tech.
Compiler Technology
Organization
ISAs
CPI
21
Dataflow the single-bus datapath
y a bx cx2
Ex. A2, B4, C6, x2
Part 1 C ? x D ? x
Part 2 D ? x2 C ? 6
Part 3 A ? Cx2
Part 4 C ? 4 D ? x
Part 5 B ? Bx
Part 6 B ? Bx Cx2 (or RegA RegB)
Part 7 A ? A
Part 8 Y ? RegA RegB
22
Dataflow A Single BusAny (and all) functional
units can access bus
Functional Unit
Functional Unit
Functional Unit
Functional Unit
(Remember, we need to assert the right control
signals, at the right time, in the right order)
23
The beginnings of a generic dataflow
  • Abstract / Simplified View
  • Two types of signals data and control
  • clocking strategy
  • All storage elements are clocked by the same
    clock edge.

Data
Address
PC
Ra
Instruction
Address
Rb
A
L
U
Instruction Memory
Register File
Rw
Data Memory
Data
24
Dataflow Ex. MIPS Instruction Formats
  • R-type format (i.e. ADD, SUB, OR, etc.)
  • I-type format (i.e. ADDI, LW, SW, BEQ)
  • J-type (i.e. JUMP)

(Remember, opcodes/function codes used to
generate control signals)
25
Single cycle MIPS dataflow
26
A pipelined dataflow
Need to carry control signals too
IF/ID
ID/EX
EX/MEM
MEM/WB
4
M u x
ADD
PC
Branch taken
Comp.
IR6...10
M u x
Inst. Memory
IR11..15
Register File
ALU
MEM/ WB.IR
M u x
Data Mem.
M u x
Data must be stored from one stage to the next in
pipeline registers/latches. hold temporary values
between clocks and needed info. for execution.
Sign Extend
16
32
27
Dataflow Execution Sequence Summary (MIPS)
IR ? MemoryPC
PC ? PC 4
A ? RegIR(2521)
B ? RegIR(2016)
ALUOut ? PC SignEx(IR(150) ltlt 2)
3 cycles
3 cycles
4 cycles
5 cycles
28
Dataflow FSM (MIPS Machine)
Instruction Fetch
MemRead ALUSrcA 0 IorD 0 IRWrite ALUSrcB
01 ALUOp 00 PCWrite PCSource 00
Instruction decode/ Register fetch
1
0
ALUSrcA 0 ALUSrcB 11 ALUOp 00
start
8
9
Branch Completion
Memory address computation
Jump Completion
2
6
Execution
ALUSrcA 1 ALUSrcB 00 ALUOp
01 PCWriteCond PCSource 01
ALUSrcA 1 ALUSrcB 10 ALUOp 00
ALUSrcA 1 ALUSrcB 00 ALUOp 10
PCWrite PCSource 10
Memory access
5
Memory access
RegDst 1 RegWrite MemToReg 0
MemRead IorD 1
MemRead IorD 1
3
Tells us what values are needed and during what
step
R-type completion
7
RegDst 0 RegWrite MemToReg 1
4
Memory read completion
29
Interrupts
Address Bus
Processor
Data Bus
Int
Inta
Device 1
Device 2
If the processor decides to handle the interrupt
it asserts the inta (interrupt acknowledege) line
30
Example Device Interrupt(Say, arrival of
network message)
Save registers ? lw r1,20(r0) lw r2,0(r1) addi
r3,r0,5 sw 0(r1),r3 ? Restore registers Clear
current Int RETI
? add r1,r2,r3 subi r4,r1,4 slli
r4,r4,2 Hiccup(!) lw r2,0(r4) lw r3,4(r4) add r2
,r2,r3 sw 8(r4),r2 ?
(callee save)
External Interrupt
Interrupt Handler
code to handle int.
(callee restore)
(reset bit)
(return from interrupt)
31
Program Execution w/Protection ( w/IO)
I/O (kernel) space
a loop
user space
PC (mem. addr.)
a system call
kernel space
an interrupt
time
32
Pipelining Lessons
  • Multiple tasks operating simultaneously
  • Pipelining doesnt help latency of single task,
    it helps throughput of entire workload
  • Pipeline rate limited by slowest pipeline stage
  • Potential speedup Number pipe stages
  • Unbalanced lengths of pipe stages reduces speedup
  • Also, need time to fill and drain the
    pipeline.

6 PM
7
8
9
Time
T a s k O r d e r
33
(Pipelining) More on throughput
  • All pipe stages are connected so everything must
    move from one to another at the same time
  • How fast this can occur is a function of the time
    it takes for the slowest stage to finish
  • Example If a laundry takes 30 min. to wash but
    40 min. to dry, its going to sit in the washer
    for 10 min. idle
  • In a mP, this is the machine cycle time (usually
    1 clock)
  • If a each pipe stage is perfectly balanced time
    wise
  • Time/Instruction Time on unpipelined/ of pipe
    stages
  • Therefore speedup from pipelining of pipe
    stages
  • But of course nothings perfect!

34
Speed Up Equation for Pipelining
For simple RISC pipeline, CPI 1. W/microcode,
unpipelined CPI pipeline depth
Single-cycle HW would have a slow clock
35
The hazards of pipelining
  • Pipeline hazards prevent the next instruction
    from executing during its designated clock cycle
  • There are 3 classes of hazards
  • Structural Hazards
  • Arise from resource conflicts when HW cannot
    support all possible combinations of instructions
  • Data Hazards
  • Occur when a given instruction depends on data
    from an instruction ahead of it in the pipeline
  • Control Hazards
  • Result from branch type and other instructions
    that change the flow of the program (i.e. the PC)

36
(Pipelining) Stalls and performance
  • Stalls impede the progress of a pipeline and
    result in the deviation of 1 instruction
    executing each clock cycle
  • Recall that pipelining can be viewed to
  • Decrease the CPI or clock cycle time for an
    instruction
  • Lets see what affect stalls have on CPI
  • CPI pipelined
  • Ideal CPI Pipeline stall cycles per instruction
  • 1 Pipeline stall cycles per instruction
  • Ignoring overhead and assuming stages are
    balanced

37
Process States
New
Terminated
Ready
Running
A longer example using these states later on in
lecture
Waiting
38
Process Control Block
Pointer
Process State
Process Number
Program Counter
Another question for the class Why do we need
each one of these elements?
Registers
Scheduling Info
Memory Limits
I/O Status Info
Accounting Info
39
Process Scheduling
CPU
Ready Queue
I/O
I/O Request
I/O Queue
Time Slice Expired
Child Executes
Fork a Child
Wait for an Interrupt
Interrupt Occurs
40
(Process) Scheduling Algorithms
  • First-Come, First-Served
  • Shortest-Job-First
  • Priority
  • Round-Robin
  • Multilevel Queue
  • Multilevel Feedback Queue

41
Average Memory Access Time
AMAT HitTime (1 - h) x MissPenalty
  • Hit time basic time of every access.
  • Hit rate (h) fraction of access that hit
  • Miss penalty extra time to fetch a block from
    lower level, including time to replace in CPU

42
The Full Memory Hierarchyalways reuse a good
idea
Capacity Access Time Cost
Upper Level
Staging Xfer Unit
faster
CPU Registers 100s Bytes lt10s ns
Registers
prog./compiler 1-8 bytes
Instr. Operands
Cache K Bytes 10-100 ns 1-0.1 cents/bit
Cache
cache cntl 8-128 bytes
Blocks
Main Memory M Bytes 200ns- 500ns .0001-.00001
cents /bit
Memory
OS 4K-16K bytes
Pages
Disk G Bytes, 10 ms (10,000,000 ns) 10 - 10
cents/bit
Disk
-5
-6
user/operator Mbytes
Files
Larger
Tape infinite sec-min 10
Tape
Lower Level
-8
43
(Memory) Caches where we put data
Fully Associative
Direct Mapped
Set Associative
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
Cache
Set 0
Set 1
Set 2
Set 3
Block 12 can go anywhere
Block 12 can go only into Block 4 (12 mod 8)
Block 12 can go anywhere in set 0 (12 mod 4)
1 2 3 4 5 6 7 8 9..
Memory
12
44
Ex. the Alpha 20164 data and instruction cache
(Memory)
Block Addr.
Block Offset
lt21gt
lt8gt
lt5gt
1
CPU Address
Tag
Index
4
Data in
Data out
Valid lt1gt
Tag lt21gt
Data lt256gt
(256 blocks)
2
...
41 Mux
3
?
Lower level memory
45
Ex. Cache Math (Memory)
  • First, the address coming into the cache is
    divided into two fields
  • 29-bit block address and a 5-bit block offset
  • Block offset further divided into
  • An address tag and a cache index
  • The cache index selects the tag to be tested to
    see if the desired block is in the cache
  • Size of the index depends on cache size, block
    size, and set associativity
  • So, the index is 8-bits wide and the tag is 29-8
    21 bits

46
Memory access equations
  • Using what we defined on the previous slide, we
    can say
  • Memory stall clock cycles
  • Reads x Read miss rate x Read miss penalty
  • Writes x Write miss rate x Write miss penalty
  • Often, reads and writes are combined/averaged
  • Memory stall cycles
  • Memory access x Miss rate x Miss penalty
    (approximation)
  • Its also possible to factor in instruction count
    to get a complete formula

47
A clear explanation of pages vs. segments!
(Virtual Memory)
Segments
Paging
(Much like a cache!)
(Pages no longer the same size)
Virtual Address
Now, 2 fields of the virtual address have
variable length
Page
Offset
  • One specifies the segment
  • Other offset wi/page

Segment
Offset
Segment
Offset
Offset
Frame
The Problem This length can vary
Can concatenate physical address w/offset as
all pages the same size
Offset takes you to specific word
48
Address Translation in a Paging System (Virtual
Memory)
49
Translation in a Segmentation System (Virtual
Memory)
This could use all 32 bits cant just
concatenate
50
A Memory Hierarchy Flow Chart
TLB access
Virtual Address
Yes
No
TLB Hit?
Yes
No
Try to read from page table
Write?
Try to read from cache
Set in TLB
Page fault?
Cache/buffer memory write
Yes
Yes
No
No
Cache hit?
Replace page from disk
TLB miss stall
Cache miss stall
Deliver data to CPU
51
A disk, pictorially (I/O)
  • When accessing data we read or write to a sector
  • All sectors the same size, outer tracks just less
    dense
  • To read or write, moveable arm with read/write
    head moves over each surface
  • Cylinder all tracks under the arms at a given
    point on all surfaces
  • To read or write
  • Disk controller moves arm over proper track a
    seek
  • The time to move is called the seek time
  • When sector found, data is transferred

52
Disk Terminology (I/O)
Cylinder Track 'x' on all platters/surfaces
53
Disk Device Terminology (I/O)
  • Several platters, with information recorded
    magnetically on both surfaces (usually)
  • Bits recorded in tracks, which in turn divided
    into sectors (e.g., 512 Bytes)
  • Actuator moves head (end of arm,1/surface) over
    track (seek), select surface, wait for sector
    rotate under head, then read or write
  • Cylinder all tracks under heads

54
Ex. average disk access time (I/O)
  • What is the average time to read or write a
    512-byte sector for a typical disk?
  • The average seek time is given to be 9 ms
  • The transfer rate is 4 MB per second
  • The disk rotates at 7200 RPM
  • The controller overhead is 1 ms
  • The disk is currently idle before any requests
    are made (so there is no queuing delay)
  • Average disk access time
  • average seek time average rotational delay
    transfer time controller overhead

55
Allocation Strategies (I/O)
  • Fixed contiguous regions
  • Contiguous regions with overflow areas
  • Linked allocation
  • File Allocation Table (FAT) MS-DOS
  • Indexed Allocation
  • Multilevel Indexed Allocation
  • Hybrid (BSD Unix)

56
Disk Scheduling Algorithms (I/O)
  • First come, First served
  • Shortest seek time first
  • SCAN (elevator algorithm)
  • C-SCAN
  • Look
  • C-Look

Same as SCAN but reverse direction if no more
requests in the scan direction Leads to better
performance than SCAN
57
Speedup (Parallel Processing) metric for
performance on latency-sensitive applications
  • Time(1) / Time(P) for P processors
  • note must use the best sequential algorithm for
    Time(1) -- the parallel algorithm may be
    different.

linear speedup (ideal)
speedup
typical rolls off w/some of processors
1 2 4 8 16 32 64
occasionally see superlinear... why?
1 2 4 8 16 32 64
processors
58
Speedup Challenge (Parallel Processing)
  • To get full benefit of parallelism need to be
    able to parallelize the entire program!
  • Amdahls Law
  • Timeafter (Timeaffected/Improvement)Timeunaffec
    ted
  • Example We want 100 times speedup with 100
    processors
  • Timeunaffected 0!!!

59
Shared-Memory Hardware (1)Hardware and
programming model dont have to match, but this
is the mental model for shared-memory programming
  • Memory centralized with uniform access time
    (UMA) and bus interconnect, I/O
  • Examples Dell Workstation 530, Sun Enterprise,
    SGI Challenge
  • typical
  • 1 cycle to local cache
  • 20 cycles to remote cache
  • 100 cycles to memory

60
Shared-Memory Hardware (2)
  • Variation memory is not centralized. Called
    non-uniform access time (NUMA)
  • Shared memory accesses are converted into a
    messaging protocol (usually by hardware)
  • Examples DASH/Alewife/FLASH (academic), SGI
    Origin, Compaq GS320, Sequent (IBM) NUMA-Q

61
Multiprocessor Cache Coherency
  • Means that values in cache and memory are
    consistent or that we know they are different and
    can act accordingly
  • Considered to be a good thing.
  • Becomes more difficult with multiple processors
    and multiple caches!
  • Popular technique Snooping!
  • Write-invalidate
  • Write-update

62
Cache coherence protocols (Multiprocessors)
  • Directory Based
  • Whether or not some physical memory location is
    shared or not is recorded in 1 central location
  • Called the directory
  • Snooping
  • Every cache with entries from the centralized
    main memory also have that particular blocks
    sharing status
  • No centralized state is kept
  • Caches are connected to the shared memory bus
  • Whenever there is bus traffic, the cache check
    (or snoop) to see if they have the block being
    transferred on the bus

63
What is a Thread?
  • Basic unit of CPU utilization
  • A lightweight process (LWP)
  • Consists of
  • Program Counter
  • Register Set
  • Stack Space
  • Shares with peer threads
  • Code
  • Data
  • OS Resources
  • Open files
  • Signals

64
Threads
Recall from board code, data, files shared No
process context switching
  • Can be context switched more easily
  • Registers and PC
  • Not memory management
  • Can run on different processors concurrently in
    an SMP
  • Share CPU in a uniprocessor
  • May (Will) require concurrency control
    programming like mutex locks.

This is why we talked about critical sections,
etc. 1st
65
Classic CS Problem Producer Consumer (Threads)
  • Producer
  • If (! full)
  • Add item to buffer
  • empty FALSE
  • if(buffer_is_full)
  • full TRUE
  • Consumer
  • If (! empty)
  • Remove item from buffer
  • full FALSE
  • if(buffer_is_empty)
  • empty TRUE

66
Example Producer Threads Program
  • while(forever)
  • // produce item
  • pthread_mutex_lock(padlock)
  • while (full)
  • pthread_cond_wait(non_full, padlock)
  • // add item to buffer
  • buffercount
  • if (buffercount BUFFERSIZE)
  • full TRUE
  • empty FALSE
  • pthread_mutex_unlock(padlock)
  • pthread_cond_signal(non_empty)

67
Example Consumer Threads Program
  • while(forever)
  • pthread_mutex_lock(padlock)
  • while (empty)
  • pthread_cond_wait (non_empty, padlock)
  • // remove item from buffer
  • buffercount--
  • full false
  • if (buffercount 0)
  • empty true
  • pthread_mutex_unlock(padlock)
  • pthread_cond_signal(non_full)
  • // consume_item

68
Things to know? (Threads)
  • The reason threads are around?
  • 2. Benefits of increased concurrency?
  • 3. Why do we need software controlled "locks"
    (mutexes) of shared data?
  • 4. How can we avoid potential deadlocks/race
    conditions.
  • 5. What is meant by producer/consumer thread
    synchronization/communication using pthreads?
  • 6. Why use a "while" loop around a
    pthread_cond_wait() call?
  • 7. Why should we minimize lock scope (minimize
    the extent of code within a lock/unlock block)?
  • 8. Do you have any control over thread
    scheduling?

69
Locks and condition variables (Threads)
  • A semaphore really serves two purposes
  • Mutual exclusion protect shared data
  • Always a binary semaphore
  • Synchronization temporally coordinate events
  • One thread waits for something, other thread
    signals when its available
  • Idea
  • Provide this functionality in two separate
    constructs
  • Locks Provide mutual exclusion
  • Condition variables Provide synchronization
  • Like semaphores, locks and condition variables
    are language-independent, and are available in
    many programming enviroments

70
Locks and condition variables (Threads)
  • Locks
  • Provide mutually exclusive access to shared data
  • A lock can be locked or unlocked
  • (Sometimes called busy or free)
  • Can be implemented
  • Trivially by binary semaphores
  • (create a private lock semaphore, use P and V)
  • By lower-level constructs, much like semaphores
    are implemented

71
Locks and condition variables (Threads)
  • Example conventions
  • Before accessing shared data, call
    LockAcquire() on a specific lock
  • Complain (via ASSERT) if a thread ties to acquire
    a lock it already has
  • After accessing shared data, call
    Lock()Release() on the same lock
  • Example
  • Thread A Thread B
  • milk?Acquire() milk?Acquire()
  • if(noMilk) if(noMilk)
  • buy milk buy milk
  • milk?Release() milk?Release()

72
Locks and condition variables (Threads)
  • Consider the following code
  • QueueAdd() QueueRemove()
  • lock?Acquire() lock?Acquire()
  • add item if item on queue, remove item
  • lock?Release() lock?Release()
  • return item
  • QueueRemove will only return an item if theres
    already one in the queue

73
Locks and condition variables (Threads)
  • If the queue is empty, it might be more desirable
    for QueueRemove to wait until there is
    something to remove
  • Cant just go to sleep
  • If it sleeps while holding the lock, no other
    thread can access the shared queue, add an item
    to it, and wake up the sleeping thread
  • Solution
  • Condition variables will let a thread sleep
    inside a critical section
  • By releasing the lock while the thread sleeps

74
Locks and condition variables (Threads)
  • Condition Variables coordinate events
  • Example (generic) syntax
  • Condition (name) create a new instance of
    class Condition (a condition variable) with the
    specified name
  • After creating a new condition, the programmer
    must call LockLock() to create a lock that will
    be associated with that condition variable
  • ConditionWait(conditionLock) release the lock
    and wait (sleep) when the thread wakes up,
    immediately try to re-acquire the lock return
    when it has the lock
  • ConditionSignal(conditionLock) if threads are
    waiting on the lock, wake up one of those threads
    and put it on the ready list
  • Otherwise, do nothing

75
Locks and condition variables (Threads)
  • ConditionBroadcast(conditionLock) if threads
    are waiting on the lock, wake up all of those
    threads and put them on the ready list otherwise
    do nothing
  • IMPORTANT
  • A thread must hold the lock before calling Wait,
    Signal, or Broadcast
  • Can be implemented
  • Carefully by higher-level constructs (create and
    queue threads, sleep and wake up threads as
    appropriate)
  • Carefully by binary semaphores (create and queue
    semaphores as appropriate, use P and V to
    synchronize)
  • Carefully by lower-level constructs, much like
    semaphores are implemented

76
Locks and condition variables (Threads)
  • Associated with a data structure is both a lock
    and a condition variable
  • Before the program performs an operation on the
    data structure, it acquires the lock
  • If it needs to wait until another operation puts
    the data structure into an appropriate state, it
    uses the condition variable to wait
  • (see next slide for example)

77
Locks and condition variables (Threads)
  • Unbounded-buffer producer-consumer
  • Lock lk int avail 0
  • Condition c
  • / producer / / consumer /
  • while(1) while(1)
  • lk?Acquire() lk?Acquire()
  • produce next item if(avail 0)
  • avail c?Wait(lk)
  • c?Signal(lk) consume next item
  • lk?Release() avail--
  • lk?Release()

78
Locks and condition variables (Threads)
  • Semaphores and condition variables are pretty
    similar perhaps we can build condition
    variables out of semaphores
  • Does this work?
  • ConditionWait() ConditionSignal()
  • sema?P() sema?V()
  • NO! Were going to use these condition
    operations inside a lock. What happens if we use
    semaphores inside a lock?

79
Locks and condition variables (Threads)
  • How about this?
  • ConditionWait() ConditionSignal()
  • lock?Release() sema?V()
  • sema?P()
  • lock?Acquire()
  • How do semaphores and condition variables differ
    with respect to keeping track of history?

80
Locks and condition variables (Threads)
  • Semaphores have a value, CVs do not!
  • On a semaphore signal (a V), the value of the
    semaphore is always incremented, even if no one
    is waiting
  • Later on, if a thread does a semaphore wait (a
    P), the value of the semaphore is decremented and
    the thread continues
  • On a condition variable signal, if no one is
    waiting, the signal has no effect
  • Later on, if a thread does a condition variable
    wait, it waits (it always waits!)
  • It doesnt matter how many signals have been made
    beforehand

81
(Networks) Performance parameters
  • Bandwidth
  • Maximum rate at which interconnection network can
    propagate data once a message is in the network
  • Usually headers, overhead bits included in
    calculation
  • Units are usually in megabits/second, not
    megabytes
  • Sometimes see throughput
  • Network bandwidth delivered to an application
  • Time of Flight
  • Time for 1st bit of message to arrive at receiver
  • Includes delays of repeaters/switches length /
    m (speed of light) (m determines property of
    transmission material)
  • Transmission Time
  • Time required for message to pass through the
    network
  • size of message divided by the bandwidth

82
(Networks) More performance parameters
  • Transport latency
  • Time of flight transmission time
  • Time message spends in interconnection network
  • But not overhead of pulling out or pushing into
    the network
  • Sender overhead
  • Time for mP to inject a message into the
    interconnection network including both HW and SW
    components
  • Receiver overhead
  • Time for mP to pull a message out of
    interconnection network, including both HW and SW
    components
  • So, total latency of a message is

83
Metrics graphically
84
(Networks) An example
  • Consider a network with the following parameters
  • Network has a bandwidth of 10 Mbit/sec
  • Were assuming no contention for this bandwidth
  • Sending overhead 230 mSec, Receiving overhead
    270 mSec
  • We want to send a message of 1000 bytes
  • This includes the header
  • It will be sent as 1 message (no need to split it
    up)
  • Whats the total latency to send the message to a
    machine
  • 100 m apart (assume no repeater delay for
    this)
  • 1000 km apart (not realistic to assume no
    repeater
  • delay)

85
(Networks) An example, continued
  • Well use the facts that
  • The speed of light is 299,792.5 km/s
  • Our m value is 0.5
  • This means we have a good fiber optic or coaxial
    cable (more later)
  • Lets use
  • If the machines are 100 m apart
  • If the machines are 1000 km apart

86
(Networks) Some more odds and ends
  • Note from the example (with regard to longer
    distance)
  • Time of flight dominates the total latency
    component
  • Repeater delays would factor significantly into
    the equation
  • Message transmission failure rates rise
    significantly
  • Its possible to send other messages with no
    responses from previous ones
  • If you have control of the network
  • Can help increase network use by overlapping
    overheads and transport latencies
  • Can simplify the total latency equation to
  • Total latency Overhead (Message
    size/bandwidth)
  • Leads to
  • Effective bandwidth Message size/Total latency

87
(Networks) Switched vs. shared
Node
Node
Node
Shared Media (Ethernet)
Node
Node
Switched Media (ATM)
(A. K. A. data switching interchanges,
multistage interconnection networks, interface
message processors)
Switch
Node
Node
88
(Networks) Connection-Based vs. Connectionless
  • Telephone operator sets up connection between
    the caller and the receiver
  • Once the connection is established, conversation
    can continue for hours
  • Share transmission lines over long distances by
    using switches to multiplex several conversations
    on the same lines
  • Time division multiplexing divide B/W
    transmission line into a fixed number of slots,
    with each slot assigned to a conversation
  • Problem lines busy based on number of
    conversations, not amount of information sent
  • Advantage reserved bandwidth

(see board for ex.)
89
(Networks) Connection-Based vs. Connectionless
  • Connectionless every package of information must
    have an address gt packets
  • Each package is routed to its destination by
    looking at its address
  • Analogy, the postal system (sending a letter)
  • also called Statistical multiplexing
  • Note Split phase buses are sending packets

90
(Networks) Store and Forward vs. Cut-Through
  • Store-and-forward policy each switch waits for
    the full packet to arrive in switch before
    sending to the next switch (good for WAN)
  • Cut-through routing or worm hole routing switch
    examines the header, decides where to send the
    message, and then starts forwarding it
    immediately
  • In worm hole routing, when head of message is
    blocked, message stays strung out over the
    network, potentially blocking other messages
    (needs only buffer the piece of the packet that
    is sent between switches).
  • Cut through routing lets the tail continue when
    head is blocked, accordioning the whole message
    into a single switch. (Requires a buffer large
    enough to hold the largest packet).

91
(Networks) Broadband vs. Baseband
  • A baseband network has a single channel that is
    used for communication between stations. Ethernet
    specifications which use BASE in the name refer
    to baseband networks.
  • BASE refers to BASE BAND signaling. Only
    Ethernet signals are carried on the medium
  • A broadband network is much like cable
    television, where different services communicate
    across different frequencies on the same cable.
  • Broadband communications would allow a Ethernet
    network to share the same physical cable as voice
    or video services. 10BROAD36 is an example of
    broadband networking.

92
(Networks) Ethernet
  • The various Ethernet specifications include a
    maximum distance
  • What do we do if we want to go further?
  • Repeater
  • Hardware device used to extend a LAN
  • Amplifies all signals on one segment of a LAN and
    transmits them to another
  • Passes on whatever it receives (GIGO)
  • Knows nothing of packets, addresses
  • Any limit?

93
(Networks) Bridges
  • We want to improve performance over that provided
    by a simple repeater
  • Add functionality (i.e. more hardware)
  • Bridge can detect if a frame is valid and then
    (and only then) pass it to next segment
  • Bridge does not forward interference or other
    problems
  • Computers connected over a bridged LAN don't know
    that they are communicating over a bridge

94
(Networks) Network Interface Card
  • NIC
  • Sits on the host station
  • Allows a host to connect to a hub or a bridge
  • Hub merely extends multiple segments into
    single LAN do not help with performance only 1
    message can transmit at a time
  • If connected to a hub, then NIC has to use
    half-duplex mode of communication (i.e. it can
    only send or receive at a time)
  • If connected to a bridge, then NIC (if it is
    smart) can use either half/full duplex mode
  • Bridges learn Media Access Control (MAC) address
    and the speed of the NIC it is talking to.

95
(Networks) Routers
  • Routers
  • Devices that connect LANs to WANs or WANs to WANs
  • Resolve incompatible addresses (generally slower
    than bridges)
  • Divides interconnection networks into smaller
    subnets which simplifies manageability and
    security
  • Work much like bridges
  • Pay attention to the upper network layer
    protocols
  • (OSI layer 3) rather than physical layer (OSI
    layer 1) protocols.
  • (This will make sense later)
  • Will decide whether to forward a packet by
    looking at the protocol level addresses (for
    instance, TCP/IP addresses) rather than the MAC
    address.
  • (This will make sense later)

96
(Protocols) Recall
  • A protocol is the set of rules used to describe
    all of the hardware and (mostly) software
    operations used to send messages from Processor A
    to Processor B
  • Common practice is to attach headers/trailers to
    the actual payload forming a packet or frame.

97
(Protocols) Layering Advantages
  • Layering allows functionally partitioning the
    responsibilities (similar to having procedures
    for modularity in writing programs)
  • Alows easily integrating (plug and play) new
    modules at a particular layer without any changes
    to the other layers
  • Rigidity is only at the level of the interfaces
    between the layers, not in the implementation of
    these interfaces
  • By specifying the interfaces judiciously
    inefficiencies can be avoided

98
(Protocols) ISO Model Examples
7
User program
  • FTP

Presentation
6
Session
  • Sockets open/close/read/write interface

5
Kernel Software
Transport
  • TCP reliable infinite-length stream

4
Network
  • IP unreliable datagrams anywhere in
    world

3
  • Ethernet unreliable datagrams on local segment

Data Link
2
Hardware
  • 10baseT ethernet spec twisted pair w/RJ45s

Physical
1
99
(Protocols) Layering Summary
  • Key to protocol families is that communication
    occurs logically at the same level of the
    protocol, called peer-to-peer,
  • but is implemented via services at the next lower
    level
  • Encapsulation carry higher level information
    within lower level envelope
  • Fragmentation break packet into multiple smaller
    packets and reassemble
  • Danger is each level increases latency if
    implemented as hierarchy (e.g., multiple check
    sums)

100
Techniques Protocols Use
  • Sequencing for Out-of-Order Delivery
  • Sequencing to Eliminate Duplicate Packets
  • Retransmitting Lost Packets
  • Avoiding Replay Caused by Excessive Delay
  • Flow Control to Prevent Data Overrun
  • Mechanism to Avoid Network Congestion
  • Name Resolution (external to protocol really)

101
Internetworking (Protocols)
  • Different networking solutions exist
  • Why? No single networking technology is best for
    all needs
  • Universal service
  • System where arbitrary pairs of computers can
    communicate
  • Increases productivity
  • Networks, by themselves, are incompatible with
    universal service
  • Solution Internetworking or an internet

Literally Communicating between networks of the
same and/or different types
102
Encapsulate universal packetsin (any) local
network frame format
Used to send msg. from 1 network to another (or
wi/the same)but we want a uniform standard.
Frame Header
Frame Data
Used to communicate within 1 network
103
Physical Network Connection (Protocols)
Router
Router facilitates communication
between networks
Individual Networks
Each cloud represents arbitrary network
technology LAN, WAN, ethernet, token ring, ATM,
etc.
104
Layered Model (Protocols)
TCP/IP Model
Application
5
Transport
4
Internet
3
Network Interface
2
Physical
1
105
(Protocols) Layer upon layer upon layer...
  • Layer 1 Physical
  • Basic network hardware (same as ISO model Layer
    1)
  • Layer 2 Network Interface
  • How to organize data into frames and how to
    transmit over network (similar to ISO model Layer
    2)
  • Layer 3 Internet
  • Specify format of packets sent across the
    internet as well as forwarding mechanisms used by
    routers
  • Layer 4 Transport
  • Like ISO Layer 4 specifies how to ensure reliable
    transfer
  • Layer 5 Application
  • Corresponds to ISO Layers 6 and 7. Each Layer 5
    protocol specifies how one application uses an
    internet

106
IP Addressing
  • Each host in the internet must have a unique
    address
  • Users, application programs and software
    operating in the higher layers of the protocol
    stack use these addresses
  • In the IP protocol each host is assigned a unique
    32 bit address. Any packet destined for a host on
    the internet will contain the destination IP
    address.

107
IP Address Hierarchy
  • Addresses are broken into a prefix and a suffix
    for routing efficiency
  • The Prefix is uniquely assigned to an individual
    network.
  • The Suffix is uniquely assigned to a host within
    a given network

1
1
2
Network 1
Network 2
3
3
5
108
Five Classes of IP Address
Primary Classes
109
Computingthe Class (IP)
110
Classes and Dotted Decimal (IP)
  • Class
  • A
  • B
  • C
  • D
  • E
  • Range of Values
  • 0 through 127
  • 128 through 191
  • 192 through 223
  • 224 through 239
  • 240 through 255

Does this mean there are 64 Class B networks?
Does this mean there are 32 Class C networks?
(on the board)
111
Division of the Address Space (IP)
Address Class
Bits in Prefix
Maximum Number of Networks
Bits in Suffix
Maximum Number of Hosts per Network
A B C
7 14 21
128 16384 2097152
24 16 8
16777216 65536 256
(on the board)
112
Special IP Addresses
  • Network Address
  • Directed Broadcast Address
  • Limited Broadcast Address
  • This Computer Address
  • Loopback Address
  • Berkeley Broadcast Address Form

113
Routers and IP Addressing
  • Each host has an address
  • Each router has two (or more) addresses!
  • Why?
  • A router has connections to multiple physical
    networks
  • Each IP address contains a prefix that specifies
    a physical network
  • An IP address does not really identify a specific
    computer but rather a connection between a
    computer and a network.
  • A computer with multiple network connections
    (e.g. a router) must be assigned an IP address
    for each connection

114
Example (IP)
Ethernet 131.108.0.0
Token Ring 223.240.129.0
131.108.99.5
223.240.129.2
223.240.129.17
78.0.0.17
WAN 78.0.0.0
Note!
115
How to Resolve Addresses (IP)
  • Table Lookup
  • Store bindings/mapping in table which software
    can search
  • Closed-form Computation
  • Protocol addresses are chosen to allow
    computation of hardware address from protocol
    address using basic boolean and arithmetic
    operations
  • Message Exchange
  • Computers exchange messages across a network to
    resolve addresses. One computer sends a message
    requesting a translation and another computer
    replies

(more detail about items 1-3 on earlier slide)
116
Use of Default Routes (IP)
Simplified more
Node 1
Node 2
Node 3
Node 4
Dest
Next Hop
Dest
Next Hop
Dest
Next Hop
Dest
Next Hop
1
-
1
(2,1)
3
-
3
(4,3)

(1,2)
2
-
4
(3,4)
4
-
3
(2,3)

(3,2)

(4,2)
4
(2,4)
117
Error Reporting (ICMP)
  • TCP/IP includes a protocol used by IP to send
    messages when problems are detected Internet
    Control Message Protocol
  • IP uses ICMP to signal problems
  • ICMP uses IP to send messages
  • When IP detects an error (e.g. corrupt packet) it
    sends an ICMP packet

118
ICMP Message Transport
119
Services Provided by TCP
  • Connection Orientation
  • Point-To-Point Communication
  • Complete Reliability
  • Full Duplex Communication
  • Stream Interface
  • Reliable Connection Startup
  • Graceful Connection Shutdown

120
End to End Services (TCP)
  • TCP provides a connection from one application on
    a computer to an application on a remote computer
  • Connection is virtual - provided by software
    passing messages
  • TCP messages are encapsulated in IP Datagrams
  • Upon arrival IP passes the TCP message on to the
    TCP layer.
  • TCP exists at both ends of the connection but not
    at intermediate points (routers).

121
Network Nodes
Application Layer
RPC Layer
Network Layer
Remote Procedure Call
122
Remote Procedure Call
  • Client can run other processes while waiting for
    server to service request
  • Natural interface... just like procedure call
  • Implementation issues?

123
client
messages
server
User calls kernel to send RPC message to
procedure X
Remote Procedure Call
kernel sends message to nameserver to find port
number
To server From client Port Nameserver Re
Address for RPC X
Nameserver receives message, looks up answer
kernel places port P in user RPC message
To client From server Port kernel Re RPC
X Port P
Nameserver replies to client with Port P
kernel sends RPC
From client To server Port P ltContentsgt
daemon listening to port P receives message
kernel receives reply, passes it to user
From RPC Port P To client Port kernel ltOutputgt
daemon processes request and processes send output
124
RPC
  • User
  • No different from any program making use of
    procedure calls
  • User stub
  • Marshall arguments
  • Identify target procedure
  • Hand over to RPC runtime to deliver to callee
  • Server stub
  • Unmarshall the arguments
  • Make a normal procedure call in server using the
    arguments
  • User and server code part of distributed app
  • Stubs generated automatically

125
RPC
RPC runtime
user
user stub
server stub
server
call return
pack args unpack result
unpack args pack result
xmit wait rcv
rcv xmit
call exec return
Caller (Client) Node
Callee (Server) Node
Five pieces of program involved
Write a Comment
User Comments (0)
About PowerShow.com