CS252 Graduate Computer Architecture Lecture 23 I/O Continued

About This Presentation

Title:

CS252 Graduate Computer Architecture Lecture 23 I/O Continued

Description:

Disk Latency = Queuing Time (12 4.2 1 2)ms = QT 19.2ms. Average Service Time = 19.2 ms ... Contents reconstructed from data redundantly stored in the ... – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 76

Provided by: davidapa6

Learn more at: https://people.eecs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: CS252 Graduate Computer Architecture Lecture 23 I/O Continued

1
CS252Graduate Computer ArchitectureLecture
23I/O Continued

November 17, 1999
Prof. John Kubiatowicz

2
Review Disk Device Terminology
Disk Latency Queuing Time Seek Time
Rotation Time Xfer Time Ctrl Time
Order of magnitude times for 4K byte transfers
Seek 12 ms or less Rotate 4.2 ms _at_ 7200 rpm
0.5 rev/(7200 rpm/60m/s) (8.3 ms _at_
3600 rpm ) Xfer 1 ms _at_ 7200 rpm (2 ms _at_ 3600
rpm) Ctrl 2 ms (big variation)
Disk Latency Queuing Time (12 4.2 1
2)ms QT 19.2ms Average Service Time 19.2 ms
3
But What about queue time?Or why nonlinear
response
Response Time (ms)
300
Metrics Response Time Throughput
200
100
0
100
0
Throughput ( total BW)
Response time Queue Device Service time
4
Departure to discuss queueing theory

(On board)

5
Introduction to Queueing Theory
Arrivals
Departures

More interested in long term, steady state than
in startup gt Arrivals Departures
Littles Law Mean number tasks in system
arrival rate x mean reponse time
Observed by many, Little was first to prove
Applies to any system in equilibrium, as long as
nothing in black box is creating or destroying
tasks

6
A Little Queuing Theory Notation

Queuing models assume state of equilibrium
input rate output rate
Notation
r average number of arriving customers/secondTs
er average time to service a customer
(tradtionally µ 1/ Tser )u server utilization
(0..1) u r x Tser (or u r / Tser
)Tq average time/customer in queue Tsys average
time/customer in system Tsys Tq
TserLq average length of queue Lq r x Tq
Lsys average length of system Lsys r x Tsys
Littles Law Lengthsystem rate x Timesystem
(Mean number customers arrival rate x mean
service time)

7
A Little Queuing Theory

Service time completions vs. waiting time for a
busy server randomly arriving event joins a
queue of arbitrary length when server is busy,
otherwise serviced immediately
Unlimited length queues key simplification
A single server queue combination of a servicing
facility that accomodates 1 customer at a time
(server) waiting area (queue) together called
a system
Server spends a variable amount of time with
customers how do you characterize variability?
Distribution of a random variable histogram?
curve?

8
A Little Queuing Theory

Server spends a variable amount of time with
customers
Weighted mean m1 (f1 x T1 f2 x T2 ... fn x
Tn)/F (Ff1 f2...)
variance (f1 x T12 f2 x T22 ... fn x Tn2)/F
m12
Must keep track of unit of measure (100 ms2 vs.
0.1 s2 )
Squared coefficient of variance C variance/m12
Unitless measure (100 ms2 vs. 0.1 s2)
Exponential distribution C 1 most short
relative to average, few others long 90 lt 2.3 x
average, 63 lt average
Hypoexponential distribution C lt 1 most close
to average, C0.5 gt 90 lt 2.0 x average, only
57 lt average
Hyperexponential distribution C gt 1 further
from average C2.0 gt 90 lt 2.8 x average, 69 lt
average

Avg.
9
A Little Queuing Theory Variable Service Time

Server spends a variable amount of time with
customers
Weighted mean m1 (f1xT1 f2xT2 ... fnXTn)/F
(Ff1f2...)
Squared coefficient of variance C
Disk response times C 1.5 (majority seeks lt
average)
Yet usually pick C 1.0 for simplicity
Another useful value is average time must wait
for server to complete task m1(z)
Not just 1/2 x m1 because doesnt capture
variance
Can derive m1(z) 1/2 x m1 x (1 C)
No variance gt C 0 gt m1(z) 1/2 x m1

10
A Little Queuing TheoryAverage Wait Time

Calculating average wait time in queue Tq
If something at server, it takes to complete on
average m1(z)
Chance server is busy u average delay is u x
m1(z)
All customers in line must complete each avg
Tser
Tq u x m1(z) Lq x Ts er 1/2 x u x Tser
x (1 C) Lq x Ts er Tq 1/2 x u x Ts er x
(1 C) r x Tq x Ts er Tq 1/2 x u x Ts er
x (1 C) u x TqTq x (1 u) Ts er x u
x (1 C) /2Tq Ts er x u x (1 C) / (2 x
(1 u))
Notation
r average number of arriving customers/secondTs
er average time to service a customeru server
utilization (0..1) u r x TserTq average
time/customer in queueLq average length of
queueLq r x Tq

11
A Little Queuing Theory M/G/1 and M/M/1

Assumptions so far
System in equilibrium
Time between two successive arrivals in line are
random
Server can start on next customer immediately
after prior finishes
No limit to the queue works First-In-First-Out
Afterward, all customers in line must complete
each avg Tser
Described memoryless or Markovian request
arrival (M for C1 exponentially random),
General service distribution (no restrictions), 1
server M/G/1 queue
When Service times have C 1, M/M/1 queueTq
Tser x u x (1 C) /(2 x (1 u)) Tser x
u / (1 u)
Tser average time to service a
customeru server utilization (0..1) u r x
TserTq average time/customer in queue

12
A Little Queuing Theory An Example

processor sends 10 x 8KB disk I/Os per second,
requests service exponentially distrib., avg.
disk service 20 ms
On average, how utilized is the disk?
What is the number of requests in the queue?
What is the average time spent in the queue?
What is the average response time for a disk
request?
Notation
r average number of arriving customers/second
10Tser average time to service a customer 20
ms (0.02s)u server utilization (0..1) u r x
Tser 10/s x .02s 0.2Tq average time/customer
in queue Tser x u / (1 u) 20 x
0.2/(1-0.2) 20 x 0.25 5 ms (0 .005s)Tsys
average time/customer in system Tsys Tq Tser
25 msLq average length of queueLq r x Tq
10/s x .005s 0.05 requests in queueLsys
average tasks in system Lsys r x Tsys
10/s x .025s 0.25

13
A Little Queuing Theory Another Example

processor sends 20 x 8KB disk I/Os per sec,
requests service exponentially distrib., avg.
disk service 12 ms
On average, how utilized is the disk?
What is the number of requests in the queue?
What is the average time a spent in the queue?
What is the average response time for a disk
request?
Notation
r average number of arriving customers/second
20Tser average time to service a customer 12
msu server utilization (0..1) u r x Tser
20/s x .012s 0.24Tq average time/customer in
queue Ts er x u / (1 u) 12 x
0.24/(1-0.24) 12 x 0.32 3.8 msTsys average
time/customer in system Tsys Tq Tser 15.8
msLq average length of queueLq r x Tq 20/s
x .0038s 0.076 requests in queue Lsys average
tasks in system Lsys r x Tsys 20/s x
.016s 0.32

14
A Little Queuing TheoryYet Another Example

Suppose processor sends 10 x 8KB disk I/Os per
second, squared coef. var.(C) 1.5, avg. disk
service time 20 ms
On average, how utilized is the disk?
What is the number of requests in the queue?
What is the average time a spent in the queue?
What is the average response time for a disk
request?
Notation
r average number of arriving customers/second
10Tser average time to service a customer 20
msu server utilization (0..1) u r x Tser
10/s x .02s 0.2Tq average time/customer in
queue Tser x u x (1 C) /(2 x (1 u))
20 x 0.2(2.5)/2(1 0.2) 20 x 0.32 6.25 ms
Tsys average time/customer in system Tsys Tq
Tser 26 msLq average length of queueLq r x
Tq 10/s x .006s 0.06 requests in
queueLsys average tasks in system Lsys r x
Tsys 10/s x .026s 0.26

15
Manufacturing Advantages
of Disk Arrays
Disk Product Families
Conventional 4 disk designs
14
10
5.25
3.5
High End
Low End
Disk Array 1 disk design
3.5
16
Replace Small of Large Disks with Large of
Small Disks! (1988 Disks)
IBM 3390 (K) 20 GBytes 97 cu. ft. 3 KW 15
MB/s 600 I/Os/s 250 KHrs 250K
IBM 3.5" 0061 320 MBytes 0.1 cu. ft. 11 W 1.5
MB/s 55 I/Os/s 50 KHrs 2K
x70 23 GBytes 11 cu. ft. 1 KW 120 MB/s 3900
IOs/s ??? Hrs 150K
Data Capacity Volume Power Data Rate I/O
Rate MTTF Cost
large data and I/O rates high MB per cu. ft.,
high MB per KW reliability?
Disk Arrays have potential for
17
Array Reliability

Reliability of N disks Reliability of 1 Disk
N
50,000 Hours 70 disks 700 hours
Disk system MTTF Drops from 6 years to 1
month!
Arrays (without redundancy) too unreliable to
be useful!

Hot spares support reconstruction in parallel
with access very high media availability can be
achieved
18
Redundant Arrays of Disks
Files are "striped" across multiple
spindles Redundancy yields high data
availability
Disks will fail Contents reconstructed from data
redundantly stored in the array
Capacity penalty to store it Bandwidth penalty
to update
Mirroring/Shadowing (high capacity
cost) Horizontal Hamming Codes
(overkill) Parity Reed-Solomon Codes Failure
Prediction (no capacity overhead!) VaxSimPlus
Technique is controversial
Techniques
19
Redundant Arrays of DisksRAID 1 Disk
Mirroring/Shadowing
recovery group
Each disk is fully duplicated onto its
"shadow" Very high availability can be
achieved Bandwidth sacrifice on write
Logical write two physical writes Reads may
be optimized Most expensive solution 100
capacity overhead
Targeted for high I/O rate , high availability
environments
20
Redundant Arrays of Disks RAID 3 Parity Disk
10010011 11001101 10010011 . . .
P
logical record
1 0 0 1 0 0 1 1
1 1 0 0 1 1 0 1
1 0 0 1 0 0 1 1
0 0 1 1 0 0 0 0
Striped physical records
Parity computed across recovery group to
protect against hard disk failures 33
capacity cost for parity in this configuration
wider arrays reduce capacity costs, decrease
expected availability, increase
reconstruction time Arms logically
synchronized, spindles rotationally synchronized
logically a single high capacity, high
transfer rate disk
Targeted for high bandwidth applications
Scientific, Image Processing
21
Redundant Arrays of Disks RAID 5 High I/O Rate
Parity
Increasing Logical Disk Addresses
D0
D1
D2
D3
P
A logical write becomes four physical
I/Os Independent writes possible because
of interleaved parity Reed-Solomon Codes ("Q")
for protection during reconstruction
D4
D5
D6
P
D7
D8
D9
P
D10
D11
D12
P
D13
D14
D15
Stripe
P
D16
D17
D18
D19
Targeted for mixed applications
Stripe Unit
D20
D21
D22
D23
P
. . .
. . .
. . .
. . .
. . .
Disk Columns
22
Problems of Disk Arrays Small Writes
RAID-5 Small Write Algorithm
1 Logical Write 2 Physical Reads 2 Physical
Writes
D0
D1
D2
D3
D0'
P
old data
new data
old parity
(1. Read)
(2. Read)
XOR

XOR
(3. Write)
(4. Write)
D0'
D1
D2
D3
P'
23
Subsystem Organization
array controller
host
single board disk controller
host adapter
manages interface to host, DMA
single board disk controller
control, buffering, parity logic
single board disk controller
physical device control
single board disk controller
striping software off-loaded from host to array
controller no applications modifications no
reduction of host performance
often piggy-backed in small format devices
24
System Availability Orthogonal RAIDs
Array Controller
String Controller
. . .
String Controller
. . .
String Controller
. . .
String Controller
. . .
String Controller
. . .
String Controller
. . .
Data Recovery Group unit of data redundancy
Redundant Support Components fans, power
supplies, controller, cables
End to End Data Integrity internal parity
protected data paths
25
System-Level Availability
host
host
Fully dual redundant
I/O Controller
I/O Controller
Array Controller
Array Controller
. . .
. . .
. . .
Goal No Single Points of Failure
. . .
. . .
. . .
with duplicated paths, higher performance can
be obtained when there are no failures
Recovery Group
26
Review Storage System Issues

Historical Context of Storage I/O
Secondary and Tertiary Storage Devices
Storage I/O Performance Measures
Processor Interface Issues
A Little Queuing Theory
Redundant Arrarys of Inexpensive Disks (RAID)
I/O Buses
ABCs of UNIX File Systems
I/O Benchmarks
Comparing UNIX File System Performance

27
CS 252 Administrivia

Upcoming schedule of project events in CS 252
Friday Nov 12 finish I/O? Start
multiprocessing/networking
Remaining 3 lectures before Thanksgiving
multiprocessing
Wednesday Dec 1 Midterm I
Friday Dec 3 Esoteric computation.
Quantum/DNA/Nano computing
Next week Midproject meetings. Tuesday?
(Sharad?)
Tue/Wed Dec 7/8 for oral reports?
Friday Dec 10 project reports due.Get
moving!!!

28
What is a bus?

A Bus Is
shared communication link
single set of wires used to connect multiple
subsystems
A Bus is also a fundamental tool for composing
large, complex systems
systematic means of abstraction

29
Buses
30
Advantages of Buses
I/O Device
I/O Device
I/O Device

Versatility
New devices can be added easily
Peripherals can be moved between computersystems
that use the same bus standard
Low Cost
A single set of wires is shared in multiple ways

31
Disadvantage of Buses
I/O Device
I/O Device
I/O Device

It creates a communication bottleneck
The bandwidth of that bus can limit the maximum
I/O throughput
The maximum bus speed is largely limited by
The length of the bus
The number of devices on the bus
The need to support a range of devices with
Widely varying latencies
Widely varying data transfer rates

32
General Organization of a Bus
Control Lines
Data Lines

Control lines
Signal requests and acknowledgments
Indicate what type of information is on the data
lines
Data lines carry information between the source
and the destination
Data and Addresses
Complex commands

33
Master versus Slave
Master issues command
Bus Master
Bus Slave
Data can go either way

A bus transaction includes two parts
Issuing the command (and address) request
Transferring the data
action
Master is the one who starts the bus transaction
by
issuing the command (and address)
Slave is the one who responds to the address by
Sending data to the master if the master ask for
data
Receiving data from the master if the master
wants to send data

34
DMA (Direct Memory Access)?

Typical I/O devices must transfer large amounts
of data to memory of processor
Disk must transfer complete block (4K? 16K?)
Large packets from network
Regions of frame buffer
DMA gives external device ability to write memory
directly much lower overhead than having
processor request one word at a time.
Processor (or at least memory system) acts like
slave
Issue Cache coherence
What if I/O devices write data that is currently
in processor Cache?
The processor may never see new data!
Solutions
Flush cache on every I/O operation (expensive)
Have hardware invalidate cache lines (remember
Coherence cache misses?)

35
Types of Buses

Processor-Memory Bus (design specific)
Short and high speed
Only need to match the memory system
Maximize memory-to-processor bandwidth
Connects directly to the processor
Optimized for cache block transfers
I/O Bus (industry standard)
Usually is lengthy and slower
Need to match a wide range of I/O devices
Connects to the processor-memory bus or backplane
bus
Backplane Bus (standard or proprietary)
Backplane an interconnection structure within
the chassis
Allow processors, memory, and I/O devices to
coexist
Cost advantage one bus for all components

36
Example Pentium System Organization
Processor/Memory Bus
PCI Bus
I/O Busses
37
A Computer System with One Bus Backplane Bus
Backplane Bus
Processor
Memory
I/O Devices

A single bus (the backplane bus) is used for
Processor to memory communication
Communication between I/O devices and memory
Advantages Simple and low cost
Disadvantages slow and the bus can become a
major bottleneck
Example IBM PC - AT

38
A Two-Bus System

I/O buses tap into the processor-memory bus via
bus adaptors
Processor-memory bus mainly for processor-memory
traffic
I/O buses provide expansion slots for I/O
devices
Apple Macintosh-II
NuBus Processor, memory, and a few selected I/O
devices
SCCI Bus the rest of the I/O devices

39
A Three-Bus System

A small number of backplane buses tap into the
processor-memory bus
Processor-memory bus is only used for
processor-memory traffic
I/O buses are connected to the backplane bus
Advantage loading on the processor bus is
greatly reduced

40
North/South Bridge architectures separate buses
Processor Memory Bus
backside cache
Bus Adaptor
I/O Bus
Backplane Bus
Bus Adaptor
I/O Bus

Separate sets of pins for different functions
Memory bus
Caches
Graphics bus (for fast frame buffer)
I/O buses are connected to the backplane bus
Advantage
Buses can run at different speeds
Much less overall loading!

41
What defines a bus?
Transaction Protocol
Timing and Signaling Specification
Bunch of Wires
Electrical Specification
Physical / Mechanical Characterisics the
connectors
42
Synchronous and Asynchronous Bus

Synchronous Bus
Includes a clock in the control lines
A fixed protocol for communication that is
relative to the clock
Advantage involves very little logic and can run
very fast
Disadvantages
Every device on the bus must run at the same
clock rate
To avoid clock skew, they cannot be long if they
are fast
Asynchronous Bus
It is not clocked
It can accommodate a wide range of devices
It can be lengthened without worrying about clock
skew
It requires a handshaking protocol

43
Busses so far
Master
Slave

Control Lines
Address Lines
Data Lines

Bus Master has ability to control the bus,
initiates transaction
Bus Slave module activated by the transaction
Bus Communication Protocol specification of
sequence of events and timing requirements in
transferring information.
Asynchronous Bus Transfers control lines (req,
ack) serve to orchestrate sequencing.
Synchronous Bus Transfers sequence relative to
common clock.

44
Bus Transaction

Arbitration Who gets the bus
Request What do we want to do
Action What happens in response

45
Arbitration Obtaining Access to the Bus
Control Master initiates requests
Bus Master
Bus Slave
Data can go either way

One of the most important issues in bus design
How is the bus reserved by a device that wishes
to use it?
Chaos is avoided by a master-slave arrangement
Only the bus master can control access to the
bus
It initiates and controls all bus requests
A slave responds to read and write requests
The simplest system
Processor is the only bus master
All bus requests must be controlled by the
processor
Major drawback the processor is involved in
every transaction

46
Multiple Potential Bus Masters the Need for
Arbitration

Bus arbitration scheme
A bus master wanting to use the bus asserts the
bus request
A bus master cannot use the bus until its request
is granted
A bus master must signal to the arbiter after
finish using the bus
Bus arbitration schemes usually try to balance
two factors
Bus priority the highest priority device should
be serviced first
Fairness Even the lowest priority device should
never be completely locked out
from the bus
Bus arbitration schemes can be divided into four
broad classes
Daisy chain arbitration
Centralized, parallel arbitration
Distributed arbitration by self-selection each
device wanting the bus places a code indicating
its identity on the bus.
Distributed arbitration by collision detection
Each device just goes for it. Problems
found after the fact.

47
The Daisy Chain Bus Arbitrations Scheme
Device 1 Highest Priority
Device N Lowest Priority
Device 2
Grant
Grant
Grant
Release
Bus Arbiter
Request
wired-OR

Advantage simple
Disadvantages
Cannot assure fairness A low-priority
device may be locked out indefinitely
The use of the daisy chain grant signal also
limits the bus speed

48
Centralized Parallel Arbitration
Device 1
Device N
Device 2
Req
Grant
Bus Arbiter

Used in essentially all processor-memory busses
and in high-speed I/O busses

49
Simplest bus paradigm

All agents operate synchronously
All can source / sink data at same rate
gt simple protocol
just manage the source and target

50
Simple Synchronous Protocol
BReq
BG
R/W Address
CmdAddr
Data1
Data2
Data

Even memory busses are more complex than this
memory (slave) may take time to respond
it may need to control data rate

51
Typical Synchronous Protocol
BReq
BG
R/W Address
CmdAddr
Wait
Data1
Data2
Data1
Data

Slave indicates when it is prepared for data xfer
Actual transfer goes at bus rate

52
Increasing the Bus Bandwidth

Separate versus multiplexed address and data
lines
Address and data can be transmitted in one bus
cycleif separate address and data lines are
available
Cost (a) more bus lines, (b) increased
complexity
Data bus width
By increasing the width of the data bus,
transfers of multiple words require fewer bus
cycles
Example SPARCstation 20s memory bus is 128 bit
wide
Cost more bus lines
Block transfers
Allow the bus to transfer multiple words in
back-to-back bus cycles
Only one address needs to be sent at the
beginning
The bus is not released until the last word is
transferred
Cost (a) increased complexity (b)
decreased response time for request

53
Increasing Transaction Rate on Multimaster Bus

Overlapped arbitration
perform arbitration for next transaction during
current transaction
Bus parking
master can holds onto bus and performs multiple
transactions as long as no other master makes
request
Overlapped address / data phases (prev. slide)
requires one of the above techniques
Split-phase (or packet switched) bus
completely separate address and data phases
arbitrate separately for each
address phase yield a tag which is matched with
data phase
All of the above in most modern memory buses

54
1993 MP Server Memory Bus Survey GTL revolution

Bus MBus Summit Challenge XDBus
Originator Sun HP SGI Sun
Clock Rate (MHz) 40 60 48 66
Address lines 36 48 40 muxed
Data lines 64 128 256 144 (parity)
Data Sizes (bits) 256 512 1024 512
Clocks/transfer 4 5 4?
Peak (MB/s) 320(80) 960 1200 1056
Master Multi Multi Multi Multi
Arbitration Central Central Central Central
Slots 16 9 10
Busses/system 1 1 1 2
Length 13 inches 12? inches 17 inches

55
Asynchronous Handshake
Write Transaction
Address Data Read Req Ack
Master Asserts Address
Next Address
Master Asserts Data
t0 t1 t2 t3 t4
t5

t0 Master has obtained control and asserts
address, direction, data
Waits a specified amount of time for slaves to
decode target
t1 Master asserts request line
t2 Slave asserts ack, indicating data received
t3 Master releases req
t4 Slave releases ack

56
Read Transaction
Address Data Read Req Ack
Master Asserts Address
Next Address
Slave Data
t0 t1 t2 t3 t4
t5

t0 Master has obtained control and asserts
address, direction, data
Waits a specified amount of time for slaves to
decode target\
t1 Master asserts request line
t2 Slave asserts ack, indicating ready to
transmit data
t3 Master releases req, data received
t4 Slave releases ack

57
1993 Backplane/IO Bus Survey

Bus SBus TurboChannel MicroChannel PCI
Originator Sun DEC IBM Intel
Clock Rate (MHz) 16-25 12.5-25 async 33
Addressing Virtual Physical Physical Physical
Data Sizes (bits) 8,16,32 8,16,24,32 8,16,24,32,64
8,16,24,32,64
Master Multi Single Multi Multi
Arbitration Central Central Central Central
32 bit read (MB/s) 33 25 20 33
Peak (MB/s) 89 84 75 111 (222)
Max Power (W) 16 26 13 25

58
High Speed I/O Bus

Examples
graphics
fast networks
Limited number of devices
Data transfer bursts at full rate
DMA transfers important
small controller spools stream of bytes to or
from memory
Either side may need to squelch transfer
buffers fill up

59
PCI Read/Write Transactions

All signals sampled on rising edge
Centralized Parallel Arbitration
overlapped with previous transaction
All transfers are (unlimited) bursts
Address phase starts by asserting FRAME
Next cycle initiator asserts cmd and address
Data transfers happen on when
IRDY asserted by master when ready to transfer
data
TRDY asserted by target when ready to transfer
data
transfer when both asserted on rising edge
FRAME deasserted when master intends to complete
only one more data transfer

60
PCI Read Transaction
Turn-around cycle on any signal driven by more
than one agent
61
PCI Write Transaction
62
PCI Optimizations

Push bus efficiency toward 100 under common
simple usage
like RISC
Bus Parking
retain bus grant for previous master until
another makes request
granted master can start next transfer without
arbitration
Arbitrary Burst length
initiator and target can exert flow control with
xRDY
target can disconnect request with STOP (abort or
retry)
master can disconnect by deasserting FRAME
arbiter can disconnect by deasserting GNT
Delayed (pended, split-phase) transactions
free the bus after request to slow device

63
Summary

Buses are an important technique for building
large-scale systems
Their speed is critically dependent on factors
such as length, number of devices, etc.
Critically limited by capacitance
Tricks esoteric drive technology such as GTL
Important terminology
Master The device that can initiate new
transactions
Slaves Devices that respond to the master
Two types of bus timing
Synchronous bus includes clock
Asynchronous no clock, just REQ/ACK strobing
Direct Memory Access (dma) allows fast, burst
transfer into processors memory
Processors memory acts like a slave
Probably requires some form of cache-coherence so
that DMAed memory can be invalidated from cache.

64
Summary of Bus Options

Option High performance Low cost
Bus width Separate address Multiplex address
data lines data lines
Data width Wider is faster Narrower is cheaper
(e.g., 32 bits) (e.g., 8 bits)
Transfer size Multiple words has Single-word
transfer less bus overhead is simpler
Bus masters Multiple Single master (requires
arbitration) (no arbitration)
Clocking Synchronous Asynchronous

65
Processor Interface Issues

Processor interface
Interrupts
Memory mapped I/O
I/O Control Structures
Polling
Interrupts
DMA
I/O Controllers
I/O Processors
Capacity, Access Time, Bandwidth
Interconnections
Busses

66
I/O Interface
CPU
Memory
memory bus
Independent I/O Bus
Seperate I/O instructions (in,out)
Interface
Interface
Peripheral
Peripheral
CPU
Lines distinguish between I/O and memory
transfers
common memory I/O bus
40 Mbytes/sec optimistically 10 MIP
processor completely saturates the bus!
VME bus Multibus-II Nubus
Memory
Interface
Interface
Peripheral
Peripheral
67
Memory Mapped I/O
CPU
Single Memory I/O Bus No Separate I/O
Instructions
ROM
RAM
Memory
Interface
Interface
Peripheral
Peripheral
CPU

I/O
L2
Memory Bus
I/O bus
Memory
Bus Adaptor
68
Programmed I/O (Polling)
CPU
Is the data ready?
busy wait loop not an efficient way to use the
CPU unless the device is very fast!
no
Memory
IOC
yes
read data
device
but checks for I/O completion can be dispersed
among computationally intensive code
store data
done?
no
yes
69
Interrupt Driven Data Transfer
CPU
add sub and or nop
user program
(1) I/O interrupt
(2) save PC
Memory
IOC
(3) interrupt service addr
device
read store ... rti
interrupt service routine
User program progress only halted during
actual transfer 1000 transfers at 1 ms each
1000 interrupts _at_ 2 µsec per interrupt
1000 interrupt service _at_ 98 µsec each 0.1 CPU
seconds
(4)
memory
-6
Device xfer rate 10 MBytes/sec gt 0 .1 x 10
sec/byte gt 0.1 µsec/byte
gt 1000 bytes
100 µsec 1000 transfers x 100 µsecs 100 ms
0.1 CPU seconds
Still far from device transfer rate! 1/2 in
interrupt overhead
70
Direct Memory Access
Time to do 1000 xfers at 1 msec each
1 DMA set-up sequence _at_ 50 µsec 1 interrupt _at_ 2
µsec 1 interrupt service sequence _at_ 48
µsec .0001 second of CPU time
CPU sends a starting address, direction, and
length count to DMAC. Then issues "start".
0
CPU
ROM
Memory Mapped I/O
RAM
Memory
DMAC
IOC
device
Peripherals
DMAC provides handshake signals for
Peripheral Controller, and Memory Addresses and
handshake signals for Memory.
DMAC
n
71
Input/Output Processors
D1
IOP
CPU
D2
main memory bus
Mem
. . .
Dn
I/O bus
target device
where cmnds are
CPU IOP
issues instruction to IOP interrupts when done
OP Device Address
(4)
(1)
looks in memory for commands
(2)
(3)
memory
OP Addr Cnt Other
what to do
special requests
Device to/from memory transfers are controlled by
the IOP directly. IOP steals memory cycles.
where to put data
how much
72
Relationship to Processor Architecture

I/O instructions have largely disappeared
Interrupt vectors have been replaced by jump
tablesPC lt- M IVA interrupt number PC lt-
IVA interrupt number
Interrupts
Stack replaced by shadow registers
Handler saves registers and re-enables higher
priority int's
Interrupt types reduced in number handler must
query interrupt controller

73
Relationship to Processor Architecture

Caches required for processor performance cause
problems for I/O
Flushing is expensive, I/O polutes cache
Solution is borrowed from shared memory
multiprocessors "snooping"
Virtual memory frustrates DMA
Load/store architecture at odds with atomic
operations
load locked, store conditional
Stateful processors hard to context switch

74
Summary

Disk industry growing rapidly, improves
bandwidth 40/yr ,
areal density 60/year, /MB faster?
queue controller seek rotate transfer
Advertised average seek time benchmark much
greater than average seek time in practice
Response time vs. Bandwidth tradeoffs
Queueing theory or
Value of faster response time
0.7sec off response saves 4.9 sec and 2.0 sec
(70) total time per transaction gt greater
productivity
everyone gets more done with faster response,
but novice with fast response expert with slow

75
Summary Relationship to Processor Architecture