Title: CS 2200 Lecture 18 IO 1
1CS 2200 Lecture 18I/O (1)
- (Lectures based on the work of Jay Brockman,
Sharon Hu, Randy Katz, Peter Kogge, Bill Leahy,
Ken MacKenzie, Richard Murphy, and Michael
Niemier)
2What is it exactly?
- To anyone in computer science or computer
engineering I/O probably has many different
meanings - My research in computer architecture focuses on
processor design - So I/O generally just involves a processor/memory
interface - For a DRAM chip designer, I/O might involve
- A processor/memory interface
- A memory/disk interface
- For an OS designer, I/O might be
- An interrupt from a device, input from the user,
etc. etc. - Basically it can mean lots of different things
- In computer architecture levels of memory
hierarchy beyond main memory are often ignored
3Why study I/O?
- Weve talked a lot about the CPU time metric
- (In fact Ive probably stressed it quite a bit!)
- CPU time is important
- for measuring how fast an instruction or program
is actually executed - But whats perhaps more important is response
time - The time between when the user types a command
and when the results appear - This might be a better measure of performance
- A brief study of I/O will help complete the
picture of a general computer architecture or
organization
4A quick example
- The difference between CPU time and response time
is 10 - (So, I/O overhead basically adds 10 to our
execution time before user sees results) - Can speed up CPU by a factor of 10, but I/O
overhead/time will stay the same - With no changes in the I/O performance, Amdahls
law states that - Well only get a speedup of 5.5! ½ of CPU
improvement is wasted - What if we make the CPU 100 times faster?
- Well only get a speed up of 10! 90 of speedup
wasted! - With CPU performance skyrocketing, if we dont
improve I/O, tasks will become I/O bound
5Our Road Map
Processor
Memory Hierarchy
I/O Subsystem
Parallel Systems
Networking
6Five Classic Components of a Computer Systemall
computers since 1946
7... and the software abstractions atop them!
operating systems, networking
Computation Processes Threads
Communication I/O devices, the internet
Storage Virtual Memory, Files
8I/0 Plan
- I/O devices in general
- magnetic disks in particular
- networks in particular
- Hardware interface issues
- tradeoff of performance and convenience
- dealing with external events
- Software abstractions
- example filesystems
- disk head scheduling
- POSIX models all I/O as files device drivers
9Roots of I/O Devices
- Telecommunications
- Smoke signals
- Drums
- Optical telegraphy
- Electrical telegraphy
- Wireless telegraphy
- Teletype
- Television
- etc.
10Roots of I/O Devices
- Control/Tabulation
- Jacquards Punched Cards
- Holleriths tabulating machines
- IBM
- Other technologies
- Magnetic recording
- Wire
- Tape
11Characterization
- Behavior
- Input
- Output
- Storage
Computer
12Characterization
Computer
13Characterization
Computer
14Types
- Device Behavior Partner DataRate
kb/s - Keyboard I Human 0.01
- Mouse I Human 0.02
- Voice Input I Human 0.02
- Scanner I Human 400
- Voice Output O Human 0.6
- Line Printer O Human 1
- Laser Printer O Human 200
- Graphics Display O Human 60,000
- Modem IO Machine 8
- Network IO Machine 6,000
- Floppy Disk S Machine 100
- Optical Disk S Machine 1,000
- Magnetic Tape S Machine 2,000
- Magnetic Disk S Machine 10,000
15Mouse
I got the idea for the mouse while attending a
talk at a computer conference. The speaker was so
boring that I started daydreaming and hit upon
the idea. Doug Englebart
- Uses mechanical counters or optical devices to
generate pulses which increment or decrement
counters - Counter values determined by polling.
16Magnetic Disks
- Drums
- Disks
- Removable disk packs
- Floppy disk
- Invented for IBM Field Engineers
- Contact
- Slow speed
17Magnetic Disks
- Most common form of long term, rewriteable
storage devices - Usually considered the lowest level of memory
hierarchy - How does a magnetic disk work?
- Collection of platters rotates on a spindle at
some RPM - Platters are metal disks covered with magnetic
recording material on both sides - Disk diameters can vary
- Usually the wider faster, narrower cheaper
- Disk surface divided into tracks which are
divided into sectors - Sectors are the smallest unit that can be written
18A disk, pictorially
- When accessing data we read or write to a sector
- All sectors the same size, outer tracks just less
dense - To read or write, moveable arm with read/write
head moves over each surface - Cylinder all tracks under the arms at a given
point on all surfaces - To read or write
- Disk controller moves arm over proper track a
seek - The time to move is called the seek time
- When sector found, data is transferred
19Disk Terminology
Cylinder Track 'x' on all platters/surfaces
20The speed of light? No.
- Time required for a requested track sector to
rotate under the read/write head is called the
rotation latency or rotational delay - Involves mechanical components on the order of
milliseconds - i.e. were no longer moving at the speed of light
like in our CPU! - Time required to actually write or read data is
called the transfer time - (a function of block size, rotation speed,
recording density on a track, and speed of the
electronics connecting the disk to the computer)
21Disk odds n ends
- Often transfer time is a very small portion of a
full access - Its possible to use techniques (discussed in
caches) to help reduce disk overhead. Any
thoughts? - To help reduce complexity theres usually
additional HW called a disk controller - Disk controller helps manage disk accesses
- but also adds more overhead controller time
- (Can also have a queuing delay)
- (Time spent waiting for a disk to become free if
its already in use for another access)
22Example average disk access time
- What is the average time to read or write a
512-byte sector for a typical disk? - The average seek time is given to be 9 ms
- The transfer rate is 4 MB per second
- The disk rotates at 7200 RPM
- The controller overhead is 1 ms
- The disk is currently idle before any requests
are made (so there is no queuing delay) - Average disk access time average seek time
average rotational delay transfer time
controller overhead
23Capacity trends and disks
- Capacity of disks usually referred to as areal
density
Cost for 1GB of magnetic disk space has
decreased/ will decrease almost exponentially
over time!
24Magnetic Disks short overview
- Hard disk
- Higher speed (3600 - 7200)
- Larger
- Higher Density
- Multiple platters
- Performance
- Seek time (8-20 ms or faster)
- Rotational latency (4-8 ms)
- Transfer rate 2-40 MB/sec
25Disk Latency
Disk Latency Queuing Time Controller time
Seek Time Rotation Time Transfer Time
Order of magnitude times for 4K byte transfers
Seek 8 ms or less Rotate 4.2 ms _at_ 7200
rpm Transfer 1 ms _at_ 7200 rpm
26Technology Trends
Disk Capacity now doubles every 18
months before 1990 every 36 months
Today Processing Power Doubles Every 18
months Today Memory Size Doubles Every 18
months(4X/3yr) Today Disk Capacity Doubles
Every 18 months Disk Positioning Rate (Seek
Rotate) Doubles Every Ten Years!
The I/O GAP
27Historical Perspective
- 1956 IBM Ramac early 1970s Winchester
- Developed for mainframes
- Had proprietary interfaces
- Steady shrink in form factor 27 in. to 14 in.
- 1970s developments
- 5.25 inch floppy disk formfactor (microcode into
mainframe) - early emergence of industry standard disk
interfaces - ST506, SASI, SMD, ESDI
28Historical Perspective
- Early 1980s
- PCs and first generation workstations
- Mid 1980s
- Client/server computing
- Centralized storage on file server
- accelerates disk downsizing 8 inch to 5.25 inch
- Mass market disk drives become a reality
- industry standards SCSI, IPI, IDE
- 5.25 inch drives for standalone PCs, End of
proprietary interfaces
29Disk History
Data density Mbit/sq. in.
Capacity of Unit Shown Megabytes
1973 1. 7 Mbit/sq. in 140 MBytes
1979 7. 7 Mbit/sq. in 2,300 MBytes
source New York Times, 2/23/98, page C3,
Makers of disk drives crowd even more data into
even smaller spaces
30Historical Perspective
- Late 1980s/Early 1990s
- Laptops, notebooks, (palmtops)
- 3.5 inch, 2.5 inch, (1.8 inch formfactors)
- Formfactor plus capacity drives market, not so
much performance - Recently Bandwidth improving at 40/ year
- Challenged by DRAM, flash RAM in PCMCIA cards
- still expensive
- unattractive MBytes per cubic inch
- Optical disk fails on performace (e.g., NEXT) but
finds niche (CD ROM)
31Disk History
1989 63 Mbit/sq. in 60,000 MBytes
1997 1450 Mbit/sq. in 2300 MBytes
1997 3090 Mbit/sq. in 8100 MBytes
source New York Times, 2/23/98, page C3,
Makers of disk drives crowd even more data into
even smaller spaces
32Something cool
- This iPod mini
- A 4 GB disk in a 2 x 3.6 x 0.5 space
33MBits per square inch DRAM as of Disk over
time
9 v. 22 Mb/si
470 v. 3000 Mb/si
0.2 v. 1.7 Mb/si
source New York Times, 2/23/98, page C3,
Makers of disk drives crowd even more data into
even smaller spaces
34Technology Trends
Capacity Speed (latency) Logic 2x in 3
years 2x in 3 years DRAM 4x in 3 years 2x in
10 years Disk 4x in 3 years 2x in 10 years
35Technology Trends
Optics
Capacity Speed (latency) Logic 2x in 3
years 2x in 3 years DRAM 4x in 3 years 2x in
10 years Disk 4x in 3 years 2x in 10 years
Aerodynamics
Mechanics
DRAM-to-Disk density ratio
Disk 3000Mb/inin DRAM 470Mb/inin
36Magnetic Disks
illustration source unknown
37Second Major Example Networks
- Examples
- System Area Networks (SP2) 100s nodes 25
meters per link - Local Area Networks (Ethernet) 100s nodes
1000 meters - Wide Area Network (ATM) 1000s nodes 5,000,000
meters
a.k.a. end systems, hosts
a.k.a. network, communication subnet
Interconnection Network
38ABCs of Networks
- Starting Point Send bits between 2 computers
- Queue (FIFO) on each end
- Information sent called a message
- Can send both ways (Full Duplex)
- Rules for communication? protocol
- Inside a computer
- Loads/Stores Request (Address) Response (Data)
- Need Request Response signaling
39Trivial Example
- What is the format of mesage?
- Fixed? Number bytes?
Request/ Response
Address/Data
1 bit
32 bits
0 Please send data from Address 1 Packet
contains data corresponding to request
- Header/Trailer information to deliver a message
- Payload data in message (1 word above)
40Extensions
- What if more than 2 computers want to
communicate? - Need computer address field (destination) in
packet - What if packet is garbled in transit?
- Add error detection field in packet (e.g., CRC)
- What if packet is lost?
- More elaborate protocols to detect loss
(e.g., NAK, ARQ, time outs) - What if multiple processes/machine?
- Queue per process to provide protection
- Simple questions such as these lead to elaborate
protocols and packet formats complexity - note complexity often slow
41A Simple Example Revisted
- What is the format of packet?
- Fixed? Number bytes?
Address/Data
CRC
Code
2 bits
32 bits
4 bits
00 RequestPlease send data from Address 01
ReplyPacket contains data corresponding to
request 10 Acknowledge request 11 Acknowledge
reply
42Network Media
- There are different ways to connect computers
together - Can kind of think of it like a memory hierarchy
- Different kinds of media vary in cost,
performance, and reliability - There are several different kinds well consider
- Twisted Pair
- Coaxial Cable
- Fiber Optics
- Air
43Twisted pair media
- Just a twisted pair of copper wires
- Insulated, about 1mm thick
- Twisted together to reduce electrical
interference - Makes sure we dont turn it into an antenna!
- Data transfer speeds of
- A few Mbs over a few kilometers 10s of Mbs over
shorter distances - Uses
- Used lots in the telephone industry
- OK for LANs because of reasonable data transfer
rates
44Coaxial (coax) cable
- A picture of it is included below
- Consists of copper center surrounded by
insulator, a mesh, and a plastic coating - Originally developed for cable companies to
transmit at a higher rate over a few kms - Good bandwidth 50 ohm coax cable can deliver 10
Mbs over a kilometer - Good for LAN
45Coax cable junctions
- Its harder to connect things to this media
however - One method is the T-junction
- The typical way this is handled
- Cable cut in 2 and a connector is inserted that
reconnects the cable and adds a 3rd wire to the
computer - But, if you add a new connector, you have to
split the network and therefore bring it down for
a short period of time - Additional maintenance is a headache b/c any user
can disconnect the network - Better the vampire tap
- Drill a hole to terminate in the copper core
- Screw in connector no cable cut, no network
down time
46Fiber optics
- Replaces copper with plastic and electrons with
photons - Information is now transmitted via pulses of
light - Usually, 3 basic components
- Transmission medium fiber optic cable
- Light source LED or laser diode
- Light detector photodiode
- A simplex media data can only go in 1 direction
- How it works
47Fiber optics how it really works
- Because light is bent/refracted at interfaces, it
can slowly spread out as it travels down the
diameter of a cable - Unless that is we transfer a single wavelength of
light - Then itll travel in a straight line
- With this in mind, let consider the 2 kinds of
fiber optic cable - Multimode Fiber
- Allows light to be dispersed
- Uses inexpensive LEDs
- Useful for transmissions of about 2 kms 600 Mbs
in 1995 - Single-mode Fiber
- A single-wavelength fiber
- Uses more expensive laser diodes as light sources
- Transmits Gbs over 100s of kms great for phone
companies!
48Fiber optics practical issues
- Single mode fiber is a better transmitter but
more difficult to attach connectors - Also, less reliable, more expensive, cant bend
as much - Usually in LAN, multimode is the weapon of
choice - So, how do you connect fiber optics to a
computer? - Passive Mode
- Taps are fused into the fiber and a photodiode
looks at passing light - Electrical output passes to the computer
interface - A failure cuts off just 1 computer
- Active Mode
- Really a break in the cable
- Light converted to electrical signals, sent to
computer, converted back to light, sent back down
cable - Problem tap failure causes net failure
- Advantage light source refreshed, can go longer
distances
49Some comparisons
50The bottom line
- Bandwidth problems can be fixed with more money
for more wires - Improving your latency is somewhat more difficult
to do - After all, 299792.5 km/s is kinda fixed
51I/O Device Summary
- Disks/Networks very different but consider these
similarities - Data handled in batches (sectors, messages)
- Lots of waiting around for external events
- Compatibility is important (more than
performance) - Reliability is important (and requires work to
achieve) - Slow devices are simple (and boring)
- Fast devices may be substantially autonomous
- graphics
52I/O Hardware Interface Issues
53I/O Hardware
- Basic memory-map w/polling and/or interrupts
- Project 2!
- Advanced bus issues
- Performance vs. compatibility - multiple busses
- Namespaces
- Smart device controllers
- Direct Memory Access (DMA)
- Arbitration
- Caching issues
- I/O processors
- the wheel of reincarnation
54Basic I/O devices as memorya la project 2
55Performance vs. Compatibility
- Problem
- Processor - memory is a performance-crucial path
... improve as often as possible! - I/O controllers made by many vendors ... change
is expensive!
56(No Transcript)
57(No Transcript)
58Multiple Busses
Cache Bus e.g. 256b, 533MHz
Memory Bus e.g. 64b, 533MHz
Processor
interrupts
Cache
I/O Bus e.g. 64b, 66MHz
Memory Bus
bridge
Main Memory
I/O Bus (e.g. PCI)
I/O Controller
I/O Controller
I/O Controller
Disk Drive Bus e.g. SCSI 16b, 20MHz
Graphics
Disk
Disk
Network
59Namespaces
- Two issues
- Separate namespace for I/O devices or not?
- kind of a historical curiousity at this point
- Assigning the namespace(s) you do have
- physical memory space
- I/O space (if any)
- Interrupt space
- blackboard talk/brainstorming about how to
allocate namespace to devices - 1. fixed, i.e. by slot number
- 2. jumpers on the boards (ISA did this)
- 3. software jumpers configured at boot time
(PCI)
60Smart Device Controllers
61Polling
- Computer
- Busy bit set? Yes. Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
62Polling
- Computer
- Busy bit set? No.
- Set write bit in command register
- Write a byte (or word) of data to Data-out
- Set command ready bit in control register
- Busy bit set? Yes.
- Busy bit set? Yes.
- Controller
- Controller clears busy bit
- Sees command ready
- Set busy bit
63Polling
- Computer
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? No.
- Controller
- Checks write bit
- Reads data-out
- Does I/O with device
- Clears command ready bit
- Clears error bit
- Clears busy bit
64Polling
- Appropriate when controller and device very fast
- Very inefficient when most of the time the
controller is busy - Better solution...Interrupts
65Interrupts
- Used by I/O controllers to communicate to
Processor - Also used by applications to communicate with
operating system - Software Interrupt or Trap
- OS can now use same device registers as before
66Interrupts
- Processor
- Initiate I/O
- Context switch to something else
- Receive interrupt transfer to handler
- Interrupt handler processes data, returns from
interrupt - Resume processing of interrupted task
- I/O Controller
- Initiate I/O with physical device
- Completion (good or bad)
- Generate interrupt
67DMA
- Preceding scheme effective but wasteful for large
blocks of data. - Using sophisticated general-purpose processor for
very specialized function - Solution Add enough processing power to device
controller (and possibly bus controller) to allow
direct transfer between device and memory.
68DMA
N
Processor tells controller to make DMA
transfer. Assume disk to memory. (Includes N
number of bytes)
69DMA
N
Controller gets sector of data from disk.
70DMA
N-1
Controller transfers one word to memory and
updates count. Checks for termination. If not...
71DMA
N-2
Controller transfers one word to memory and
updates count. Checks for termination. If not...
72DMA
N-3
Controller transfers one word to memory and
updates count. Checks for termination. If not...
73DMA
N-4
Controller transfers one word to memory and
updates count. Checks for termination. If not...
74DMA
N-5
Controller transfers one word to memory and
updates count. Checks for termination. If not...
75DMA
0
Controller transfers one word to memory and
updates count. Checks for termination. If done...
76DMA
Controller interrupts processor
77DMA
Processor acknowledges interrupt
78DMA
Controller sends interrupt vector
79DMA
Processor can now have scheduler take
appropriate action (i.e. move process waiting
for I/O into ready queue, etc.)
80Arbitration
- DMA implies multiple owners of the bus
- must decide who owns the bus from cycle to cycle
- Arbitration
- Daisy chain
- Centralized parallel arbitration
- Distributed arbitration by self selection
- Distributed arbitration by collision detection
81Daisy Chain
Simple but not fair and slow.
82Centralized Parallel Arbitration
- Requires central arbiter
- Each device has separate line
- Central arbiter may become bottleneck
- Used in PCI bus
83Distributed Arbitration by Self Selection
- Each device sees all requestors
- Priority scheme allows each to know if they get
bus - Requires lots of request lines
- Used by Apple NuBus (backplane)
84Distributed Arbitration by Collision Detection
- Devices independently request bus
- Devices have ability to detect simultaneous
requests or Collisions. - Upon collision a variety of schemes are used to
select among requestors - Used by Ethernet
85Caching Issues
- What happens if the processor has a cached copy
of data when a device does DMA? - blackboard talk ... short answer is that
theres a cache coherance problem the DMA may
change memory and the processor doesnt see the
change. Two solutions - Device driver (software) flushes cache before
using DMA - Elaborate bus hardware maintains consistency by
checking the cache on every external bus
transaction
86Smart(er) Device Controllers
D1
bridge
CPU
D2
main memory bus
Mem
. . .
Dn
I/O bus
- So far smart meant devices could support DMA
- Note, the bridge could be the unit to support
DMA - Why not add even more functionality than that?
87Input/Output Processors
D1
IOP
CPU
D2
main memory bus
Mem
. . .
Dn
I/O bus
target device
where cmnds are
CPU IOP
issues instruction to IOP interrupts when done
OP Device Address
(4)
(1)
looks in memory for commands
(2)
(3)
memory
OP Addr Cnt Other
what to do
special requests
Device to/from memory transfers are controlled by
the IOP directly. IOP steals memory cycles.
where to put data
how much
88Input/Output Processors?
D1
IOP
CPU
D2
main memory bus
Mem
. . .
Dn
I/O bus
?
D1
CPU
CPU
D2
main memory bus
Mem
. . .
Dn
I/O bus
89wheel of reincarnation
- Start with simple devices
- Add cute functionality
- Add lots of functionality
- Declare it to be a processor in its own right
- Repeat...
- Graphics community has been around this wheel a
couple of times now.
90Summary
- Example Devices
- often work in blocks
- spend lots of time waiting
- Bus Issues
- memory map w/polling and/or interrupts (project
2) - Performance vs. compatibility - multiple busses
- Namespaces
- Smart device controllers
- Direct Memory Access (DMA)
- Arbitration
- Caching issues
- I/O processors
- the wheel of reincarnation
91I/O Software Abstraction