Chapters 8

About This Presentation

Title:

Chapters 8

Description:

processor-memory (short high speed, custom design) ... analysis, i.e., find the weakest link (see 'I/O System Design') Many new developments ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 30

Provided by: toda9

Category:

Tags: chapters

more less

Transcript and Presenter's Notes

Title: Chapters 8

1
Chapters 8 9

I/O
Multiprocessors and Clusters
(partial coverage)

2
Interfacing Processors and Peripherals

Typical collection of I/O devices and
interconnection

3
Interfacing Processors and Peripherals

I/O devices very diverse devices behavior
(i.e., input vs. output) partner (who is at
the other end?) data rate
I/O Design affected by many factors
(expandability, resilience)
Performance access latency throughput
connection between devices and the system the
memory hierarchy the operating system
A variety of different users (e.g., banks,
supercomputers, engineers)

4
I/O

Important but neglected The difficulties in
assessing and designing I/O systems have often
relegated I/O to second class status courses
in every aspect of computing, from programming
to computer architecture often ignore I/O or
give it scanty coverage textbooks leave the
subject to near the end, making it easier for
students and instructors to skip it!
GUILTY! we wont be looking at I/O in much
detail be sure and read Chapter 8 in its
entirety. you should probably take a
networking class!

5
I/O Example Disk Drives

To access data seek position head over the
proper track (3 to 14 ms. avg.) rotational
latency wait for desired sector (.5 / RPM)
transfer grab the data (one or more sectors)
30 to 80 MB/sec
Average disk access time average seek time
average rotational delay transfer time
controller overhead
Example pages 570 571

6
I/O Example Buses

Shared communication link (one or more wires)
Difficult design may be bottleneck length
of the bus number of devices tradeoffs
(buffers for higher bandwidth increases
latency) support for many different devices
cost
Types of buses processor-memory (short high
speed, custom design) backplane (high speed,
often standardized, e.g., PCI) I/O (lengthy,
different devices, e.g., USB, Firewire)
Synchronous vs. Asynchronous use a clock and a
synchronous protocol, fast and small but every
device must operate at same rate and clock skew
requires the bus to be short dont use a clock
and instead use handshaking

7
I/O Bus Standards

Today we have two dominant bus standards

8
Other important issues

Bus Arbitration daisy chain arbitration (not
very fair) centralized arbitration (requires
an arbiter), e.g., PCI collision detection,
e.g., Ethernet
Operating system polling interrupts
direct memory access (DMA)
Performance Analysis techniques queuing
theory simulation analysis, i.e., find the
weakest link (see I/O System Design)
Many new developments

9
OS and I/O

Questions
How is a user I/O request transformed into a
device command and communicated to the device?
How is data actually transferred to or from a
memory location?
What is the role of the OS?
Why does OS assume major responsibility for
handling I/O
I/O system is shared
Kernel mode is required to handle interrupts
Low-level I/O control is complicated
Three types of communication
OS gives commands to I/O devices
Devices notify OS operation completion or error
Data transfer between memory and I/O devices

10
I/O Techniques

Programmed I/O (polling)
Memory-mapped I/O vs. special I/O instructions
Polling required for flow control
Status register
Interrupt-driven I/O
Processor issues I/O command, does something
useful while waiting for I/O completion
Pending I/O interrupt checked after execution an
instruction is finished (compared to exception )
Has to know identity of device raising the
interrupt
DMA (direct memory access)
Transfer data directly to or from memory without
involving processor
Interrupt processor at completion

11
Interrupt-driven I/O

Interrupt priority levels
Steps handling interrupt
Determine which interrupts are enabled
Select highest priority of interrupts
Save status register (interrupt mask field)
Change interrupt mask field to disable interrupts
of equal or lower priorities
Save process state
Set interrupt enable bit to allow higher-priority
interrupts
Call appropriate interrupt service routine
Clear interrupt enable bit, restore interrupt
mask and process state.

12
DMA

Steps in a DMA transfer
Processor sets up DMA
Identity of device, operation to perform,
address, number of bytes to transfer
DMA transfers data
Cycle stealing (not an interrupt)
DMA interrupts processor upon completion
Complication with memory system
Virtual memory (virtual or physical addresses)
Cache (coherence problem)

13
Programmed I/O Polling

Assume
Clock cycles per polling operation 400
CPU clock rate 500 MHz
Determine the fraction of CPU time consumed for
the following three cases
1. Mouse, polled 30 times/sec
2. Floppy disk, 16 bits transferred at a
time data rate 50KB/sec
3. Hard disk, 16 bytes transferred at a
time data rate 4MB/sec External Devices

14
Interrupt-Driven I/O

Assume
Same processor and hard disk in previous example
Overhead for each transfer including interrupt
500 cycles
Determine the fraction of CPU time consumed if
hard disk only transfers 5 of the time

15
DMA

Assume
Same processor and hard disk in previous example
Overhead for handling interrupt at DMA completion
500 cycles
Initial set up of a DMA transfer takes 1000 clock
cycles
Average transfer size from disk 8KB
Determine the fraction of CPU time consumed if
hard disk actively transfers 100 of the time,
Not including overhead of cycle stealing
Including overhead of cycle stealing (depending
on DMA configuration)

16
Pentium 4

I/O Options

17
Fallacies and Pitfalls

Fallacy the rated mean time to failure of disks
is 1,200,000 hours, so disks practically never
fail.
Fallacy magnetic disk storage is on its last
legs, will be replaced.
Fallacy A 100 MB/sec bus can transfer 100
MB/sec.
Pitfall Moving functions from the CPU to the
I/O processor, expecting to improve performance
without analysis.

18
Multiprocessors

Idea create powerful computers by connecting
many smaller ones good news works for
timesharing (better than supercomputer) bad
news its really hard to write good concurrent
programs many commercial failures

19
Questions

How do parallel processors share data? single
address space (SMP vs. NUMA) message passing
How do parallel processors coordinate?
synchronization (locks, semaphores) built into
send / receive primitives operating system
protocols
How are they implemented? connected by a
single bus connected by a network

20
Multiprocessor Cache Coherence

Snooping cache coherency
All cache controllers snoop on the bus to
determine whether they have a copy of the shared
block

21
Multiprocessor Cache Coherence

A write-invalidate cache coherence protocol

22
Multiprocessor Cache Coherence

A write-invalidate cache coherence protocol

23
Supercomputers
Plot of top 500 supercomputer sites over a decade
24
Using multiple processors an old idea

Some SIMD designs
Costs for the the Illiac IV escalated from 8
million in 1966 to 32 million in 1972 despite
completion of only ¼ of the machine. It took
three more years before it was operational!
For better or worse, computer architects are not
easily discouragedLots of interesting designs
and ideas, lots of failures, few successes

25
Topologies
26
Topologies
27
Clusters

Constructed from whole computers
Independent, scalable networks
Strengths
Many applications amenable to loosely coupled
machines
Exploit local area networks
Cost effective / Easy to expand
Weaknesses
Administration costs not necessarily lower
Connected using I/O bus
Highly available due to separation of memories
In theory, we should be able to do better

28
Google

Serve an average of 1000 queries per second
Google uses 6,000 processors and 12,000 disks
Two sites in silicon valley, two in Virginia
Each site connected to internet using OC48 (2488
Mbit/sec)
Reliability
On an average day, 20 machines need rebooted
(software error)
2 of the machines replaced each year
In some sense, simple ideas well executed.
Better (and cheaper) than other approaches
involving increased complexity

29
Concluding Remarks

Evolution vs. Revolution More often the
expense of innovation comes from being too
disruptive to computer users Acceptan
ce of hardware ideas requires acceptance by
software people therefore hardware people should
learn about software. And if software people
want good machines, they must learn more about
hardware to be able to communicate with and
thereby influence hardware engineers.

Write a Comment

User Comments (0)

About PowerShow.com

Chapters 8 - PowerPoint PPT Presentation

Chapters 8

processor-memory (short high speed, custom design) ... analysis, i.e., find the weakest link (see 'I/O System Design') Many new developments ... – PowerPoint PPT presentation