Chapters 8 - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Chapters 8

Description:

processor-memory (short high speed, custom design) ... analysis, i.e., find the weakest link (see 'I/O System Design') Many new developments ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 30
Provided by: toda9
Category:
Tags: chapters

less

Transcript and Presenter's Notes

Title: Chapters 8


1
Chapters 8 9
  • I/O
  • Multiprocessors and Clusters
  • (partial coverage)

2
Interfacing Processors and Peripherals
  • Typical collection of I/O devices and
    interconnection

3
Interfacing Processors and Peripherals
  • I/O devices very diverse devices behavior
    (i.e., input vs. output) partner (who is at
    the other end?) data rate
  • I/O Design affected by many factors
    (expandability, resilience)
  • Performance access latency throughput
    connection between devices and the system the
    memory hierarchy the operating system
  • A variety of different users (e.g., banks,
    supercomputers, engineers)

4
I/O
  • Important but neglected The difficulties in
    assessing and designing I/O systems have often
    relegated I/O to second class status courses
    in every aspect of computing, from programming
    to computer architecture often ignore I/O or
    give it scanty coverage textbooks leave the
    subject to near the end, making it easier for
    students and instructors to skip it!
  • GUILTY! we wont be looking at I/O in much
    detail be sure and read Chapter 8 in its
    entirety. you should probably take a
    networking class!

5
I/O Example Disk Drives
  • To access data seek position head over the
    proper track (3 to 14 ms. avg.) rotational
    latency wait for desired sector (.5 / RPM)
    transfer grab the data (one or more sectors)
    30 to 80 MB/sec
  • Average disk access time average seek time
    average rotational delay transfer time
    controller overhead
  • Example pages 570 571

6
I/O Example Buses
  • Shared communication link (one or more wires)
  • Difficult design may be bottleneck length
    of the bus number of devices tradeoffs
    (buffers for higher bandwidth increases
    latency) support for many different devices
    cost
  • Types of buses processor-memory (short high
    speed, custom design) backplane (high speed,
    often standardized, e.g., PCI) I/O (lengthy,
    different devices, e.g., USB, Firewire)
  • Synchronous vs. Asynchronous use a clock and a
    synchronous protocol, fast and small but every
    device must operate at same rate and clock skew
    requires the bus to be short dont use a clock
    and instead use handshaking

7
I/O Bus Standards
  • Today we have two dominant bus standards

8
Other important issues
  • Bus Arbitration daisy chain arbitration (not
    very fair) centralized arbitration (requires
    an arbiter), e.g., PCI collision detection,
    e.g., Ethernet
  • Operating system polling interrupts
    direct memory access (DMA)
  • Performance Analysis techniques queuing
    theory simulation analysis, i.e., find the
    weakest link (see I/O System Design)
  • Many new developments

9
OS and I/O
  • Questions
  • How is a user I/O request transformed into a
    device command and communicated to the device?
  • How is data actually transferred to or from a
    memory location?
  • What is the role of the OS?
  • Why does OS assume major responsibility for
    handling I/O
  • I/O system is shared
  • Kernel mode is required to handle interrupts
  • Low-level I/O control is complicated
  • Three types of communication
  • OS gives commands to I/O devices
  • Devices notify OS operation completion or error
  • Data transfer between memory and I/O devices

10
I/O Techniques
  • Programmed I/O (polling)
  • Memory-mapped I/O vs. special I/O instructions
  • Polling required for flow control
  • Status register
  • Interrupt-driven I/O
  • Processor issues I/O command, does something
    useful while waiting for I/O completion
  • Pending I/O interrupt checked after execution an
    instruction is finished (compared to exception )
  • Has to know identity of device raising the
    interrupt
  • DMA (direct memory access)
  • Transfer data directly to or from memory without
    involving processor
  • Interrupt processor at completion

11
Interrupt-driven I/O
  • Interrupt priority levels
  • Steps handling interrupt
  • Determine which interrupts are enabled
  • Select highest priority of interrupts
  • Save status register (interrupt mask field)
  • Change interrupt mask field to disable interrupts
    of equal or lower priorities
  • Save process state
  • Set interrupt enable bit to allow higher-priority
    interrupts
  • Call appropriate interrupt service routine
  • Clear interrupt enable bit, restore interrupt
    mask and process state.

12
DMA
  • Steps in a DMA transfer
  • Processor sets up DMA
  • Identity of device, operation to perform,
    address, number of bytes to transfer
  • DMA transfers data
  • Cycle stealing (not an interrupt)
  • DMA interrupts processor upon completion
  • Complication with memory system
  • Virtual memory (virtual or physical addresses)
  • Cache (coherence problem)

13
Programmed I/O Polling
  • Assume
  • Clock cycles per polling operation 400
  • CPU clock rate 500 MHz
  • Determine the fraction of CPU time consumed for
    the following three cases
  • 1. Mouse, polled 30 times/sec
  • 2. Floppy disk, 16 bits transferred at a
    time data rate 50KB/sec
  • 3. Hard disk, 16 bytes transferred at a
    time data rate 4MB/sec External Devices

14
Interrupt-Driven I/O
  • Assume
  • Same processor and hard disk in previous example
  • Overhead for each transfer including interrupt
    500 cycles
  • Determine the fraction of CPU time consumed if
    hard disk only transfers 5 of the time

15
DMA
  • Assume
  • Same processor and hard disk in previous example
  • Overhead for handling interrupt at DMA completion
    500 cycles
  • Initial set up of a DMA transfer takes 1000 clock
    cycles
  • Average transfer size from disk 8KB
  • Determine the fraction of CPU time consumed if
    hard disk actively transfers 100 of the time,
  • Not including overhead of cycle stealing
  • Including overhead of cycle stealing (depending
    on DMA configuration)

16
Pentium 4
  • I/O Options

17
Fallacies and Pitfalls
  • Fallacy the rated mean time to failure of disks
    is 1,200,000 hours, so disks practically never
    fail.
  • Fallacy magnetic disk storage is on its last
    legs, will be replaced.
  • Fallacy A 100 MB/sec bus can transfer 100
    MB/sec.
  • Pitfall Moving functions from the CPU to the
    I/O processor, expecting to improve performance
    without analysis.

18
Multiprocessors
  • Idea create powerful computers by connecting
    many smaller ones good news works for
    timesharing (better than supercomputer) bad
    news its really hard to write good concurrent
    programs many commercial failures

19
Questions
  • How do parallel processors share data? single
    address space (SMP vs. NUMA) message passing
  • How do parallel processors coordinate?
    synchronization (locks, semaphores) built into
    send / receive primitives operating system
    protocols
  • How are they implemented? connected by a
    single bus connected by a network

20
Multiprocessor Cache Coherence
  • Snooping cache coherency
  • All cache controllers snoop on the bus to
    determine whether they have a copy of the shared
    block

21
Multiprocessor Cache Coherence
  • A write-invalidate cache coherence protocol

22
Multiprocessor Cache Coherence
  • A write-invalidate cache coherence protocol

23
Supercomputers
Plot of top 500 supercomputer sites over a decade
24
Using multiple processors an old idea
  • Some SIMD designs
  • Costs for the the Illiac IV escalated from 8
    million in 1966 to 32 million in 1972 despite
    completion of only ΒΌ of the machine. It took
    three more years before it was operational!
    For better or worse, computer architects are not
    easily discouragedLots of interesting designs
    and ideas, lots of failures, few successes

25
Topologies
26
Topologies
27
Clusters
  • Constructed from whole computers
  • Independent, scalable networks
  • Strengths
  • Many applications amenable to loosely coupled
    machines
  • Exploit local area networks
  • Cost effective / Easy to expand
  • Weaknesses
  • Administration costs not necessarily lower
  • Connected using I/O bus
  • Highly available due to separation of memories
  • In theory, we should be able to do better

28
Google
  • Serve an average of 1000 queries per second
  • Google uses 6,000 processors and 12,000 disks
  • Two sites in silicon valley, two in Virginia
  • Each site connected to internet using OC48 (2488
    Mbit/sec)
  • Reliability
  • On an average day, 20 machines need rebooted
    (software error)
  • 2 of the machines replaced each year
  • In some sense, simple ideas well executed.
    Better (and cheaper) than other approaches
    involving increased complexity

29
Concluding Remarks
  • Evolution vs. Revolution More often the
    expense of innovation comes from being too
    disruptive to computer users Acceptan
    ce of hardware ideas requires acceptance by
    software people therefore hardware people should
    learn about software. And if software people
    want good machines, they must learn more about
    hardware to be able to communicate with and
    thereby influence hardware engineers.
Write a Comment
User Comments (0)
About PowerShow.com