Chapter 8: Part II - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

Chapter 8: Part II

Description:

two clock cycles needed between ... 1 clock cycle to send address to memory. 40 cycles to read first 4 ... Assume the number of clock cycles for a polling ... – PowerPoint PPT presentation

Number of Views:13

Avg rating:3.0/5.0

Slides: 34

Provided by: WenH

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 8: Part II

1
Chapter 8 Part II

Storage, Network and Other Peripherals

2
Performance Analysis Sync. vs. Async.

Synchronous bus clock time50ns, each
transaction takes one clock cycle
Asynchronous bus 40 ns per handshake
Data portion32 bits
Question Find the bandwidth of each bus when
performing one-word reads from a 200ns memory.

3
Sync. vs. Async. Buses (I)

For the synchronous bus
Send the address to memory50 ns
Read the memory 200 ns
Send the data to the device 50 ns
Total time 300 ns, bandwidth4bytes/300ns13.3
MB/sec

4
Sync. vs. Async. Buses (II)

For the asynchronous bus
Step 1 40 ns
Step 2,3,4 max(3x40, 200ns)200ns
Step 5,6,7 3x40ns 120ns
Total time 360 ns, maximum bandwidth
4bytes/360ns 11.1 MB/s

5
Increasing Bus Bandwidth

Data bus width
Separate versus multiplexed address and data
lines
Block transfers

6
Performance Analysis of Two Bus Schemes

Given a system with
a memory and bus system supporting block access
of 4 to 16 words
a 64-bit synchronous bus clocked at 200MHz, with
each 64-bit transfer taking 1 clock cycle, and 1
clock cycle to send an address to memory
two clock cycles needed between each bus
operation
memory access for first 4 words takes 200ns, each
additional set of 4 words requires 20ns

7
Question

Find the sustained bandwidth and latency for a
read of 256 words for transfers using 4-word
blocks and 16-word blocks.
Find the effective number of bus transactions for
each case.

8
4-Word Block Transfer

1 clock cycle to send address to memory
200ns/(5ns/cycle) 40 cycles to read memory
2 cycles to send data from memory
2 idle cycles
Total 45 cycles
256 words requires 45x64 2880 cycles

9
4-Word Block Transfer

Latency 2880 cycles x 5ns/cycle 14400 ns
Number of bus transactions 64 x 1s/14400ns
4.44M transactions/s
Bandwidth (256x4 bytes)x 1/14400ns 71.11 MB/s

10
16-Word Block Transfer

1 clock cycle to send address to memory
40 cycles to read first 4 words from memory
2 cycles to send data, during which the read of
the next 4 words is started.
2 idle cycles between transfers, during which the
read of the next block is completed.
Need to repeat the last two steps 3 times to read
a total of 16 words.

11
16-Word Block Transfer

Total cycles required 1 40 4x(22) 57
cycles
256/1616 transactions are required
Total number of cycles required for 256 word
16x57 912 cycles, latency 4560 ns
Number of bus transactions 16 x 1s/4560ns
3.51M transactions/s
Bandwidth (256x4 bytes)x 1/4560ns 224.56 MB/

12
Bus Arbitration

Daisy chain arbitration (not very fair)
Centralized arbitration (requires an arbiter),
e.g., PCI
Self selection, e.g., NuBus used in Macintosh
Collision detection, e.g., Ethernet

13
Bus Standards

PCI ( a general purpose backplane bus)
SCSI (Small Computer System Interface)
IEEE 1394 (Firewire)
USB 2.0

Characteristic Firewire(1394) USB 2.0
Bus width 4 2
Clocking asynchronous asynchronous
Peak bandwidth 50MB/s (Firewire 400) 100MB/s (Firewire 800) 0.2 MB/s 1.5 MB/s 60 MB/s
Hot pluggable Yes Yes
Max of devices 63 127
Max. Bus length 4.5M 5M
14
Interfacing I/O Devices

How is a user I/O request transformed into a
device command and communicated to the device?
How is data actually transferred to or from a
memory location?
What is the role of the operating system?

15
Role of the OS

The OS plays a major role in handling I/O, in
that
I/O system is shared by multiple programs using
the processor
I/O system often use interrupts (cause transfer
to supervisor mode)
low-level control of I/O is complex

16
Communications between OS and I/O Devices

The OS must be able to give commands to I/O.
The I/O must be able to notify the OS when
operation is completed or error has occurred.
Data must be transferred between memory and an
I/O device.

17
Giving Commands to I/O

To give a command, the processor must be able to
address the device and to supply command words
memory-mapped I/O portions of the address space
is assigned to I/O devices
special I/O dedicated I/O instructions in the
processor.

18
Communicating with the Processor

Polling
Interrupts
DMA

19
Polling

Polling processor periodically checks the status
of I/O.
Overhead of polling in an I/O system
Example 1 mouse
Example 2 floppy disk
Example 3 hard disk

20
Mouse

Assume the number of clock cycles for a polling
operation, including transferring to the polling
routine, accessing the device, and restarting the
user program, is 400, with a 500 MHz clock.
The mouse must be polled 30 times a second to
ensure that no user movement is missed.
Fraction of CPU time 30x400/(500x106) 0.002

21
Floppy Disk

The floppy disk transfers data to the processor
in 16-bit units and has a data rate of 50KB/s.
Polling rate (50KB/s)/(2 Bytes/polling) 25K
polling/sec
Fraction of CPU time 25Kx400/(500x106) 2

22
Hard Disk

Transfer in 4-word blocks
transfer rate 4MB/s
Polling rate (4MB/s)/(4x4 Bytes/polling) 250K
polling/sec
Fraction of CPU time 250Kx400/(500x106) 20

23
Overhead of Polling

Can do the polling only when the device is
active, thus reducing the overhead.
However, the overhead is still significant,
resulting in another design called
interrupt-driven I/O.

24
Overhead of Interrupt-Driven I/O

Assume the overhead for each transfer, including
the interrupt, is 500 cycles.
Cycles per second for disk 250Kx500 125x106
cycles
Fraction of processor consumed
125x106/(500x106) 25
Assuming disk is transferring data 5 of the
time, fraction of CPU on average 25x51.25

25
Direct Memory Access(DMA)

If disk is transferring data most of the time,
the overhead for interrupt-driven I/O is still
high.
For high-bandwidth device, let the device
controller transfer data directly to or from the
memory without involving the processor, known as
direct memory access.
Interrupt is used to signal the completion of I/O
transfer or error.
Note How does it affect cache design?

26
Overhead of I/O Using DMA

Assume initial setup of DMA transfer takes 1000
cycles, handling of interrupt at DMA completion
takes 500 cycles, average transfer from disk is
8KB
Each DMA transfer takes 8KB/(4MB/s) 2x10-3s
If the disk is constantly transferring data, it
requires (1000500)/(2x10-3) 750x103 cycles
Fraction of CPU time 750x103/(500x106) 0.15

27
I/O System Design

Latency constraints ensuring the latency to
complete and I/O operation is bounded.
Bandwidth constraints
Performance Analysis techniques queuing
theory simulation analysis

28
I/O System Design- Example

CPU 3 BIPS, average 100,000 instructions in the
OS per I/O operation
backplane bus transfer rate 1000 MB/s
SCSI-Ultra 320 controller with transfer rate
320 MB/s, accommodating up to 7 disks
Disk bandwidth 75MB/s, seekrotational
latency6 ms
Workload 64-KB reads, user program need 200,000
instructions per I/O

29
Example