Title: CH3 CPUs
1CH3 CPUs
2summary
- Input and output mechanisms.
- Supervisor mode, exceptions, and traps.
- Memory management and address translation.
- Caches.
- How architecture affects program performance.
- How architecture affects program power
consumption.
33.1 Introduction
4outline
- aspects of CPUs that do not directly relate to
their instruction sets - interrupts and memory management
- performance and power consumption
5outline
- 3.2 study input and output mechanisms such as
interrupts - 3.3 several mechanisms designed to handle
internal events - 3.4 co-processors that provide optional support
for parts of the instruction set - 3.5 memory systems, memory management and caches
6outline
- 3.6 looks at performance
- 3.7 considers power consumption
- 3.8 data compressor example
73.2 Programming Input and Output
8- basics of I/O programming
- basic characteristics of I/O devices
93.2.1 Input and Output Devices
10Structure of a typical I/O device
- Input and output devices usually have some analog
or nonelectronic component - relationship between I/O device and CPU
- Registers interface between CPU and device's
internals - CPU talks to the device by reading and writing
the registers
11Structure of a typical I/O device
12Structure of a typical I/O device
- Data registers hold data values, such as the
data read or written by a disk. - Status registers provide information about the
device's operation
13Ex1. 8251 UART
- 8251 UART (Universal Asynchronous
Receiver/Transmitter) the original device used
for serial communications - Data are transmitted as streams of characters
- Every character starts with a start bit (a 0) and
a stop bit (a 1)
14Ex1. 8251 UART
- baud rate data bits are sent as high and low
voltages at a uniform rate - CPU must set the UART's mode registers
- - baud rate
- - data bits 5-8bits
- - parity bit even, odd,none
- - stop bit 1, 1.5, or 2 bits
15Ex1. 8251 UART
- 8-bit register buffers characters between the
UART and the CPU bus. - Transmitter Ready output transmitter is ready to
accept a data character - Transmitter Empty signal goes high when the UART
has no characters to send. - Receiver Ready goes high when UART has a
character ready to be read by CPU.
163.2.2 Input and Output Primitives
17programming support for input and output
- I/O instructions
- - special instructions (Intel x86) for input and
output - memory-mapped I/O
- provides addresses for the registers in each I/O
device - read and write instructions communicate with the
devices
18Ex1. Memory-Mapped I/O on ARM
- use the EQU pseudo-op to define a symbolic name
for the memory location of our I/O
device - DEV1 EQU 0x1000
19Ex1. Memory-Mapped I/O on ARM
- read and write the device register
- LDR r1,DEV1 set up device address
- LDR r0,r1 read DEV1
- LDR r0,8 set up value to write
- STR r0,r1 write 8 to device
20Ex2. Memory-Mapped I/O on SHARC
- A memory-mapped I/O device must be assigned
within the external memory space, which starts at
0x400000. - use a DM access to read and write the off-chip
device register - I0 0x400000
- M0 0
- R1 DM(i0,M0)
21write I/O devices in C
- read and write arbitrary memory locations are
peek and poke - The peek function written in C as
- int peek(char location)
- return location
-
- define DEV1 0x1000
- dev_status peek(DEVl)
22write I/O devices in C
- poke function can be implemented as
- void poke(char location, char newval)
- (location) newval
-
- write 8 to the status register
- poke(DEV1,8)
233.2.3 Busy-Wait I/O
24busy-wait I/O
- Devices are slower than the CPU and require many
cycles to complete an operation. - CPU must wait for one operation to complete
before starting the next one - polling Asking an I/O device whether it is
finished by reading its status register
25Ex3-3 Busy-Wait I/O Programming
- write a sequence of characters to an output
device - two registers one for the character to be
written and a status register - status register's value is 1 when the device is
busy writing and 0 when the write transaction has
completed
26Ex3-3 Busy-Wait I/O Programming
- register addresses
- define 0UT_CHAR 0x1000 / output device
character register / - define OUT_STATUS 0x1001 / output device status
register /
27Ex3-3 Busy-Wait I/O Programming
- sequence of characters is stored in a standard C
string, which is terminated by a null (0)
character - char mystring "Hello, world." / string to
write / - char current_char / pointer to current
position in string /
28Ex3-3 Busy-Wait I/O Programming
- current_char mystring
- / point to head of string /
- while (current_char ! '\0')
- / until null character /
- poke(OUT_CHAR,current_char)
- / send character to device /
- while (peek(OUT_STATUS) ! 0)
- / keep checking status /
- current_char / update character pointer /
29Ex3-4 Copy Characters from Input to Output Using
Busy-Wait I/O
- repeatedly read a character from the input device
and write it to the output device - define addresses for the device registers
- define IN_DATA 0x1000
- define IN_STATUS 0x1001
- define 0UT_DATA 0x1100
- define OUT_STATUS 0x1101
30Ex3-4 Copy Characters from Input to Output Using
Busy-Wait I/O
- The input device
- sets status register to 1 when a new character
has been read - set the status register 0 after character has
been read - When writing
- set the output status register to 1 to start
writing and wait for it to return to 0
31- while (TRUE) / perform operation forever /
- / read a character into achar /
- while (peek(IN_STATUS) 0) / wait until ready
/ - achar (char)peek(IN_DATA) / read the
character / - / write achar /
- poke(OUT_DATA,achar)
- poke(OUT_STATUS,l) / turn on device /
- while (peek(OUT_STATUS) ! 0) / wait until done
/
323.2.4 Interrupts
- Busy-wait I/O is inefficient the CPU does
nothing but test the device status - CPU could work in parallel with the I/O
transaction - - computation
- - control of other I/O devices.
33interrupt
- interrupt mechanism allows devices to signal CPU
and to force execution of a particular piece of
code - At interrupt, the program counter point to an
interrupt handler routine (device driver)
writing the next data, reading data - CPU can return to the program that was
interrupted
34interrupt
35interrupt
- interface between the CPU and I/O device includes
the following signals - I/O device asserts the interrupt request signal
when it wants service - CPU asserts the interrupt acknowledge signal when
it is ready to handle the I/O device's request
36interrupt
- The interrupt handler operates much like a
subroutine, except that it is not called by the
executing program - The program that runs when no interrupt is being
handled is often called the foreground program - when the interrupt handler finishes, it returns
to the foreground program
37ex3-5 Copy Characters from Input to Output with
Basic Interrupts
- repeatedly read a character from an input device
and write it to an output device - use a global variable achar for the input handler
to pass the character to the foreground program - use a global Boolean variable, gotchar, to signal
when a new character has been received
38- void input_handler() / get a character and put
in global / - achar peek(IN_DATA) / get character /
- gotchar TRUE / signal to main program /
- poke(IN_STATUS,0) / reset status to initiate
next transfer / -
- void output_handler() / react to character
being sent / - / don't have to do anything /
39ex3-5 Copy Characters from Input to Output with
Basic Interrupts
- main()
- while (TRUE) / read then write forever /
- if (gotchar) / write a character /
- poke(OUT_DATA,achar) / put character in
device / - poke(OUT_STATUS,l) / set status to initiate
write / - gotchar FALSE / reset flag /
40Ex3-6 Copy Characters from Input to Output with
Interrupt and Buffer
- performs reads and writes independently.
- The read and write routines communicate through
the following global variables. - string io_buf hold a queue of characters that
have been read but not yet written. - integers buf_start and buf_end point to the
first and last characters read. - integer error set to 0 whenever io_buf overflows
41Ex3-6 Copy Characters from Input to Output with
Interrupt and Buffer
- input and output devices allow to run at
different rates - queue io_buf acts as a wraparound buffer
- add characters to the tail
- take characters from the head
42Ex3-6 Copy Characters from Input to Output with
Interrupt and Buffer
- When head and tail are equal, the queue is empty
43Ex3-6 Copy Characters from Input to Output with
Interrupt and Buffer
- When the buffer is full, we leave one character
in the buffer unused
44(No Transcript)
45Debug interrupt
- interrupt can occur at any time means that the
same bug can manifest itself in different ways
when the interrupt handler interrupts different
segments of the foreground program
46Ex3-7 Debugging Interrupt Code
- Y Axb
- for (i 0 i lt M i)
- yi bi
- for (j 0 j lt N j)
- yi yi Ai,jxj
47Ex3-7 Debugging Interrupt Code
- Assume read_handler has a bug that causes it to
change the value of j - Any CPU register that is written by the interrupt
handler must be saved before it is modified and
restored before the handler exits
48implement
- The CPU implements interrupts by checking the
interrupt request line at the beginning of
execution of every instruction - If an interrupt request asserted, CPU does not
fetch curent instruction - The starting address of the interrupt handler is
usually given as a pointer
49interrupts and subroutines
- interrupt handler must return to the foreground
program without disturbing the foreground
program's operation - Most CPUs use the same basic mechanism for
remembering the foreground program's PC as is
used for subroutines - interrupt mechanism puts the return address on a
stack
50Priorities and Vectors
- interrupts can be generalized to handle multiple
devices and to provide more flexible definitions - - interrupt priorities CPU to recognize some
interrupts as more important than others - - interrupt vectors allow the interrupting
device to specify its handler
51Prioritized interrupts
- Prioritized interrupts
- - allow multiple devices to be connected
- - allow the CPU to ignore less important
interrupt requests - the lower-numbered interrupt lines are given
higher priority
52Prioritized device interrupts
- most CPUs provide the priority number in binary
form
53change the priority
- How do we change the priority of a device?
- Simply by connecting it to a different interrupt
request line - This requires hardware modification
- programmable switches, or make the change easy
54Nested interrupt
- Masking CPU stores the priority level of
interrupt in an internal register - When a subsequent interrupt occur,
- - checked against the priority register
- - new request only if higher priority
- When the interrupt handler exits, the priority
register must be reset.
55power-down interrupts
- The highest-priority interrupt is normally called
the nonmaskable interrupt or NMI. - The NMI cannot be turned off
- reserved for interrupts caused by power failures
- detect a dangerously low power supply
- NMI interrupt handler save critical state in
nonvolatile memory, turn off I/O devices
56- Most CPUs provide a relatively small number of
interrupt priority levels - more priority levels can be added with external
logic - combine polling with prioritized interrupts to
efficiently handle the device
57Using polling to share an interrupt over several
devices
58Ex3-8 I/O with Prioritized Interrupts
- A has priority 1
- B priority 2
- C priority 3.
59Interrupt vectors
- define the interrupt handler that should service
a request from a device - hardware structure to support interrupt vectors
60Interrupt vectors
- additional interrupt vector lines run from the
devices to the CPU - After request is acknowledged, device sends its
interrupt vector to CPU. - CPU uses vector number as an index in a table
stored in memory - gives the address of the handler
61Activity on the bus during a vectored interrupt
62Interrupt vectors
- First, the device stores its vector number. a
device can be given a new handler without
modifying the system software. - there is no fixed relationship between vector
numbers and interrupt handlers
63implement
- Most modern CPUs implement both prioritized and
vectored interrupts. - Priorities determine which device is serviced
first - vectors determine what routine is used to service
the interrupt
64Interrupt Overhead
- complete interrupt handling process
- Once a device requests an interrupt, some steps
are performed by the CPU, some by the device, and
others by software. - The basic procedure is described below.
- 1. CPU checks interrupts at the beginning of an
instruction, answers the highest-priority
interrupt
65Interrupt Overhead
- 2. Device device receives acknowledgment and
sends the CPU its interrupt vector. - 3. CPU CPU looks up the device handler address
in the interrupt vector table, save current PC,
internal CPU state, general-purpose registers.
66Interrupt Overhead
- 4. Software device driver save additional CPU
state, performs required operations, restores
saved state, executes interrupt return
instruction. - 5. CPU interrupt return instruction restores the
PC and other automatically saved states, return
to the interrupted.
67performance penalty
- interrupt causes a change in the program counter,
it incurs a branch penalty. if interrupt
automatically stores CPU registers, requires
extra cycles - interrupt requires extra cycles to acknowledge
the interrupt and obtain the vector from the
device.
68performance penalty
- interrupt handler will save and restore CPU
registers that were not automatically saved by
the interrupt. - interrupt return instruction incurs a branch
penalty as well as the time required to restore
the automatically saved state.
69performance penalty
- time required for the hardware to respond to the
interrupt, obtain the vector, cannot be changed
by the programmer. - programming result in a small number of registers
used by an interrupt handler - coding interrupt handler in assembly language
rather than a high-level language
70Interrupts in ARM
- types of interrupts fast interrupt requests
(FIQs) and interrupt requests (IRQs). - FIQ takes priority over an IRQ.
- interrupt table is kept in the bottom memory
addresses, starting at location 0. - The entries in the table contain subroutine calls
to the appropriate handler.
71Interrupts in ARM
- responding to an interrupt
- saves the appropriate value of the PC to be used
to return, - copies the CPSR into an SPSR (saved program
status register), - forces bits in the CPSR to note the interrupt,
and - forces the PC to the appropriate interrupt vector.
72Interrupts in ARM
- leaving the interrupt handler
- restore the proper PC value,
- restore the CPSR from the SPSR, and
- clear interrupt disable flags.
73Interrupts in ARM
- worst-case latency to respond
- 2 cycles to synchronize external request,
- up to 20 cycles to complete current instruction,
- 3 cycles for data abort
- 2 cycles to enter interrupt handling state.
- adds up to 4-27 clock cycles
74Interrupts in SHARC
- supports three prioritized, vectored, maskable
interrupts, - each of which calls an interrupt handler
subroutine
75When processing an interrupt
- outputs interrupt vector address
- pushes current PC onto the PC stack
- may push the ASTAT and MODE1 registers onto the
status stack - sets appropriate bit in the interrupt latch
register - changes interrupt mask pointer to show the
current interrupt nesting state.
76return from an interrupt
- pops the return address of the PC stack and saves
it to the PC, - pops the status stack if appropriate, and
- clears the appropriate bits in the interrupt
latch and mask registers.
77Interrupts in SHARC
- The interrupt vector table may be kept either in
internal or external memory. - vector table provides interrupt vectors for a
number of actions, including - reset, the three external interrupts,
- internal DMA channels, timers,
- floating-point errors,
- user software interrupts.
78Interrupts in SHARC
- For most instructions, the latency for an
external interrupt is four cycles. - Some instructions require multiple cycles to
finish and will delay interrupt handling - waiting for external memory may also delay
handling.