Title: IO Subsystems
1I/O Subsystems
2(No Transcript)
3I/O Subsystems
- Data registers hold values that are treated as
data by the device, such as the data read or
written from/to a disk - Status registers provide information about the
devices operation, such as whether the current
transaction has completed
4CPU I/O Structures
- Isolated I/O
- Separate I/O instructions in instruction set
- Memory Mapped I/O
- Memory reference instructions used for I/O
- NIOS
- Memory mapped I/O
5Memory mapped I/O in C
- Peek and Poke functions
- traditional functions to read and write arbitrary
memory locations
6Basic Structures
- int peek (char location)
- return location / return contents of
location/ - void poke (char location, char newval)
- (location) newval /output newval to
location/
7BusyWait I/O - Input
- define IN_STATUS 0X1001
- define IN_DATA 0X1000
- while (peek (IN_STATUS) 0) /wait loop
status 1 when char ready/ - achar (char) peek (IN_DATA) /input
character/
8BusyWait I/O - Output
- define OUT_STATUS 0X1101
- define OUT_DATA 0X1100
- while (peek(OUT_STATUS) 0)
- poke (OUT_STATUS,1) /set device busy/
- poke (OUT_DATA, achar) /output
character/
9String Output Example
- Sequence of characters stored in a standard C
string - terminated by a null (0) character
10Definitions
11String Output
12- The outer while loop sends the characters one at
a time - The inner while loop checks the device status
- it implements the busy-wait function by
repeatedly checking the device status until the
status changes to 0
13Copying Characters from Input to Output Using
Busy-Wait I/0
- Repeatedly read a character from the input
device and write it to the output device
14Copying Characters from Input to Output Using
Busy-Wait I/0
15Copying Characters from Input to Output Using
Busy-Wait I/0
16Copying Characters from Input to Output Using
Busy-Wait I/0
- Advantage
- simplicity of implementation - hardware and
software - Disadvantage
- CPU can do nothing else while in the wait loop
- OK for small systems
17Solution - Interrupts
- Eliminates dead time for I/O transfers
- Allows CPU to respond to asynchronous external
events
18Basic Interrupt Structure
- Device issues interrupt request
- CPU saves state transfers to interrupt
handler - CPU - issues interrupt acknowledge
19(No Transcript)
20Copying Characters from Input to Output with
Basic Interrupts
- Write C functions as interrupt handlers
- Define global variables
- achar for input handler to pass character to
the foreground program - gotchar Boolean variable to signal new
character has been received
21Input Handler
22Main
23Output Handler
24Circular Queue
25Copying Characters from Input to Output with
Basic Interrupts
- Use of interrupts has made the main program
somewhat simpler - But program design still does not let the
foreground program do useful work - Hence, need a more sophisticated program design
to let the foreground program work completely
independently of input and output
26Copying Characters from Input to Output with
Interrupts and Buffers
- Need program to perform reads and writes
independently - Solution - a buffer to hold inputs until they
are written
27Copying Characters from Input to Output with
Interrupts and Buffers
- Read and write routines to communicate through
the following global variables - Character string io-buf to hold a queue of
characters that have been read but not yet
written - Integer error will be set to 0 if/whenever
io-buf overflows
28For io_buf - a Circular Queue
- The queue io-buf acts as a wraparound buffer
- characters added to the tail when an input is
received - characters taken from the head when we are ready
for output
29Circular Queue
- Situation at the start of the program execution
- tail points to the first available location
- head points to the next character to be output
- if head and tail are equal the queue is empty
30Circular Queue
- When the first character is entered, the tail is
incremented after the character is added to the
queue
31Circular Queue
32Circular Queue
- When the buffer is full, leave one location in
the buffer empty - if another character is added and the tail buffer
is updated(wrapping it around to the head of
the buffer) could not distinguish a full buffer
from an empty one
33Circular Queue
34Circular Queue
- What happens when the output goes past the end
of io-buf
35Circular Queue
36Service Routines
37Copying Characters from Input to Output with
Interrupts and Buffers
- Two interrupt handler routines defined in C
- input-handler for the input device
- output-handler for the output device
38Copying Characters from Input to Output with
Interrupts and Buffers
- The complication is in starting the output
device - If io-buf has characters waiting, the output
driver can start a new output transaction by
output action whenever the new character
arrives - If there are no characters waiting, an outside
agent must start a new output action whenever
the new character arrives
39Copying Characters from Input to Output with
Interrupts and Buffers
- Solution -- have the input handler check to see
whether there is only one character in the
buffer and start a new transaction
40Input and Output Handlers
41UML Sequence Diagram
42- Foreground program does not need to do anything
everything is taken care of by the interrupt
handlers - Simulation shows that the foreground program is
not executing continuously, but continues to run
in a regular state independent of the number of
characters waiting in the queue
43Debugging Interrupt Code
44Y Ax b
45Y Ax b
- Assume
- the foreground code is performing the matrix
multiplication operation - the interrupt handlers perform I/O while the
matrix computation is performed - but with one small problem
- read-handler has a bug that causes it to change
the value of j
46- Any CPU register that is written by the
interrupt handler must be saved before it is
modified and restored before the handler exits - Any type of bug such as forgetting to save the
register or to properly restore it can cause
that register to mysteriously change value in
the foreground program
47- What happens to the foreground program when j
changes value during an interrupt depends on
when the interrupt handler executes - Because the value of j is reset at each iteration
of the outer loop, the bug will affect only one
entry to result y - But clearly the entry that changes will depend
on when the interrupt occurs
48- Furthermore, the change observed in y depends on
not only what new value is assigned to j (which
may depend on the data handled by the interrupt
code), but also when in the inner loop the
interrupt occurs
49Prioritized and Vectored Interrupts
- Early CPUs (and many still in use) had only an
interrupt request and interrupt acknowledge - Multiple I/O devices or other external events
required additional external hardware and
instructions in the handler to handle multiple
interrupts
50Prioritized and Vectored Interrupts
- PRIORITY implemented via internal or external
hardware - VECTOR starting address of interrupt handler
51Vectored Interrupts
52UML Sequence Diagram Description
53NIOS
- 64 prioritized interrupts
- 6 bit interrupt priority number must be supplied
by the device - One interrupt request line
- Service routine accessed via device number
54Interrupt Overhead
- Once a device requests an interrupt
- some steps are performed by the CPU hardware
- some by the device
- others by software
55Basic Procedure
- CPU
- checks for pending interrupts at the beginning
of an instruction cycle - answers the highest-priority interrupt that has a
higher priority than that given in the interrupt
priority register - Device
- receives the acknowledgement and sends the CPU
its interrupt vector
56Basic Procedure - continued
- CPU
- looks up the device handler address in the
interrupt vector table using the vector as an
index - a subroutine-like mechanism is used to save the
current value of the PC and possibly other
internal CPU state, such as general-purpose
registers
57Basic Procedure - continued
- Software
- device driver may save additional CPU state
- performs the required operations on the device
- restores any saved state and executes the
interrupt return instruction - CPU
- interrupt return instruction restores the PC and
other automatically saved states to return
execution to the code that was interrupted
58Interrupt Performance Penalty
- Interrupt itself has overhead similar to a
subroutine call - because an interrupt causes a change in the
program counter, it incurs a branch penalty - if the interrupt automatically stores CPU
registers, that action requires extra cycles,
even if the state is not modified by the
interrupt handler
59Interrupt Performance Penalty
- In addition to the branch delay penalty
- interrupt requires extra cycles to acknowledge
the interrupt and obtain the vector from the
device - Interrupt handler will, in general, save and
restore CPU registers that were not
automatically saved by the interrupt - Interrupt return instruction incurs a branch
penalty as well as the time required to restore
the automatically saved state
60Processor Characteristics
61Supervisor Mode
- Complex systems are often implemented as several
programs that communicate with each other - Even with an operating system, it may be
desirable to provide hardware checks to ensure
that the programs do not interfere with each
other
62Supervisor Mode
- Often useful to have a supervisor mode provided
by the CPU - Normal programs run in user mode
- Supervisor mode has privileges that user modes do
not - e.g. - control of the memory management unit is
typically reserved for supervisor mode to avoid
the obvious problems - Â NIOS has no supervisor mode
63Exceptions
- An exception is an internally detected error
- Simple example is division by zero
- One possibility - check every divisor before
division to be sure it is not zero i.e. via
software - CPU can more efficiently check the divisors
value during execution with an exception - The exception mechanism provides a way for the
program to react to such unexpected events
64Exceptions
- Exceptions are generally implemented as a
variation of an interrupt - however, exceptions are generated internally
- Exceptions in general require both
prioritization and vectoring - A single operation may generate more than one
exception for example, an illegal operand and
an illegal memory access
65Exceptions
- Priority of exceptions is usually fixed by the
CPU architecture - Vectoring provides a way for the user to specify
the handler for the exception condition - The vector number for an exception is usually
predefined by the architecture
66NIOS
- Provides two exceptions
- Register file window underflow
- Register file window overflow
67Traps
- A trap, also known as a software interrupt, is
an instruction that explicitly generates an
exception condition - most common use of a trap is to enter supervisor
mode - NIOS
- provides a trap instruction
68Co-processors
- CPU architects
- often want to provide flexibility in what
features are implemented in the CPU - or the features can not be fit on the CPU chip
- Co-processors attached to the CPU can provide
such flexibility at the instruction set level - e.g. Intel 8086, 8087
69Co-processors
- To support co-processors
- certain opcodes must be reserved in the
instruction set for co-processor operations - Co-processor must be tightly coupled to the CPU
- when the CPU receives a co-processor instruction,
the CPU must activate the co-processor and pass
it the relevant instruction - co-processor instructions can load and store
co- processor registers or can perform internal
operations
70Co-processors
- CPU may receive co-processor instructions even
when there is no co-processor attached - illegal instruction traps are used to handle
these situations - the function is executed in software on the main
CPU
71Memory Systems
- Caches
- Memory Management Units
72Cache
- A fast and small memory that holds copies of
some of the content of main memory
73Cache
74Cache
- Cache hit requested location is in the cache
- Cache miss requested location is not in the
cache
75Cache Performance
- h hit rate
- Tcache cache access time
- Tmain main memory access time
- Tav hTcache (1-h)Tmain
- improves system performance
- can cause problems in embedded systems involving
hard real time control
76Cache Types
- Direct Mapped
- Set Associative caches
- Set Associative yields better average
performance (with additional complexity) but
penalty for a miss is more severe impacting
predictability
77Memory Management Unit
- Translates addresses between the CPU and
physical memory - virtual addresses virtual memory
- requires a disk or other secondary storage
device - To date have not been common in embedded systems
78Memory Management Units
79CPU Performance
- Pipelining
- ARM and SHARC three stage
- Fetch the instruction is fetched from memory
- Decode the instructions opcode and operands
are decoded to determine what function to
perform - Execute the decoded instruction is executed
80CPU Performance
- NIOS - four stages
- Instruction Fetch
- Instruction Decode/Operand Fetch
- Execute
- Write-back
81CPU Performance
82CPU Performence
83Data Stall
84Branches - Control Stall or Branch Penalty
85NIOS
- Delayed branch
- some number of instructions directly after the
branch are always executed, whether or not the
branch is taken - maintains full pipeline some instructions may
have to be no ops
86Superscalar Processors
87Data Dependency
88Superscalar Processors
- Improves throughput but complicates performance
estimation - instructions scheduled at execution time
- Simulator needed to calculate performance
89Power Consumption
- Energy vs power
- Power ---gt heat generation
- Energy ---gt battery life
- Power generally used for both unless needed for
clarification
90Power Consumption
- Voltage drops
- the power consumption of a CMOS circuit is
proportional to the square of the power supply
voltage (V2) - Toggling
- a CMOS circuit uses most of its power when it is
changing its output value
91Power Consumption
- Leakage
- even when a CMOS circuit is not active, some
charge leaks out of the circuits nodes through
the substrate
92Power Management
- Static Power Management
- invoked by userÂ
- Dynamic Power Management
- automatic control by the CPU
93Power Saving Strategies in CMOS
- Reduce power supply level
- e.g. 5.0 ?3.3. (5.0/3.3)2 2.29 factor reduction
or 56 reduction - Lower clock frequency
- lowers power but not energy consumption
- Disable certain internal function units (turn
off clock) - Totally disconnect power from some internal units
94Power Down Mode
- Provides the opportunity to greatly reduce power
consumption because it will typically be entered
for a substantial period of time - going into and especially out of a power-down
mode is not free it costs both time and energy - pipeline processors require complex control that
must be properly initialized to avoid corrupting
data in the pipeline
95Power State Machine
96Power-Saving Modes of the Strong ARM SA-1100
97Power-Saving Modes of the Strong ARM SA-1100
- Run mode is normal operation and has the highest
power consumption - Idle mode saves power by stopping the CPU clock
- system unit modules real-time clock, operating
system timer, interrupt control, general-purpose
I/O, and power manager all remain operational - idle mode is entered by executing a
three-instruction sequence - CPU returns to run mode upon receiving an
interrupt from one of the internal system units
or from a peripheral or by resetting the CPU
98Power-Saving Modes of the Strong ARM SA-1100
- Sleep mode shuts off most of the chips activity
- entering sleep mode causes the system to shut
down on- chip activity, reset the CPU, and negate
the PWR_EN pin to tell the external electronics
that the chips power supply should be driven to
0 volts - a separate I/O power supply remains on and
supplies power to the power manager so that the
CPU can be awakened from sleep mode the
low-speed clock keeps the power manager running
at low speeds sufficient to manage sleep mode - sleep mode is entered by forcing the sleep bit in
the power manager control register it can also
be entered by a power supply fault
99Power-Saving Modes of the Strong ARM SA-1100