Title: Today: a look to the future
1Today a look to the future
- Future classes
- where you will learn ways in which real computers
are more sophisticated than the one we looked at - Future computer architectures in the coming
decades - RISC
- Fractional representations
- Fixed-point
- Floating-point
- Multi-cycle datapaths
- Pipelining
- Memory hierarchies (Caches)
- Speculation (Optimistically proactive)
- Next decade Moores law continues
- And Beyond
2Data movement instructions
- Finally, the types of operands allowed in data
manipulation instructions is another way of
characterizing instruction sets. - So far, weve assumed that ALU operations can
have only register and constant operands. - Many real instruction sets allow memory-based
operands as well. - Well use the books example and illustrate how
the following operation can be translated into
some different assembly languages. - X (A B)(C D)
- Assume that A, B, C, D and X are really memory
addresses.
3Register-to-register RISC architectures
- Our programs so far assume a register-to-register,
or load/store, architecture, which matches our
datapath from last week nicely. - Operands in data manipulation instructions must
be registers. - Other instructions are needed to move data
between memory and the register file. - With a register-to-register, three-address
instruction set, we might translate X (A B)(C
D) into
LD R1, A R1 ? MA // Use direct addressing LD
R2, B R2 ? MB ADD R3, R1, R2 R3 ? R1 R2 // R3
MA MB LD R1, C R1 ? MC LD R2, D R2 ?
MD ADD R1, R1, R2 R1 ? R1 R2 // R1 MC
MD MUL R1, R1, R3 R1 ? R1 R3 // R1 has the
result ST X, R1 MX ? R1 // Store that into MX
4Memory-to-memory architectures
- In memory-to-memory architectures, all data
manipulation instructions use memory addresses as
operands. - With a memory-to-memory, three-address
instruction set, we might translate X (A B)(C
D) into simply - How about with a two-address instruction set?
ADD X, A, B MX ? MA MB ADD T, C, D MT ?
MC MD // T is temporary storage MUL X, X,
T MX ? MX MT
MOVE X, A MX ? MA // Copy MA to MX
first ADD X, B MX ? MX MB // Add
MB MOVE T, C MT ? MC // Copy MC to
MT ADD T, D MT ? MT MD // Add MD MUL
X, T MX ? MX MT // Multiply
5Register-to-memory architectures
- Finally, register-to-memory architectures let the
data manipulation instructions access both
registers and memory. - With two-address instructions, we might do the
following
LD R1, A R1 ? MA // Load MA into R1
first ADD R1, B R1 ? R1 MB // Add MB LD
R2, C R2 ? MC // Load MC into R2 ADD R2, D R2
? R2 MD // Add MD MUL R1, R2 R1 ? R1
R2 // Multiply ST X, R1 MX ? R1 // Store
6Size and speed
- There are lots of tradeoffs in deciding how many
and what kind of operands and addressing modes to
support in a processor. - These decisions can affect the size of machine
language programs. - Memory addresses are long compared to register
file addresses, so instructions with memory-based
operands are typically longer than those with
register operands. - Permitting more operands also leads to longer
instructions. - There is also an impact on the speed of the
program. - Memory accesses are much slower than register
accesses. - Longer programs require more memory accesses,
just for loading the instructions! - Most newer processors use register-to-register
designs. - Reading from registers is faster than reading
from RAM. - Using register operands also leads to shorter
instructions.
7Representing fractional numbers
- Fixed-point numbers
- Represent numbers using a fixed number of bits
dedicated to the integer and fractional portion. - 10001001.0010110 - 8 bits each
- .0111010010010101 all 16 to the fractional
portion - Frequently used in business applications
- Evenly spaced gap between representable numbers
for the full range.
8Representing fractional numbers
- Floating-point numbers
- Somewhat similar to scientific notation
- E.g. 1.001 x 213, 1.1 x 2-4
- Several standards, most popular is the IEEE 754
- 32-bit float consists of
- 1 sign bit
- 8 exponent bits
- excess-127 format. So treat bits as unsigned
and subtract 127 from them to find actual
exponents - 11111111 reserved to signify infinities and NaNs
- 00000000 reserved to represent denormalized
numbers (no leading 1 assumed and e -126) - 23 mantissa bits
- Represent the fractional component, i.e. 1.01101
- Assume a leading 1 unless exponent is 0.
- Value -1s 2(e-127) 1.m
9Representing fractional numbers
- Floating-point numbers
- IEEE 754
- 64-bit double-precision float consists of
- 1 sign bit
- 11 exponent bits
- excess-1023 format.
- 11111111 reserved to signify infinities and NaNs
- 00000000 reserved to represent denormalized
numbers (no leading 1 assumed and e -1022) - 52 mantissa bits
- Represent the fractional component, i.e. 1.01101
- Assume a leading 1 unless exponent is 0.
- Value -1s 2(e-1023) 1.m
- Uneven gaps between representable numbers (closer
together when very small, larger gaps with larger
numbers)
10Problems with our datapath
- Other than the obvious need more registers, more
bits in each register (and therefore in datapath) - The clock cycle time is contrained by the longest
possible instruction execution time. - Solution
- break an instruction execution into multiple
cycles - Bucket brigade pipelined datapath
- Microprogrammed datapath
Access PC 1 ns
Instruction Memory 4 ns
Register read 3 ns
MUX B 1 ns
ALU or Memory 4 ns
MUX D 1 ns
Register write 3 ns
11A Microprogrammed Datapath
- The datapath we worked with for the past few
weeks was just an example - We will look at another datapath today
- To emphasize that alternate designs are possible
- To show an example where each instruction takes
multiple cycles to finish - To show a different way of generating control
signals - Material is not based on the book
- Used to be in the older version..
- For the exam
- Basic understanding of the slides, and
- section 8-7 (of the 3rd edition)
- Follow the web link there if you are interested
12Why multiple cycles?
- Wouldnt it be slower?
- Not necessarily if each clock cycle can be made
shorter - Variable number of cycles for instructions (some
2, some 5)
13New Datapath
- Let us use one memory module
- for both data and instructions
- Allow for multiple cycles for each instruction
14Register File
MUX B
ALU
Memory
Data In
Address
Data Out
MUX D
15How to generate contol signals
- Consequence of this datapath
- Needs a cycle to fetch instruction from memory
- Control word
- the set of control signals
- In our older datapath
- Control word was determined fully by the
instruction - Here
- It depends on instruction and on which cycle
within the instruction we are in - Example
16Generating control sequential circuit
IR Instruction Register
Control Unit
Control word
Cycle Counter
17Generating ControlMicroprogram Memory
IR Instruction Register
Microprogram Memory
Control word
Next MicroInstruction Address
MicroProgram Counter
18(No Transcript)
19Pipelined datapath
- Simplified scenario
- 4 step assembly line
- Instruction Fetch
- Operand Fetch
- Execution of operation
- Writeback
- Although total time for each instruction to
finish is the same (or slightly larger) - The unit as a whole processes more instructions
per unit time - Just as in assembly of a car
- More on this in CS 232 and beyond
20Other ideas
- Memory hierarchies (Caches)
- Speculation (Optimistically proactive)
- Next decade Moores law continues
- And Beyond