Title: Lecture 1: Flynn
1Lecture 1 Flynns Taxonomy
2The Global View of Computer Architecture
Applications
Parallelism
History
Technology
Computer Architecture -instruction set
design -Organization -Hardware/Software boundary
Programming Languages
OS
Measurement and Evaluation
Compilers
Interface Design (ISA)
3The Task of A Computer Designer
- Determine what attributes are important for a
new machine. - Design a machine to maximize performance while
staying within cost constrains.
4Flynns Taxonomy
- Michael Flynn (from Stanford)
- Made a characterization of computer systems
which became known as Flynns Taxonomy
5Flynns Taxonomy
- SISD Single Instruction Single Data Systems
SI
SISD
SD
6Flynns Taxonomy
- SIMD Single Instruction Multiple Data Systems
Array Processors
SISD
SD
SI
SISD
SD
Multiple Data
SISD
SD
7Flynns Taxonomy
- MIMD Multiple Instructions Multiple Data System
Multiprocessors -
- Multiple Instructions Multiple Data
SI
SISD
SD
SI
SISD
SD
SI
SISD
SD
8Flynns Taxonomy
- MISD- Multiple Instructions / Single Data System
- Some people say pipelining lies here, but this
is debatable. - Multiple Instructions
Single Data
SISD
SI
SI
SISD
SD
SISD
SI
9AbbreviationsSISD one address Machine.
- IP Instruction pointer
- MAR Memory Address Register
- MDR Memory Data Register
- A Accumulator
- ALU Arithmetic Logic Unit
- IR Instruction Register
- OP Opcode
- ADDR Address
IP
MAR
MEMORY
A
MDR
ADDR
OP
DECODER
ALU
10- LOAD X
- MAR ? IP
- MDR ? MMAR IP ? IP 1
- IR ? MDR
- DECODER ?IR.OP
- MAR ? IR.ADDR
- MDR ?MMAR
- A ? MDR
One address format
OP
ADDRESS
IP
MAR
MEMORY
A
MDR
ADDR
OP
DECODER
ALU
11One address format
- ADD X
- - MAR ? IP
- MDR ? MMAR IP ? IP 1
- IR ? MDR
- DECODER ?IR.OP
- MAR ? IR.ADDR
- MDR ?MMAR
- A ? A MDR
OP
ADDRESS
IP
MAR
MEMORY
A
MDR
ADDR
OP
DECODER
ALU
12One address format
- STORE X
- - MAR ? IP
- MDR ? MMAR IP ? IP 1
- IR ? MDR
- DECODER ?IR.OP
- MAR ? IR.ADDR
- MDR ? A
- MMAR ? MDR
-
OP
ADDRESS
IP
MAR
MEMORY
A
MDR
ADDR
OP
DECODER
ALU
13SISD Stack Machine
- First Stack Machine
- B5000
IP
MAR
MEMORY
Stack
MDR
ADDR
OP
1
2
3
ALU
4
DECODER
14PUSH ? ? ST4 ST4 ? ST3 ST3 ?
ST2 ST2 ? ST1 ST1 ? MDR
means in parallel
IP
MAR
LOAD X MAR ? IP MDR ? MMAR IP ? IP 1 IR
? MDR DECODER ?IR.OP MAR ? IR.ADDR MDR
?MMAR PUSH
MEMORY
Stack
MDR
ADDR
OP
1
2
ST
3
ALU
4
DECODER
15- POP
- MDR ?ST1
- ST1 ?ST2
- ST2 ?ST3
- ST3 ?ST4
- ST4 ? 0
means in parallel
IP
MAR
STORE X MAR ? IP MDR ? MMAR IR ? MDR DECODER ?
IR.OP MAR? IR.ADDR POP MMAR ? MDR
MEMORY
Stack
MDR
ADDR
OP
1
2
3
ALU
4
DECODER
16Zero address format
OP
NOT USED
- ADD
- MAR ? IP
- MDR ? MMAR
- IR ? MDR
- DECODER ? IR.OP
- ST2 ? ST1 ST2
- ST1 ?ST2
- ST2 ?ST3
- ST3 ?ST4
- ST4 ? 0
IP
MAR
MEMORY
Stack
MDR
ADDR
OP
1
2
3
ALU
4
DECODER
17Example
- Stack Trace
- Loadi 1 _ _ _ _
- Loadi 2
- Add
- Store X
18Cont
Stack Trace Loadi 1 1 _ _ _ Loadi 2
Add Store X
IP
MAR
MEMORY
Stack
MDR
ADDR
OP
1
ALU
DECODER
19Cont
- Stack Trace
- Loadi 1 1 _ _ _
- Loadi 2 2 1 _ _
- Add
- Store X
IP
MAR
MEMORY
Stack
MDR
ADDR
OP
2
1
ALU
DECODER
20ADD Step 1
IP
- Stack Trace
- Push 1 1 _ _ _
- Push 2 2 1 _ _
- Add 2 3 _ _
- Store X
-
MAR
MEMORY
Stack
MDR
ADDR
OP
2
3
ALU
DECODER
21ADD step 2
IP
- Stack Trace
- Push 1 1 _ _ _
- Push 2 2 1 _ _
- Add 3 _ _ _
- Store X
-
MAR
MEMORY
Stack
MDR
ADDR
OP
3
ALU
DECODER
22Before Store X is executed
IP
MAR
- Stack Trace
- Push 1 1 _ _ _
- Push 2 2 1 _ _
- Add 3 _ _ _
- Store X 3 _ _ _
-
MEMORY
Stack
MDR
ADDR
OP
3
ALU
DECODER
23After Store X is executed
IP
- Stack Trace
- Push 1 1 _ _ _
- Push 2 2 1 _ _
- Add 2 3 _ _
- Store _ _ _ _
MAR
MEMORY
Stack
MDR
ADDR
OP
ALU
DECODER
24SIMD(Array Processor)
IP
MAR
MEMORY
ADDR
OP
MDR
A1
B1
C1
A2
B2
C2
AN
BN
CN
DECODER
ALU
ALU
ALU
25Array Processors
- One of the first Array Processors was the ILLIIAC
IV - Load A1, V1
- Load B1,Y1
- Load A2, V2
- Load B2, Y2
-
-
- Load An, Vn
- Load Bn, Yn
- ADD
- Store C1, W1
- Store C2, W2
- Store C3, W3
- ..
- ..
- Store Cn, Wn
26Pipelining
- Definition Pipelining is an implementation
technique whereby multiple instructions are
overlapped in execution, taking advantage of
parallelism that exists among actions needed to
execute an instruction. - Example Pipelining is similar to an automobile
assembly line.
27Pipelining Automobile Assembly line
- T0 Frame
- T1 Frame wheels
- T2 Frame wheels Engine
- T3 Frame Wheels Engine Body ? New Car
- If it takes 1 hour to complete one car, and each
of the above stages takes 15 minutes, then
building the first car takes 1 hour one car can
produced every 15 minutes. This same principle
can be applied to computer instructions.
28Pipelining Floating point addition
- Suppose a floating point addition operation could
be divided into 4 stages, each completion ¼ of
the total addition operation. - Then the following chart would be possible.
STAGE1
STAGE2
STAGE3
STAGE4
29The steps
- Step one Compare and choose the exponents.
- Step two Set both numbers to the same exponent.
- Step three Perform addition/subtraction on the
two numbers. - Last step Normalization.
30Block Diagram of a FP adder/ multiplication unit
Mantissa M1
Exponent E2
Exponent E1
Mantissa M2
Exponent Compare
Align the proper Significand
ADD/ Multiply
Result Normalization (Round)
Result Exponent
Result Significand
31Example
- 0.9056 102
- 3.7401 104 .3749156 105
- 0.9056 102
- 374.01 102 374.9156 102
Compare Exponents
Alignment
Normalize
Add
32Example
0
-2
123 10 456 10 123 10 4.56 10
127.56 10
0
0
0
Suppose that we need to add two vector of length
50 and the add operation has a single cycle or
duration of 4 time units. So we need,
50
4 times units
So it takes 4 50 200 time units
33Floating point unit
Compare Exponents
ALU
Alignments
Add
Normalization
34ALU as a floating point pipelined unit
50
Exponent Compare
(1)
Align the proper mantissa
(2)
Add
(3)
Result Normalization
(4)
To T1 T3 T4 t5
1 x1y1 X2y2 X3y3 X4y4 X5y5
2 X1y1 X2y2 X3y3 X4y4
3 X1y1 X2y2 X3y3
4 X1y1 X2y2
- 4 steps for the first result
- One additional step for each next result
- 4 49 1 53 time units
In a SIMD architecture with 50 processor elements
or ALUs it will take 1 time unit
35Vector processors
- Vector processor Characteristics
- - Pipelining is used in processing.
- - Vector Registers are used to provide the ALU
with constant input. - - Memory interleaving is used to load input to
the vector registers. - - AKA Supercomputers.
- - the CRAY-1 is regarded as the first of these
types.
36Vector Processors Diagram
IP
MAR
MEMORY
A
B
C
MDR
ADDR
OP
DECODER
ALU
37Vector Processing Compilers
- The addition of vector processing has led to
vectorization in compiler design. - Example the following loop construct
- FOR I, 1 to N
- Ci ? Ai Bi
- Can be unrolled into the following instruction
- C1 ? A1B1
- C2 ? A2B2
- C3 ? A3B3..
38Memory Interleaving
- Definition Memory Interleaving is a design used
to gain faster access to memory, by organizing
memory into separate memory banks, each with
their own MAR (memory address register). This
allows parallel access and eliminates the
required wait for a single MAR to finish a memory
access.
39Memory Interleaving Diagram
MAR
MAR2
MAR3
MAR4
MAR1
MEMORY1
MEMORY2
MEMORY3
MEMORY4
MDR1
MDR2
MDR3
MDR4
MDR
40Vector Processors Memory Interleaving Diagram
IP
MAR
Vector Register
A
B
C
MDR
OP
ADDR
DECODER
ALU
41Matrix Access
- Parallel access by Rows
- Parallel access by Columns
- Parallel access by Diagonals
- Skewed matrix representation
42Parallel access by Row
- a11 a12 a13 a14
- a21 a22 a23 a24
- a31 a32 a33 a34
- a41 a42 a43 a44
Serial access by Columns
P1 P2 P3 P4
43Parallel access by Columnreordering is necessary
- a11 a21 a31 a41
- a12 a22 a32 a42
- a13 a23 a33 a43
- a14 a24 a34 a44
Serial access by Rows
P1 P2 P3 P4
44Parallel access by Diagonals
- a11 a12 a13 a14
- a24 a21 a22 a23
- a33 a34 a31 a32
- a42 a43 a44 a41
P1
P2
P3 P1
P4
45Parallel Column/Row Access
- a11 a12 a13 a14
- a24 a21 a22 a23
- a33 a34 a31 a32
- a42 a423 a44 a41
a13
a11
a21
a23
a31
a33
a41
a43
P1
P2
P3
P4
Skewed matrix representation
46Parallel Column/Row/Diagonal Access
Skewed matrix representation (5 banks)
- a11 a12 a13 a14
- a21 a22 a23 a24
- a34 a31 a32 a33
- a43 a44 a41 a42
Interconnection Network
P1
P2
P3
P4
47Multiprocessor Machines (MIMD)
MEMORY
CPU
CPU
CPU
48Hierarchy of Parallelism - 1
1. Expression Level
( ( ( a b ) c ) d ) ( a b
) ( c d )
49Hierarchy of Parallelism - 2
2. Statement Level
A) Bernstains Condition s1s2 IS (x ab )
a,b OS (x ab ) x If IS(S1) OS (S1)
EMPTY IS (S2) OS (S1) EMPTY OS(S1)
OS (S2) EMPTY THEN S1 PARALLEL TO S2
50Hierarchy of Parallelism - 3
2. Statement Level
B) IF.THEN A? B1 If Agt 5 Then C ? A1 Goto
5 Else D ? A1 Goto 7 Endif 5 E
? A1 7 F ? A1 Â Â Â
 Will become  A ? B1 B ? Agt5 C ? A1 when B D ?
A1 when ? B E ? A1 when B F ? A1 Â Â
51Hierarchy of Parallelism - 3
2. Statement Level
B
C
C) Vectorization For I 1 to nAI BI
CIÂ Â Vectorize A1 B1 C1A2 B2
C2Â Â AN BN CN