Lecture 1: Flynn - PowerPoint PPT Presentation

About This Presentation
Title:

Lecture 1: Flynn

Description:

Mantissa M2. Exponent Compare. Align the proper Significand. ADD/ Multiply ... Align the proper mantissa. Add. Result Normalization. 4. 3. 2. 1. X2 y2. X1 y1 ... – PowerPoint PPT presentation

Number of Views:219
Avg rating:3.0/5.0
Slides: 52
Provided by: ucfst
Learn more at: http://www.cs.ucf.edu
Category:

less

Transcript and Presenter's Notes

Title: Lecture 1: Flynn


1
Lecture 1 Flynns Taxonomy
2
The Global View of Computer Architecture
Applications
Parallelism
History
Technology
Computer Architecture -instruction set
design -Organization -Hardware/Software boundary
Programming Languages
OS
Measurement and Evaluation
Compilers
Interface Design (ISA)
3
The Task of A Computer Designer
  • Determine what attributes are important for a
    new machine.
  • Design a machine to maximize performance while
    staying within cost constrains.

4
Flynns Taxonomy
  • Michael Flynn (from Stanford)
  • Made a characterization of computer systems
    which became known as Flynns Taxonomy

5
Flynns Taxonomy
  • SISD Single Instruction Single Data Systems

SI
SISD
SD
6
Flynns Taxonomy
  • SIMD Single Instruction Multiple Data Systems
    Array Processors

SISD
SD
SI
SISD
SD
Multiple Data
SISD
SD
7
Flynns Taxonomy
  • MIMD Multiple Instructions Multiple Data System
    Multiprocessors
  • Multiple Instructions Multiple Data

SI
SISD
SD
SI
SISD
SD
SI
SISD
SD
8
Flynns Taxonomy
  • MISD- Multiple Instructions / Single Data System
  • Some people say pipelining lies here, but this
    is debatable.
  • Multiple Instructions
    Single Data

SISD
SI
SI
SISD
SD
SISD
SI
9
AbbreviationsSISD one address Machine.
  • IP Instruction pointer
  • MAR Memory Address Register
  • MDR Memory Data Register
  • A Accumulator
  • ALU Arithmetic Logic Unit
  • IR Instruction Register
  • OP Opcode
  • ADDR Address

IP
MAR
MEMORY
A
MDR
ADDR
OP
DECODER
ALU
10
  • LOAD X
  • MAR ? IP
  • MDR ? MMAR IP ? IP 1
  • IR ? MDR
  • DECODER ?IR.OP
  • MAR ? IR.ADDR
  • MDR ?MMAR
  • A ? MDR

One address format
OP
ADDRESS
IP
MAR
MEMORY
A
MDR
ADDR
OP
DECODER
ALU
11

One address format
  • ADD X
  • - MAR ? IP
  • MDR ? MMAR IP ? IP 1
  • IR ? MDR
  • DECODER ?IR.OP
  • MAR ? IR.ADDR
  • MDR ?MMAR
  • A ? A MDR

OP
ADDRESS
IP
MAR
MEMORY
A
MDR
ADDR
OP
DECODER
ALU
12

One address format
  • STORE X
  • - MAR ? IP
  • MDR ? MMAR IP ? IP 1
  • IR ? MDR
  • DECODER ?IR.OP
  • MAR ? IR.ADDR
  • MDR ? A
  • MMAR ? MDR

OP
ADDRESS
IP
MAR
MEMORY
A
MDR
ADDR
OP
DECODER
ALU
13
SISD Stack Machine
  • First Stack Machine

  • B5000

IP
MAR
MEMORY
Stack
MDR
ADDR
OP
1
2
3
ALU
4
DECODER
14
PUSH ? ? ST4 ST4 ? ST3 ST3 ?
ST2 ST2 ? ST1 ST1 ? MDR
means in parallel
IP
MAR
LOAD X MAR ? IP MDR ? MMAR IP ? IP 1 IR
? MDR DECODER ?IR.OP MAR ? IR.ADDR MDR
?MMAR PUSH
MEMORY
Stack
MDR
ADDR
OP
1
2
ST
3
ALU
4
DECODER
15
  • POP
  • MDR ?ST1
  • ST1 ?ST2
  • ST2 ?ST3
  • ST3 ?ST4
  • ST4 ? 0

means in parallel
IP
MAR
STORE X MAR ? IP MDR ? MMAR IR ? MDR DECODER ?
IR.OP MAR? IR.ADDR POP MMAR ? MDR
MEMORY
Stack
MDR
ADDR
OP
1
2
3
ALU
4
DECODER
16
Zero address format
OP
NOT USED
  • ADD
  • MAR ? IP
  • MDR ? MMAR
  • IR ? MDR
  • DECODER ? IR.OP
  • ST2 ? ST1 ST2
  • ST1 ?ST2
  • ST2 ?ST3
  • ST3 ?ST4
  • ST4 ? 0

IP
MAR
MEMORY
Stack
MDR
ADDR
OP
1
2
3
ALU
4
DECODER
17
Example
  • Stack Trace
  • Loadi 1 _ _ _ _
  • Loadi 2
  • Add
  • Store X

18
Cont
Stack Trace Loadi 1 1 _ _ _ Loadi 2
Add Store X
IP
MAR
MEMORY
Stack
MDR
ADDR
OP
1
ALU
DECODER
19
Cont
  • Stack Trace
  • Loadi 1 1 _ _ _
  • Loadi 2 2 1 _ _
  • Add
  • Store X

IP
MAR
MEMORY
Stack
MDR
ADDR
OP
2
1
ALU
DECODER
20
ADD Step 1



IP
  • Stack Trace
  • Push 1 1 _ _ _
  • Push 2 2 1 _ _
  • Add 2 3 _ _
  • Store X

MAR
MEMORY
Stack
MDR
ADDR
OP
2
3
ALU
DECODER
21
ADD step 2



IP
  • Stack Trace
  • Push 1 1 _ _ _
  • Push 2 2 1 _ _
  • Add 3 _ _ _
  • Store X

MAR
MEMORY
Stack
MDR
ADDR
OP
3
ALU
DECODER
22
Before Store X is executed
IP
MAR
  • Stack Trace
  • Push 1 1 _ _ _
  • Push 2 2 1 _ _
  • Add 3 _ _ _
  • Store X 3 _ _ _

MEMORY
Stack
MDR
ADDR
OP
3
ALU
DECODER
23
After Store X is executed
IP
  • Stack Trace
  • Push 1 1 _ _ _
  • Push 2 2 1 _ _
  • Add 2 3 _ _
  • Store _ _ _ _

MAR
MEMORY
Stack
MDR
ADDR
OP
ALU
DECODER
24
SIMD(Array Processor)
IP
MAR
MEMORY
ADDR
OP
MDR
A1
B1
C1
A2
B2
C2
AN
BN
CN
DECODER
ALU
ALU
ALU
25
Array Processors
  • One of the first Array Processors was the ILLIIAC
    IV
  • Load A1, V1
  • Load B1,Y1
  • Load A2, V2
  • Load B2, Y2
  • Load An, Vn
  • Load Bn, Yn
  • ADD
  • Store C1, W1
  • Store C2, W2
  • Store C3, W3
  • ..
  • ..
  • Store Cn, Wn

26
Pipelining
  • Definition Pipelining is an implementation
    technique whereby multiple instructions are
    overlapped in execution, taking advantage of
    parallelism that exists among actions needed to
    execute an instruction.
  • Example Pipelining is similar to an automobile
    assembly line.

27
Pipelining Automobile Assembly line
  • T0 Frame
  • T1 Frame wheels
  • T2 Frame wheels Engine
  • T3 Frame Wheels Engine Body ? New Car
  • If it takes 1 hour to complete one car, and each
    of the above stages takes 15 minutes, then
    building the first car takes 1 hour one car can
    produced every 15 minutes. This same principle
    can be applied to computer instructions.

28
Pipelining Floating point addition
  • Suppose a floating point addition operation could
    be divided into 4 stages, each completion ¼ of
    the total addition operation.
  • Then the following chart would be possible.

STAGE1
STAGE2
STAGE3
STAGE4
29
The steps
  • Step one Compare and choose the exponents.
  • Step two Set both numbers to the same exponent.
  • Step three Perform addition/subtraction on the
    two numbers.
  • Last step Normalization.

30
Block Diagram of a FP adder/ multiplication unit
Mantissa M1
Exponent E2
Exponent E1
Mantissa M2
Exponent Compare
Align the proper Significand
ADD/ Multiply
Result Normalization (Round)
Result Exponent
Result Significand
31
Example
  • 0.9056 102
  • 3.7401 104 .3749156 105
  • 0.9056 102
  • 374.01 102 374.9156 102

Compare Exponents
Alignment
Normalize
Add

32
Example
0
-2
123 10 456 10 123 10 4.56 10
127.56 10
0
0
0
Suppose that we need to add two vector of length
50 and the add operation has a single cycle or
duration of 4 time units. So we need,
50
4 times units
So it takes 4 50 200 time units
33
Floating point unit
Compare Exponents
ALU
Alignments
Add
Normalization
34
ALU as a floating point pipelined unit
50
Exponent Compare
(1)
Align the proper mantissa
(2)
Add
(3)
Result Normalization
(4)
To T1 T3 T4 t5
1 x1y1 X2y2 X3y3 X4y4 X5y5
2 X1y1 X2y2 X3y3 X4y4
3 X1y1 X2y2 X3y3
4 X1y1 X2y2
  • 4 steps for the first result
  • One additional step for each next result
  • 4 49 1 53 time units

In a SIMD architecture with 50 processor elements
or ALUs it will take 1 time unit
35
Vector processors
  • Vector processor Characteristics
  • - Pipelining is used in processing.
  • - Vector Registers are used to provide the ALU
    with constant input.
  • - Memory interleaving is used to load input to
    the vector registers.
  • - AKA Supercomputers.
  • - the CRAY-1 is regarded as the first of these
    types.

36
Vector Processors Diagram
IP
MAR
MEMORY
A
B
C
MDR
ADDR
OP
DECODER
ALU
37
Vector Processing Compilers
  • The addition of vector processing has led to
    vectorization in compiler design.
  • Example the following loop construct
  • FOR I, 1 to N
  • Ci ? Ai Bi
  • Can be unrolled into the following instruction
  • C1 ? A1B1
  • C2 ? A2B2
  • C3 ? A3B3..

38
Memory Interleaving
  • Definition Memory Interleaving is a design used
    to gain faster access to memory, by organizing
    memory into separate memory banks, each with
    their own MAR (memory address register). This
    allows parallel access and eliminates the
    required wait for a single MAR to finish a memory
    access.

39
Memory Interleaving Diagram
MAR
MAR2
MAR3
MAR4
MAR1
MEMORY1
MEMORY2
MEMORY3
MEMORY4
MDR1
MDR2
MDR3
MDR4
MDR
40
Vector Processors Memory Interleaving Diagram
IP
MAR
Vector Register
A
B
C
MDR
OP
ADDR
DECODER
ALU
41
Matrix Access
  • Parallel access by Rows
  • Parallel access by Columns
  • Parallel access by Diagonals
  • Skewed matrix representation

42
Parallel access by Row
  • a11 a12 a13 a14
  • a21 a22 a23 a24
  • a31 a32 a33 a34
  • a41 a42 a43 a44

Serial access by Columns
P1 P2 P3 P4
43
Parallel access by Columnreordering is necessary
  • a11 a21 a31 a41
  • a12 a22 a32 a42
  • a13 a23 a33 a43
  • a14 a24 a34 a44

Serial access by Rows
P1 P2 P3 P4
44
Parallel access by Diagonals
  • a11 a12 a13 a14
  • a24 a21 a22 a23
  • a33 a34 a31 a32
  • a42 a43 a44 a41

P1
P2
P3 P1
P4
45
Parallel Column/Row Access
  • a11 a12 a13 a14
  • a24 a21 a22 a23
  • a33 a34 a31 a32
  • a42 a423 a44 a41

a13
a11
a21
a23
a31
a33
a41
a43
P1
P2
P3
P4
Skewed matrix representation
46
Parallel Column/Row/Diagonal Access
Skewed matrix representation (5 banks)
  • a11 a12 a13 a14
  • a21 a22 a23 a24
  • a34 a31 a32 a33
  • a43 a44 a41 a42

Interconnection Network
P1
P2
P3
P4
47
Multiprocessor Machines (MIMD)
MEMORY
CPU
CPU
CPU
48
Hierarchy of Parallelism - 1
1. Expression Level
( ( ( a b ) c ) d ) ( a b
) ( c d )






49
Hierarchy of Parallelism - 2
2. Statement Level
A) Bernstains Condition s1s2 IS (x ab )
a,b OS (x ab ) x If IS(S1) OS (S1)
EMPTY IS (S2) OS (S1) EMPTY OS(S1)
OS (S2) EMPTY THEN S1 PARALLEL TO S2
50
Hierarchy of Parallelism - 3
2. Statement Level
B) IF.THEN A? B1 If Agt 5 Then C ? A1 Goto
5 Else D ? A1 Goto 7 Endif 5 E
? A1 7 F ? A1      
 Will become  A ? B1 B ? Agt5 C ? A1 when B D ?
A1 when ? B E ? A1 when B F ? A1    
51
Hierarchy of Parallelism - 3
2. Statement Level
B
C
C) Vectorization For I 1 to nAI BI
CI  Vectorize A1 B1 C1A2 B2
C2  AN BN CN
Write a Comment
User Comments (0)
About PowerShow.com