CPE 626: Advanced VLSI Design L02 - PowerPoint PPT Presentation

About This Presentation
Title:

CPE 626: Advanced VLSI Design L02

Description:

Title: CA226: Advanced Computer Architectures Author: aleksander Last modified by: Default Created Date: 1/5/2001 1:58:05 PM Document presentation format – PowerPoint PPT presentation

Number of Views:145
Avg rating:3.0/5.0
Slides: 47
Provided by: Alek155
Learn more at: http://www.ece.uah.edu
Category:

less

Transcript and Presenter's Notes

Title: CPE 626: Advanced VLSI Design L02


1
CPE 626 Advanced VLSI DesignL02
  • Department of Electrical and Computer
    Engineering University of Alabama in Huntsville

2
Outline
  • Simple Processor MU0
  • Datapath Design
  • Control Logic
  • ALU Design
  • Pipeline Processor DLX
  • ISA
  • Registers
  • Addressing Modes and Data Types
  • Instruction Format
  • Instruction Set
  • Non-pipeline Implementation
  • Pipeline Implementation

3
MU0 A Simple Processor
  • Instruction format
  • Instruction set

4
MU0 Logic Design
  • Follow an approach to separate the design into
    two components
  • Datapath all the components carrying, storing
    or processing bits including the accumulator,
    program counter, ALU, and instruction register
  • Control logic everything that does not fit
    comfortably into datapath
  • Datapath design many ways to do this
  • Assume that memory access is limiting factor, and
    assume that memory access will take exactly one
    clock cycle

5
MU0 Datapath Example
  • Program Counter PC
  • Accumulator - ACC
  • Arithmetic-Logic Unit ALU
  • Instruction Register
  • Instruction Decode andControl Logic

Follow the principle that the memory will be
limiting factor in design each instruction takes
exactly the number of clock cycles defined by the
number of memory accesses it must take.
Note We do not have a dedicated PC incrementer!
Why?
6
MU0 Datapath Design
  • Assume that each instruction starts when it has
    arrived in the IR
  • Step 1 EX (execute)
  • LDA S ACC lt- MemS
  • STO S MemS lt- ACC
  • ADD S ACC lt- ACC MemS
  • SUB S ACC lt- ACC - MemS
  • JMP S PC lt- S
  • JGE S if (ACC gt 0) PC lt- S
  • JNE S if (ACC ! 0) PC lt- S
  • Step 2 IF (fetch the next instruction)
  • Either PC or the address in the IR is issued to
    fetch the next instruction
  • address is incremented in the ALU and value saved
    into the PC
  • Initialization
  • Reset input to start executing instructions from
    a known address here it is 000hex
  • provide zero at the ALU output and then load it
    into the PC register

7
MU0 RTL Organization
  • Control Logic
  • Asel
  • Bsel
  • ACCce (ACC change enable)
  • PCce (PC change enable)
  • IRce (IR change enable)
  • ACCoe (ACC output enable)
  • ALUfs (ALU function select)
  • MEMrq (memory request)
  • RnW (read/write)
  • Ex/ft (execute/fetch)

8
MU0 control logic
9
LDA S (0000)
Ex/ft 0
Ex/ft 1
B
B1
10
STO S (0001)
Ex/ft 0
Ex/ft 1
x
B1
11
ADD S (0010)
Ex/ft 0
Ex/ft 1
AB
B1
12
SUB S (0011)
Ex/ft 0
Ex/ft 1
A-B
B1
13
JMP S (0100)
Ex/ft 0
B1
14
JGE S (0101)
Ex/ft 0, ACC15 1
Ex/ft 0, ACC15 0
B1
B1
15
JNE S (0110)
Ex/ft 0, ACCz 1
Ex/ft 0, ACCz 0
B1
B1
16
STP (001)
Ex/ft 0
x
17
Reset
Ex/ft 0
0
18
MU0 ALU Design
  • ALU functions AB, A-B, B, B1, 0 (used only
    when reset is active) gt 4 functions
  • Aen (enable operand A)
  • Binv (invert operand B)

19
Another ExampleDLX Architecture
20
DLX Registers
  • GPRs with load-store architecture
  • GPR 32 32-bit named R0, R1,... R31, R00
  • FPR (floating point registers)
  • single precision32 32-bit named F0, F1,... F31
    (accessed independently)
  • double precision16 64-bit named F0, F2,... F30
    (accessed in pairs)
  • Instructions which support transfers between
    GPRs and FPRs
  • Other status registers, e.g., floating-point
    status register (hold information about the
    results of FP ops)

21
Addressing Modes and Data Types
  • Immediate with a 16-bit value field
  • Displacement with a 16-bit displacement
  • register deferred derived when disp0
  • absolute derived from displacement with R0
  • Byte addressable in big-endian with 32-bit
    address
  • All memory references are load/store through GPR
    or FPR and must be aligned
  • Data types
  • 8-bit bytes, 16-bit half words (loaded into
    registers with either zeros or the sign bit
    replicated to fill 32 bits)
  • 32-bit integers
  • 32-bit single precision and 64-bit
    double-precision for FP

22
Instruction Formats
  • I-type load, store, arithmetic, logic,
    relational, shift, branch
  • R-type arithmetic, logic, relational
  • J-type jump, jump and link, trap, return from
    exception

I-type instruction
Encodes Loads and stores of bytes, words, half
words All immediates (rd?rs1 op
immediate) Conditional branch instructions (rs1
is register, rd is unused) Jump register, jump
and link register (rd0, rsdestination, imm.0)
R-type instruction
Reg-reg ALU operations (rd?rs1 func rs2)
funcadd, sub,... Read/write special registers
and moves
J-type instruction
26
6
Offset added to PC
Opcode
Jump and jump and link Trap and return from
exception
23
Instructions for Data Transfers
Instruction Opcode Instruction Meaning
LB, LBU, SB Load byte, load byte unsigned, store byte
LH, LHU, SH Load half word, load half word unsigned, store half word
LW, SW Load word, store word (to/from integer registers)
LF, LD, SF, SD Load SP float, load DP float, store SP float, store DP float (SP - single precision, DP - double precision)
MOVI2S, MOVS2I Move from/to GPR to/from a special register
MOVF, MOVD Copy one floating-point register or a DP pair to another register or pair
MOVFP2I, MOVI2FP Move 32 bits from/to FP register to/from integer registers
Example Instruction Meaning
LW R1, 30(R2) RegsR1 ?32 Mem30 RegsR2
LW R1, 1000(R0) RegsR1 ?32 Mem1000 0
LB R1, 40(R3) RegsR1 ?32 (Mem40 RegsR30)24 Mem40 RegsR3
LBU R1, 40(R3) RegsR1 ?32 (0)24 Mem40 RegsR3
LH R1, 40(R3) RegsR1 ?32 (Mem40 RegsR30)16 Mem40 RegsR3 Mem41RegsR3
LF F0, 50(R3) RegsF0 ?32 Mem50 RegsR3
LD F0, 50(R2) RegsF0 RegsF1 ?32 Mem50 RegsR2
24
Arithmetic/logical instructions
  • All ALU instructions are register-register
  • add, sub, and, or, xor, shift
  • Immediate forms also available
  • LHI loads immediate value into most significant
    16 bits
  • R0 used to synthesise other operations
  • Loading constant is an immediate gtadd with R0
    as one source
  • Register-register move is an add with R0 as one
    source
  • Compare operations put 1 ("true") in destination
    if condition is met

25
Arithmetic/logical instructions (contd)
Instruction Opcode Instruction Meaning
ADD, ADDI, ADDU, ADDUI Add, add immediate (all immediates are 16-bits) signed and unsigned
SUB, SUBI, SUBU, SUBUI Subtract, subtract immediate signed and unsigned
MULT, MULTU, DIV, DIVU Multiply and divide, signed and unsigned operands must be floating-point registers all operations take and yield 32-bit values
AND, ANDI And, and immediate
OR, ORI, XOR, XORI Or, or immediate, exclusive or, exclusive or immediate
LHI Load high immediate - loads upper half of register with immediate
SLL, SRL, SRA, SLLI, SRLI, SRAI Shifts both immediate(S__I) and variable form(S__) shifts are shift left logical, right logical, right arithmetic
S__, S__I Set conditional "__"may be LT, GT, LE, GE, EQ, NE
Example Instruction Meaning
ADD R1, R2, R3 RegsR1 ? RegsR2 RegsR3
ADDI R1, R2, 3 RegsR1 ? RegsR2 3
LHI R1, 42 RegsR1 ? 42016
SLLI R1, R2, 5 RegsR1 ? RegsR2 ltlt 5
SLT R1, R2, R3 if (RegsR2 lt RegsR3) RegsR1 ? 1 else RegsR1 ? 0
26
Control-flow instructions
  • Jump can use 26-bit signed offset from PC or
    contents of register
  • Jump-and-link saves PC in R31
  • Conditional branches test source for
    zero/non-zero and use 16-bit signed offset

Instruction Opcode Instruction Meaning
BEQZ, BNEZ Branch GPR equal/not equal to zero 16-bit offset from PC
BFPT, BFPF Test comparison bit in the FP status register and branch 16-bit offset from PC
J,  JR Jumps 26-bit offset from PC(J) or target in register(JR)
TRAP Transfer to operating system at a vectored address
RFE Return to user code from an exception restore user code
27
Floating-point instructions in DLX
  • Moves between floating point (32-bit) and
    double-precision (64-bit) registers
  • Operations add, subtract, multiply, divide
  • Also, integer multiply/divide on floating point
    regs

Instruction Opcode Instruction Meaning
ADDD, ADDF Add DP, SP numbers
SUBD, SUBF Subtract DP, SP numbers
MULTD, MULTF Multiply DP, SP floating point
DIVD, DIVF Divide DP, SP floating point
CVTF2D, CVTF2I, CVTD2F, CVTD2I, CVTI2F, CVTI2D Convert instructions CVTx2y converts from type x to type y, where x and y are one of I(Integer), D(Double precision), or F(Single precision). Both operands are in the FP registers.
__D, __F DP and SP compares "__" may be LT, GT, LE, GE, EQ, NE set comparison bit in FP status register.
28
A Simple Implementationof DLX
29
Instruction Execution
  • Process of instruction execution is usually
    broken up into stages (divide and conquer)
  • smaller stages are easier to design
  • easy to optimize (change) one stage without
    touching the others
  • 5 main stages for DLX each stage takes one
    clock cycle
  • Instruction Fetch (IF)
  • Instruction Decode / Register fetch cycle (ID)
  • Execution / Effective address cycle (EX)
  • Memory access / Branch completion cycle (MEM)
  • Write-back cycle (WB)

30
Instruction Fetch (IF)
  • Send out PC and fetch the instruction from the
    memory into instruction register (IR)
  • IR is used to hold the instruction
  • Increment the PC by 4 to address the next
    sequential instruction
  • NPC is used to hold the next sequential address

IR ? MemPC NPC ? PC 4
31
Instruction Decode (ID)
  • Decode the instruction to determine instruction
    type (Opcode field - 6 ms bits of the
    instruction)
  • Read in data from all necessary registers
  • temporary registers A, B hold outputs of GPR
  • Imm is used to hold sign-extended lower 16-bits
    of the IR
  • decoding is done in parallel with reading
    registers since these fields are at fixed
    locations
  • a register may be read even we do not use it

A ? RegsIR6..10 B ? RegsIR11..15 Imm ?
(IR16)16IR16..31
32
Execution EX (1/2)
  • Register-register ALU instruction
  • ALU performs the operation specified by the
    opcode on the values in registers A and Bthe
    result is placed in the temporary register
    ALUOutput
  • Register-immediate ALU instruction
  • ALU performs the operation specified by the
    opcode on the value in register A and on the
    value in register Immthe result is placed in
    the temporary register ALUOutput

ALUOutput ? A op B
ALUOutput ? A op Imm
33
Execution EX (2/2)
  • Memory reference
  • ALU adds the operands to form effective address
    and places the result into the temporary
    register ALUOutput
  • Branch
  • ALU adds the NPC to the Imm to compute the
    address of the branch target
  • Register A is checked to determine whether the
    branch is taken (for BEQZ op is for BNEZ op
    is !)
  • Cond is 1-bit register (1 - branch is taken, 0 -
    not taken)

ALUOutput ? A Imm
ALUOutput ? NPC Imm Cond ? (A op 0)
34
Memory access (MEM)
  • Memory reference
  • load
  • store
  • Branch
  • if the instruction branches, the PC is replaced
    with the branch destination otherwise, it is
    replaced with NPC

LMD ? MemALUOutput
MemALUOutput ? B
if (cond) PC ? ALUOutput else PC ? NPC
35
Write-back (WB)
  • Register-register ALU
  • Register-immediate ALU
  • Load instruction

RegsIR16..20 ? ALUOutput
RegsIR11..15 ? ALUOutput
RegsIR11..15 ? LMD
36
Datapath
Memory Access
Instruction Fetch
Instr. Decode Reg. Fetch
Execute Addr. Calc
Write Back
Next PC
M U X
Next SEQ PC
Add
NPC
Zero?
4
RS1
M U X
InstructionMemory
RS2
A
Reg. File
IR
PC
ALU
ALUoutput
RD
M U X
B
LMD
DataMemory
M U X
Sign Extend
Imm
Imm
WB Data
37
Sequential Execution
Time clocks
10
5
Ii
Ii1
Ii2
Instructions
Sequential execution for these 3 instructions
(Ii, Ii1, Ii2) takes 15 clock cycles
38
Pipelined Execution
Time clocks
10
5
  • Analogy with automobile assembly line
  • many steps, each contributing something to the
    construction of the car
  • each step operates in parallel with other steps,
    though on a different car

Ii
Ii1
Ii2
Ii3
Ii4
Instructions
Pipe stages (segments)
Pipelined execution for instructions Ii, Ii1,
and Ii2 takes 7 clock cycles
39
Pipelining Lessons
Time clocks
  • Pipelining does not help latency of single
    instruction, it helps throughput of entire
    workload
  • Multiple instructions operating simultaneously
    using different resources
  • Potential speedup Number pipe stages
  • Time to fill pipeline and time to drain
    reduce speedup 2.15X vs. 5X in this example

5
Ii
Ii1
Ii2
Instructions
  • Latency Throughput
  • Latency ...how long it takes to execute an
    instruction
  • Throughput ...how often an instruction exits the
    pipeline

40
Pipelining Lessons (contd)
Time clocks
  • Pipeline stages are hooked together gt all stages
    must be ready to proceed at the same time
  • Machine cycle the time required between moving
    an instruction one step down the pipeline
    (usually one clock cycle)
  • The length of a machine cycle is determined by
    the time required for the slowest stage
  • Unbalanced lengths of pipe stages also reduces
    speedup

5
Ii
Ii1
Ii2
Instructions
41
Visualizing Pipeline
Time (clock cycles)
CC 2
CC 3
CC 4
CC 6
CC 7
CC 5
CC 1
I n s t r. O r d e r
IM
42
Pipeline Datapath
Memory Access
Write Back
Instruction Fetch
Instr. Decode Reg. Fetch
Execute Addr. Calc
Next PC
M U X
Next SEQ PC
Add
Zero?
4
IR6..10
IR11..15
M U X
InstructionMemory
IR
Reg. File
PC
ALU
M U X
DataMemory
M U X
Sign Extend
Imm
MEM/WB.IR11..15 or MEM/WB.IR16..20
WB Data
43
Instruction Flow through Pipeline Regs
Time (clock cycles)
CC 4
CC 3
CC 1
CC 2
Lw R4,0(R2)
Sub R6,R5,R7
Add R1,R2,R3
Xor R9,R8,R1
Nop
Add R1,R2,R3
Lw R4,0(R2)
Sub R6,R5,R7
Nop
Add R1,R2,R3
Nop
Lw R4,0(R2)
Nop
Nop
Nop
Add R1,R2,R3
44
DLX Pipeline Definition IF, ID
  • Stage IF
  • IF/ID.IR ? MemPC
  • if EX/MEM.cond IF/ID.NPC, PC ? EX/MEM.ALUOUT
    else IF/ID.NPC, PC ? PC 4
  • Stage ID
  • ID/EX.A ? RegsIF/ID.IR610 ID/EX.B ?
    RegsIF/ID.IR1115
  • ID/EX.Imm ? (IF/ID.IR16)16 IF/ID.IR1631
  • ID/EX.NPC ? IF/ID.NPC ID/EX.IR ? IF/ID.IR

45
DLX Pipeline Definition IE
  • ALU
  • EX/MEM.IR ? ID/EX.IR
  • EX/MEM.ALUOUT ? ID/EX.A func ID/EX.B
    orEX/MEM.ALUOUT ? ID/EX.A func ID/EX.Imm
  • EX/MEM.cond ? 0
  • load/store
  • EX/MEM.IR ? ID/EX.IREX/MEM.B ? ID/EX.B
  • EX/MEM.ALUOUT ? ID/EX.A ? ID/EX.Imm
  • EX/MEM.cond ? 0
  • branch
  • EX/MEM.NPC ? ID/EX.A ? ID/EX.Imm
  • EX/MEM.cond ? (ID/EX.A func 0)

46
DLX Pipeline Definition MEM, WB
  • Stage MEM
  • ALU
  • MEM/WB.IR ? EX/MEM.IR
  • MEM/WB.ALUOUT ? EX/MEM.ALUOUT
  • load/store
  • MEM/WB.IR ? EX/MEM.IR
  • MEM/WB.LMD ? MemEX/MEM.ALUOUT
    orMemEX/MEM.ALUOUT ? EX/MEM.B
  • Stage WB
  • ALU
  • RegsMEM/WB.IR1620 ? MEM/WB.ALUOUT
    orRegsMEM/WB.IR1115 ? MEM/WB.ALUOUT
  • load
  • RegsMEM/WB.IR1115 ? MEM/WB.LMD
Write a Comment
User Comments (0)
About PowerShow.com