Title: CSC 159 COMPUTER ORGANIZATION Diploma in Computer Science CS110
1CSC 159 COMPUTER ORGANIZATIONDiploma in
Computer Science CS110
- Chapter 5
- Modern Computer System
2- Pipeline
- Superscalar
- Cache Associative, direct mapped
- Virtual Memory
- Multiprocessing
- RISC
3Subject
- You see a gorgeous girl at a party. You go up to
her and say, "I am very rich. Marry me!" That's
Direct Marketing - You're at a party with a bunch of friends and see
a gorgeous girl. One of your friends goes up to
her and pointing at you says, "He's very rich.
Marry him." That's Advertising. - You see a gorgeous girl at a party. You go up to
her and get her telephone number. The next day
you call and say, "Hi, I'm very rich. Marry me."
That's Telemarketing. - You're at a party and see a gorgeous girl. You
get up and straighten your tie you walk up to
her and pour her a drink. You open the door for
her, pick up her bag after she drops it, offer
her a ride, and then say, "By the way, I'm very
rich "Will you marry me?" That's Public
Relations. - You're at a party and see a gorgeous girl. She
walks up to you and says, "You are very rich, I
want to marry you." That's Brand Recognition. - You see a gorgeous girl at a party. You go up to
her and say, "I'm rich. Marry me" She gives you
a nice hard slap on your face. That's Customer
Feedback
4- Instruction execution is extremely complex and
involves several operations which are executed
successively (see slide 2). This implies a large - amount of hardware, but only one part of this
hardware works at a given moment. - Pipelining is an implementation technique whereby
multiple instructions are overlapped in
execution. This is solved without additional
hardware but only - by letting different parts of the hardware work
for different instructions at the same time. - The pipeline organization of a CPU is similar to
an assembly line the work to be done in an
instruction is broken into smaller steps
(pieces), each of which takes a fraction of the
time needed to complete the entire instruction.
Each of these steps is a pipe stage (or a pipe
segment). - Pipe stages are connected to form a pipe
- The time required for moving an instruction from
one stage to the next a machine cycle(often this
is one clock cycle). The execution of one
instruction takes several machine cycles as it
passes through the pipeline.
5Pipelining
- Pipelining
- the laundry analogy n Suppose you had to do 4
loads of laundry and each stage takes 30 minutes. - Which is faster, doing it sequentially or
pipelined? - ExeTimeseq 4430 480
- ExeTimeseq 43090 210
- How much faster?
- So pipelined is 2.3 (480/210) faster than
sequential
6Pipelining so whats the best to expect?
- Pipelining so whats the best to expect?
- Suppose you had 1000 loads to do!
- ExeTimeseq 1000430 120,000 minutes
- ExeTimeseq 10003090 30,090 minutes
- How much faster is this case?
- Perfratio 120000min/30090min 3.8
- Here pipelined is 3.98 faster than sequential
- So, as the number of loads increases, this number
approaches the number of stages in the pipeline - So the more stages, the more concurrency, hence
better throughput - BUT no change in execution time per load!
7Pipelining why such improvement?
- Multiple tasks happen simultaneously
- Each resource is kept busy (usually)
- Except when filling and draining pipe
- Not much idle time
- Pipelining what permits this parallelism?
- Each stage is independent of others
- Some method to transition from one stage to the
next - Need to empty a stage before reloading
- May need a basket to carry to next stage
- Time for each stage is about the same
8Acceleration by Pipelining
- Apparently a greater number of stages always
provides better performance. However - a greater number of stages increases the overhead
in moving information between stages and
synchronization between stages. - with the number of stages the complexity of the
CPU grows. - it is difficult to keep a large pipeline at
maximum rate because of pipeline hazards. - 80486 and Pentium five-stage pipeline for
integer instr. - eight-stage pipeline for FP instr.
- PowerPC four-stage pipeline for integer instr.
- six-stage pipeline for FP instr.
9Pipeline hazards
- Pipeline hazards are situations that prevent the
next instruction in the instruction stream from
executing during its designated clock cycle. The
instruction is said to be stalled. When an
instruction is stalled, all instructions later in
the pipeline than the stalled instruction are
also stalled. Instructions earlier than the
stalled one can continue. No new - instructions are fetched during the stall.
- Types of hazards
- 1. Structural hazards
- 2. Data hazards
- 3. Control hazards
10Hazards 3 types of pipelining hazards
- structural hazards attempt to use the same
resource two different ways at the same time - E.g., combined washer/dryer would be a structural
hazard or folder busy doing something else
(watching TV) - data hazards attempt to use item before it is
ready - E.g., one sock of pair in dryer and one in
washer cant fold until - get sock from washer through dryer
- instruction depends on result of prior
instruction still in pipeline - control hazards attempt to make a decision
before condition is evaluated - E.g., washing football uniforms and need to get
proper detergent level need to see after dryer
before next load in - branch instructions
11Pipelining Advances super-pipelining
superscalar
- Super pipelining longer pipes
- Increases in pipe stages (up to 10 or more)
- The more stages the more throughput
- Superscalar multiple pipes
- More than one instruction started each cycle
(referred to as multiple issue) - Requires more hardware
- More complex dependency detection
- Sometimes different types of pipes ALU, FPU,
branch, etc.
12Superscalar
- A superscalar architecture is one in which
several instructions can be initiated
simultaneously and executed independently. - Pipelining allows several instructions to be
executed at the same time, but they have to be in
different pipeline stages at a given moment. - Superscalar architectures include all features of
pipelining but, in addition, there can be several
instructions executing simultaneously in the same
pipeline stage. - They have the ability to initiate multiple
instructions during the same clock cycle. - There are two typical approaches today, in order
to - improve performance
- 1. Superpipelining
- 2. Superscalar
13Superscalar (contd)
- Superscalar architectures allow several
instructions to be issued and completed per clock
cycle. - A superscalar architecture consists of a number
of pipelines that are working in parallel. - Depending on the number and kind of parallel
units available, a certain number of instructions
can be executed in parallel. - In the following example a floating point and two
integer operations can be issued and executed
simultaneously each unit is pipelined and can
execute several operations in different pipeline
stages.
14Limitations on Parallel Execution
- The situations which prevent instructions to be
executed in parallel by a superscalar
architecture are very similar to those which
prevent an efficient execution on any pipelined
architecture (see pipeline hazards - lectures 3,
4). - The consequences of these situations on
superscalar architectures are more severe than
those on simple pipelines, because the potential
of parallelism in superscalars is greater and,
thus, a greater opportunity is lost. - Limitations on Parallel Execution (contd)
- Three categories of limitations have to be
considered - 1. Resource conflicts
- - They occur if two or more instructions compete
for the same resource (register, memory,
functional unit) at the same time they are
similar to structural hazards discussed with
pipelines. Introducing several parallel pipelined
units, superscalar architectures try to reduce a
part of possible resource conflicts. - 2. Control (procedural) dependency
- - The presence of branches creates major
problems in assuring an optimal parallelism. How
to reduce branch penalties has been discussed in
lectures 78. - - If instructions are of variable length, they
cannot be fetched and issued in parallel an
instruction has to be decoded in order to
identify the following one and to fetch it
Þsuperscalar techniques are efficiently
applicable to RISCs, with fixed instruction
length and format.
15- 2. Control
- - The presence of branches creates major
problems in assuring an optimal parallelism. How
to reduce branch penalties has been discussed in
lectures 78. - - If instructions are of variable length, they
cannot be fetched and issued in parallel an
instruction has to be decoded in order to
identify the following one and to fetch it
superscalar techniques are efficiently applicable
to RISCs, with fixed instruction length and
format. - 3. Data conflicts
- - Data conflicts are produced by data
dependencies between instructions in the program.
Because superscalar architectures provide a great
liberty in the order in which instructions can be
issued and completed, data dependencies have to
be considered with much attention.
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20(No Transcript)
21REDUCED INSTRUCTION SET COMPUTERS (RISC)
- What are RISCs and why do we need them?
- RISC architectures represent an important
innovation in the area of computer organization. - The RISC architecture is an attempt to produce
more CPU power by simplifying the instruction set
of the CPU. - The opposed trend to RISC is that of complex
instruction set computers (CISC). - Both RISC and CISC architectures have been
developed as an attempt to cover the semantic
gap.
22The Semantic Gap
- In order to improve the efficiency of software
development, new and powerful programming
languages have been developed (Ada, C, Java).
They provide high level of abstraction,
conciseness, power. - By this evolution, the semantic gap grows.
- Problem How should new HLL programs be compiled
and executed efficiently on a processor
architecture? - Two possible answers
- 1. The CISC approach design very complex
architectures including a large number of
instructions and addressing modes include also
instructions close to those present in HLL. - 2. The RISC approach simplify the instruction
set and adapt it to the real requirements of user
programs.
23Main Characteristics of RISC Architectures
- The instruction set is limited and includes only
simple instructions. - - The goal is to create an instruction set
containing instructions that execute quickly
most of the RISC instructions are executed in a
single machine cycle (after fetched and decoded). - Pipeline operation (without memory
reference) -
- - RISC instructions, being simple, are
hard-wired, while CISC architectures have to use
microprogramming in order to implement complex
instructions. -
- - Having only simple instructions results in
reduced complexity of the control unit and the
data path as a consequence, the processor can
work at a high clock frequency. - - The pipelines are used efficiently if
instructions are simple and of similar execution
time. - - Complex operations on RISCs are executed as a
sequence of simple RISC instructions. In the case
of CISCs they are executed as one single or a few
complex instruction.
24Main Characteristics of RISC Architectures
- Assume
- - we have a program with 80 of executed
instructions being simple and 20 complex - - on a CISC machine simple instructions take 4
cycles, complex instructions take 8 cycles cycle
time is 100 ns (10-7 s) - - on a RISC machine simple instructions are
executed in one cycle complex operations are
implemented as a sequence of instructions we
consider on average 14 instructions (14 cycles)
for a complex operation cycle time is 75 ns
(0.75 10-7 s). - How much time takes a program of 1 000 000
instructions? - CISC (1060.804 1060.208)10-7 0.48 s
- RISC (1060.801 1060.2014)0.7510-7
0.27 s - complex operations take more time on the RISC,
but their number is small - because of its simplicity, the RISC works at a
smaller cycle time with the CISC, simple
instructions are slowed down because of the
increased data path length and the increased
control complexity.
25Main Characteristics of RISC Architectures
- Load-and-store architecture
- - Only LOAD and STORE instructions reference
data in memory all other instructions operate
only with registers (are register-to-register
instructions) thus, only the few instructions
accessing memory need more than one cycle to
execute (after fetched and decoded). - Pipeline operation with memory reference
- Instructions use only few addressing modes
- - Addressing modes are usually register, direct,
register indirect, displacement. - Instructions are of fixed length and uniform
format - - This makes the loading and decoding of
instructions simple and fast it is not needed to
wait until the length of an instruction is known
in order to start decoding the following one - - Decoding is simplified because opcode and
address fields are located in the same position
for all instructions
26Main Characteristics of RISC Architectures
- A large number of registers is available
- - Variables and intermediate results can be
stored in registers and do not require repeated
loads and stores from/to memory. - - All local variables of procedures and the
passed parameters can be stored in registers (see
slide 8 for comments on possible number of
variables and parameters). - What happens when a new procedure is called?
- - Normally the registers have to be saved in
memory (they contain values of variables and
parameters for the calling procedure) at return
to the calling procedure, the values have to be
again loaded from memory. This takes a lot of
time. - - If a large number of registers is available, a
new set of registers can be allocated to the
called procedure and the register set assigned to
the calling one remains untouched.
27Main Characteristics of RISC Architectures
- Is the above strategy realistic?
- - The strategy is realistic, because the number
of local variables in procedures is not large.
The chains of nested procedure calls is only
exceptionally larger than 6. - - If the chain of nested procedure calls becomes
large, at a certain call there will be no
registers to be assigned to the called procedure
in this case local variables and parameters have
to be stored in memory. - Why is a large number of registers typical for
RISC architectures? - - Because of the reduced complexity of the
processor there is enough space on the chip to be
allocated to a large number of registers. This,
usually, is not the case with CISCs.
28Are RISCs Really Better than CISCs?
- RISC architectures have several advantages and
they were discussed throughout this lecture.
However, a definitive answer to the above
question is difficult to give. - A lot of performance comparisons have shown
that benchmark programs are really running faster
on RISC processors than on processors with CISC
characteristics. - However, it is difficult to identify which
feature of a processor produces the higher
performance. Some "CISC fans" argue that the
higher speed is not produced by the typical RISC
features but because of technology, better
compilers, etc. - An argument in favour of the CISC the simpler
instruction set of RISC processors results in a
larger memory requirement compared to the similar
program compiled for a CISC architecture. - Most of the current processors are not
typicalRISCs or CISCs but try to combine
advantages of both approaches
29Some Processor Examples
- CISC Architectures
- VAX 11/780
- Nr. of instructions 303
- Instruction size 2 - 57
- Instruction format not fixed
- Addressing modes 22
- Number of general purpose registers 16
- Pentium
- Nr. of instructions 235
- Instruction size 1 - 11
- Instruction format not fixed
- Addressing modes 11
- Number of general purpose registers 8
- RISC Architectures
- Sun SPARC
- Nr. of instructions 52
- Instruction size 4
- Instruction format fixed
- Addressing modes 2
- Number of general purpose registers up to 520
- PowerPC
- Nr. of instructions 206
- Instruction size 4
- Instruction format not fixed (but small
differences) - Addressing modes 2
- No. of gen. purpose registers 32
30Summary
- Both RISCs and CISCs try to solve the same
problem to cover the semantic gap. They do it in
different ways. CISCs are going the traditional
way of implementing more and more complex
instructions. RISCs try to simplify the
instruction set. - Innovations in RISC architectures are based on
a close analysis of a large set of widely used
programs. - The main features of RISC architectures are
reduced number of simple instructions, few
addressing modes, load-store architecture,
instructions are of fixed length and format, a
large number of registers is available. - One of the main concerns of RISC designers was
to maximise the efficiency of pipelining. - Present architectures often include both RISC
and CISC features.
31(No Transcript)
32Memory Hierarchy distance versus speed size
- The closer to the CPU the smaller the memory,
but faster the access time - Larger capacity memory is typically slower
33The End