Nirmal Chhugani - PowerPoint PPT Presentation

About This Presentation

Title:

Nirmal Chhugani

Description:

Introduction. PowerPC (Performance Optimization With Enhanced RISC Performance Computing) is a RISC architecture created by (AIM) Apple IBM Motorola alliance ... – PowerPoint PPT presentation

Number of Views:70

Avg rating:3.0/5.0

Slides: 29

Provided by: csWinona

Learn more at: https://cs.winona.edu

Category:

more less

Transcript and Presenter's Notes

Title: Nirmal Chhugani

1
Power PC Architecture

Nirmal Chhugani

2
Introduction

PowerPC (Performance Optimization With Enhanced
RISC Performance Computing) is a RISC
architecture created by (AIM) AppleIBMMotorola
alliance in 1991.
The original idea for the PowerPC architecture
came
from IBMs Power architecture (introduced in
the Risc/6000) and retains a high level of
compatibility with it.
The intention was to build a high-performance,
superscalar low-cost processor.

3
History

The history of the PowerPC began with IBM's 801
prototype chip of John Cocke s(IBM Watson
Research Lab) RISC ideas in the late 1970s (with
further refinements developed by David Paterson).
801-based cores were used in a number of IBM
embedded products, eventually becoming the
16-register ROMP (Research Office Products
Division Micro Processor was a 10 MHz RISC
microprocessor designed by IBM in the early 1980)
processor used in the IBM RT(computer workstation
by IBM).
The RT had disappointing performance and IBM
started the project to build the fastest
processor on the market. The result was the POWER
architecture, introduced with the RISC
System/6000 in early 1990.

4
History.. POWER architecture

The POWER architecture incorporated lots of the
RISC characteristics
fixed-length instructions,
register-to-register architecture,
simple addressing modes,
large general register file
three-operand instruction format.
Additionally, it has other features more
characteristic of more complex ISAs.

5
Power Architecture

Designed to be superscalar- dispatched across
three independent units branch, fixed-point
arithmetic, and floating point units. This allows
out of order execution.
Compound instructions--updating the base register
on a load and store with the newly calculated
effective address, thus eliminating the need for
extra add instructions required to increment the
index for array traversals.
Does not implement delayed branches- Instead the
POWER architecture uses a branch target buffer,
and the now well known branch folding technique.
Branching technique- The POWER architecture has
eight condition registers that are set by compare
instructions. One additional bit in the opcode
of each instruction signaled that instructions
should be executed only under certain conditions,
a form of predicated execution.

6
Shortfalls..

The original POWER microprocessor, one of the
first superscalar RISC implementations, was a
high performance, multi-chip design.
IBM soon realized that they would need a
single-chip microprocessor to scale their RS/6000
line from lower-end to high-end machines.
Work on a single-chip POWER microprocessor,
called the RSC (RISC Single Chip) began. In early
1991 IBM realized that their design could
potentially become a high-volume microprocessor
used across the industry.

7
PowerPC Architecture

In order to maintain RS/6000 software
compatibility, the PowerPC adapted the POWER
architecture, and many enhancements were added to
provide a low-cost, single-chip, superscalar,
multiprocessor capable, and 64-bit processor.
Several bit/field instructions that use three
source operands were eliminated to avoid the need
for extra register ports.
Complex string instructions were left out,
consistent with the RISC philosophy.
Instructions whose operation was dependent on the
value of source operand were eliminated.
Precision shifts, integer multiplies, and
divide-with-reminder instructions were omitted.
Support for operation in both big-endian and
little-endian modes
Single and double precision floating-point
arithmetic 64-bit architecture, backward
compatible to 32-bit

8
PowerPC family

PowerPC 601
medium sized and medium performance processor
includes a more sophisticated branch unit
capable to dispatch three out-of-order
instructions per cycle.
up to 8 instructions per cycle can be fetched
directly into an eight-entry
instruction queue
(IQ), where they're decoded before being

dispatched to the execution core.
Branch folding The instruction queue is used for
detecting and dealing
with branches.
The branch unit scans bottom four entries of the
queue,
identifying branch instructions and
determining what type they are
(conditional,
unconditional).
In cases where the branch unit has enough
information to resolve the

branch right then and there (an unconditional
branch, or a conditional
branch whose condition is
dependent on information that's already in the

condition register) then the
branch instruction is simply deleted from

the instruction queue and replaced with the
instruction located at the branch target.
PowerPC 603
smaller die size than the 601
smaller cache
capable to dispatch three out-of-order
instructions per cycle.
The 604 and 620 microprocessors were developed in
the sequel of the PowerPC production line. Both
aimed for higher performance. The 604 was based
on the 32-bit architecture while the 620 is a
64-bit architecture.

9
Current Status

PowerPC e200 - 32 bit power architecture
microprocessor - speed ranging up to 600 MHz -
ideal for embedded applications.
PowerPC e300 similar to e200 with an increase
in speed upto 667 MHz.
PowerPC e600 speed upto 2 Ghz ideal for high
performance routing and telecommunications
applications.
POWER5 IBM dual core µP
POWER6 IBM Dual core µP - A notable
difference from POWER5 is that the POWER6
executes instructions in-order instead of
out-of-order
PowerPC G3 - Apple Macintosh computers such as
the PowerBook G3, the multicolored iMacs, iBooks
and several desktops, including both the Beige
and Blue and White Power Macintosh G3s.
PowerPC G4 - is a designation used by Apple
Computer to describe a fourth generation of
32-bit PowerPC microprocessors.
PowerPC G5 - 64-bit Power Architecture processors
Xenon - based on IBMs PowerPC ISA XBOX 360
game console.
Broadway based on IBMs PowerPC ISA Nintendo
Wii gaming console
Blue Gene/L - dual core PowerPC 440, 700 MHz,
2004
Blue Gene/P - quad core PowerPC 450, 850 MHz,
2007

10
PowerPC ISA

Mix between Sparc(Risc) and Motorola(Cisc).
Different implementation levels ( so the chip
does not need to be fully implemented for
embedded solutions ).
Load and store architecture. Operations are
always done over registers. Memory is never
directly addressed.
Offers a large number of mnemonics that increase
the number of instructions without increasing the
number of on-chip instruction.
Passes arguments using registers and the stack.
32-bit Registers, allow to address 4 gigabytes of
virtual memory.

11
Overall design

Integer Execution Unit
Floating Point Unit
Load/Store Unit (LSU)
Branch Execution Units
Memory Management Unit
Memory Unit
Cache

12
PowerPC Registers

PowerPC's application-level registers are broken
into three categories
general purpose, floating point and special
purpose registers.
General-purpose registers (GPRs) - r0 to r31
flat-scheme of 32 general purpose registers.
Source and destination for all integer operations
address source for all load/store operations.
They also provide access to SPRs.
All GPRs are available for use with one
exception in certain instructions, GPR0 simply
means the value 0, and no lookup is done for
GPR0's contents.
Some of these registers have special tasks
assigned to them
r0 Volatile register which may be modified during
function linkage
r1 Stack frame pointer, always valid
r2 System-reserved register
r3-r4 Volatile registers used for parameter
passing and return values
r5-r10 Volatile registers used for parameter
passing
r11-r12 Volatile registers which may be modified
during function linkage
r13 Small data area pointer register
r14-r30 Registers used for local variables
r31 Used for local variables or "environment
pointers

13
Floating point registers

Floating-point registers (FPRs)- fr0 to fr31
32 floating-point registers with 64-bit
precision.
source and destination operands of all
floating-point operations
can contain 32-bit and 64-bit signed and unsigned
integer values, as well as single-precision and
double-precision floating-point values.
FPRs also provide access to the
FPSCR(Floating-Point Status and Control Register)
FPSCR captures status and exceptions resulting
from floating-point operations, and also provides
control bits for enabling specific exception
types.
Instructions to load and store double precision
floating point numbers transfers 64-bit of data
without conversion.
Instructions to load from memory single precision
floating point numbers convert to double
precision format before storing them in the
register.
f0 Volatile register
f1 Volatile register used for parameter passing
and return values
f2-f8 Volatile registers used for parameter
passing
f9-f13 Volatile registers
f14-f31 Registers used for local variables

14
Special-purpose registers (SPRs)

The Fixed-Point Exception Register (XER)- used
for indicating conditions for integer operations,
such as carries and overflows.
The Floating-Point Status and Control Register
(FPSCR)- 32-bit register used to store the status
and control of the floating-point operations.
The Count Register (CTR)- used to hold a loop
count that can be decremented during the
execution of branch instructions.
The Condition Register (CR)-32-bit register
grouped into eight fields, where each field is 4
bits that signify the result of an instructions
operation Equal (EQ), Greater Than (GT), Less
Than (LT), and Summary Overflow (SO).
The Link Register (LR) contains the address to
return to at the end of a function call.

15
Data Types

It can use either little-endian or big-endian
style.
Fixed-point data types include
Unsigned byte 8bits
Unsigned halfword 16-bits
Signed halfword 16-bits
Unsigned word 32-bit
Signed word 32-bit
Unsigned doubleword 64-bits
Byte Strings From 0 128 bytes in length
2s complement is used for negative values
floating-point data formats
single-precision, 32 bits long (23 8 1)
double-precision, 64 bits long (52 11 1)
characters are stored using 8-bit ASCII codes

16
Instruction types
17
Instruction Format

All instruction encodings are 32 bits in length.
Bit numbering for PowerPC is the opposite of most
other definitions bit 0 is the most significant
bit, and bit 31 is the least significant bit.
Instructions are first decoded by the upper 6
bits in a field, called the primary opcode. The
remaining 26 bits contain fields for operand
specifiers, immediate operands, and extended
opcodes, and these may be reserved bits or
fields.
Common Instruction formats

Format 0-5 6-10 11-15 16-20 21-25 26-29 30 31
D-form opcd tgt/src src/tgt immediate immediate immediate immediate immediate
X-form opcd tgt/src src/tgt src extended opcd extended opcd extended opcd extended opcd
A-form opcd tgt/src src/tgt src src extended opcd extended opcd Rc
BD-form opcd BO BI BD BD BD AA LK
I-form opcd LI LI LI LI LI AA LK
18
Instruction format

D-form- provides up to two registers as source
operands, one immediate source, and up to two
registers as target operands. Some variations of
this instruction format use portions of the
target and source register operand specifiers as
immediate fields or as extended opcodes.
X-form- provides up to two registers as source
operands and up to two target operands. Some
variations of this instruction format use
portions of the target and source operand
specifiers as immediate fields or as extended
opcodes.
A-form- provides up to three registers as source
operands, and one target operand. Some variations
of this instruction format use portions of the
target and source operand specifiers as immediate
fields or as extended opcodes.
BD-form- conditional branch instruction. The BO
field specifies the type of condition BI field
specifies which CR bit to be used as the
condition BD field is used as the branch
displacement. AA bit specifies whether the branch
is an absolute or relative branch. The LK bit
specifies whether the address of the next
sequential instruction is saved in the Link
Register as a return address for a subroutine
call.
I-form- used by the unconditional branch
instruction. Being unconditional, the BO and BI
fields of the BD format are exchanged for
additional branch displacement to form the LI
instruction field. This instruction format also
supports the AA and LK bits in the same fashion
as the BD format.
Simplified powerpc instrution set
http//pds.twi.tudelft.nl/vakken/in1200/labcourse/
instruction-set/

D-form opcd tgt/src src/tgt immediate
X-form opcd tgt/src src/tgt src extended opcd
A-form opcd tgt/src src/tgt src src extended opcd Rc
BD-form opcd BO BI BD AA LK
I-form opcd LI AA LK
19
Instruction formats
BD-Form
D-Form
A-Form
20
PowerPC Addressing Modes

Load/store architecture
Indirect
Instruction includes 16 bit displacement to be
added to base register (may be GP register)
Can replace base register content with new
address
Indirect indexed
Instruction references base register and index
register (both may be GP)
EA is sum of contents
Branch address Target address calculation
Absolute TA actual address
Relative TA current
instruction address displacement 25 bits,
signed
Indirect
Arithmetic
Operands in registers or part of instruction
Floating point is register only
Link Register TA (LR)
Count Register TA (CR)

21
PowerPC function call conventions

Results from a function call are returned in
GPR3, FPR1, or by passing a pointer to a
structure as the implicit leftmost parameter.
Any parameters that do not fit into the
designated registers are passed on the stack. In
addition, enough space is allocated on the stack
to hold all parameters, whether they are passed
in registers or not.
PowerPC run-time environment uses a

grow-down stack that allocates
space for
a function's parameters,
linkage
information, and for local
variables.
The environment uses a single stack

pointer without any frame pointer.

To achieve this
simplification, the
PowerPC
stack has a much more rigidly

defined
structure.

22
PowerPC G4e Pipelining

Seven Stage Pipeline
Superscalar Microprocessor allows multiple
instructions to be executed in parallel.
Nine Execution Units
BPU Branch Processing Unit
VPU Vector Permute Unit
VIU Vector Integer Unit
VCIU Vector Complex Integer Unit
VFPU Vector Floating Point Unit
FPU Floating Point Unit
IU Integer Unit
CIU Complex Integer Unit
LSU Load/Store Unit

23
PowerPC G4e Pipeline Stages

Stages 1 and 2 - Instruction Fetch
These two stages are both dedicated primarily to
grabbing an instruction from the L1 cache.
The G4e can fetch four instructions per clock
cycle from the L1 cache and send them on to the
next stage
Stage 3 - Decode/Dispatch
Once an instruction has been fetched, it goes
into a 12-entry instruction queue to be decoded.
The G4e's decoder can dispatch up to three
instructions per clock cycle to the next stage.

24
PowerPC G4e Pipeline Stages

Stage 4 - Issue
The first queue Floating-Point Issue Queue (FIQ),
which holds floating-point (FP) instructions that
are waiting to be executed.
The second is the Vector Issue Queue (VIQ), which
holds vector operations.
The third queue is the General Instruction Queue
(GIQ), which holds everything else.
Once the instruction leaves its issue queue, it
goes to the execution engine to be executed.

25
PowerPC G4e Pipeline Stages

Stage 5 - Execute
The instructions can pass out-of-order from their
issue queues into their respective functional
units and be executed.
Stage 6 and 7 - Complete and Write-Back
In these two stages, the instructions are put
back into the order in which they came into the
processor, and their results are written back to
memory.

26
Design principles

Simplicity favors' regularity
Standard 32 bit instruction format for all
instructions
fixed-length instructions,
register-to-register architecture
three-operand instruction format.
Smaller is faster
3- Categories of registers , but each handles
specific instructions so presumably faster access
time
Make the common case fast
Integer and floating point instructions
Good design demands good compromises
To align with RISC principles many instructions
that required three source operands were
eliminated
Many complex instructions curtailed to confirm
with RISC principles but compensated by large
number of mnemonics that increase the number of
instructions .

27
Pros and Cons

Instruction Set
200 machine instructions
More complex than most RISC machines
e.g. floating-point multiply and add
instructions that take three input operands
e.g. load and store instructions may
automatically update the index register to
contain the just-computed target address
Pipelined execution
More sophisticated than SPARC
Input and Output
Two different modes
Direct-store segment map virtual address space
to an external address space
Normal virtual memory access
Permits a range of implementation from low cost
controllers through high performance processors.

28
References

http//www.ibm.com/developerworks/linux/library/l-
powarch/
http//www.cresco.enea.it/LA1/cresco_sp14_ylichron
/CBE-docs/PowerPC_Vers202_Book1_public.pdf
http//en.wikipedia.org/wiki/PowerPC
http//pds.twi.tudelft.nl/vakken/in1200/labcourse/
instruction-set
http//www.eecs.umich.edu/stever/373/lecnotes2.p
df
http//www.devx.com/ibm/Article/20943