Nirmal Chhugani - PowerPoint PPT Presentation

About This Presentation
Title:

Nirmal Chhugani

Description:

Introduction. PowerPC (Performance Optimization With Enhanced RISC Performance Computing) is a RISC architecture created by (AIM) Apple IBM Motorola alliance ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 29
Provided by: csWinona
Learn more at: https://cs.winona.edu
Category:
Tags: chhugani | cycle | dual | nirmal

less

Transcript and Presenter's Notes

Title: Nirmal Chhugani


1
Power PC Architecture
  • Nirmal Chhugani

2
Introduction
  • PowerPC (Performance Optimization With Enhanced
    RISC Performance Computing) is a RISC
    architecture created by (AIM) AppleIBMMotorola
    alliance in 1991.
  • The original idea for the PowerPC architecture
    came
  • from IBMs Power architecture (introduced in
    the Risc/6000) and retains a high level of
    compatibility with it.
  • The intention was to build a high-performance,
    superscalar low-cost processor.

3
History
  • The history of the PowerPC began with IBM's 801
    prototype chip of John Cocke s(IBM Watson
    Research Lab) RISC ideas in the late 1970s (with
    further refinements developed by David Paterson).
  • 801-based cores were used in a number of IBM
    embedded products, eventually becoming the
    16-register ROMP (Research Office Products
    Division Micro Processor was a 10 MHz RISC
    microprocessor designed by IBM in the early 1980)
    processor used in the IBM RT(computer workstation
    by IBM).
  • The RT had disappointing performance and IBM
    started the project to build the fastest
    processor on the market. The result was the POWER
    architecture, introduced with the RISC
    System/6000 in early 1990.

4
History.. POWER architecture
  • The POWER architecture incorporated lots of the
    RISC characteristics
  • fixed-length instructions,
  • register-to-register architecture,
  • simple addressing modes,
  • large general register file
  • three-operand instruction format.
  • Additionally, it has other features more
    characteristic of more complex ISAs.

5
Power Architecture
  • Designed to be superscalar- dispatched across
    three independent units branch, fixed-point
    arithmetic, and floating point units. This allows
    out of order execution.
  • Compound instructions--updating the base register
    on a load and store with the newly calculated
    effective address, thus eliminating the need for
    extra add instructions required to increment the
    index for array traversals.
  • Does not implement delayed branches- Instead the
    POWER architecture uses a branch target buffer,
    and the now well known branch folding technique.
  • Branching technique- The POWER architecture has
    eight condition registers that are set by compare
    instructions. One additional bit in the opcode
    of each instruction signaled that instructions
    should be executed only under certain conditions,
    a form of predicated execution.

6
Shortfalls..
  • The original POWER microprocessor, one of the
    first superscalar RISC implementations, was a
    high performance, multi-chip design.
  • IBM soon realized that they would need a
    single-chip microprocessor to scale their RS/6000
    line from lower-end to high-end machines.
  • Work on a single-chip POWER microprocessor,
    called the RSC (RISC Single Chip) began. In early
    1991 IBM realized that their design could
    potentially become a high-volume microprocessor
    used across the industry.

7
PowerPC Architecture
  • In order to maintain RS/6000 software
    compatibility, the PowerPC adapted the POWER
    architecture, and many enhancements were added to
    provide a low-cost, single-chip, superscalar,
    multiprocessor capable, and 64-bit processor.
  • Several bit/field instructions that use three
    source operands were eliminated to avoid the need
    for extra register ports.
  • Complex string instructions were left out,
    consistent with the RISC philosophy.
  • Instructions whose operation was dependent on the
    value of source operand were eliminated.
  • Precision shifts, integer multiplies, and
    divide-with-reminder instructions were omitted.
  • Support for operation in both big-endian and
    little-endian modes
  • Single and double precision floating-point
    arithmetic 64-bit architecture, backward
    compatible to 32-bit

8
PowerPC family
  • PowerPC 601
  • medium sized and medium performance processor
  • includes a more sophisticated branch unit
  • capable to dispatch three out-of-order
    instructions per cycle.
  • up to 8 instructions per cycle can be fetched
    directly into an eight-entry
    instruction queue
    (IQ), where they're decoded before being

    dispatched to the execution core.
  • Branch folding The instruction queue is used for
    detecting and dealing
    with branches.
    The branch unit scans bottom four entries of the
    queue,
    identifying branch instructions and
    determining what type they are
    (conditional,
    unconditional).
  • In cases where the branch unit has enough
    information to resolve the

    branch right then and there (an unconditional
    branch, or a conditional
    branch whose condition is
    dependent on information that's already in the

    condition register) then the
    branch instruction is simply deleted from

    the instruction queue and replaced with the
    instruction located at the branch target.
  • PowerPC 603
  • smaller die size than the 601
  • smaller cache
  • capable to dispatch three out-of-order
    instructions per cycle.
  • The 604 and 620 microprocessors were developed in
    the sequel of the PowerPC production line. Both
    aimed for higher performance. The 604 was based
    on the 32-bit architecture while the 620 is a
    64-bit architecture.

9
Current Status
  • PowerPC e200 - 32 bit power architecture
    microprocessor - speed ranging up to 600 MHz -
    ideal for embedded applications.
  • PowerPC e300 similar to e200 with an increase
    in speed upto 667 MHz.
  • PowerPC e600 speed upto 2 Ghz ideal for high
    performance routing and telecommunications
    applications.
  • POWER5 IBM dual core µP
  • POWER6 IBM Dual core µP - A notable
    difference from POWER5 is that the POWER6
    executes instructions in-order instead of
    out-of-order
  • PowerPC G3 - Apple Macintosh computers such as
    the PowerBook G3, the multicolored iMacs, iBooks
    and several desktops, including both the Beige
    and Blue and White Power Macintosh G3s.
  • PowerPC G4 - is a designation used by Apple
    Computer to describe a fourth generation of
    32-bit PowerPC microprocessors.
  • PowerPC G5 - 64-bit Power Architecture processors
  • Xenon - based on IBMs PowerPC ISA XBOX 360
    game console.
  • Broadway based on IBMs PowerPC ISA Nintendo
    Wii gaming console  
  • Blue Gene/L - dual core PowerPC 440, 700 MHz,
    2004
  • Blue Gene/P - quad core PowerPC 450, 850 MHz,
    2007

10
PowerPC ISA
  • Mix between Sparc(Risc) and Motorola(Cisc).
  • Different implementation levels ( so the chip
    does not need to be fully implemented for
    embedded solutions ).
  • Load and store architecture. Operations are
    always done over registers. Memory is never
    directly addressed.
  • Offers a large number of mnemonics that increase
    the number of instructions without increasing the
    number of on-chip instruction.
  • Passes arguments using registers and the stack.
  • 32-bit Registers, allow to address 4 gigabytes of
    virtual memory.

11
Overall design
  • Integer Execution Unit
  • Floating Point Unit
  • Load/Store Unit (LSU)
  • Branch Execution Units
  • Memory Management Unit
  • Memory Unit
  • Cache

12
PowerPC Registers
  • PowerPC's application-level registers are broken
    into three categories
    general purpose, floating point and special
    purpose registers.
  • General-purpose registers (GPRs) - r0 to r31
  • flat-scheme of 32 general purpose registers.
  • Source and destination for all integer operations
  • address source for all load/store operations.
  • They also provide access to SPRs.
  • All GPRs are available for use with one
    exception in certain instructions, GPR0 simply
    means the value 0, and no lookup is done for
    GPR0's contents.
  • Some of these registers have special tasks
    assigned to them
  • r0 Volatile register which may be modified during
    function linkage
  • r1 Stack frame pointer, always valid
  • r2 System-reserved register
  • r3-r4 Volatile registers used for parameter
    passing and return values
  • r5-r10 Volatile registers used for parameter
    passing
  • r11-r12 Volatile registers which may be modified
    during function linkage
  • r13 Small data area pointer register
  • r14-r30 Registers used for local variables
  • r31 Used for local variables or "environment
    pointers

13
Floating point registers
  • Floating-point registers (FPRs)- fr0 to fr31
  • 32 floating-point registers with 64-bit
    precision.
  • source and destination operands of all
    floating-point operations
  • can contain 32-bit and 64-bit signed and unsigned
    integer values, as well as single-precision and
    double-precision floating-point values.
  • FPRs also provide access to the
    FPSCR(Floating-Point Status and Control Register)
  • FPSCR captures status and exceptions resulting
    from floating-point operations, and also provides
    control bits for enabling specific exception
    types.
  • Instructions to load and store double precision
    floating point numbers transfers 64-bit of data
    without conversion.
  • Instructions to load from memory single precision
    floating point numbers convert to double
    precision format before storing them in the
    register.
  • f0 Volatile register
  • f1 Volatile register used for parameter passing
    and return values
  • f2-f8 Volatile registers used for parameter
    passing
  • f9-f13 Volatile registers
  • f14-f31 Registers used for local variables

14
Special-purpose registers (SPRs)
  • The Fixed-Point Exception Register (XER)- used
    for indicating conditions for integer operations,
    such as carries and overflows.
  • The Floating-Point Status and Control Register
    (FPSCR)- 32-bit register used to store the status
    and control of the floating-point operations.
  • The Count Register (CTR)- used to hold a loop
    count that can be decremented during the
    execution of branch instructions.
  • The Condition Register (CR)-32-bit register
    grouped into eight fields, where each field is 4
    bits that signify the result of an instructions
    operation Equal (EQ), Greater Than (GT), Less
    Than (LT), and Summary Overflow (SO).
  • The Link Register (LR) contains the address to
    return to at the end of a function call.

15
Data Types
  • It can use either little-endian or big-endian
    style.
  • Fixed-point data types include
  • Unsigned byte 8bits
  • Unsigned halfword 16-bits
  • Signed halfword 16-bits
  • Unsigned word 32-bit
  • Signed word 32-bit
  • Unsigned doubleword 64-bits
  • Byte Strings From 0 128 bytes in length
  • 2s complement is used for negative values
  • floating-point data formats
  • single-precision, 32 bits long (23 8 1)
  • double-precision, 64 bits long (52 11 1)
  • characters are stored using 8-bit ASCII codes

16
Instruction types
17
Instruction Format
  • All instruction encodings are 32 bits in length.
  • Bit numbering for PowerPC is the opposite of most
    other definitions bit 0 is the most significant
    bit, and bit 31 is the least significant bit.
  • Instructions are first decoded by the upper 6
    bits in a field, called the primary opcode. The
    remaining 26 bits contain fields for operand
    specifiers, immediate operands, and extended
    opcodes, and these may be reserved bits or
    fields.
  • Common Instruction formats

Format 0-5 6-10 11-15 16-20 21-25 26-29 30 31
D-form opcd tgt/src src/tgt immediate immediate immediate immediate immediate
X-form opcd tgt/src src/tgt src extended opcd extended opcd extended opcd extended opcd
A-form opcd tgt/src src/tgt src src extended opcd extended opcd Rc
BD-form opcd BO BI BD BD BD AA LK
I-form opcd LI LI LI LI LI AA LK
18
Instruction format
  • D-form- provides up to two registers as source
    operands, one immediate source, and up to two
    registers as target operands. Some variations of
    this instruction format use portions of the
    target and source register operand specifiers as
    immediate fields or as extended opcodes.
  • X-form- provides up to two registers as source
    operands and up to two target operands. Some
    variations of this instruction format use
    portions of the target and source operand
    specifiers as immediate fields or as extended
    opcodes.
  • A-form- provides up to three registers as source
    operands, and one target operand. Some variations
    of this instruction format use portions of the
    target and source operand specifiers as immediate
    fields or as extended opcodes.
  • BD-form- conditional branch instruction. The BO
    field specifies the type of condition BI field
    specifies which CR bit to be used as the
    condition BD field is used as the branch
    displacement. AA bit specifies whether the branch
    is an absolute or relative branch. The LK bit
    specifies whether the address of the next
    sequential instruction is saved in the Link
    Register as a return address for a subroutine
    call.
  • I-form- used by the unconditional branch
    instruction. Being unconditional, the BO and BI
    fields of the BD format are exchanged for
    additional branch displacement to form the LI
    instruction field. This instruction format also
    supports the AA and LK bits in the same fashion
    as the BD format.
  • Simplified powerpc instrution set
    http//pds.twi.tudelft.nl/vakken/in1200/labcourse/
    instruction-set/

D-form opcd tgt/src src/tgt immediate
X-form opcd tgt/src src/tgt src extended opcd
A-form opcd tgt/src src/tgt src src extended opcd Rc
BD-form opcd BO BI BD AA LK
I-form opcd LI AA LK
19
Instruction formats
BD-Form
D-Form
A-Form
20
PowerPC Addressing Modes
  • Load/store architecture
  • Indirect
  • Instruction includes 16 bit displacement to be
    added to base register (may be GP register)
  • Can replace base register content with new
    address
  • Indirect indexed
  • Instruction references base register and index
    register (both may be GP)
  • EA is sum of contents
  • Branch address Target address calculation
  • Absolute TA actual address
  • Relative TA current
    instruction address displacement 25 bits,
    signed
  • Indirect
  • Arithmetic
  • Operands in registers or part of instruction
  • Floating point is register only
  • Link Register TA (LR)
  • Count Register TA (CR)

21
PowerPC function call conventions
  • Results from a function call are returned in
    GPR3, FPR1, or by passing a pointer to a
    structure as the implicit leftmost parameter.
  • Any parameters that do not fit into the
    designated registers are passed on the stack. In
    addition, enough space is allocated on the stack
    to hold all parameters, whether they are passed
    in registers or not.
  • PowerPC run-time environment uses a

    grow-down stack that allocates
    space for
    a function's parameters,
    linkage
    information, and for local
    variables.
  • The environment uses a single stack

    pointer without any frame pointer.

    To achieve this
    simplification, the
    PowerPC
    stack has a much more rigidly

    defined
    structure.

22
PowerPC G4e Pipelining
  • Seven Stage Pipeline
  • Superscalar Microprocessor allows multiple
    instructions to be executed in parallel.
  • Nine Execution Units
  • BPU Branch Processing Unit
  • VPU Vector Permute Unit
  • VIU Vector Integer Unit
  • VCIU Vector Complex Integer Unit
  • VFPU Vector Floating Point Unit
  • FPU Floating Point Unit
  • IU Integer Unit
  • CIU Complex Integer Unit
  • LSU Load/Store Unit

23
PowerPC G4e Pipeline Stages
  • Stages 1 and 2 - Instruction Fetch
  • These two stages are both dedicated primarily to
    grabbing an instruction from the L1 cache.
  • The G4e can fetch four instructions per clock
    cycle from the L1 cache and send them on to the
    next stage
  • Stage 3 - Decode/Dispatch
  • Once an instruction has been fetched, it goes
    into a 12-entry instruction queue to be decoded.
  • The G4e's decoder can dispatch up to three
    instructions per clock cycle to the next stage.

24
PowerPC G4e Pipeline Stages
  • Stage 4 - Issue
  • The first queue Floating-Point Issue Queue (FIQ),
    which holds floating-point (FP) instructions that
    are waiting to be executed.
  • The second is the Vector Issue Queue (VIQ), which
    holds vector operations.
  • The third queue is the General Instruction Queue
    (GIQ), which holds everything else.
  • Once the instruction leaves its issue queue, it
    goes to the execution engine to be executed.

25
PowerPC G4e Pipeline Stages
  • Stage 5 - Execute
  • The instructions can pass out-of-order from their
    issue queues into their respective functional
    units and be executed.
  • Stage 6 and 7 - Complete and Write-Back
  • In these two stages, the instructions are put
    back into the order in which they came into the
    processor, and their results are written back to
    memory.

26
Design principles
  • Simplicity favors' regularity
  • Standard 32 bit instruction format for all
    instructions
  • fixed-length instructions,
  • register-to-register architecture
  • three-operand instruction format.
  • Smaller is faster
  • 3- Categories of registers , but each handles
    specific instructions so presumably faster access
    time
  • Make the common case fast
  • Integer and floating point instructions
  • Good design demands good compromises
  • To align with RISC principles many instructions
    that required three source operands were
    eliminated
  • Many complex instructions curtailed to confirm
    with RISC principles but compensated by large
    number of mnemonics that increase the number of
    instructions .

27
Pros and Cons
  • Instruction Set
  • 200 machine instructions
  • More complex than most RISC machines
  • e.g. floating-point multiply and add
    instructions that take three input operands
  • e.g. load and store instructions may
    automatically update the index register to
    contain the just-computed target address
  • Pipelined execution
  • More sophisticated than SPARC
  • Input and Output
  • Two different modes
  • Direct-store segment map virtual address space
    to an external address space
  • Normal virtual memory access
  • Permits a range of implementation from low cost
    controllers through high performance processors.

28
References
  • http//www.ibm.com/developerworks/linux/library/l-
    powarch/
  • http//www.cresco.enea.it/LA1/cresco_sp14_ylichron
    /CBE-docs/PowerPC_Vers202_Book1_public.pdf
  • http//en.wikipedia.org/wiki/PowerPC
  • http//pds.twi.tudelft.nl/vakken/in1200/labcourse/
    instruction-set
  • http//www.eecs.umich.edu/stever/373/lecnotes2.p
    df
  • http//www.devx.com/ibm/Article/20943
Write a Comment
User Comments (0)
About PowerShow.com