64 Architecture and innovations - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

64 Architecture and innovations

Description:

Current technology Unrolling/Pipelining exploit this ILP. Prologue/Epilogue cause code expansion ... Unrolling cause more code expansion. Limits the ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 33
Provided by: Deborah3
Category:

less

Transcript and Presenter's Notes

Title: 64 Architecture and innovations


1
64 Architecture and innovations
  • Todays Architecture Challenges
  • IA-64 Architecture
  • Other 64 Architectures

Part of this from Intel Documents
2
Todays Architecture challenges
  • Sequential semantics of the ISA
  • Instruction level parallelism (ILP)
  • Unpredictable branches, memory dependencies
  • Memory latency
  • Limited resources
  • (registers, memory address space)
  • Procedure call, loop pipelining overhead

3
Sequential semantics
  • Program sequence of instructions
  • Potential dependence from instructions to
    instructions
  • High performance requires parallel execution
  • Parallel execution needs independent instructions
  • Sequentiality inherent in traditional
    architecture

4
Sequential semantics
  • Compiler knows the available parallelism
  • But has no vocabulary to express it
  • Computer architecture must (re)discover
    parallelism and provide ILP

5
Low instruction level parallelism
  • Branches frequent, code blocks small
  • Limited parallelism within code basic blocks
  • Wider machines need more parallel instructions
  • Need to exploit ILP across branches
  • Limited ILP available within basic blocks

6
Branch unpredictability
  • Branches after the sequence of instructions
  • ILP must be extracted across branches
  • Branch prediction has its limitations
  • Not perfect, performance penalty when wrong
  • Need to speculatively execute instructions that
    can fault
  • Memory operations (load), float point
    operations,...
  • Need to defer exceptions on speculative
    operations
  • More book keeping overhead hardware

7
Memory dependencies
  • Loads usually at the top of chains of
    instructions
  • ILP extraction requires moving these loads
  • Branches may be a barrier
  • Stores may be also a barrier
  • Dynamic disambiguation has its limitations
  • Limited on its scope, requires additional
    hardware
  • Adds to code size increase if done in software

8
Memory latency
  • Speed difference between CPU and memory
  • Has been increasing over time
  • Need to distance loads from their uses
  • Cache hierarchy has its limitations
  • Small cache limits working set
  • Helps if there is locality
  • Managed asynchronously by hardware

9
Resource constraints
  • Sall register space
  • Limits compilers ability to express parallelism
  • Shared resources
  • Condition flags, control registers, etc.
  • Forces dependencies on otherwise independent
    instructions
  • Floating point resources

10
Procedure call overhea
  • Modular programming increasingly used
  • Programs ten to be call intensive
  • Register space is shared by caller and callee
  • Procedure calls require register save/restores

11
Loop optimization overhead
  • Loops are a common source of good ILP
  • Current technology Unrolling/Pipelining exploit
    this ILP
  • Prologue/Epilogue cause code expansion
  • Unrolling cause more code expansion
  • Limits the applicability of these techniques

12
  • Other challenges
  • Complex conditonals
  • Sequential branch execution increases critical
    path
  • Dynamic resource binding
  • Parallel instructions need to be reorganized to
    fit machine capacity

13
Innovations
  • Predication enhances parallelism
  • Speculation minimizes the memory latency
  • Loop unrolling and rotation
  • Instruction scheduling
  • Massive resources

14
Predication instructions
  • Instruction with a predicate condition
  • (condition) Operator Destination, Source1,
    Source2
  • Execute an instruction only if its condition
    satisfied
  • Advantages
  • Avoid jumps over code blocks
  • Increasing effectiveness of pipeline
  • Avoid problems with cache
  • Eliminate unnecessary branch instructions
  • Increase ILP
  • Costs
  • Encoding space

15
Load Speculation
  • Initiate loads from memory earlier in the
    instruction stream, even before a branch
  • Control speculation
  • Data speculation
  • Proposed to reduce memory latency
  • Benefits
  • Increase parallelism
  • Performance improvement 79 (a study published in
    1998)

16
Loop unrolling and rotation
  • Loop unrolling
  • Duplicate code in the loop two, four, or even
    more times to reduce the number of branches and
    allow greater parallelism
  • Code exapansion
  • Rotation
  • Both register and predicate values can be stored
    and advanced in the register chains for future
    uses
  • Advantages
  • Benefits of loop unrolling without paying its
    price
  • Increase ILP

17
Instruction scheduling
  • A compiler optimization used to improve
    instruction level parallelism
  • Analyze data dependency
  • Form basic blocks of instructions such that all
    instructions within a block can be executed in
    any order, possibly in parallel

18
Massive resources
  • Thanks to advancement of hardware technology, a
    new architecture can be designed with massive
    resources that are not available earlier
  • IA-64 architecture
  • 128 general integer registers
  • 128 floating point registers
  • 64 predicate registers, and
  • Many extra execution units

19
IA-64 features
  • EPIC (Explicit Parallel Instruction Computation)
  • Data types, memory and registers
  • Register stack
  • Predication and parallel compares
  • Software pipelining and register rotation
  • Control and data speculation
  • Branch architecture
  • Integer architecture
  • Floating point architecture

20
IA 64 Instructions
  • EPIS instruction Parallelism
  • Instruction bundles
  • Predication instructions
  • Support IA-32 instructions

21
(No Transcript)
22
Predication instructions
23
Register set
24
Control speculation
  • Control speculation moves loads above branches
  • Detected exception indicated using a tag bit
  • Check raises detected exceptions

25
Data speculation
  • Data speculation moves loads above possibly
    conflicting stores
  • Advanced-loaded data base be used speculatively

26
AMD 64 Architecture
  • Designed by AMD (x86-64)
  • A super set of IA-32
  • Full support for 64-bit integers
  • Additional registers (16)
  • Additional XMM registers
  • Large virtual address space (2 )
  • Instruction pointer relative data access
  • SSE insructions
  • No-execute bit
  • Removal of older features

64
27
Intel 64 Architecture (Intel EM64T)
  • Extended Memory 64 Technology
  • Compatibility mode
  • 64-bit mod
  • 64 bit addressing space
  • 8 additional register
  • 8 additional registers for SS instructions
  • Fast interrupt mechanism
  • Instruction pointer relative addressing mode

28
Difference between AMD 64 and Intel 64
  • A small number of differences
  • Compilers generally produce binaries that target
    both AMD64 and EM64T, making the differences
    mainly of interest to compiler developers and
    operating system developers.
  • BSF and BSR
  • AMD64 supports 3DNow
  • Intel 64 supports microcode updates as in 32-bit
    mode while AMD64 uses a different microcode
    updates
  • A few others

29
Itanium at the market place
  • HP is the driving force behind Itanium server
    using IA-64 Architecture
  • HP Workstation ZX60000
  • 900 MHz Itanium 2
  • 3000
  • Dell PowerEdge 7250
  • Starting price US12,000 in 2004
  • Dell dropped all Itanium ships from its product
    line
  • Its future
  • ?

30
Operating system support
  • Windows, DOS
  • Linux
  • Max OS X
  • Solaris
  • OpenBSD
  • NetBSD

31
Intel 64
  • Intel 64 is the ISA for
  • Intel Xeon processor
  • Intel Core 2 Duo processor

32
AMD 64/X86-64
  • AMD Athlon 64
  • AMD Athlon 64 X2
  • AMD Opteron
  • AMD Turion 64
Write a Comment
User Comments (0)
About PowerShow.com