Pentium III Instruction Stream - PowerPoint PPT Presentation

About This Presentation

Title:

Pentium III Instruction Stream

Description:

Pentium III Instruction Stream ... – PowerPoint PPT presentation

Number of Views:89

Avg rating:3.0/5.0

Slides: 27

Provided by: KevinH181

Learn more at: https://www.cs.virginia.edu

Category:

Tags: iii | instruction | pentium | stream

Transcript and Presenter's Notes

Title: Pentium III Instruction Stream

1
Pentium III Instruction Stream
2
Introduction

Pentium III uses several key features to exploit
ILP
This part of our presentation will cover the
methods that the third generation P6/IA32
architecture uses and their advantages/disadvantag
es.

3
Features

Completely speculative execution
superscalar issue
Speculative register renaming
Deeply pipelined execution
Large branch prediction unit

4
Pentium III Execution

Deeply Pipelined
Over 30 stages for many ops (without miss
penalties)
Several tradeoffs for deeply pipelined models
Stall penalties
Clock rate

5
Pentium III Execution Model

Consists of
In-order front end/issue
Out of order execution core
In order retirement unit (non-speculative)

6
Front End Execution

ICache access
Branch prediction
Decode
Issue

7
ICache

Icache is
16KB , 4 way set associative, 32 byte cache lines
L2 (unified)

8
Branch Prediction

BTB (branch target buffer) decides address of
next executed instruction
Speculative state advantages
Less complicated recovery
Less Mispredict costs
BTB runs off of prefetch

9
Branch Prediction (Cont.)

Dynamic predictor
Yehs algorithm
last 4 directions available per branch address
One cycle disadvantage on taken branches
RSB

10
Branch Prediction (Cont.)

Static predictor
6 cycle penalty
Forward branches(not taken)
Backward branches(taken)

11
Decode

Three decode units
Two simple, one complex
Micro ops
RISC type operations
Can be 1-4 per CISC operation

12
Decode (Cont.)

Issue problems arise
Program instruction ordering very important
Tradeoff
Issue of 4-wide instructions improves compiler
performance by allowing more optimization

13
Decode (Cont.)

Williamette (last IA32 architecture) has
Execution trace cache
Immediately accessible (no cache hit delay)
Exploits temporal locality

14
Execution

Micro-ops follow distinct trails
RAT (register alias table)
ROB (re-order buffer)
Reservation station
Execution units

15
RAT

Register Mappings (source, destination)
Eliminates false dependencies
In-Order Retirement
Allows out of order execution from ROB
Issues up to 3 micro-ops to ROB per cycle
See any throughput problems?

16
RAT (cont.)

Can access either ROB or RRF
Solves true dependencies
State bits required
Branch Mispredicts?
Flush all state(mappings) older than branch
No new mappings until all current instructions
retired

17
ROB

ROB is temporary location of queued micro-ops
40 entries
Contain micro-ops, state, and results

18
ROB states

SD
Scheduled for execution
DP
Micro-op is at head of dispatch queue
EX
Currently being executed
WB
Completed execution waiting for results
RR, RT
Ready for retirement, being retired

19
Reservation Station
20
Reservation Station (Cont.)

5 ports for different ops
FP, Int, MMX, SSE, LSQ ops
More throughput problems?
20 entry queue
Organization not specified

21
Execution

Scheduling
One scheduler for each port
20 entry queue optimized by priority algorithm
Dispatch
All 5 ports can be dispatched every clock cycle

22
Execution (Cont.)

Dispatch
Dcache misses, hazards resolved
Results written back to ROB
Resolves dependency chain

23
Retirement

Results written to RRF
Non-speculative state
Register maps deleted, if possible

24
Throughput
25
Area Considerations

As it turns out
IA32 architecture doesnt scale entirely well
Die area a large problem
Bus / logical complexity grows in non linear
fashion

26
Finally

It seems that
IA32 is at an end
VLIW is next

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

A Compiler Infrastructure for Stream Programs Bill Thies PowerPoint PPT Presentation

A Compiler Infrastructure for Stream Programs Bill Thies - A Compiler Infrastructure. for Stream Programs. Bill Thies ... Michal Karczmarek, Jasper Lin, Andrew Lamb, David Maze, Rodric Rabbah and Saman Amarasinghe ... | PowerPoint PPT presentation | free to view

Validating The Intel Pentium 4 Microprocessor PowerPoint PPT Presentation

Validating The Intel Pentium 4 Microprocessor - We hope that other microprocessor designers and validators will be ... a timely fashion was indeed a daunting one. ... working with designers to rapidly drive ... | PowerPoint PPT presentation | free to view

P6 and IA64 PowerPoint PPT Presentation

P6 and IA64 - Pentium III ( Coppermine ) integrate 256 KB cache on die ... 256 bit in Pentium III ( Coppermine ) BSB speed is higher than mainboard's bus speed ... | PowerPoint PPT presentation | free to view

P5 Evolution: Pentium MMX PowerPoint PPT Presentation

P5 Evolution: Pentium MMX - To meet growing importance and increasing demands of multi-media and communication applications ... 0.18mm coppermine technology ... | PowerPoint PPT presentation | free to view

Chapter 5 Memory III PowerPoint PPT Presentation

Chapter 5 Memory III - Notice the 'U' shape: some is good, too much is bad. Michigan State University ... An 8-way associative cache has close to the same miss rate as fully associative ... | PowerPoint PPT presentation | free to view

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation PowerPoint PPT Presentation

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation - CPE 631 Lecture 10: Instruction Level Parallelism and. Its ... Iter- ation. Count. 8/19/09. UAH-CPE631. 34. CPE 631. Loop Example Cycle 1. 8/19/09. UAH-CPE631 ... | PowerPoint PPT presentation | free to view

Intel Pentium 4: A Detailed Description PowerPoint PPT Presentation

Intel Pentium 4: A Detailed Description - Intel Pentium 4: Introduction ... pumped SRT radix-2 algorithm, producing two bits of quotient (or square root) every clock cycle ... | PowerPoint PPT presentation | free to view

CS4100: Instruction Set Architecture PowerPoint PPT Presentation

CS4100: Instruction Set Architecture - Addressing modes. Comparison with other ISAs. What Is ... of operands and addressing mode: 3, 2, 1, or 0 operands; constant, from register or from memory ... | PowerPoint PPT presentation | free to view

Instruction Level Parallelism and Dynamic Execution PowerPoint PPT Presentation

Instruction Level Parallelism and Dynamic Execution - branches make flow dynamic, determine which instruction is supplier of data. Example: ... instructions can go past branches, allowing. FP ops beyond basic ... | PowerPoint PPT presentation | free to view

Native Signal Processing PowerPoint PPT Presentation

Native Signal Processing - First x86 NSP extensions, created for Intel's Pentium. 3DNow! ... New x86 FP SIMD for Intel's Pentium III. November 22, 1999. The University of Texas at Austin ... | PowerPoint PPT presentation | free to view

Instruction Set Architecture PowerPoint PPT Presentation

Instruction Set Architecture - General purpose integer and floating point registers ... Memory access only via load and store instructions ... Assembler uses the dollar notation to name registers ... | PowerPoint PPT presentation | free to view

CS 252 Graduate Computer Architecture Lecture 5: Instruction-Level Parallelism (Part 2) PowerPoint PPT Presentation

CS 252 Graduate Computer Architecture Lecture 5: Instruction-Level Parallelism (Part 2) - Instructions fetched and decoded into instruction. reorder buffer in-order ... Next PC determined before branch fetched and decoded. 2k-entry direct-mapped BTB ... | PowerPoint PPT presentation | free to view

An Analysis of SIMD Instructions in the Pentium III Microprocessor PowerPoint PPT Presentation

An Analysis of SIMD Instructions in the Pentium III Microprocessor - CPU-intensive applications. Integer SIMD/floating point problem ... speed to other computers, or if the CPU is a bottleneck for the performance of the software. ... | PowerPoint PPT presentation | free to view

Lecture 12: Limits of ILP and Pentium Processors PowerPoint PPT Presentation

Lecture 12: Limits of ILP and Pentium Processors - Do we need to invent new HW/SW mechanisms to keep on processor ... commited/clock 3. Window (Instrs in reorder buffer) 40. Number of reservations stations 20 ... | PowerPoint PPT presentation | free to view

Instruction-level Parallelism PowerPoint PPT Presentation

Instruction-level Parallelism - RAW(read after write): j tries to read a source before i writes to it ... alone is not sufficient for program correctness cause multiple predecessors ... | PowerPoint PPT presentation | free to view

Instruction Level Parallelism and Dynamic Execution PowerPoint PPT Presentation

Instruction Level Parallelism and Dynamic Execution - Pipeline CPI = Ideal pipeline CPI Structural Stalls Data Hazard Stalls ... Skip a couple of cycles. 39. 4/14/06. Tomasulo Example Cycle 55. 40. 4/14/06 ... | PowerPoint PPT presentation | free to view

Non-Linear Materials Silicon Germanium III-V & II-VI PowerPoint PPT Presentation

Non-Linear Materials Silicon Germanium III-V & II-VI - ... Non-Linear Materials Silicon Germanium III-V & II-VI Materials Systems VCELS Optical Fiber Optical Amplifiers SOA EDFA Optical Correlators Optical Signal ... | PowerPoint PPT presentation | free to view

Intel Pentium 4: A Detailed Description PowerPoint PPT Presentation

Intel Pentium 4: A Detailed Description - Intel Corporation, 2004. 4 IA-32 Intel Architecture Software Developer s Manual: Volume 1: Basic Architecture. Intel Corporation, 2004. 5 ... | PowerPoint PPT presentation | free to view

Intel Pentium M PowerPoint PPT Presentation

Intel Pentium M - Intel Pentium M Outline History P6 Pipeline in detail New features Improved Branch Prediction Micro-ops fusion Speed Step technology Thermal Throttle 2 Power and ... | PowerPoint PPT presentation | free to view

Intel Processor Strategy Update PowerPoint PPT Presentation

Intel Processor Strategy Update - ... Intel Pentium 4 Processor, 2.4 GHz, Intel Medford 850 Motherboard, (D850MD 850 motherboard) Chipset, 256 MB Memory, Windows* XP Professional Edition ... | PowerPoint PPT presentation | free to view

Computer Fundamentals PowerPoint PPT Presentation

Computer Fundamentals - Computer Fundamentals * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Pentium Evolution Pentium II ... | PowerPoint PPT presentation | free to view

????te?t????? ?pe?e??ast?? ?p? ??? 80386 St?? Pentium 4 St?fa??? ?a???a? { kaxiras@cs.wisc.edu, kaxiras@ee.upatras.gr } PowerPoint PPT Presentation

????te?t????? ?pe?e??ast?? ?p? ??? 80386 St?? Pentium 4 St?fa??? ?a???a? { kaxiras@cs.wisc.edu, kaxiras@ee.upatras.gr } - Title: 80386 Pentium 4 { kaxiras@cs.wisc.edu, kaxiras@ee ... | PowerPoint PPT presentation | free to view

CS 252 Graduate Computer Architecture Lecture 5: Instruction-Level Parallelism (Part 2) PowerPoint PPT Presentation

CS 252 Graduate Computer Architecture Lecture 5: Instruction-Level Parallelism (Part 2) - Title: EECS 252 Graduate Computer Architecture Lec XX - TOPIC Last modified by: Krste Asanovic Created Date: 2/8/2005 3:17:21 AM Document presentation format | PowerPoint PPT presentation | free to view

CS252 Graduate Computer Architecture Lecture 15: Instruction Level Parallelism and Dynamic Execution PowerPoint PPT Presentation

CS252 Graduate Computer Architecture Lecture 15: Instruction Level Parallelism and Dynamic Execution - Lecture 1: Course Introduction and Overview | PowerPoint PPT presentation | free to view

Instructions: Language of the Computer PowerPoint PPT Presentation

Instructions: Language of the Computer - Chapter 2 Instructions: Language of the Computer Instruction Set The repertoire of instructions of a computer Different computers have different instruction sets But ... | PowerPoint PPT presentation | free to view

Intel Pentium 4 Processor PowerPoint PPT Presentation

Intel Pentium 4 Processor - Intel Pentium 4 Processor Presented by Steve Kelley Zhijian Lu | PowerPoint PPT presentation | free to view

13AMT Procesory III. PowerPoint PPT Presentation

13AMT Procesory III. - 13AMT Procesory III. Lecture 4 Ing. Martin Molhanec, CSc. | PowerPoint PPT presentation | free to view