Pentium Architecture Studying - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Pentium Architecture Studying

Description:

Control register CR0 flag CD turn on ... MTRR(memory type range register):associate memory type with physical address ... MTRRcap register is used to record: ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 22
Provided by: min83
Category:

less

Transcript and Presenter's Notes

Title: Pentium Architecture Studying


1
Pentium Architecture Studying
2
Member / Topic
  • Jian-Jang Chen
  • Micro-architecture / Pipeline
  • I-Hwei Yen
  • Instruction Set Architecture
  • Hung-Jen Huang
  • Cache / Memory Architecture
  • Min Li
  • Code Optimization (Branch prediction)

3
Micro-Architecture/Pipeline
Pentium Architecture
4
Micro-Architecture/Pipeline
  • Fetch / Decode Unit

From bus interface unit
Icache
Next_IP
Branch Target buffer
Instruction Decoder (x3)
Microcode Instruction sequencer
To Instruction pool (reorder buffer)
Register alias table allocate
Pentium Architecture
5
  • Dispatch / Execute Unit

MMX Exe Unit
FP Exe Unit
Port 0
Int. Exe. Unit
R.S.
To / From Instruction pool (reorder buffer)
MMX Exe Unit
Jmp Exe Unit
Port 1
Int. Exe. Unit
Port 2
Load Unit
Loads
Port 3,4
Store Unit
Stores
6
  • Retire Unit

To / From Dcache
Reservation Station
Memory Interface Unit
Retirement register file
From
To
Instruction pool
7
  • Operand Addressing
  • An operand in an IA instruction can be located in
    the instruction itself, a register, a memory
    location, or an I/O port. They are classified by
    the following
  • Immediate Operands
  • Register Operands
  • Memory Operands
  • I/O Port Addressing

Pentium Architecture
8
Memory Operands
  • Operands in memory are referenced by a segment
    selector and an offset

15
0
31
0
Segment Selector
Offset
Pentium Architecture
9
  • Segment Selector
  • Can be specified implicitly or explicitly
  • Rules

Pentium Architecture
10
  • Offset
  • Offset can be any combinations of the factors
    bellow
  • Offset Base (Index Scale) Displacement
  • Direct (static address) Offset Displacement
  • Indirect (dynamic) Offset Base
  • Examples
  • (Index Scale) Displacement can present the
    elements of an array. Displacement locates the
    beginning of the array, the index holds the
    element to be fetched, and the scale used for
    different data types.

Pentium Architecture
11
  • Instruction Format
  • The general instruction format is as following

Pentium Architecture
12
  • L1 ins/data16K,4-way,32bytes/block
  • L2unified,512K,32bytes/block
  • Write buffer32bytes,4 in Pentium III
  • L2 has a separate cache bus
  • No partial filled cache line
  • Snoop ability for multiprocessors.

Physical memory
Data Cache Unit
System buss
L2 cache
bus
Instruct. TLB
Bus Interface Unit
Data TLB
Fetch Unit and L1 ins cache
Write Buffer
13
  • Memory type
  • Allow any area of system memory to be cached
    in L1 or L2
  • Allow the type of caching,ie, memory type, to
    be specified by a variety of
  • flags and registers.Five types of memory are
    defined.
  • UC(uncacheable) 1)in order accesses 2)useful
    for memory mapped I/O.
  • WC(write combining)system memory locations
    are not cached,writes can be
  • delayed and combined in the write buffer
    until buffer full or serialization.
  • WT(write through)reads and writes are
    cached.all writes go through both a
  • cache line and the system memory.
  • WB(write back)all reads and writes occur in
    cache when possible.
  • WP(write protected)writes cause
    corresponding cache lines on all processors
  • on the bus to be invalidated

14
  • Cache control protocol
  • MESI maintains consistency between different
    processors caches.
  • L1 instruction cache only has SI
    control,because its not writable.
  • Each cache line could be in one of the
    following four states

15
  • Cache control
  • Two level cache controlglobal and page.
  • Control register CR0 flag CD turn on/off whole
    system memory/caching L2,L1
  • Control flag NW in CR0 controls writing policy
    of the whole system memory.
  • Each page table or page directory entry has two
    similar flags to control caching at
  • page level1)PCDenable/disable caching
    2)PWTclear for WB set for WT
  • Global pageresident page entries in TLB unless
    special operation.
  • Precedenceglobal flag overrules page level flag
    caching control
  • Precedenceuncaching is selected when
    confliction occurs
  • PrecedenceWC takes precedence over WTwhich
    takes precedence over WB
  • Invalidatesome instructions could invalidate
    cache when caching is disabled
  • TLB or write buffer may be drained or
    invalidated under special operation.

16
  • MTRR register
  • MTRR(memory type range register)associate
    memory type with physical address
  • Allows 96 memory ranges to be defined in
    physical memory.
  • In multiprocessor system ,different processors
    must use identical MTRR map.
  • In general,BIOS configures these MTRRs,and
    operating system remaps them.
  • MTRRcap register is used to record
  • 1.number of variable ranges could be
    implemented 2.fix range support? 3.WC?
  • MTRRdefType registerused to 1)define default
    type of the memory
  • 2)turn on/off MTRRs 3)enable/disable
    fixed-range MTRRs
  • If fix-range MTRRs enabled,they take priority
    over variable-range MTRRs.
  • 11 fixed-range registers,each is in charge of 8
    fix memory ranges type.
  • Allows maximum 8 variable ranges be defined by
    16 MTRRs.

17
  • Branch Prediction Rules
  • If the instruction address is not in the BTB,
    execution is predicted to
  • continue without branching ( fall through )
  • Predicted taken branches have a 1 clock delay
  • The BTB stores a four-bit history of branch
    predictions
  • BTB pattern matches on the direction of the
    last four branches to
  • Dynamically predict whether a branch will
    be taken

Pentium Architecture
18
BRANCH PREDICTION OPTIMIZATION
  • Optimize Branch Predictions in Code
  • Reduce or eliminate branches
  • Insure that each CALL instruction has a
    matching RET instruction
  • Do not intermingle data with instructions in a
    code segment
  • Unroll all very short loops
  • Write code to follow the static prediction
    algorithm

Pentium Architecture
19
BRANCH PREDICTION OPTIMIZATION
  • Static Prediction Algorithm

When branches dont have a history in the BTB
  • Predicts unconditional branches to be taken
  • JMP
  • Predicts backward conditional branches to be
    taken. This rule is
  • suitable for loops
  • loop lt condition gt
  • Predicts forward conditional branches to be
    NOT taken
  • if lt condition gt

Pentium Architecture
20
BRANCH PREDICTION OPTIMIZATION
  • Eliminating and Reducing the Number of Branches
  • Removing the possibility of branch
    mispredictions
  • Reducing the number of BTB entries required

WHY
  • Using replacement instructions instead of branch
    instruction
  • SETcc
  • CMOVcc or FCMOVcc
  • ---Combine JNE ( JGE , etc.) and MOV instructions
    into one

HOW
Pentium Architecture
21
BRANCH PREDICTION OPTIMIZATION
1. X ( AltB ) ? C1C2
Example
2.
Example
Pentium Architecture
Write a Comment
User Comments (0)
About PowerShow.com