204321 Computer Architecture

1 / 40
About This Presentation
Title:

204321 Computer Architecture

Description:

?????????????????? fast hit time ??? Direct-mapped ?????????? conflict misses ... Early Restart and Critical Word First on miss ... –

Number of Views:24
Avg rating:3.0/5.0
Slides: 41
Provided by: pitimonp
Category:

less

Transcript and Presenter's Notes

Title: 204321 Computer Architecture


1
????????????????????????? (???????)
  • ?????????? ??????????
  • ??????????????????????????
  • ??????????????????????

2
?????????????????????????????????
  • ?? miss rate
  • ?? miss penalty
  • ?? hit time

3
Misses
  • Compulsory
  • ???????????????????????????
  • ?? miss ??????????????????????????????????????
  • A.k.a cold start misses ???? first reference
    misses.
  • Capacity
  • ????????? ????????????????????????????????????????
    ??????????????????
  • Fully Associative Size X Cache)
  • Conflict
  • ??????????????????????????????????????????????????
    ???????????????????????????
  • A.k.a. collision misses ???? interference misses.
  • N-way Associative, Size X Cache)

4
3Cs Absolute Miss Rate (SPEC92)
Conflict
Compulsory ???????
5
21 Cache Rule
miss rate 1-way associative cache size X
miss rate 2-way associative cache size X/2
Conflict
6
3Cs Relative Miss Rate
Conflict
7
Pop Quiz
  • 3 Cs Compulsory, Capacity, Conflict
  • ????????? ????????????????? C ??????????
  • ??????? Block Size
  • ??????? Associativity
  • ??????? Compiler

8
?????? miss
  • Larger Block Size
  • Higher Associativity
  • Victim Cache
  • Pseudo-Associativity
  • HW Prefetching Instruction, Data
  • SW Prefetching Data
  • Compiler Optimizations

9
1. Larger Block Size
10
2. Higher Associativity
  • 21 Cache Rule
  • Miss Rate ??? DM ???? N ? Miss Rate 2-way SA ????
    N/2
  • ???????????
  • ???????????????? Execution time ????????
  • Pop quiz
  • Clock cycle ????????????????
  • hit time for 2-way vs. 1-way
  • external cache 10
  • internal 2

11
???????? AMAT vs. Miss Rate
  • AMAT Average memory access time
  • Cache Size Associativity
  • (KB) 1-way 2-way 4-way 8-way
  • 1 2.33 2.15 2.07 2.01
  • 2 1.98 1.86 1.76 1.68
  • 4 1.72 1.67 1.61 1.53
  • 8 1.46 1.48 1.47 1.43
  • 16 1.29 1.32 1.32 1.32
  • 32 1.20 1.24 1.25 1.27
  • 64 1.14 1.20 1.21 1.23
  • 128 1.10 1.17 1.18 1.20
  • (Blue means A.M.A.T. not improved by more
    associativity)

12
3. Victim Cache
  • ?????????????????? fast hit time ???
    Direct-mapped ?????????? conflict misses
  • ???? Associative cache ??????????????????
    conflict ??? direct-mapped
  • 4-entry victim cache ???????? conflicts ???
    20-95 ?? 4 KB direct mapped
  • ????? Alpha, HP

13
4. Pseudo-Associativity
  • ?????????????????? fast hit time ???
    Direct-mapped ?????conflict misses ?????????
    2-way SA cache?
  • Hit ??????????
  • Miss ???????? pseudo cache
  • ????????????? pseudo hit ????????? miss
  • ??????????????pseudo miss ????????????????????
  • ??????
  • CPU pipeline ???????????????????????????????? 1
    ???? 2 cycle
  • ??????????? L2????? MIPS R1000, UltraSPARC

14
5. Hardware Prefetching
  • Instruction Prefetching
  • Alpha 21064 ??? miss ?? fetches 2 ?????
  • ???????????????????????????? stream buffer
  • ????? miss ???????? stream buffer
  • Data Prefetching
  • ?????? 4KB cache
  • 1 stream buffer ????? 25 misses
  • 4 streams ????? 43
  • ?????? 2 64KB, 4-way set associative caches
  • 8 streams ????? 50 to 70
  • Prefetching ????????????? extra memory bandwidth
    ????????????????????????????

15
6. Software Prefetching Data
  • Data ????????
  • Data Prefetch
  • Load data into register (HP PA-RISC loads)
  • Cache Prefetch
  • load into cache (MIPS IV, PowerPC, SPARC v. 9)
  • ????? Prefetch Instructions ???????????
  • Cost of prefetch issues lt Savings in reduced
    misses?

16
7. Compiler Optimizations
  • ???????? misses ??? 8KB direct mapped cache, 4
    byte blocks ??? 75 ???? software
  • Instructions
  • ?????????????????????? conflict
  • Data
  • Merging Arrays
  • Loop Interchange
  • Loop Fusion
  • Blocking

17
???????? Merging Arrays
  • / Before 2 sequential arrays /
  • int valSIZE
  • int keySIZE
  • / After 1 array of structures /
  • struct merge
  • int val
  • int key
  • struct merge merged_arraySIZE
  • ?? conflicts ??????? val key improve spatial
    locality

18
???????? Loop Interchange
  • / Before /
  • for (k 0 k lt 100 k k1)
  • for (j 0 j lt 100 j j1)
  • for (i 0 i lt 5000 i i1)
  • xij 2 xij
  • / After /
  • for (k 0 k lt 100 k k1)
  • for (i 0 i lt 5000 i i1)
  • for (j 0 j lt 100 j j1)
  • xij 2 xij
  • ??????????????????????????????????? 100 words

19
???????? Loop Fusion
  • / Before /
  • for (i 0 i lt N i i1)
  • for (j 0 j lt N j j1)
  • aij 1/bij cij
  • for (i 0 i lt N i i1)
  • for (j 0 j lt N j j1)
  • dij aij cij
  • / After /
  • for (i 0 i lt N i i1)
  • for (j 0 j lt N j j1)
  • aij 1/bij cij
  • dij aij cij
  • 2 misses ????????? a ??? c ?????????????? 1 miss
    ?????????? c ????????

20
???????? Blocking
  • / Before /
  • for (i 0 i lt N i i1)
  • for (j 0 j lt N j j1)
  • r 0
  • for (k 0 k lt N k k1)
  • r r yikzkj
  • xij r
  • Idea compute on many BxB sub-matrixes that fits

21
???????? Blocking
  • / After /
  • for (jj 0 jj lt N jj jjB)
  • for (kk 0 kk lt N kk kkB)
  • for (i 0 i lt N i i1)
  • for (j jj j lt min(jjB-1,N) j j1)
  • r 0
  • for (k kk k lt min(kkB-1,N) k k1)
  • r r yikzkj
  • xij xij r
  • B called Blocking Factor
  • Capacity Misses from 2N3 N2 to 2N3/B N2
  • POP QUIZ Conflict Misses Too?

22
?????? miss
  • ????????????? parameter ??????????????????????????
    ????????????????
  • Larger Block Size
  • Higher Associativity
  • Victim Cache
  • Pseudo-Associativity
  • HW Prefetching Instruction, Data
  • SW Prefetching Data
  • Compiler Optimizations

23
?????????????????????????????????
  • ?? miss rate
  • ?? miss penalty
  • ?? hit time

24
?????? Miss penalty
  • Read priority over write on miss
  • Subblock placement
  • Early Restart and Critical Word First on miss
  • Non-blocking Caches
  • Second Level Cache

25
1. Read Priority over Write on Miss
  • ??? Write through ???????? write buffers
  • ??????????? RAW conflicts ????????????????????????
    ?????? misses ?????
  • ???????????? Miss penalty ?????????????????
  • ??????????????? write buffer ?????????????
    ???????? conflict ??????????
  • Write Back?
  • ????????? Miss ???????????????? cache
    ???????????????????????????? (dirty block)
  • ???? ??????? dirty block ????????????????????????
    ???
  • ??????????? ??? dirty block ?????? write buffer
    ???????????? ????????????????
  • CPU ?? stall ???????????? ????????????????????
    ???? ?????

26
2. ?????????? Subblock
  • ?????????????????? load ?????????????? miss
  • ???? valid bits ??? subblock to ?????????????????
    valid

27
3. Early Restart and Critical Word First
  • ?????????????????????? ????? ???????????? cpu
    ????????
  • Early restart
  • ??????????????? word ??????????????? cpu ??? cpu
    ????????????????
  • Critical Word First
  • ?? word ???????????????????????????? ????? cpu
    ??????????????????
  • Also called wrapped fetch and requested word
    first
  • ??????????????????????

28
4. Non-blocking Caches
  • Non-blocking cache ???? lockup-free cache
  • ??????????????????????????????????(?????? hit)
    ?????????????????? miss ??? ??????????????????????
    ?????
  • ???????????? out-of-order execution
  • hit under miss
  • ??????? miss penalty ??????????????????? miss
  • hit under multiple miss ???? miss under miss
  • ?????????? miss penalty ??????????????
    overlapping ????????? multiple misses
  • ????????????????????????
  • ??????? multiple memory banks

29
5. Second Level Cache
  • L2 Equations
  • AMAT Hit TimeL1 Miss RateL1 x Miss
    PenaltyL1Miss PenaltyL1 Hit TimeL2 Miss
    RateL2 x Miss PenaltyL2
  • AMAT Hit TimeL1 Miss RateL1 x (Hit TimeL2
    Miss RateL2 Miss PenaltyL2)
  • Definitions
  • Local miss rate
  • Misses ???????????????????????????????????????????
    ????????????? (Miss rateL2)
  • Global miss rate
  • Misses ???????????????????????????????????????????
    ?? cpu ???????? (Miss RateL1 x Miss RateL2)
  • Global Miss Rate is what matters

30
??????????????? L2
  • Reducing Miss Rate
  • Larger Block Size
  • Higher Associativity
  • Victim Cache
  • Pseudo-Associativity
  • HW Prefetching Instruction, Data
  • SW Prefetching Data
  • Compiler Optimizations

31
POP QUIZ
  • ???????????? L3 ??????????????????????????????????
    ???
  • ????????? L1, L2, ??? L3 ???????????????????????
  • ??????????? Miss penalty ??????? L2 ??? L3
  • ??????????? Miss rate ??? L2 ??? L3
  • ???????? L3 ??????? ?????????

32
????????? L3
  • Miss penalty ????????? (????????????????)
  • ??? L2 ?????????????? L3 ?????????????????????????
    ??

33
???? ????? Miss Penalty
  • ?????? miss penalty
  • Read priority over write on miss
  • Subblock placement
  • Early Restart and Critical Word First on miss
  • Non-blocking Caches (Hit under Miss, Miss under
    Miss)
  • Second Level Cache

34
?????????????????????????????????
  • ?? miss rate
  • ?? miss penalty
  • ?? hit time

35
????? hit time
  • ?????????????????????????????????
  • ???????????????????????
  • Pipelining Writes

36
1. ?????????????????????????????????
  • Alpha 21164
  • 8KB Instruction cache
  • 8KB data cache
  • 96KB L2 cache (??????????? inst ??? data)
  • Direct Mapped, on chip

37
2. ???????????????? address (1)
  • ???virtual address ????? cache
  • ???????? Virtually Addressed Cache ???? Virtual
    Cache vs. Physical Cache
  • ?????????????????? process ???????????? flush
  • Cost of flush compulsory misses
  • ???????? virtual address ?????????????????????????
    ?? (aliases ???? synonyms)
  • ??????????????????? I/O ???????????????? virtual
    address

38
2. ???????????????? address (2)
  • Solution ?????? aliases
  • ??? HW ?????????????????????????????? physical
    address ????????
  • ??? SW ?????????????? ??? n ??????????????????????
    ?????? (page coloring)
  • Solution ?????? cache flush
  • ????? process identifier tag ??????????????
    process ?? ??????????????????? address ?? Process

39
Virtually Addressed Caches
CPU
CPU
CPU
VA
VA
VA
VA Tags

PA Tags
TB

TB
VA
PA
PA
L2
TB

MEM
PA
PA
MEM
MEM
Overlap access with VA translation requires
index to remain invariant across translation
Conventional Organization
Virtually Addressed Cache Translate only on
miss Synonym Problem
40
3. Pipelined Writes
  • ????????????????? tag ???????????????????????
  • ???????????????? n ??????????? tag
  • ???????????????? n-1 ????????????????

41
???? Cache Optimization
  • Technique MR MP HT Complexity
  • Larger Block Size 0Higher
    Associativity 1Victim Caches 2Pseudo-As
    sociative Caches 2HW Prefetching of
    Instr/Data 2Compiler Controlled
    Prefetching 3Compiler Reduce Misses 0
  • Priority to Read Misses 1Subblock Placement
    1Early Restart Critical Word 1st
    2Non-Blocking Caches 3Second Level
    Caches 2
  • Small Simple Caches 0Avoiding Address
    Translation 2Pipelining Writes 1

miss rate
miss penalty
hit time
Write a Comment
User Comments (0)
About PowerShow.com