Reconfigurable Caches and their Application to Media Processing - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Reconfigurable Caches and their Application to Media Processing

Description:

Different workloads on general ... Challenge for future general-purpose systems. Use most transistors effectively ... CACTI analytical timing model for cache ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 27
Provided by: parthasara
Category:

less

Transcript and Presenter's Notes

Title: Reconfigurable Caches and their Application to Media Processing


1
Reconfigurable Caches and their Application to
Media Processing
  • Parthasarathy (Partha) Ranganathan
  • Dept. of Electrical and Computer Engineering
  • Rice University
  • Houston, Texas

Sarita Adve Dept. of Computer Science University
of Illinois at Urbana Champaign Urbana, Illinois

Norman P. Jouppi Western Research
Laboratory Compaq Computer Corporation Palo Alto,
California
2
Motivation (1 of 2)
  • Different workloads on general-purpose processors
  • Scientific/engineering, databases, media
    processing,
  • Widely different characteristics
  • Challenge for future general-purpose systems
  • Use most transistors effectively for all workloads

3
Motivation (2 of 2)
  • Challenge for future general-purpose systems
  • Use most transistors effectively for all
    workloads
  • 50 to 80 of processor transistors devoted to
    cache
  • Very effective for engineering and database
    workloads
  • BUT large caches often ineffective for media
    workloads
  • Streaming data and large working sets ISCA
    1999
  • Can we reuse cache transistors for other
    useful work?

4
Contributions
  • Reconfigurable Caches
  • Flexibility to reuse cache SRAM for other
    activities
  • Several applications possible
  • Simple organization and design changes
  • Small impact on cache access time

5
Contributions
  • Reconfigurable Caches
  • Flexibility to reuse cache SRAM for other
    activities
  • Several applications possible
  • Simple organization and design changes
  • Small impact on cache access time
  • Application for media processing
  • e.g., instruction reuse reuse memory for
    computation
  • 1.04X to 1.20X performance improvement

6
Outline for Talk
  • Motivation
  • Reconfigurable caches
  • Key idea
  • Organization
  • Implementation and timing analysis
  • Application for media processing
  • Summary and future work

7
Reconfigurable Caches Key Idea
Key idea reuse cache transistors!
  • Dynamically divide SRAM into multiple partitions
  • Use partitions for other useful activities

? Cache SRAM useful for both conventional and
media workloads
8
Reconfigurable Cache Uses
  • Number of different uses for reconfigurable
    caches
  • Optimizations using lookup tables to store
    patterns
  • Instruction reuse, value prediction, address
    prediction,
  • Hardware and software prefetching
  • Caching of prefetched lines
  • Software-controlled memory
  • QoS guarantees, scratch memory area

? Cache SRAM useful for both conventional and
media workloads
9
Key Challenges
  • How to partition SRAM?
  • How to address the different partitions as they
    change?
  • Minimize impact on cache access (clock cycle) time
  • Associativity-based partitioning

10
Conventional Cache Organization
11
Associativity-Based Partitioning
Partition at granularity of ways Multiple data
paths and additional state/logic
12
Reconfigurable Cache Organization
  • Associativity-based partitioning
  • Simple - small changes to conventional caches
  • But and granularity of partitions depends on
    associativity
  • Alternate approach Overlapped-wide-tag
    partitioning
  • More general, but slightly more complex
  • Details in paper

13
Other Organizational Choices (1 of 2)
  • Ensuring consistency of data at repartitioning
  • Cache scrubbing flush data at repartitioning
    intervals
  • Lazy transitioning Augment state with partition
    information
  • Addressing of partitions - software (ISA) vs.
    hardware

14
Other Organizational Choices (2 of 2)
  • Method of partitioning - hardware vs. software
    control
  • Frequency of partitioning - frequent vs.
    infrequent
  • Level of partitioning - L1, L2, or lower levels
  • Tradeoffs based on application requirements

15
Outline for Talk
  • Motivation
  • Reconfigurable caches
  • Key idea
  • Organization
  • Implementation and timing analysis
  • Application for media processing
  • Summary and future work

16
Conventional Cache Implementation
ADDRESS
DATA ARRAY
TAG ARRAY
BIT LINES
WORD LINES
DECODERS
COLUMN MUXES
SENSE AMPS
COMPARATORS
MUX DRIVERS
DATA
OUTPUT DRIVER
OUTPUT DRIVERS
VALID OUTPUT
  • Tag and data arrays split into multiple
    sub-arrays
  • to reduce/balance length of word lines and bit
    lines

17
Changes for Reconfigurable Cache
ADDRESS
1NP
DATA ARRAY
BIT LINES
TAG ARRAY
WORD LINES
DECODERS
COLUMN MUXES
SENSE AMPS
COMPARATORS
MUX DRIVERS
1NP
DATA
OUTPUT DRIVER
OUTPUT DRIVERS
VALID OUTPUT
1NP
  • Associate sub-arrays with partitions
  • Constraint on minimum number of sub-arrays
  • Additional multiplexors, drivers, and wiring

18
Impact on Cache Access Time
  • Sub-array-based partitioning
  • Multiple simultaneous accesses to SRAM array
  • No additional data ports
  • Timing analysis methodology
  • CACTI analytical timing model for cache time
    (Compaq WRL)
  • Extended to model reconfigurable caches
  • Experiments varying cache sizes, partitions,
    technology,

19
Impact on Cache Access Time
  • Cache access time
  • Comparable to base (within 1-4) for few
    partitions (2)
  • Higher for more partitions, especially with small
    caches
  • But still within 6 for large caches
  • Impact on clock frequency likely to be even lower

20
Outline for Talk
  • Motivation
  • Reconfigurable caches
  • Application for media processing
  • Instruction reuse with media processing
  • Simulation results
  • Summary and future work

21
Application for Media Processing
  • Instruction reuse/memoization Sodani and Sohi,
    ISCA 1997
  • Exploits value redundancy in programs
  • Store instruction operands and result in reuse
    buffer
  • If later instruction and operands match in reuse
    buffer,
  • skip execution
  • read answer from reuse buffer

cache partition
cache partition
cache partition
Few changes for implementation with
reconfigurable caches
22
Simulation Methodology
  • Detailed simulation using RSIM (Rice)
  • User-level execution-driven simulator
  • Media processing benchmarks
  • JPEG image encoding/decoding
  • MPEG video encoding/decoding
  • GSM speech decoding and MPEG audio decoding
  • Speech recognition and synthesis

23
System Parameters
  • Modern general-purpose processor with ILPmedia
    extensions
  • 1 GHz, 8-way issue, OOO, VIS, prefetching
  • Multi-level memory hierarchy
  • 128KB 4-way associative 2-cycle L1 data cache
  • 1M 4-way associative 20-cycle L2 cache
  • Simple reconfigurable cache organization
  • 2 partitions at L1 data cache
  • 64 KB data cache, 64KB instruction reuse buffer
  • Partitioning at start of application in software

24
Impact of Instruction Reuse
100
100
100
92
89
84
JPEG decode
MPEG decode
Speech synthesis
  • Performance improvements for all applications
    (1.04X to 1.20X)
  • Use memory to reduce compute bottleneck
  • Greater potential with aggressive design details
    in paper

25
Summary
  • Goal Use cache transistors effectively for all
    workloads
  • Reconfigurable Caches Flexibility to reuse cache
    SRAM
  • Simple organization and design changes
  • Small impact on cache access time
  • Several applications possible
  • Instruction reuse - reuse memory for computation
  • 1.04X to 1.20X performance improvement
  • More aggressive reconfiguration currently under
    investigation

26
  • More information available at
  • http//www.ece.rice.edu/parthas
  • parthas_at_rice.edu
Write a Comment
User Comments (0)
About PowerShow.com