Code Transformations to Improve Memory Parallelism - PowerPoint PPT Presentation

About This Presentation
Title:

Code Transformations to Improve Memory Parallelism

Description:

Code Transformations to Improve Memory Parallelism Vijay S. Pai and Sarita Adve MICRO-32, 1999 Motivation and Solutions Memory system is the bottleneck in ILP-based ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 7
Provided by: JayHoo3
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: Code Transformations to Improve Memory Parallelism


1
Code Transformations to Improve Memory Parallelism
  • Vijay S. Pai and Sarita Adve
  • MICRO-32, 1999

2
Motivation and Solutions
  • Memory system is the bottleneck in ILP-based
    system
  • Solution overlap multiple read misses (the
    dominant source of memory stalls) within the same
    instruction window, while preserving cache
    locality
  • Lack of enough independent load misses in a
    single instruction window
  • Solution read miss clustering enabled by code
    transformations, eg. unroll-and-jam
  • Automate code transformation
  • Solution mapping memory parallelism problem to
    floating-point pipelining (D. Callahan et al.
    Estimating Interlock and Improving Balance for
    Pipelined Machines. Journal of Parallel and
    Distributed Computing, Aug. 1988)

3
  • Unroll-and-jam

4
  • Apply code transformations in a compiler
  • Automatic unroll-and-jam transformation
  • Locality analysis to determine leading references
    (M. E. Wolf and M. S. Lam. A Data Locality
    Optimizing Algorithm. PLDI 1991)
  • Dependence analysis of limit memory parallelism
  • Cache-line dependences
  • Address dependences
  • Window constraints
  • Experimental methodology
  • Environment Rice Simulator for ILP
    Multiprocessors
  • Workload Latbench,five scientific applications
  • Incorporate miss clustering by hand
  • Results
  • 9-39 reduction in multiprocessor execution time
  • 11-48 reduction in uniprocessor execution time

5
  • Strengths
  • Good performance
  • Weaknesses
  • Transformations is lack of validity

6
  • Questions to discuss
  • What hardware supports are needed to overlap
    multiple read misses?
  • Why use unroll-and-jam instead of strip-mine and
    interchange code transformation?
  • How do you think of the future work?
  • V. S. Pai and S. Adve. Improving Software
    Prefetching with Transformations to Increase
    Memory Parallelism. http//www.ece.rice.edu/rsim/
    pubs/TR9910.ps
Write a Comment
User Comments (0)
About PowerShow.com