OR682Math685CSI700 - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

OR682Math685CSI700

Description:

data flow analysis (identify parallelism) ... exploit parallelism. For Next Class. Homework: see web site. Reading: Dowd: chapters 6, 7, and 8 ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 25
Provided by: stephe129
Category:

less

Transcript and Presenter's Notes

Title: OR682Math685CSI700


1
OR682/Math685/CSI700
  • Lecture 13
  • Fall 2000

2
Todays Topics
  • Memory
  • Chapter 3
  • Floating-Point Numbers not in lecture
  • Chapter 4
  • Much of this material discussed earlier
  • Compilers
  • Chapter 5

3
Memory
  • Memory access can slow computations
  • slower memory units
  • limited connections to memory
  • hierarchies of memory
  • Understanding memory can help you write better
    software (and help you diagnose performance
    problems)

4
Memory Technology
  • DRAM (needs dynamic refreshing)
  • lower cost
  • less space on chip
  • less power and heat
  • slower
  • SRAM (static no refreshing)
  • higher speed, cost, heat, power, and space

5
Access Time
  • Time to read/write a memory location
  • Cycle time time for a repeat reference
  • may be longer (memory needs to recover)
  • may be shorter (pipelining, multiple memory
    banks, etc.)
  • Different levels of memory have different speeds
  • registers, cache, main memory, virtual memory

6
Registers
  • Registers part of main processor (and as fast)
  • Very limited storage (perhaps a few dozen
    locations)
  • Compilers must decide how to use registers
    effectively
  • re-use of numbers without storing to memory

7
Cache
  • Small amounts of (fast) SRAM
  • Layers of cache (on-chip and off-chip)
  • Want processor and compiler to guess correctly
    which memory locations to put in cache
  • hit rate percentage of memory references that
    are satisfied by cache
  • Cache memory must be kept consistent
  • Cache divided into lines to reduce effort of
    tracking locations in memory

Fortran files ex37.f, ex37a.f
8
Timing Output
  • time f1
  • 0.14u 0.14s 000 43 047k 76io 0pf0w
  • u user time (seconds)
  • s system time
  • elapsed time
  • CPU time as of elapsed time
  • k text plus data space
  • io of blocks (read/write)
  • pf page faults, w of process swaps

9
Cache Misses
  • Organization of
  • SUM SUM A(I,J)
  • Traversing a linked list
  • Indirect addressing
  • SUM SUM A(IND(J))
  • Thrashing a lot of cache misses (and slow
    performance)

10
Cache Organization
  • Direct-mapped cache
  • Each cache location maps onto a fixed set of
    memory locations (only one of which is stored in
    the cache)
  • problems if the program needs to access two such
    locations simultaneously
  • Fully associative cache
  • any memory location can map into any cache line
  • remove least recently used line as necessary

11
Virtual Memory
  • Separate physical space (actual memory) from
    address space (memory that can be referenced)
  • Extra memory on auxiliary devices (e.g., disks)
  • Memory is often divided into units called pages
    (about 1Mbyte, say)

12
Virtual Memory (cont.)
  • Page table translate from address space to
    physical space (and yet another cache to watch
    for repeated usage)
  • Page fault the desired page is on an auxiliary
    device
  • computer/compiler must manage memory
  • depends on number of users
  • can seriously degrade performance

13
What Can You Do?
  • Localized use of memory
  • Program with BLAS (see Lecture 3)

Fortran files ex38.f, ex38a.f,
http//www.netlib.org/lapack/index.html
14
What a Compiler Does
  • Focus optimization of machine instructions
  • Convert high-level language into fastest possible
    machine language (in an accurate way)
  • Essential for getting good performance from
    complicated processors and memory
  • May require human guidance

15
Choice of Language
  • C, C
  • pointers (unpredictable before run time)
  • harder to optimize
  • Fortran
  • older, standardized Fortran (Fortran 77)
  • Fortran 90, etc.
  • better data structures, but

16
Compilation Process
  • Preprocessing (including files, text
    substitutions, etc.)
  • Lexical analysis (identify variables, constants,
    comments, )
  • Parsing (analyze syntax) into intermediate
    language suitable for optimization
  • Optimization (one or more passes)
  • Translation to assembly language

17
Levels of Optimization
  • Often an option at compile time
  • basic optimizations described below
  • interprocedural (across subroutines)
  • runtime profile
  • floating-point optimizations (e.g., algebraic
    transformations)
  • data flow analysis (identify parallelism)
  • advanced (vectorization, parallelization, data
    decomposition)

18
Basic Optimization
  • Copy propagation
  • Change
  • X Y
  • Z 1.0 X
  • to
  • X Y
  • Z 1.0 Y
  • so statements can execute simultaneously

19
Constant Folding
  • Look for disguised constants
  • INTEGER I,K
  • PARAMETER (I100)
  • K 200
  • J I K

20
Strength Reduction
  • Change
  • Y X2
  • J K2
  • to
  • Y XX
  • J KK
  • for faster execution

21
Loop Invariants
  • Change
  • DO I1,N
  • A(I)B(I)CD
  • EG(K)
  • ENDDO
  • to
  • TEMPCD
  • DO I1,N
  • A(I)B(I)TEMP
  • ENDDO
  • EG(K)

22
Other Techniques
  • Dead code removal
  • between procedures?
  • Variable renaming (when variables are re-used)
  • to allow more flexibility in register, memory use
  • Common sub-expressions
  • D C (A B)
  • E (A B)/2

23
Object Code Generation
  • Specific to the target computer processor
  • Compiler may have to
  • manage use of registers
  • schedule instructions
  • distribute resources
  • balance the instruction mix
  • exploit fast operations (e.g., for loop
    indices)
  • exploit parallelism

24
For Next Class
  • Homework see web site
  • Reading
  • Dowd chapters 6, 7, and 8
Write a Comment
User Comments (0)
About PowerShow.com