Workload Characteristics and Representative Workloads - PowerPoint PPT Presentation

About This Presentation
Title:

Workload Characteristics and Representative Workloads

Description:

When we want to collect profiles to be used in the design of a next ... Toy benchmarks: tower of hanoi, qsort, fibo. Synthetic benchmarks: dhrystone, whetstone ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 25
Provided by: davidr8
Category:

less

Transcript and Presenter's Notes

Title: Workload Characteristics and Representative Workloads


1
Workload Characteristicsand Representative
Workloads
  • David Kaeli
  • Department of Electrical and Computer Engineering
  • Northeastern University
  • Boston, MA
  • kaeli_at_ece.neu.edu

2
Overview
  • When we want to collect profiles to be used in
    the design of a next-generation computing system,
    we need to be very careful that we capture a
    representative sample in our profile
  • Workload characteristics allow us to better
    understand the content of the samples which we
    collect
  • We need to select programs to study which
    represent the class of applications we will
    eventually run on the target system

3
Examples of Workload Characteristics
  • Instruction mix
  • Static vs. dynamic instruction count
  • Working sets
  • Control flow behavior
  • Working set behavior
  • Inter-reference gap model (temporal locality)
  • Database size
  • Address and value predictability
  • Application/library/OS breakdown
  • Heap vs. stack allocation

4
Benchmarks
  • Real or synthetic programs/applications used to
    exercise a hardware system, and generally
    representing a range of behaviors found in a
    particular class of applications
  • Benchmarks classes include
  • Toy benchmarks tower of hanoi, qsort, fibo
  • Synthetic benchmarks dhrystone, whetstone
  • Embedded EEMBC, UTDSP
  • CPU benchmarks SPECint/fp
  • Internet benchmark SPECjAppServ, SPECjvm
  • Commerical benchmark TPC, SAP, SPECjAppServ
  • Supercomputing Perfect Club, Splash, Livermore
    Loops

5
SPEC Benchmarks Presentation
6
Is there a way to reduce the runtime of SPEC
while maintaining representativeness?
  • MinneSPEC (U. of Minn) Using gprof statistics
    about the runtime of SPEC and various
    Simplescalar simulation results (I-mix, cache
    misses, etc), we can capture statistically
    similar, though significantly shorter, runs of
    the programs
  • Provides three input sets will run in
  • A few minutes
  • A few hours
  • A few days
  • IEEE Computer Arch Letters paper

7
How many programs do we need?
  • Does the set of application capture enough
    variation to be representative of the entire
    class of workload?
  • Should we consider using multiple benchmark
    suites to factor out similarities in programming
    styles?
  • Can we utilize workload characteristics to
    identify the particular programs of interest?

8
Example of capturing program behavior
Quantifying Behavioral Differences Between C
and C Programs Calder, Grunwald and Zorn, 1995
  • C is a programming language growing in
    popularity
  • We need to design tomorrows computer
    architectures based on tomorrows software
    paradigms
  • How do workload characteristics changes as we
    move to new programming paradigms?

9
Example of capturing program behavior
Quantifying Behavioral Differences Between C
and C Programs Calder, Grunwald and Zorn, 1995
  • First problem Find a set of representative
    programs from both FO and OO domains
  • Difficult for OO in 1995
  • Differences between programming models
  • OO relies heavily on messages and methods
  • Data locality will change due to the bundling
    together of data structures in Objects
  • Size of functions will be reduced in OO
    Polymorphism allows for indirect function
    invocation and runtime method selection
  • OO programs will manage a larger number of
    dynamically allocated objects

10
Address and Value Profiling
  • Lipasti observed that profiled instructions tend
    to repeat their behavior
  • Many addresses are nearly constant
  • Many values do not change between instruction
    execution
  • Can we use profiles to better understand some of
    these behaviors, and the until this knowledge to
    optimize execution?

11
Address Profiling
  • If an address remains unchanged, can we issue
    loads and store early (similar to prefetching)?
  • Do we even have to issue the load or store if we
    have not modified memory?
  • What are the implications if indirect addressing
    is used?
  • Can we detect patterns (i.e., strides) in the
    address values?
  • Can we do anything smart when we detect pointer
    chasing??

12
Data Value Profiling
  • When we see that particular data values do not
    change, how can we take advantage of this?
  • Lipasti noticed that a large percentage of store
    instructions overwrite memory with the value
    already stored there
  • Can we avoid computing new results if we notice
    that our input operand have not changed?
  • What can we do if we a particular operand only
    takes on a small set of values?

13
Parameter Value Profiling
  • Profile the parameter values passed to functions
  • If these parameters are predictable, we can
    exploit this fact during compilation
  • We can study this on an individual function basis
    or a call site basis
  • Compiler optimizations such as code
    specialization and function cloning can be used

14
Parameter Value Profiling
  • We have profiled a set of MS Windows NT 4.0
    desktop applications
  • Word97
  • Foxpro 6.0
  • SQLserver 7.0
  • VC 6.0
  • Excel97
  • Powerpoint97
  • Access97
  • We measured the value predictability of parameter
    values for all non-pointer based parameters

15
Parameter Value Profiling
  • We look for the predictability of parameters
    using
  • Invariance 1 probability that the most frequent
    value is passed
  • Invariance 4 probability that one of the 4 most
    frequent values is passed
  • Parameter values are more predictable on a call
    site basis than on a function basis (e.g., for
    Word97, 8 of the functions pass highly
    predictable parms, where as when computed on
    individual call sites, over 16 of the call sites
    pass highly predictable parms)
  • Highly predictable means that on over 90 of the
    calls the same value is observed
  • We will discuss how to clone and specialize
    procedures when we discuss profile guided data
    transformations

16
How can we reduce the runtime of a single
benchmark and still maintain accuracy?
  • Simpoint attempt to collect a set of trace
    samples that best represents the whole execution
    of the program
  • Identifies phase behavior in programs
  • Considers a metric that captures the differences
    between two samples
  • Computes the difference between these two
    intervals
  • Selects the interval that is closest to all other
    intervals

17
Simpoint (Calder ASPLOS 2002)
  • Utilize basic block frequencies to build basic
    block vectors (bbf0, bbf1.bbfn-1)
  • Each frequency is weighted by its length
  • Entire vector normalized by dividing by total
    number of basic blocks executed
  • Take fixed-length samples (100M instructions)
  • Compare BBVs using
  • Euclidean Distance
  • ED(a, b) sqrt(sum(i-gt1,n) (ai-bi)2)
  • Manhattan Distance
  • MD(a, b) sum(i-gt1,n)(ai-bi)

18
Simpoint (Calder ASPLOS 2002)
  • Manhattan Distance is used to build a similarity
    matrix
  • N x N matrix, where N is the number of sampling
    intervals in the program
  • Element SM(x, y) is the Manhattan Distance
    between two 100M element BBV at sample offsets x
    and y
  • Plot the Similarity Matrix as an upper triangle

19
Simpoint (Calder ASPLOS 2002)
  • Basic Block Vectors can adequately capture the
    necessary representative characteristics of a
    program
  • Distance metrics can help to identify the most
    representative samples
  • Cluster analysis (k-means) can improve
    representativeness by selecting multiple samples

20
Simpoint (Calder ISPASS 2004)
  • Newer work on Simpoint considers using register
    def-use behavior on an interval basis
  • Also, tracking of loop behavior and procedure
    calls frequencies provides similar accuracy as
    using basis block vectors

21
Simpoint (Calder ASPLOS 2002)
  • Algorithm overview
  • Profile program by dividing into fixed sized
    intervals (e.g., 1M, 10M, 100M insts)
  • Collect frequency vectors (e.g., BBVs, def-use,
    etc.) compute normalized frequencies
  • Run k-means clustering algorithm to divide the
    set of intervals into k partitions/sets, for
    values of k from 1 to K
  • Compute a goodness-of-fit of the data for each
    value of k
  • Select the clustering the reduces small k and
    provides a reasonable goodness-of-fit result
  • The result is a selection of representative
    simulation points that best fit the entire
    application execution

22
Simpoint Paper
23
Improvements to Simpoints (KDD05, ISPASS06)
  • Utilize a Mixture of Multinomials instead of
    K-means
  • Assumes data is generated by a mixture of
    K-component density functions
  • We utilize Expectation-Maximization (EM) to find
    a local maximum likelihood for the parameters of
    the density function iterate on E and M steps
    until convergence
  • The number of clusters is selected using the
    Bayesian Information Criteria (BIC) approach to
    judge goodness of fit
  • A multinomial clustering model for fast
    simulation of computer architecture designs, K.
    Sanghai et al., Proc. of KDD 2005, Chicago, IL.,
    pp. 808-813.

24
Summary
  • Before you begin studying a new architecture,
    have a clear understanding of the target
    workloads for this system
  • Perform a significant amount of workload
    characterization before you begin profiling work
  • Benchmarks are very useful tools, but must be
    used properly to obtain meaningful results
  • Value profiling is a rich area for future
    research
  • Simpoints can be used to reduce the runtime of
    simulation and still maintain simulation fidelity
Write a Comment
User Comments (0)
About PowerShow.com