Workload Characteristics and Representative Workloads

About This Presentation

Title:

Workload Characteristics and Representative Workloads

Description:

When we want to collect profiles to be used in the design of a next ... Toy benchmarks: tower of hanoi, qsort, fibo. Synthetic benchmarks: dhrystone, whetstone ... – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 25

Provided by: davidr8

Learn more at: https://studies.ac.upc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Workload Characteristics and Representative Workloads

1
Workload Characteristicsand Representative
Workloads

David Kaeli
Department of Electrical and Computer Engineering
Northeastern University
Boston, MA
kaeli_at_ece.neu.edu

2
Overview

When we want to collect profiles to be used in
the design of a next-generation computing system,
we need to be very careful that we capture a
representative sample in our profile
Workload characteristics allow us to better
understand the content of the samples which we
collect
We need to select programs to study which
represent the class of applications we will
eventually run on the target system

3
Examples of Workload Characteristics

Instruction mix
Static vs. dynamic instruction count
Working sets
Control flow behavior
Working set behavior
Inter-reference gap model (temporal locality)
Database size
Address and value predictability
Application/library/OS breakdown
Heap vs. stack allocation

4
Benchmarks

Real or synthetic programs/applications used to
exercise a hardware system, and generally
representing a range of behaviors found in a
particular class of applications
Benchmarks classes include
Toy benchmarks tower of hanoi, qsort, fibo
Synthetic benchmarks dhrystone, whetstone
Embedded EEMBC, UTDSP
CPU benchmarks SPECint/fp
Internet benchmark SPECjAppServ, SPECjvm
Commerical benchmark TPC, SAP, SPECjAppServ
Supercomputing Perfect Club, Splash, Livermore
Loops

5
SPEC Benchmarks Presentation
6
Is there a way to reduce the runtime of SPEC
while maintaining representativeness?

MinneSPEC (U. of Minn) Using gprof statistics
about the runtime of SPEC and various
Simplescalar simulation results (I-mix, cache
misses, etc), we can capture statistically
similar, though significantly shorter, runs of
the programs
Provides three input sets will run in
A few minutes
A few hours
A few days
IEEE Computer Arch Letters paper

7
How many programs do we need?

Does the set of application capture enough
variation to be representative of the entire
class of workload?
Should we consider using multiple benchmark
suites to factor out similarities in programming
styles?
Can we utilize workload characteristics to
identify the particular programs of interest?

8
Example of capturing program behavior
Quantifying Behavioral Differences Between C
and C Programs Calder, Grunwald and Zorn, 1995

C is a programming language growing in
popularity
We need to design tomorrows computer
architectures based on tomorrows software
paradigms
How do workload characteristics changes as we
move to new programming paradigms?

9
Example of capturing program behavior
Quantifying Behavioral Differences Between C
and C Programs Calder, Grunwald and Zorn, 1995

First problem Find a set of representative
programs from both FO and OO domains
Difficult for OO in 1995
Differences between programming models
OO relies heavily on messages and methods
Data locality will change due to the bundling
together of data structures in Objects
Size of functions will be reduced in OO
Polymorphism allows for indirect function
invocation and runtime method selection
OO programs will manage a larger number of
dynamically allocated objects

10
Address and Value Profiling

Lipasti observed that profiled instructions tend
to repeat their behavior
Many addresses are nearly constant
Many values do not change between instruction
execution
Can we use profiles to better understand some of
these behaviors, and the until this knowledge to
optimize execution?

11
Address Profiling

If an address remains unchanged, can we issue
loads and store early (similar to prefetching)?
Do we even have to issue the load or store if we
have not modified memory?
What are the implications if indirect addressing
is used?
Can we detect patterns (i.e., strides) in the
address values?
Can we do anything smart when we detect pointer
chasing??

12
Data Value Profiling

When we see that particular data values do not
change, how can we take advantage of this?
Lipasti noticed that a large percentage of store
instructions overwrite memory with the value
already stored there
Can we avoid computing new results if we notice
that our input operand have not changed?
What can we do if we a particular operand only
takes on a small set of values?

13
Parameter Value Profiling

Profile the parameter values passed to functions
If these parameters are predictable, we can
exploit this fact during compilation
We can study this on an individual function basis
or a call site basis
Compiler optimizations such as code
specialization and function cloning can be used

14
Parameter Value Profiling

We have profiled a set of MS Windows NT 4.0
desktop applications
Word97
Foxpro 6.0
SQLserver 7.0
VC 6.0
Excel97
Powerpoint97
Access97
We measured the value predictability of parameter
values for all non-pointer based parameters

15
Parameter Value Profiling

We look for the predictability of parameters
using
Invariance 1 probability that the most frequent
value is passed
Invariance 4 probability that one of the 4 most
frequent values is passed
Parameter values are more predictable on a call
site basis than on a function basis (e.g., for
Word97, 8 of the functions pass highly
predictable parms, where as when computed on
individual call sites, over 16 of the call sites
pass highly predictable parms)
Highly predictable means that on over 90 of the
calls the same value is observed
We will discuss how to clone and specialize
procedures when we discuss profile guided data
transformations

16
How can we reduce the runtime of a single
benchmark and still maintain accuracy?

Simpoint attempt to collect a set of trace
samples that best represents the whole execution
of the program
Identifies phase behavior in programs
Considers a metric that captures the differences
between two samples
Computes the difference between these two
intervals
Selects the interval that is closest to all other
intervals

17
Simpoint (Calder ASPLOS 2002)

Utilize basic block frequencies to build basic
block vectors (bbf0, bbf1.bbfn-1)
Each frequency is weighted by its length
Entire vector normalized by dividing by total
number of basic blocks executed
Take fixed-length samples (100M instructions)
Compare BBVs using
Euclidean Distance
ED(a, b) sqrt(sum(i-gt1,n) (ai-bi)2)
Manhattan Distance
MD(a, b) sum(i-gt1,n)(ai-bi)

18
Simpoint (Calder ASPLOS 2002)

Manhattan Distance is used to build a similarity
matrix
N x N matrix, where N is the number of sampling
intervals in the program
Element SM(x, y) is the Manhattan Distance
between two 100M element BBV at sample offsets x
and y
Plot the Similarity Matrix as an upper triangle

19
Simpoint (Calder ASPLOS 2002)

Basic Block Vectors can adequately capture the
necessary representative characteristics of a
program
Distance metrics can help to identify the most
representative samples
Cluster analysis (k-means) can improve
representativeness by selecting multiple samples

20
Simpoint (Calder ISPASS 2004)

Newer work on Simpoint considers using register
def-use behavior on an interval basis
Also, tracking of loop behavior and procedure
calls frequencies provides similar accuracy as
using basis block vectors

21
Simpoint (Calder ASPLOS 2002)

Algorithm overview
Profile program by dividing into fixed sized
intervals (e.g., 1M, 10M, 100M insts)
Collect frequency vectors (e.g., BBVs, def-use,
etc.) compute normalized frequencies
Run k-means clustering algorithm to divide the
set of intervals into k partitions/sets, for
values of k from 1 to K
Compute a goodness-of-fit of the data for each
value of k
Select the clustering the reduces small k and
provides a reasonable goodness-of-fit result
The result is a selection of representative
simulation points that best fit the entire
application execution

22
Simpoint Paper
23
Improvements to Simpoints (KDD05, ISPASS06)

Utilize a Mixture of Multinomials instead of
K-means
Assumes data is generated by a mixture of
K-component density functions
We utilize Expectation-Maximization (EM) to find
a local maximum likelihood for the parameters of
the density function iterate on E and M steps
until convergence
The number of clusters is selected using the
Bayesian Information Criteria (BIC) approach to
judge goodness of fit
A multinomial clustering model for fast
simulation of computer architecture designs, K.
Sanghai et al., Proc. of KDD 2005, Chicago, IL.,
pp. 808-813.

24
Summary

Before you begin studying a new architecture,
have a clear understanding of the target
workloads for this system
Perform a significant amount of workload
characterization before you begin profiling work
Benchmarks are very useful tools, but must be
used properly to obtain meaningful results
Value profiling is a rich area for future
research
Simpoints can be used to reduce the runtime of
simulation and still maintain simulation fidelity

Write a Comment

User Comments (0)

About PowerShow.com

Workload Characteristics and Representative Workloads - PowerPoint PPT Presentation

Workload Characteristics and Representative Workloads

When we want to collect profiles to be used in the design of a next ... Toy benchmarks: tower of hanoi, qsort, fibo. Synthetic benchmarks: dhrystone, whetstone ... – PowerPoint PPT presentation