Title: Workload Selection and Characterization
1Workload Selection and Characterization
- Andy Wang
- CIS 5930-03
- Computer Systems
- Performance Analysis
2Workloads
- Types of workloads
- Workload selection
3Types of Workloads
- What is a Workload?
- Instruction Workloads
- Synthetic Workloads
- Real-World Benchmarks
- Application Benchmarks
- Standard Benchmarks
- Exercisers and Drivers
4What is a Workload?
- Workload anything a computer is asked to do
- Test workload any workload used to analyze
performance - Real workload any observed during normal
operations - Synthetic workload created for controlled testing
5Real Workloads
- Advantage represent reality
- Disadvantage uncontrolled
- Cant be repeated
- Cant be described simply
- Difficult to analyze
- Nevertheless, often useful for final analysis
papers - E.g., We ran system foo and it works well
6Synthetic Workloads
- Advantages
- Controllable
- Repeatable
- Portable to other systems
- Easily modified
- Disadvantage can never be sure real world will
be the same
7Instruction Workloads
- Useful only for CPU performance
- But teach useful lessons for other situations
- Development over decades
- Typical instruction (ADD)
- Instruction mix (by frequency of use)
- Sensitive to compiler, application, architecture
- Still used today (GFLOPS)
- Processor clock rate
- Only valid within processor family
8Instruction Workloads (contd)
- Modern complexity makes mixes invalid
- Pipelining
- Data/instruction caching
- Prefetching
- Kernel is inner loop that does useful work
- Sieve, matrix inversion, sort, etc.
- Ignores setup, I/O, so can be timed by analysis
if desired (at least in theory)
9Synthetic Workloads
- Complete programs
- Designed specifically for measurement
- May do real or fake work
- May be adjustable (parameterized)
- Two major classes
- Benchmarks
- Exercisers
10Real-World Benchmarks
- Pick a representative application
- Pick sample data
- Run it on system to be tested
- Modified Andrew Benchmark, MAB, is a real-world
benchmark - Easy to do, accurate for that sample data
- Fails to consider other applications, data
11Application Benchmarks
- Variation on real-world benchmarks
- Choose most important subset of functions
- Write benchmark to test those functions
- Tests what computer will be used for
- Need to be sure important characteristics arent
missed - Mix of functions must reflect reality
12Standard Benchmarks
- Often need to compare general-purpose computer
systems for general-purpose use - E.g., should I buy a Compaq or a Dell PC?
- Tougher Mac or PC?
- Desire for an easy, comprehensive answer
- People writing articles often need to compare
tens of machines
13Standard Benchmarks (contd)
- Often need to make comparisons over time
- Is this years PowerPC faster than last years
Pentium? - Probably yes, but by how much?
- Dont want to spend time writing own code
- Could be buggy or not representative
- Need to compare against other peoples results
- Standard benchmarks offer solution
14Popular Standard Benchmarks
- Sieve, 8 queens, etc.
- Whetstone
- Linpack
- Dhrystone
- Debit/credit
- TPC
- SPEC
- MAB
- Winstone, webstone, etc.
- ...
15Sieve, etc.
- Prime number sieve (Erastothenes)
- Nested for loops
- Often such small array that its silly
- 8 queens
- Recursive
- Many others
- Generally not representative of real problems
16Whetstone
- Dates way back (can compare against 70s)
- Based on real observed frequencies
- Entirely synthetic (no useful result)
- Modern optimizers may delete code
- Mixed data types, but best for floating
- Be careful of incomparable variants!
17LINPACK
- Based on real programs and data
- Developed by supercomputer users
- Great if youre doing serious numerical
computation
18Dhrystone
- Bad pun on Whetstone
- Motivated by Whetstones perceived excessive
emphasis on floating point - Dates to when ?ps were integer-only
- Very popular in PC world
- Again, watch out for version mismatches
19Debit/Credit Benchmark
- Developed for transaction processing environments
- CPU processing is usually trivial
- Remarkably demanding I/O, scheduling requirements
- Models real TPS workloads synthetically
- Modern version is TPC benchmark
20SPEC Suite
- Result of multi-manufacturer consortium
- Addresses flaws in existing benchmarks
- Uses 10 real applications, trying to characterize
specific real environments - Considers multiple CPUs
- Geometric mean gives SPECmark for system
- Becoming standard comparison method
21Modified Andrew Benchmark
- Used in research to compare file system,
operating system designs - Based on software engineering workload
- Exercises copying, compiling, linking
- Probably ill-designed, but common use makes it
important - Needs scaling up for modern systems
22Winstone, Webstone, etc.
- Stone has become suffix meaning benchmark
- Many specialized suites to test specialized
applications - Too many to review here
- Important to understand strengths drawbacks
- Bias toward certain workloads
- Assumptions about system under test
23Exercisers and Drivers
- For I/O, network, non-CPU measurements
- Generate a workload, feed to internal or external
measured system - I/O on local OS
- Network
- Sometimes uses dedicated system, interface
hardware
24Advantages of Exercisers
- Easy to develop, port
- Can incorporate measurement
- Easy to parameterize, adjust
25Disadvantagesof Exercisers
- High cost if external
- Often too small compared to real workloads
- Thus not representative
- E.g., may use caches incorrectly
- Internal exercisers often dont have real CPU
activity - Affects overlap of CPU and I/O
- Synchronization effects caused by loops
26Workload Selection
- Services Exercised
- Completeness
- Sample service characterization
- Level of Detail
- Representativeness
- Timeliness
- Other Considerations
27Services Exercised
- What services does system actually use?
- Faster CPU wont speed cp
- Network performance useless for matrix work
- What metrics measure these services?
- MIPS/GIPS for CPU speed
- Bandwidth/latency for network, I/O
- TPS for transaction processing
28Completeness
- Computer systems are complex
- Effect of interactions hard to predict
- So must be sure to test entire system
- Important to understand balance between
components - I.e., dont use 90 CPU mix to evaluate I/O-bound
application
29Component Testing
- Sometimes only individual components are compared
- Would a new CPU speed up our system?
- How does IPV6 affect Web server performance?
- But component may not be directly related to
performance - So be careful, do ANOVA, dont extrapolate too
much
30Service Testing
- May be possible to isolate interfaces to just one
component - E.g., instruction mix for CPU
- Consider services provided and used by that
component - System often has layers of services
- Can cut at any point and insert workload
31Characterizing a Service
- Identify service provided by major subsystem
- List factors affecting performance
- List metrics that quantify demands and
performance - Identify workload provided to that service
32Example Web Server
Web Page Visits
Web Client
TCP/IP Connections
Network
HTTP Requests
Web Server
Web Page Accesses
File System
Disk Transfers
Disk Drive
33Web Client Analysis
- Services visit page, follow hyperlink, display
page information - Factors page size, number of links, fonts
required, embedded graphics, sound - Metrics response time (both definitions)
- Workload a list of pages to be visited and links
to be followed
34Network Analysis
- Services connect to server, transmit request,
transfer data - Factors bandwidth, latency, protocol used
- Metrics connection setup time, response latency,
achieved bandwidth - Workload a series of connections to one or more
servers, with data transfer
35Web Server Analysis
- Services accept and validate connection, fetch
send HTTP data - Factors Network performance, CPU speed, system
load, disk subsystem performance - Metrics response time, connections served
- Workload a stream of incoming HTTP connections
and requests
36File System Analysis
- Services open file, read file (writing often
doesnt matter for Web server) - Factors disk drive characteristics, file system
software, cache size, partition size - Metrics response time, transfer rate
- Workload a series of file-transfer requests
37Disk Drive Analysis
- Services read sector, write sector
- Factors seek time, transfer rate
- Metrics response time
- Workload a statistically-generated stream of
read/write requests
38Level of Detail
- Detail trades off accuracy vs. cost
- Highest detail is complete trace
- Lowest is one request, usually most common
- Intermediate approach weight by frequency
- We will return to this when we discuss workload
characterization
39Representativeness
- Obviously, workload should represent desired
application - Arrival rate of requests
- Resource demands of each request
- Resource usage profile of workload over time
- Again, accuracy and cost trade off
- Need to understand whether detail matters
40Timeliness
- Usage patterns change over time
- File size grows to match disk size
- Web pages grow to match network bandwidth
- If using old workloads, must be sure user
behavior hasnt changed - Even worse, behavior may change after test, as
result of installing new system - Latent demand phenomenon
41Other Considerations
- Loading levels
- Full capacity
- Beyond capacity
- Actual usage
- External components not considered as parameters
- Repeatability of workload
42Workload Characterization
- Terminology
- Averaging
- Specifying dispersion
- Single-parameter histograms
- Multi-parameter histograms
- Principal-component analysis
- Markov models
- Clustering
43Workload Characterization Terminology
- User (maybe nonhuman) requests service
- Also called workload component or workload unit
- Workload parameters or workload features model or
characterize the workload
44SelectingWorkload Components
- Most important components should be external at
interface of SUT - Components should be homogeneous
- Should characterize activities of interest to the
study
45ChoosingWorkload Parameters
- Select parameters that depend only on workload
(not on SUT) - Prefer controllable parameters
- Omit parameters that have no effect on system,
even if important in real world
46Averaging
- Basic character of a parameter is its average
value - Not just arithmetic mean
- Good for uniform distributions or gross studies
47Specifying Dispersion
- Most parameters are non-uniform
- Specifying variance or standard deviation brings
major improvement over average - Average and s.d. (or C.O.V.) together allow
workloads to be grouped into classes - Still ignores exact distribution
48Single-Parameter Histograms
- Make histogram or kernel density estimate
- Fit probability distribution to shape of
histogram - Chapter 27 (not covered in course) lists many
useful shapes - Ignores multiple-parameter correlations
49Multi-Parameter Histograms
- Use 3-D plotting package to show 2 parameters
- Or plot each datum as 2-D point and look for
black spots - Shows correlations
- Allows identification of important parameters
- Not practical for 3 or more parameters
50Principal-Component Analysis (PCA)
- How to analyze more than 2 parameters?
- Could plot endless pairs
- Still might not show complex relationships
- Principal-component analysis solves problem
mathematically - Rotates parameter set to align with axes
- Sorts axes by importance
51Advantages of PCA
- Handles more than two parameters
- Insensitive to scale of original data
- Detects dispersion
- Combines correlated parameters into single
variable - Identifies variables by importance
52Disadvantages of PCA
- Tedious computation (if no software)
- Still requires hand analysis of final plotted
results - Often difficult to relate results back to
original parameters
53Markov Models
- Sometimes, distribution isnt enough
- Requests come in sequences
- Sequencing affects performance
- Example disk bottleneck
- Suppose jobs need 1 disk access per CPU slice
- CPU slice is much faster than disk
- Strict alternation uses CPU better
- Long disk access strings slow system
54Introduction toMarkov Models
- Represent model as state diagram
- Probabilistic transitions between states
- Requests generated on transitions
55Creating a Markov Model
- Observe long string of activity
- Use matrix to count pairs of states
- Normalize rows to sum to 1.0
56Example Markov Model
- Reference string of opens, reads,
closesORORRCOORCRRRRCC - Pairwise frequency matrix
57Markov Modelfor I/O String
- Divide each row by its sum to get transition
matrix - Model
Read
0.50
0.75
0.33
0.13
0.37
Open
Close
0.34
0.25
0.33
58Clustering
- Often useful to break workload into categories
- Canonical example of each category can be used
to represent all samples - If many samples, generating categories is
difficult - Solution clustering algorithms
59Steps in Clustering
- Select sample
- Choose and transform parameters
- Drop outliers
- Scale observations
- Choose distance measure
- Do clustering
- Use results to adjust parameters, repeat
- Choose representative components
60Selecting A Sample
- Clustering algorithms are often slow
- Must use subset of all observations
- Can test sample after clustering does every
observation fit into some cluster? - Sampling options
- Random
- Heaviest users of component under study
61Choosing and Transforming Parameters
- Goal is to limit complexity of problem
- Concentrate on parameters with high impact, high
variance - Use principal-component analysis
- Drop a parameter, re-cluster, see if different
- Consider transformations such as Sec. 15.4
(logarithms, etc.)
62Dropping Outliers
- Must get rid of observations that would skew
results - Need great judgment here
- No firm guidelines
- Drop things that you know are unusual
- Keep things that consume major resources
- E.g., daily backups
63Scale Observations
- Cluster analysis is often sensitive to parameter
ranges, so scaling affects results - Options
- Scale to zero mean and unit variance
- Weight based on importance or variance
- Normalize range to 0, 1
- Normalize 95 of data to 0, 1
64Choosinga Distance Measure
- Endless possibilities available
- Represent observations as vectors in k-space
- Popular measures include
- Euclidean distance, weighted or unweighted
- Chi-squared distance
- Rectangular distance
65Clustering Methods
- Many algorithms available
- Computationally expensive (NP to find optimum)
- Can be simple or hierarchical
- Many require you to specify number of desired
clusters - Minimum Spanning Tree is not only option!
66Minimum Spanning Tree Clustering
- Start with each point in a cluster
- Repeat until single cluster
- Compute centroid of each cluster
- Compute intercluster distances
- Find smallest distance
- Merge clusters with smallest distance
- Method produces stable results
- But not necessarily optimum
67K-Means Clustering
- One of most popular methods
- Number of clusters is input parameter
- First randomly assign points to clusters
- Repeat until no change
- Calculate center of each cluster
- Assign each point to cluster with nearest center
68Interpreting Clusters
- Art, not science
- Drop small clusters (if little impact on
performance) - Try to find meaningful characterizations
- Choose representative components
- Number proportional to cluster size or to total
resource demands
69Drawbacks of Clustering
- Clustering is basically AI problem
- Humans will often see patterns where computer
sees none - Result is extremely sensitive to
- Choice of algorithm
- Parameters of algorithm
- Minor variations in points clustered
- Results may not have functional meaning
70White Slide