Workload Selection and Characterization - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

Workload Selection and Characterization

Description:

... to compare tens of machines. 13 'Standard' Benchmarks (cont'd) ... Chi-squared distance. Rectangular distance. 65. Clustering Methods. Many algorithms available ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 71
Provided by: geoffku
Learn more at: http://www.cs.fsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Workload Selection and Characterization


1
Workload Selection and Characterization
  • Andy Wang
  • CIS 5930-03
  • Computer Systems
  • Performance Analysis

2
Workloads
  • Types of workloads
  • Workload selection

3
Types of Workloads
  • What is a Workload?
  • Instruction Workloads
  • Synthetic Workloads
  • Real-World Benchmarks
  • Application Benchmarks
  • Standard Benchmarks
  • Exercisers and Drivers

4
What is a Workload?
  • Workload anything a computer is asked to do
  • Test workload any workload used to analyze
    performance
  • Real workload any observed during normal
    operations
  • Synthetic workload created for controlled testing

5
Real Workloads
  • Advantage represent reality
  • Disadvantage uncontrolled
  • Cant be repeated
  • Cant be described simply
  • Difficult to analyze
  • Nevertheless, often useful for final analysis
    papers
  • E.g., We ran system foo and it works well

6
Synthetic Workloads
  • Advantages
  • Controllable
  • Repeatable
  • Portable to other systems
  • Easily modified
  • Disadvantage can never be sure real world will
    be the same

7
Instruction Workloads
  • Useful only for CPU performance
  • But teach useful lessons for other situations
  • Development over decades
  • Typical instruction (ADD)
  • Instruction mix (by frequency of use)
  • Sensitive to compiler, application, architecture
  • Still used today (GFLOPS)
  • Processor clock rate
  • Only valid within processor family

8
Instruction Workloads (contd)
  • Modern complexity makes mixes invalid
  • Pipelining
  • Data/instruction caching
  • Prefetching
  • Kernel is inner loop that does useful work
  • Sieve, matrix inversion, sort, etc.
  • Ignores setup, I/O, so can be timed by analysis
    if desired (at least in theory)

9
Synthetic Workloads
  • Complete programs
  • Designed specifically for measurement
  • May do real or fake work
  • May be adjustable (parameterized)
  • Two major classes
  • Benchmarks
  • Exercisers

10
Real-World Benchmarks
  • Pick a representative application
  • Pick sample data
  • Run it on system to be tested
  • Modified Andrew Benchmark, MAB, is a real-world
    benchmark
  • Easy to do, accurate for that sample data
  • Fails to consider other applications, data

11
Application Benchmarks
  • Variation on real-world benchmarks
  • Choose most important subset of functions
  • Write benchmark to test those functions
  • Tests what computer will be used for
  • Need to be sure important characteristics arent
    missed
  • Mix of functions must reflect reality

12
Standard Benchmarks
  • Often need to compare general-purpose computer
    systems for general-purpose use
  • E.g., should I buy a Compaq or a Dell PC?
  • Tougher Mac or PC?
  • Desire for an easy, comprehensive answer
  • People writing articles often need to compare
    tens of machines

13
Standard Benchmarks (contd)
  • Often need to make comparisons over time
  • Is this years PowerPC faster than last years
    Pentium?
  • Probably yes, but by how much?
  • Dont want to spend time writing own code
  • Could be buggy or not representative
  • Need to compare against other peoples results
  • Standard benchmarks offer solution

14
Popular Standard Benchmarks
  • Sieve, 8 queens, etc.
  • Whetstone
  • Linpack
  • Dhrystone
  • Debit/credit
  • TPC
  • SPEC
  • MAB
  • Winstone, webstone, etc.
  • ...

15
Sieve, etc.
  • Prime number sieve (Erastothenes)
  • Nested for loops
  • Often such small array that its silly
  • 8 queens
  • Recursive
  • Many others
  • Generally not representative of real problems

16
Whetstone
  • Dates way back (can compare against 70s)
  • Based on real observed frequencies
  • Entirely synthetic (no useful result)
  • Modern optimizers may delete code
  • Mixed data types, but best for floating
  • Be careful of incomparable variants!

17
LINPACK
  • Based on real programs and data
  • Developed by supercomputer users
  • Great if youre doing serious numerical
    computation

18
Dhrystone
  • Bad pun on Whetstone
  • Motivated by Whetstones perceived excessive
    emphasis on floating point
  • Dates to when ?ps were integer-only
  • Very popular in PC world
  • Again, watch out for version mismatches

19
Debit/Credit Benchmark
  • Developed for transaction processing environments
  • CPU processing is usually trivial
  • Remarkably demanding I/O, scheduling requirements
  • Models real TPS workloads synthetically
  • Modern version is TPC benchmark

20
SPEC Suite
  • Result of multi-manufacturer consortium
  • Addresses flaws in existing benchmarks
  • Uses 10 real applications, trying to characterize
    specific real environments
  • Considers multiple CPUs
  • Geometric mean gives SPECmark for system
  • Becoming standard comparison method

21
Modified Andrew Benchmark
  • Used in research to compare file system,
    operating system designs
  • Based on software engineering workload
  • Exercises copying, compiling, linking
  • Probably ill-designed, but common use makes it
    important
  • Needs scaling up for modern systems

22
Winstone, Webstone, etc.
  • Stone has become suffix meaning benchmark
  • Many specialized suites to test specialized
    applications
  • Too many to review here
  • Important to understand strengths drawbacks
  • Bias toward certain workloads
  • Assumptions about system under test

23
Exercisers and Drivers
  • For I/O, network, non-CPU measurements
  • Generate a workload, feed to internal or external
    measured system
  • I/O on local OS
  • Network
  • Sometimes uses dedicated system, interface
    hardware

24
Advantages of Exercisers
  • Easy to develop, port
  • Can incorporate measurement
  • Easy to parameterize, adjust

25
Disadvantagesof Exercisers
  • High cost if external
  • Often too small compared to real workloads
  • Thus not representative
  • E.g., may use caches incorrectly
  • Internal exercisers often dont have real CPU
    activity
  • Affects overlap of CPU and I/O
  • Synchronization effects caused by loops

26
Workload Selection
  • Services Exercised
  • Completeness
  • Sample service characterization
  • Level of Detail
  • Representativeness
  • Timeliness
  • Other Considerations

27
Services Exercised
  • What services does system actually use?
  • Faster CPU wont speed cp
  • Network performance useless for matrix work
  • What metrics measure these services?
  • MIPS/GIPS for CPU speed
  • Bandwidth/latency for network, I/O
  • TPS for transaction processing

28
Completeness
  • Computer systems are complex
  • Effect of interactions hard to predict
  • So must be sure to test entire system
  • Important to understand balance between
    components
  • I.e., dont use 90 CPU mix to evaluate I/O-bound
    application

29
Component Testing
  • Sometimes only individual components are compared
  • Would a new CPU speed up our system?
  • How does IPV6 affect Web server performance?
  • But component may not be directly related to
    performance
  • So be careful, do ANOVA, dont extrapolate too
    much

30
Service Testing
  • May be possible to isolate interfaces to just one
    component
  • E.g., instruction mix for CPU
  • Consider services provided and used by that
    component
  • System often has layers of services
  • Can cut at any point and insert workload

31
Characterizing a Service
  • Identify service provided by major subsystem
  • List factors affecting performance
  • List metrics that quantify demands and
    performance
  • Identify workload provided to that service

32
Example Web Server
Web Page Visits
Web Client
TCP/IP Connections
Network
HTTP Requests
Web Server
Web Page Accesses
File System
Disk Transfers
Disk Drive
33
Web Client Analysis
  • Services visit page, follow hyperlink, display
    page information
  • Factors page size, number of links, fonts
    required, embedded graphics, sound
  • Metrics response time (both definitions)
  • Workload a list of pages to be visited and links
    to be followed

34
Network Analysis
  • Services connect to server, transmit request,
    transfer data
  • Factors bandwidth, latency, protocol used
  • Metrics connection setup time, response latency,
    achieved bandwidth
  • Workload a series of connections to one or more
    servers, with data transfer

35
Web Server Analysis
  • Services accept and validate connection, fetch
    send HTTP data
  • Factors Network performance, CPU speed, system
    load, disk subsystem performance
  • Metrics response time, connections served
  • Workload a stream of incoming HTTP connections
    and requests

36
File System Analysis
  • Services open file, read file (writing often
    doesnt matter for Web server)
  • Factors disk drive characteristics, file system
    software, cache size, partition size
  • Metrics response time, transfer rate
  • Workload a series of file-transfer requests

37
Disk Drive Analysis
  • Services read sector, write sector
  • Factors seek time, transfer rate
  • Metrics response time
  • Workload a statistically-generated stream of
    read/write requests

38
Level of Detail
  • Detail trades off accuracy vs. cost
  • Highest detail is complete trace
  • Lowest is one request, usually most common
  • Intermediate approach weight by frequency
  • We will return to this when we discuss workload
    characterization

39
Representativeness
  • Obviously, workload should represent desired
    application
  • Arrival rate of requests
  • Resource demands of each request
  • Resource usage profile of workload over time
  • Again, accuracy and cost trade off
  • Need to understand whether detail matters

40
Timeliness
  • Usage patterns change over time
  • File size grows to match disk size
  • Web pages grow to match network bandwidth
  • If using old workloads, must be sure user
    behavior hasnt changed
  • Even worse, behavior may change after test, as
    result of installing new system
  • Latent demand phenomenon

41
Other Considerations
  • Loading levels
  • Full capacity
  • Beyond capacity
  • Actual usage
  • External components not considered as parameters
  • Repeatability of workload

42
Workload Characterization
  • Terminology
  • Averaging
  • Specifying dispersion
  • Single-parameter histograms
  • Multi-parameter histograms
  • Principal-component analysis
  • Markov models
  • Clustering

43
Workload Characterization Terminology
  • User (maybe nonhuman) requests service
  • Also called workload component or workload unit
  • Workload parameters or workload features model or
    characterize the workload

44
SelectingWorkload Components
  • Most important components should be external at
    interface of SUT
  • Components should be homogeneous
  • Should characterize activities of interest to the
    study

45
ChoosingWorkload Parameters
  • Select parameters that depend only on workload
    (not on SUT)
  • Prefer controllable parameters
  • Omit parameters that have no effect on system,
    even if important in real world

46
Averaging
  • Basic character of a parameter is its average
    value
  • Not just arithmetic mean
  • Good for uniform distributions or gross studies

47
Specifying Dispersion
  • Most parameters are non-uniform
  • Specifying variance or standard deviation brings
    major improvement over average
  • Average and s.d. (or C.O.V.) together allow
    workloads to be grouped into classes
  • Still ignores exact distribution

48
Single-Parameter Histograms
  • Make histogram or kernel density estimate
  • Fit probability distribution to shape of
    histogram
  • Chapter 27 (not covered in course) lists many
    useful shapes
  • Ignores multiple-parameter correlations

49
Multi-Parameter Histograms
  • Use 3-D plotting package to show 2 parameters
  • Or plot each datum as 2-D point and look for
    black spots
  • Shows correlations
  • Allows identification of important parameters
  • Not practical for 3 or more parameters

50
Principal-Component Analysis (PCA)
  • How to analyze more than 2 parameters?
  • Could plot endless pairs
  • Still might not show complex relationships
  • Principal-component analysis solves problem
    mathematically
  • Rotates parameter set to align with axes
  • Sorts axes by importance

51
Advantages of PCA
  • Handles more than two parameters
  • Insensitive to scale of original data
  • Detects dispersion
  • Combines correlated parameters into single
    variable
  • Identifies variables by importance

52
Disadvantages of PCA
  • Tedious computation (if no software)
  • Still requires hand analysis of final plotted
    results
  • Often difficult to relate results back to
    original parameters

53
Markov Models
  • Sometimes, distribution isnt enough
  • Requests come in sequences
  • Sequencing affects performance
  • Example disk bottleneck
  • Suppose jobs need 1 disk access per CPU slice
  • CPU slice is much faster than disk
  • Strict alternation uses CPU better
  • Long disk access strings slow system

54
Introduction toMarkov Models
  • Represent model as state diagram
  • Probabilistic transitions between states
  • Requests generated on transitions

55
Creating a Markov Model
  • Observe long string of activity
  • Use matrix to count pairs of states
  • Normalize rows to sum to 1.0

56
Example Markov Model
  • Reference string of opens, reads,
    closesORORRCOORCRRRRCC
  • Pairwise frequency matrix

57
Markov Modelfor I/O String
  • Divide each row by its sum to get transition
    matrix
  • Model

Read
0.50
0.75
0.33
0.13
0.37
Open
Close
0.34
0.25
0.33
58
Clustering
  • Often useful to break workload into categories
  • Canonical example of each category can be used
    to represent all samples
  • If many samples, generating categories is
    difficult
  • Solution clustering algorithms

59
Steps in Clustering
  • Select sample
  • Choose and transform parameters
  • Drop outliers
  • Scale observations
  • Choose distance measure
  • Do clustering
  • Use results to adjust parameters, repeat
  • Choose representative components

60
Selecting A Sample
  • Clustering algorithms are often slow
  • Must use subset of all observations
  • Can test sample after clustering does every
    observation fit into some cluster?
  • Sampling options
  • Random
  • Heaviest users of component under study

61
Choosing and Transforming Parameters
  • Goal is to limit complexity of problem
  • Concentrate on parameters with high impact, high
    variance
  • Use principal-component analysis
  • Drop a parameter, re-cluster, see if different
  • Consider transformations such as Sec. 15.4
    (logarithms, etc.)

62
Dropping Outliers
  • Must get rid of observations that would skew
    results
  • Need great judgment here
  • No firm guidelines
  • Drop things that you know are unusual
  • Keep things that consume major resources
  • E.g., daily backups

63
Scale Observations
  • Cluster analysis is often sensitive to parameter
    ranges, so scaling affects results
  • Options
  • Scale to zero mean and unit variance
  • Weight based on importance or variance
  • Normalize range to 0, 1
  • Normalize 95 of data to 0, 1

64
Choosinga Distance Measure
  • Endless possibilities available
  • Represent observations as vectors in k-space
  • Popular measures include
  • Euclidean distance, weighted or unweighted
  • Chi-squared distance
  • Rectangular distance

65
Clustering Methods
  • Many algorithms available
  • Computationally expensive (NP to find optimum)
  • Can be simple or hierarchical
  • Many require you to specify number of desired
    clusters
  • Minimum Spanning Tree is not only option!

66
Minimum Spanning Tree Clustering
  • Start with each point in a cluster
  • Repeat until single cluster
  • Compute centroid of each cluster
  • Compute intercluster distances
  • Find smallest distance
  • Merge clusters with smallest distance
  • Method produces stable results
  • But not necessarily optimum

67
K-Means Clustering
  • One of most popular methods
  • Number of clusters is input parameter
  • First randomly assign points to clusters
  • Repeat until no change
  • Calculate center of each cluster
  • Assign each point to cluster with nearest center

68
Interpreting Clusters
  • Art, not science
  • Drop small clusters (if little impact on
    performance)
  • Try to find meaningful characterizations
  • Choose representative components
  • Number proportional to cluster size or to total
    resource demands

69
Drawbacks of Clustering
  • Clustering is basically AI problem
  • Humans will often see patterns where computer
    sees none
  • Result is extremely sensitive to
  • Choice of algorithm
  • Parameters of algorithm
  • Minor variations in points clustered
  • Results may not have functional meaning

70
White Slide
Write a Comment
User Comments (0)
About PowerShow.com