Title: CPE 619: Modeling and Analysis of Computer and Communications Systems
1CPE 619 Modeling and Analysis of Computer and
Communications Systems
- Aleksandar Milenkovic
- The LaCASA Laboratory
- Electrical and Computer Engineering Department
- The University of Alabama in Huntsville
- http//www.ece.uah.edu/milenka
- http//www.ece.uah.edu/lacasa
2Topics
- Background
- About the Course
3Professor Background
- Dr. Aleksandar Milenkovic
- Research interests
- LaCASA laboratory www.ece.uah.edu/lacasa
- Computer architecture hardware/software
structures for secure, cost-effective, and
performance-effective computation and
communication - Performance analysis and evaluation
- Low-power deeply embedded systems (sensor
networks) - Teaching interests
- CPE 631 Advanced Computer Systems Architecture
- CPE 619 Modeling and Analysis of Computer and
Communication Systems - CPE 527 VLSI Design
- CPE 323 Introduction to Embedded Computer Systems
4Goals of This Course
- Comprehensive course on performance analysis
- Includes measurement, statistical modeling,
experimental design, simulation, and queuing
theory - How to avoid common mistakes in performance
analysis - Graduate course (Advanced Topics) ? Lot of
independent reading and writing? Project/Survey
paper (Research techniques)
5Syllabus
- Course Web page http//www.ece.uah.edu/milenka/c
pe619-09F - Office hours MW 100 PM 200 PM
- Email milenka at ece.uah.edu
6Text Books
- Required
- Raj Jain. The Art of Computer Systems Performance
Analysis Techniques for Experimental Design,
Measurement, Simulation, and Modeling, John Wiley
and Sons, Inc., New York, NY, 1991.
ISBN0471503363 - Other
- Edward D. Lazowska, John Zahorjan, G. Scott
Graham, and Kenneth C. Sevcik. Computer System
Analysis Using Queueing Network Models. NOTE
Available for free on the WEB at
http//www.cs.washington.edu/homes/lazowska/qsp/ - David J. Lilja. Measuring Computer Performance A
Practitioner's Guide, Cambridge University Press,
New York, NY, 2000.
7Objectives What You Will Learn
- Specifying performance requirements
- Evaluating design alternatives
- Comparing two or more systems
- Determining the optimal value of a parameter
(system tuning) - Finding the performance bottleneck (bottleneck
identification) - Characterizing the load on the system (workload
characterization) - Determining the number and sizes of components
(capacity planning) - Predicting the performance at future loads
(forecasting)
8Basic Terms
- System Any collection of hardware, software, and
firmware - Metrics Criteria used to evaluate the
performance of the system components - Workloads The requests made by the users of the
system
9Main Parts of the Course
- Part I An Overview of Performance Evaluation
- Part II Measurement Techniques and Tools
- Part III Probability Theory and Statistics
- Part IV Experimental Design and Analysis
- Part V Simulation
- Part VI Queuing Theory
10Part I An Overview of Performance Evaluation
- Introduction
- Common Mistakes and How To Avoid Them
- Selection of Techniques and Metrics
11Part II Measurement Techniques and Tools
- Types of Workloads
- Popular Benchmarks
- The Art of Workload Selection
- Workload Characterization Techniques
- Monitors
- Accounting Logs
- Monitoring Distributed Systems
- Load Drivers
- Capacity Planning
- The Art of Data Presentation
- Ratio Games
12Part III Probability Theory and Statistics
- Probability and Statistics Concepts
- Four Important Distributions
- Summarizing Measured Data By a Single Number
- Summarizing The Variability Of Measured Data
- Graphical Methods to Determine Distributions of
Measured Data - Sample Statistics
- Confidence Interval
- Comparing Two Alternatives
- Measures of Relationship
- Simple Linear Regression Models
- Multiple Linear Regression Models
- Other Regression Models
13Part IV Experimental Design and Analysis
- Introduction to Experimental Design
- 2k Factorial Designs
- 2kr Factorial Designs with Replications
- 2k-p Fractional Factorial Designs
- One Factor Experiments
- Two Factors Full Factorial Design without
Replications - Two Factors Full Factorial Design with
Replications - General Full Factorial Designs With k Factors
14Part V Simulation
- Introduction to Simulation
- Types of Simulations
- Model Verification and Validation
- Analysis of Simulation Results
- Random-Number Generation
- Testing Random-Number Generators
- Random-Variate Generation
- Commonly Used Distributions
15Part VI Queuing Theory
- Introduction to Queueing Theory
- Analysis of A Single Queue
- Queuing Networks
- Operational Laws
- Mean Value Analysis and Related Techniques
- Convolution Algorithm
- Advanced Techniques
16Grading Policy
- Prerequisites
- MA 585 or EE 500 (soft requirement)
- General knowledge about computer systems
- Interest in performance evaluation
- Grading
- Homeworks 20
- Midterm Exam 25
- Final Exam 25
- Project 25
- Class participation 5
17More on Projects
- Goal Provide an insight (or information) not
obvious before the project - Two project types
- A survey paper on a performance topic
- A real case study on performance of a system you
are already working on
18Other Course Policies
19Introduction
20Outline
- Objectives
- What kind of problems will you be able to solve
after taking this course? - The Art
- Common Mistakes
- Systematic Approach
- Case Study
21Objectives (1 of 6)
- Select appropriate evaluation techniques,
performance metrics and workloads for a system - Techniques measurement, simulation, analytic
modeling - Metrics criteria to study performance (ex
response time) - Workloads requests by users/applications to the
system - Example What performance metrics should you use
for the following systems? - a) Two disk drives
- b) Two transactions processing systems
- c) Two packet retransmission algorithms
22Objectives (2 of 6)
- Conduct performance measurements correctly
- Need two tools
- Load generator a tool to load the system
- Monitor a tool to measure the results
- Example Which type of monitor (software and
hardware) would be more suitable for measuring
each of the following quantities? - a) Number of instructions executed by a processor
- b) Degree of multiprogramming on a timesharing
system - c) Response time of packets on a network
23Objectives (3 of 6)
- Use proper statistical techniques to compare
several alternatives - Find the best among a number of alternatives
- One run of workload often not sufficient
- Many non-deterministic computer events that
effect performance - Comparing average of several runs may also not
lead to correct results - Especially if variance is high
- Example Packets lost on a link. Which link is
better? - File Size Link A Link B
- 1000 5 10
- 1200 7 3
- 1300 3 0
- 50 0 1
24Objectives (4 of 6)
- Design measurement and simulation experiments to
provide the most information with the least
effort - Often many factors that affect performance.
Separate out the effects of individual factors. - Example The performance of a system depends upon
three factors - A) garbage collection technique G1, G2, or none
- B) type of workload editing, compiling, AI
- C) type of CPU P2, P4, Sparc
- How many experiments are needed? How can the
performance of each factor be estimated?
25Objectives (5 of 6)
- Perform simulations correctly
- Select correct language, seeds for random
numbers, length of simulation run, and analysis - Before all of that, may need to validate
simulator - Example To compare the performance of two cache
replacement algorithms - A) What type of simulation model should be used?
- B) How long should the simulation be run?
- C) What can be done to get the same accuracy with
a shorter run? - D) How can one decide if the random-number
generator in the simulation is a good generator?
26Objectives (6 of 6)
- Use simple queuing models to analyze the
performance of systems - Queuing models are commonly used for analytical
modeling of computer systems - Often can model computer systems by service rate
and arrival rate of load - Multiple servers
- Multiple queues
- Example The average response time of a database
system is 3 seconds. During a 1-minute
observation interval, the idle time on the system
was 10 seconds. Using a queuing model for the
system, determine the following - System utilization, average service time per
query, the number of queries completed during
observation, average number of jobs in the
system,
27Outline
- Objectives
- The Art
- Common Mistakes
- Systematic Approach
- Case Study
28The Art of Performance Evaluation
- Evaluation cannot be produced mechanically
- Requires intimate knowledge of system
- Careful selection of methodology, workload, tools
- Not one correct answer as two performance
analysts may choose different metrics or
workloads - Like art, there are techniques to learn
- how to use them
- when to apply them
29Example Comparing Two Systems
- Two systems, two workloads, measure transactions
per second -
- Which is better?
System Workload 1 Workload 2
A 20 10
B 10 20
30Example Comparing Two Systems
- Two systems, two workloads, measure transactions
per second -
- They are equally good!
- but is A better than B?
System Workload 1 Workload 2 Average
A 20 10 15
B 10 20 15
31The Ratio Game
- Take system B as the base
-
- A is better!
- but is B better than A?
System Workload 1 Workload 2 Average
A 2 0.5 1.25
B 1 1 1
32The Ratio Game
- Take system A as the base
-
- B is better!?
System Workload 1 Workload 2 Average
A 1 1 1
B 0.5 2 1.25
33Outline
- Objectives
- The Art
- Common Mistakes
- Systematic Approach
- Case Study
34Common Mistakes (1-4)
- 1. Undefined Goals (Dont shoot and then draw
target) - There is no such thing as a general model
- Describe goals and then design experiments
- 2. Biased Goals (Performance analysis is like a
jury) - Dont show YOUR system better than HERS
- 3. Unsystematic Approach
- Arbitrary selection of system parameters,
factors, metrics, will lead to inaccurate
conclusions - 4. Analysis without Understanding (A problem
well-stated is half solved) - Dont rush to modeling before defining a problem
35Common Mistakes (5-8)
- 5. Incorrect Performance Metrics
- E.g., MIPS
- 6. Unrepresentative Workload
- Wrong workload will lead to inaccurate
conclusions - 7. Wrong Evaluation Technique (Dont have a
hammer and see everything as a nail) - Use most appropriate model, simulation,
measurement - 8. Overlooking Important Parameters
- Start from a complete list of system and workload
parameters that affect the performance
36Common Mistakes (9-12)
- 9. Ignoring Significant Factors
- Parameters that are varied are called factors
others are fixed - Identify parameters that make significant impact
on performance when varied - 10. Inappropriate Experimental Design
- Relates to the number of measurement or
simulation experiments to be conducted - 11. Inappropriate Level of Detail
- Can have too much! Ex modeling disk
- Can have too little! Ex analytic model for
congested router - 12. No Analysis
- Having a measurement expert is desirable but not
enough - Expertise in analyzing results is crucial
37Common Mistakes (13-16)
- 13. Erroneous Analysis
- E.g., take averages on too short simulations
- 14. No Sensitivity Analysis
- Analysis is evidence and not fact
- Need to determine how sensitive results are to
settings - 15. Ignoring Errors in Input
- Often parameters of interest cannot be measured
Instead, they are estimated using other
variables - Adjust the level of confidence on the model
output - 16. Improper Treatment of Outliers
- Outliers are values that are too high or too low
compared to a majority of values - If possible in real systems or workloads, do not
ignore them
38Common Mistakes (17-20)
- 17. Assuming No Change in the Future
- Workload may change in the future
- 18. Ignoring Variability
- If variability is high, the mean performance
alone may be misleading - 19. Too Complex Analysis
- A simpler and easier to explain analysis should
be preferred - 20. Improper Presentation of Results
- It is not the number of graphs, but the number
of graphs that help make decisions
39Common Mistakes (21-22)
- 21. Ignoring Social Aspects
- Writing and speaking are social skills
- 22. Omitting Assumptions and Limitations
- E.g. may assume most traffic TCP, whereas some
links may have significant UDP traffic - May lead to applying results where assumptions
do not hold
40Checklist for Avoiding Common Mistakes in
Performance Evaluation
- Is the system correctly defined and the goals are
clearly stated? - Are the goals stated in an unbiased manner?
- Have all the steps of the analysis followed
systematically? - Is the problem clearly understood before
analyzing it? - Are the performance metrics relevant for this
problem? - Is the workload correct for this problem?
- Is the evaluation technique appropriate?
- Is the list of parameters that affect performance
complete? - Have all parameters that affect performance been
chosen as factors to be varied? - Is the experimental design efficient in terms of
time and results? - Is the level of detail proper?
- Is the measured data presented with analysis and
interpretation? - Is the analysis statistically correct?
- Has the sensitivity analysis been done?
- Would errors in the input cause an insignificant
change in the results? - Have the outliers in the input or the output been
treated properly? - Have the future changes in the system and
workload been modeled? - Has the variance of input been taken into
account? - Has the variance of the results been analyzed?
41Outline
- Objectives
- The Art
- Common Mistakes
- Systematic Approach
- Case Study
42A Systematic Approach
- State goals and define boundaries
- List services and outcomes
- Select performance metrics
- List system and workload parameters
- Select factors and values
- Select evaluation techniques
- Select workload
- Design experiments
- Analyze and interpret the data
- Present the results. Repeat.
43State Goals and Define Boundaries
- Just measuring performance or seeing how it
works is too broad - E.g. goal is to decide which ISP provides
better throughput - Definition of system may depend upon goals
- E.g. if measuring CPU instruction speed, system
may include CPU cache - E.g. if measuring response time, system may
include CPU memory OS user workload
44List Services and Outcomes
- List services provided by the system
- E.g., a computer network allows users to send
packets to specified destinations - E.g., a database system responds to queries
- E.g., a processor performs a number of tasks
- A user request for any of these services results
in a number of possible outcomes (desirable or
not) - E.g., a database system may answer correctly,
incorrectly (due to inconsistent updates), or
not at all (due to deadlocks)
45Select Metrics
- Criteria to compare performance
- In general, related to speed, accuracy and/or
availability of system services - E.g. network performance
- Speed throughput and delay
- Accuracy error rate
- Availability data packets sent do arrive
- E.g. processor performance
- Speed time to execute instructions
46List Parameters
- List all parameters that affect performance
- System parameters (hardware and software)
- E.g. CPU type, OS type,
- Workload parameters
- E.g. Number of users, type of requests
- List may not be initially complete, so have a
working list and let grow as progress
47Select Factors to Study
- Divide parameters into those that are to be
studied and those that are not - E.g. may vary CPU type but fix OS type
- E.g. may fix packet size but vary number of
connections - Select appropriate levels for each factor
- Want typical and ones with potentially high
impact - For workload often smaller (1/2 or 1/10th) and
larger (2x or 10x) range - Start small or number can quickly overcome
available resources!
48Select Evaluation Technique
- Depends upon time, resources, and desired level
of accuracy - Analytic modeling
- Quick, less accurate
- Simulation
- Medium effort, medium accuracy
- Measurement
- Typical most effort, most accurate
- Note, above are all typical but can be reversed
in some cases!
49Select Workload
- Set of service requests to system
- Depends upon measurement technique
- Analytic model may have probability of various
requests - Simulation may have trace of requests from real
system - Measurement may have scripts impose transactions
- Should be representative of real life
50Design Experiments
- Want to maximize results with minimal effort
- Phase 1
- Many factors, few levels
- See which factors matter
- Phase 2
- Few factors, more levels
- See where the range of impact for the factors is
51Analyze and Interpret Data
- Compare alternatives
- Take into account variability of results
- Statistical techniques
- Interpret results
- The analysis does not provide a conclusion
- Different analysts may come to different
conclusions
52Present Results
- Make it easily understood
- Graphs
- Disseminate (entire methodology!)
"The job of a scientist is not merely to see it
is to see, understand, and communicate. Leave
out any of these phases, and you're not doing
science. If you don't see, but you do understand
and communicate, you're a prophet, not a
scientist. If you don't understand, but you do
see and communicate, you're a reporter, not a
scientist. If you don't communicate, but you do
see and understand, you're a mystic, not a
scientist."
53Outline
- Objectives
- The Art
- Common Mistakes
- Systematic Approach
- Case Study
54Case Study
- Consider remote pipes (rpipe) versus remote
procedure calls (rpc) - rpc is like procedure call but procedure is
handled on remote server - Client caller blocks until return
- rpipe is like pipe but server gets output on
remote machine - Client process can continue, non-blocking
- Results are returned asynchronously
- Goal study the performance of applications using
rpipes to similar applications using rpcs
55System Definition
- Client and Server and Network
- Key component is channel, either a rpipe or an
rpc - Only the subset of the client and server that
handle channel are part of the system
- Try to minimize effect of components outside
system
56Services
- There are a variety of services that can happen
over a rpipe or rpc - Choose data transfer as a common one, with data
being a typical result of most client-server
interactions - Classify amount of data as either large or small
- Thus, two services
- Small data transfer
- Large data transfer
57Metrics
- Limit metrics to correct operation only (no
failure or errors) - Study service rate and resources consumed
- Performance metrics
- A) elapsed time per call
- B) maximum call rate per unit time
- C) Local CPU time per call
- D) Remote CPU time per call
- E) Number of bytes sent per call
58Parameters
System
Workload
- Speed of CPUs
- Local
- Remote
- Network
- Speed
- Reliability (retrans)
- Operating system overhead
- For interfacing with channels
- For interfacing with network
- Time between calls
- Number and sizes
- of parameters
- of results
- Type of channel
- rpc
- Rpipe
- Other loads
- On CPUs
- On network
59Key Factors
- A) Type of channel
- rpipe or rpc
- B) Speed of network
- Choose short (LAN) and across country (WAN)
- C) Size of parameters
- Small or larger
- D) Number of calls
- 11 values 8, 16, 32 1024
- E) All other parameters are fixed
- (Note, try to run during light network load)
60Evaluation Technique
- Since there are prototypes, use measurement
- Use analytic modeling based on measured data for
values outside the scope of the experiments
conducted
61Workload
- Synthetic program generated specified channel
requests - Will also monitor resources consumed and log
results - Use null channel requests to get baseline
resources consumed by logging - Heisenberg uncertainty principle in physics
the measurement of position necessarily
disturbs a particle's momentum, and vice
versai.e., that the uncertainty principle is a
manifestation of the observer effect
62Experimental Design
- Full factorial (all possible combinations of
factors) - 2 channels, 2 network speeds, 2 sizes, 11 numbers
of calls - ? 2 x 2 x 2 x 11 88 experiments
63Data Analysis
- Analysis of variance will be used to quantify
the first three factors - Are they different?
- Regression will be used to quantify the effects
of n consecutive calls - Performance is linear? Exponential?
64Data Presentation
- The final results will be plotted as a
functionof the block size n