Title: Tales from the Lab: Experiences and Methodology
1Tales from the LabExperiences and Methodology
- Demand Technology User Group
- December 5, 2005
- Ellen Friedman
- SRM Associates, Ltd
2Testing in the Lab
- Experiences of a consultant
- Taming the Wild West
- Bringing order to Chaos
- HOW?
- Methodology- Capacity Planning,SPE, Load Testing,
- Discipline
- Checklists/Procedures
- What happens when procedures arent followed
- Detective Work
3Agenda
- Introduction
- Software Performance Engineering and Benefits of
Testing - Back to Basics
- Workload Characterization/Forecasting Capacity
Planning - Building the Test Labs
- Testing Considerations
- Scripts and test execution
- Some Examples
- Documenting the test plan and reporting results
- Summary
4Software Performance Engineering
- Performance engineering is the process by which
new applications (software) are tested and tuned
with the intent of realizing the required
performance. - Benefit
- Identify problems early-on in the application
life-cycle - Manage Risk
- Facilitates the identification and correction of
bottlenecks to - Minimize end to end response time
- Maximize application performance
5Should we bother to Test??
WE CANT PLAN FOR WHAT WE DONT KNOW
6What do we need to achieve?
- Scalability
- Predictable scaling of software/hardware
architecture - Do we have capacity to meet resource
requirements? - How many users will system handle before we need
to upgrade or add web servers/app servers - Stability
- Ability to achieve results under unexpected loads
and conditions - Performance vs Cost
- Achieving SLA and minimizing cost
7Testing throughout the application lifecycle
Cost of Fixing a problem late in the development
is extremely
8What is a Performance Test Lab?
A facility to pro-actively assess the
satisfactory delivery of Service to users prior
to system Implementation or roll-out. - A
test drive capability.
9Lab- What is it Good For?
- Before you deploy the application- create an
environment that simulates the production
environment - Use this environment to reflect the conditions of
target production environment
10Testing Plan
Evaluate system
SLAs, Workload Characterization, Volumes
Develop Scripts Test Strategy
Obtain tools, methodology, build scripts
Execute Baseline Tests
Run the tests in the lab and obtain baseline
Ensure that test scripts adequately represent
the production environment
Validate Baseline
Run Controlled Benchmarks
Analyze Results
Report Findings
Analyze Results
11Evaluate SystemWorkload Characterization
- Identify Critical Business Functions
- Define Corresponding System Workloads/Transactions
- Map business workloads to system transactions
- Identify flow of transactions through the system
- Identify current and expected future volume
- Determine resource requirements for
business-based workloads at all architectural
tiers - Web server, Applications server, Database server
12Evaluate SystemWorkload Forecasting
- Define key volume indicators for
- What are the drivers for volume and/or resource
usage for the system? - Examples
- Banking Checks processed
- Insurance Claims processed
- Financial Trades processed
- Shipping Packages processed
13Workload ForecastingHistorical Review
- Does the business have a set peak?
- December for retail, and shipping
- Peak/Average Ratio? 20 or 30 higher?
- Volume vs. Resource Usage
- Larger centers require greater computing
resources - Need to determine scaling of hardware/software
resources as a function of volume
14Volume vs. Response Time
Scale Volume 1000 PPH
15Service Level Considerations
- e-Business SystemTracking System for Package
Inquiries - WHERE IS MY PACKAGE?
- Call center handles real-time customer inquiries
- SLA- caller cannot be put on hold gt3 minutes
- 90 of all calls should be cleared on first
contact - Responsiveness to customer needs
- Web-interface for customers
- Page load time and query resolution lt6-8 seconds
16Lab can be used throughout the Application
Lifecycle
- Testing throughout the Application Life Cycle
- Planning
- Design/ coding
- Development/testing/UAT
- Production Deployment
- Post-production-change management
- Optimization (performance and volume testing)
- Labs reduce risk to your production environment
- Solid testing leads to cleaner implementations !!
17How many Labs? Where to put them
- Locations for testing in various technical,
business, or political contexts. The following
factors influence the decisions you make about
your test environment - Your testing methodology
- Features and components you will test
- PEOPLE, MONEY, Location
- Personnel who will perform the testing
- Size, location, and structure of your application
project teams. - Size of your budget.
- Availability of physical space.
- Location of testers.
- Use of the labs after deployment.
18Types of Labs and their Purpose
- Application unit testing
- Hardware or software incompatibilities
- Design flaws
- Performance issues
- Systems integration testing lab
- User Acceptance Testing (UAT)
- Application compatibility
- Operational or deployment inefficiencies
- Windows 2003 features
- Network infrastructure compatibility
- Interoperability with other network operating
systems - Hardware compatibility
- Tools (OS, third-party, or custom)
- Volume testing lab
- Performance and capacity planning
- Baseline traffic patterns
- traffic volumes without user activity
- Certification Lab
- Installation and configuration documentation
- Administrative procedures and documentation
- Production rollout (processes, scripts, and
files back-out plans)
19Testing Concepts 101
- Define the problem- Test Objectives
- Limit the scope
- Establish metrics analysis methodology
- Tools/analysis
- Establish the environment
- Design the test bed
- Simulate the key business functions
- Develop scripts and their frequency of execution
20Testing Process 101
- Ensure that Lab mimics production (H/W, S/W,
Workload/business functions being tested) - Test measurement tools and develop analysis tools
- ARM the application
- Instrumentation to provide end to end response
time - Instrumentation to provide business metrics to
correlate - Execute controlled test
- Single variable manipulation
- Ensure repeatability
- Analyze data repeat if required (e.g., tune
system) - Extrapolate
- Document Test set-up and results
21Developing the script
- Meet with the Business Team, Applications Team to
understand the workload. - What is typical? What is most resource intensive.
- Determine the appropriate mix of work
- Typical navigation and screen flow
- of time each screen is accessed by user
- Number of users to test with, number of different
accounts to use (other factors impacting
representative ness of test) - Include cases to test resource intensive
activities and functions - Include cases where user may abandon session
because r/t is too long - Test for time-outs
22Load Testing Parameters
- Simulating Volume and distribution of arrival
rate - Hourly volume- distribution is not uniform,
Bursty arrival rate - Web sessions are only about 3 minutes long
- When is traffic heaviest?
- How long does the user spend at the site?
- Need to vary the number of users started over the
hour/User Think Time - Package Shipping Example Different from web
site- more predictable - Arrival rate highest in first hour
- Limited by capacity of site to load the
packages/speed of belts etc. - Package scanning some automated but still has
human involvement
23X read bytes/second over time
How long should the test run?
Note reduction in read bytes/sec over time
Test run is four hours here!
Need to reach steady state!
24Creating the Test Environment in the Lab
- Creating the data/database
- Copy database from production- subset it
- Manually key/Edit some of the data
- Create image copy of system for use in each run
- Verifying the test conditions
- Utilize ghost imaging or software such as
Powerquest or Live State to save the database
and system state between test runs - May need to also verify configuration settings
that arent saved in the image copy - Make sure that you are simulating the correct
conditions (End of Day/Beginning of Day/Normal
production flow) - Scripting the key business functions
- Vary the test data as part of scripting
- Vary users/accounts/pathing
25What type of staff do we need?
- Programmers
- Korn Shell Programmers
- Mercury Mavens?
26Establish Metrics Analysis Methodology
- Based on the testing objectives, what data do we
need to collect and measure? - CPU, Memory, I/O, network, response time
- What tools do we need for measurement?
- Do not over-measure
- Dont risk over-sampling and incurring high
overhead - Create a Template to use for comparison between
test runs
27Build a Template for Comparison
- Before vs. After Comparison of Test Cases
- Collect the performance data- Metrics
- CPU Processor Metrics
- System, User and Total Processor Utilization
- Memory
- Available bytes, Page reads/second, Page
Ins/second, Virtual/Real bytes - Network
- Bytes sent/received, Packets sent/received per
NIC - Disk
- Reads and Writes/second, Read and Write
bytes/second, Seconds/Read, Seconds/Write, Disk
utilization - Process SQL Server (2 instances)
- CPU
- Working set size
- Read/Write bytes per second
- Database- SQL
- Database Reads/Writes per instance, Stored
Procedure Timings - Log Bytes flushed per database
28CASE STUDY
- Packaging- Shipping System
- Many centers throughout the country
- Same Applications
- Same Hardware
- Testing in the lab is required to identify
bottlenecks and optimize performance - SLA not being met in some larger centers
- Suspect Database Performance
29Case Study Configuration Architecture
- Database Server
- Runs 2 Instances of SQL (Main, Reporting)
- Databases are configured on the X drives
- TempDB and Logs are configured on D drive
30Scanning the package on the Belt
IF SLA not met packages arent processed
automatically Additional manual work is required
to handle exceptions
31Case Study Hardware
- Database Server- DB 1
- G3 (2.4 GHz) with 4 GB memory
- Raid 10 Configuration
- Internal
- 1 C/D logically partitioned
- External (10 slots)
- 2 X drives- mirrored
- 2 Y drives- mirrored
Database 1
DatabaseServers
- Application Server
- G3 (2.4 GHz) with 3 GB memory
- 2 Internal Drives (C/D)
- Database Server- DB 2
- G3 (2.4 GHz) with 4 GB memory
- Internal
- 1 C/D logically partitioned
- 2 X mirrored drives
Database 2
Application Server
32Case Study Software and OS
- Windows 2000
- SQL Server 2000
- 2 Database Instances
- Reporting
- Main Instance- Multiple Databases
- Replication of Main Instance to Reporting
Instance on the same server - Main Instance and Reporting Instance share same
drives
33Case StudyWhen do we test in the Lab?
- Hardware Changes
- OS Changes
- Software patch level changes to main suite of
applications - Major application changes
- Changes to other applications which coexist with
primary application suite.
34Checklists and Forms
- Test Objectives
- Application Groups must identify
- Specific application version to be tested as well
as those of other co-dependent applications - Database set-up to process the data
- Special data
- Workstation set-up
- Volume- Induction rate/flow(arrival rate)
- Workflow and percentages
- Scripts/percentage/flow rate
35Case Study Hardware Checklist
36Sign-offs on Procedures/Pre-flight
- Who?
- Applications team
- Lab group
- Systems groups
- Network
- Distributed Systems
- Database
- Performance
37Script Development Collected data from
Production Systems
- Applications to include for testing and to be
used to determine resource profiles for key
transactions and business functions - Volumes to test with
- Database conditions including database size,
database state requirements (e.g. end of day
conditions) - Application workflow- based on operational
characteristics in various centers - Job and queue dependencies
- Requirements for specific data feeds to include
- Â
38Case Study Developing a Script
- Major business functions for labeling and
shipping - Verifying the name and address of the item to be
shipped - Interface to other system and uses algorithms for
parsing names/addresses - Route planning- interface with OR systems to
optimize routing - Scanning the package information (local
operation) - Determining the type of shipment
freight/letter/overnight small package for
shipping the item, and the appropriate route - Sorting the packages according to type of
shipment - Printing the smart labels
- how/where to load the package
- Tracking the package
39Case StudyPerformance Testing in the Lab
- Production Analysis indicated
- Insufficient memory to support database storage
requirements - Resulting in increased I/O processing
- OPTIONS
- Add memory
- Not feasible requires OS upgrade to address more
than 4 GB of storage with Windows 2000 Standard
Edition - Make the I/O faster- faster drives or more drives
- Spread the I/O across multiple drives (external
disk storage is expandable up to 10 slots
available) - Separate the database usage across 2 sets of
physical drives - Split the database across multiple servers (2
database servers) - Easier upgrade then OS change
- Change the database design (Expected in 1Q2006,
testing now)
40Planning Testing out the configuration options
- Test out each of the options and provide a
recommendation - SLA 99 of packages must complete their
processing in under 500 milliseconds - Each option was evaluated based on its relative
ability to satisfy the SLA criteria.
41Validating the baseline
Taming the West!
If you cant measure it, you cant manage it!
(CMG slogan)
42Case StudyWhat are we measuring?
- End to End Response Time (percentiles, average)
- SQL Stored Procedure Timings (percentiles,
average) - SQL Trace information summarized for each stored
procedure for a period of time - Perfmon System, Process, SQL (average, max)
- CPU, Memory, Disk
- Process Memory, Disk, Processor
- SQL Database Activity, Checkpoints, Buffer Hit
etc.
43Validating the Baseline
- Data from two production systems was obtained to
produce - Test database from multiple application systems
- Database states were obtained, system
inter-dependencies were satisfied, application
configuration files - Baseline test was executed- Multiple Iterations
- Performance measurements from two other systems
were collected and compared against baseline
execution - Results were compared
- Database and scripts were modified to better
reflect production conditions
44Story Creating a new Environment
- A series of performance tests were conducted in
Green Environment to evaluate I/O performance - To be reviewed in presentation on Thursday 12-8.
- Green Environment was required for another
project. So moved to a new Red Environment - Data created from a different source (2 different
production environments) - Simulating high volume
- What happened?
- Different page densities
- Different distribution of package delivery dates
- Different database size for critical database
- Red was much fatter!
45Analysis to evaluate new Baseline
- Compare I/O activity for Green and Red
- Metrics
- End to End Response Time
- SQL Stored Procedure Timings
- SQL Activity
- Database Page Reads/Writes overall and for each
database - (X drive containing database)
- Log Bytes Flushed per second (each database)-
- D-drive (logs)
- SQL Read and Write bytes/second
- SQL reads and writes is overall so it includes
database I/O and log activity - Disk Activity
- Overall Drive D/X Read/Write bytes/second
46Comparing Overall Response TimeRed vs. Green and
Separate Server
Green and Red tests with 2 mirrored pair of X
drives are baselines
Results of baselines should be comparable!!!
47Comparison of Green and Red Environments (X drive
database)
Read Activity 16 higher Write Activity 38 higher
48Comparison of Green and Red Environments (D drive
Tempdb/logs)
Read Activity 1 higher Write Activity 13 higher
I/O activity is approximately same on D drive
49Comparison of I/O LoadSQL Activity Green vs. Red
Increase in Reads in Red due to Main Increase in
Writes in Red caused by both
50I/O Load Change Main Instance Separate server
vs. Baseline
Read Activity is reduced by 43 with separate
server
51Differences between Red and Green
- D Drive activity is approximately the same
- TempDB and logging
- X Drive activity is increased in Red environment
- Most of differences are due to an increase in
Reads on X drive for Main Instance - Implies that the database was much fatter
- Confirm this by reviewing Page reads/Page Writes
per database from SQL statistics - Review database sizes (unfortunately didnt have
this data so we inferred it based on I/O data and
SQL trace data) - SQL trace data showed more Page Reads for key
databases
52Red Environment Comparing Three Days
- Background
- Several large databases
- Main UOWIS, PAS
- Reporting Adhoc, UW1, Distribution
- 4-1 Replication turned off for UW1 database
- 4-4 Replication on for UW1 database
- 4-8 Separate server for UOWIS, replication
turned on for UW1 - Expectations
- 4-1 will perform better than 4-4 reduce I/O
significantly - Expect significant reduction in Reporting
Database I/O - 4-8 separate server will separate out the
critical database - Expect same amount of work performed as 4-4 but a
reduction in Read Activity for UOWIS because data
will now be in memory
53Reviewing Log Write Activity
Note No log bytes no replication Of UW1
database on 4-4
54 RED Comparing Three Days Database Disk
Activity
Note 4-8 UOWIS results are for separate server
Increase in work performed on 4-8 vs. 4-4
55Comparing Database Reads/WritesMain Instance
56Comparing Database Reads/WritesReporting Instance
Total page reads for reporting instance should
remain constant Why did it increase on 4-8?
57Where are the differences on the two days?
Note Differences in Stored Procedure- Total
Reads (logical) Data Cap Summary and in Belt
Summary Reports (not main functionality)
58What have we uncovered about test differences?
- Processor usage approximately the same
- Amount of Write Activity per instance is same
- Reviewed log bytes/flushed for each instance
- Reporting instance performed more I/O- more reads
- Additional report jobs were executed on 4-8 and
not on 4-4 - Reports run 4 times per hour (every 15 minutes-
causes burst in I/O activity) - When UOWIS database is on the same server
(sharing same drives as other Main Instance and
Reporting Instance work) response time is higher - Response Time is directly related to physical
reads and physical disk read performance - Spreading the I/O across more drives and/or
providing more memory for the critical database
instance improves performance
59Testing Summary
- Need to create and follow a test plan which
outlines - All pre-flight procedures
- Confirm that environment is ready to go
- Validate baselines
- Run tests in organized fashion following the plan
- Do a sanity check!
- Do results make sense
- Otherwise search for the truth- dont bury the
results
60Measurement Summary
- The nature of performance data is that it is long
tailed - Averages arent representative
- Get percentiles
- Need to understand the variability of tests
conducted - Run the same test multiple times to obtain a
baseline - Helps you iron out your procedures
- Can get a measure of variability of test case so
that you can determine if the change you are
testing is significant - If the variability experienced between your base
test runs is small that is good- you have
repeatability - If the variability is large
- You need to make sure that any change you make
shows an even greater change
61Reporting the Test ResultsTemplate
- Executive Summary
- Graphs of results- e.g., end to end response time
- Scalability of solution
- Overall findings
- Background
- Hardware/OS/Applications
- Scripts
- Analysis of Results
- System and application performance
- Decomposition of response time
- Web tier, Application, Database
- Drill down again for details as necessary e.g.,
database metrics - Next steps
62Summary
- Cant always simulate everything - do the best
you can. - Implement the change in production and go back to
the lab to understand why it matched or didnt - When you discover a problem,
- Apply what youve learned
- Make necessary changes to procedures,
documentation, methodology- in the lab and
recommend changes for outside the lab - Improve the process, dont just bury or hide the
flaws! - Result better testing and smoother
implementations
63Questions?????????
- Contact Info
- Ellen Friedman
- SRM Associates, Ltd
- ellen_at_srmassoc.com
- 516-433-1817
- Part II. To be presented at CMG Conference
- Thursday 915-1015
- Session 512
- Measuring Performance in the Lab
- A Windows Case Study
-