Title: Automated Testing: Better, Cheaper, Faster, For Everything
1Automated TestingBetter, Cheaper, Faster,For
Everything
- Larry Mellon, Steve Keller
- Austin Game Conference
- Sept, 2004
2About This Talk
- Highly visual slides are often followed by a Key
Points text slide that provides additional
details. For smoother flow, such slides are
hidden in presentation mode. - Some animations are not compatible with older
versions of PowerPoint.
3What Is A MMPAutomated Testing System?
- Push-button ability to run large-scale,
repeatable tests - Cost
- Hardware / Software
- Human resources
- Process changes
- Benefit
- Accurate, repeatable measurable tests during
development and operations - Stable software, faster, measurable progress
- Base key decisions on fact, not opinion
4Key Points
- Comfort and confidence level
- Managers/Producers can easily judge how
development is progressing - Just like bug count reports, test reports
indicate overall quality of current state of the
game - Frequent, repeatable tests show progress
backsliding - Investing developers in the test process helps
prevent QA vs. Development shouting matches - Smart developers like numbers and metrics just as
much as producers do - Making your goals you will ship cheaper,
better, sooner - Cheaper even though initial costs may be
higher, issues get exposed when its cheaper to
fix them (and developer efficiency increases) - Better robust code
- Sooner its ok to ship now is based on real
data, not supposition
5MMP Requires A Strong Commitment To Testing
- System complexity, non-determinism, scale
- Tests provide hard data in a confusing sea of
possibilities - Increase comfort and confidence of entire team
- Tools augment your teams ability to do their
jobs - Find problems faster
- Measure / change / measure repeat as necessary
- Production / exec teams come to depend on this
data to a high degree
6How To Get There
- Plan for testing early
- Non-trivial system
- Architectural implications
- Make sure the entire team is on board
- Be willing to devote time and money
7Automation Architecture
Startup Control
Collection Analysis
Repeatable, Synced Test Inputs
System Under Test
System Under Test
System Under Test
Scripted Test Clients Emulated User Play
Sessions Multi-client synchronization
Report Managers Raw Data Collection Aggregation /
Summarization Alarm Triggers
Test Manager Test Selection/Setup Control N
Clients RT probes
8Key Points
- Scriptable test clients
- Lightweight subset of the shipping client
- Instrumented spits out lots of useful
information - Repeatable
- Bots help you understand the test results
- Log both server and client output (common
format), w/timestamps! - Automated metrics collection aggregation
- High level at a glance reports with detail
drill down - Pushbutton application for both running and
analyzing a test
9Outline
- Overview Automated Testing
- Definition, Value, High-Level Approach
- Applying Automated Testing
- Mechanics, Applications
- Process Shifts Stability, Scale Metrics
- Implementation Key Risks
- Summary Questions
10Scripted Test Clients
- Scripts are emulated play sessions just like
somebody plays the game - Command steps what the player does to the game
- Validation steps what the game should do in
response
11Scripts TailoredTo Each Test Application
- Unit testing 1 feature 1 script
- Load testing Representative play session
- The average Joe, times thousands
- Shipping quality corner cases, feature
completeness - Integration test code changes for catastrophic
failures
12Bread Crumbs Aggregated Instrumentation Flags
Trouble Spots
Server Crash
13Quickly Find Trouble Spots
DB byte count oscillates out of control
14Drill Down For Details
A single DB Request is clearly at fault
15Process Shift Applying Automation to Development
Earlier Tools Investment Equals More Gain
Not Good Enough
16Process Shifts Automated Testing Can Change The
Shape Of The Development Progress Curve
Stability
Keep Developers moving forward, not bailing water
Scale
Focus Developers on key, measurable roadblocks
17Process Shift Measurable Targets, Projected
Trend Lines
Target Complete
Core Functionality Tests, Any Feature (e.g.
clients)
Time
Any Time (e.g. Alpha)
Actionable progress metrics, early enough to react
18Stability Analysis What Brings Down The Team?
Test Case Can an Avatar Sit in a Chair?
Failures on the Critical Path block access to
much of the game. Worse, unreliable failures
use_object ()
buy_object ()
enter_house ()
buy_house ()
create_avatar ()
login ()
19Impact On Others
20(No Transcript)
21Key Points
- Build stability slowed forward progress
(especially the critical path) - People were blocked from getting work done
- Uncertainty did I break that, or did it just
happen? - A lot of developers just didnt get
non-determinism - Backsliding things kept breaking
- Monkey Tests always current baseline for
developers - Common measuring stick across builds
deployments extremely valuable
22Monkey Test EnterLot
23Non-Deterministic Failures
24Key Points
- 30 test runs, 4 behaviours
- Successful entry
- Hang or Crash
- Owner evicted, all possessions stolen
- Random results observed in all major features
- Critical Path random failures outside of Unit
Tests very difficult to track
25Stability Via Monkey Tests
Continual Repetition of Critical Path Unit Tests
26Key Points
- Hourly stability checkers
- Aging (dirty processes, growing datasets, leaking
memory) - Moving parts (race conditions)
- Stability measure what works, right now?
- Flares go off, etc
- Unit tests (against Features)
- Minimal noise / side effects
- Reference point what should work?
- Clarity in reporting / triaging
27Process Shift Comb Filter Testing
Sniff Test, Monkey Tests - Fast to run -
Catch major errors - Keeps coders working
Smoke Test, Server Sniff - Is the game
playable? - Are the servers stable under a
light load? - Do all key features work?
Full Feature Regression, Full Load Test - Do
all test suites pass? - Are the servers stable
under peak load conditions?
New code ready For checkin
Promotable to full testing
Promotable to paying customers
Full system build
- Cheap tests to catch gross errors early in the
pipeline - More expensive tests only run on known
functional builds
28Key Points
- Much faster progress after stability checkers
added - Sniff
- Hourly reference tests (sniff monkey, unit
monkey) - Comb filters kept the manpower overhead low (on
both sides, and gave quick feedback. Fewer redos
for engs, fewer bugs for QA to findprocess) - Extra post-checkin testing story (optional)
- Size of team gives high broken build cost
- Fewer Redos
- Fewer side-effect bugs
29Process Shift Who Tests What?
- Automation simple tasks (repetitive or
large-scale) - Load _at_ scale
- Workflow (information management)
- Full weapon damage assessment, broad, shallow
feature coverage - Manual judgment / innovative tasks
- Visuals, playability, creative bug hunting
- Combined
- Tier 1 / Tier 2 automation flags potential
errors, manual investigates - Within a single test automation snapshots key
game states, manual evaluates results - Augmented / accelerated complex build steps,
30Process Shift Load Testing (Before Paying
Customers Show Up)
- Expose issues that only occur at scale
Establish hardware requirements
Establish play is acceptable _at_ scale
31(No Transcript)
32Client-Server Comparison
33Highly Accurate Load TestingMonkey See /
Monkey Do
Sim Actions (Player Controlled)
Sim Actions (Script Controlled)
34Outline
- Overview Automated Testing
- Definition, Value, High-Level Approach
- Applying Automated Testing
- Mechanics, Applications
- Process Shifts Stability, Scale Metrics
- Implementation Key Risks
- Summary Questions
35Data Driven Test Client
Regression
Load
Reusable Scripts Data
Single API
Test Client
Single API
Key Game States
Pass/Fail Responsiveness
Script-Specific Logs Metrics
36Scripted Players Implementation
Commands
Presentation Layer
37What Level To Test At?
Game Client
View
Mouse Clicks
Presentation Layer
Logic
Regression Too Brittle (UIpixel shift) Load
Too Bulky
38What Level To Test At?
Game Client
View
Internal Events
Presentation Layer
Logic
Regression Load Too Brittle (Churn Rate vs
Logic Data)
39Automation Scripts QA Tester Scripts
Basic gameplay changes less frequently than UI or
protocol implementations.
NullView Client
View
Presentation Layer
Logic
40Key Points
- Support costs one (data driven) client better
than N clients - Tailorable validation output turned out to be a
very powerful construct - Each test script contains required validation
steps (flexible, tunable, ) - Minimize state to regress against fewer false
positives
41Common Gotchas
- Setting the Test bar too high, too early
- Feature drift expensive test maintenance
- Code is built incrementally reporting failures
nobody is prepared to deal with wastes
everybodys time - Non-determinism
- Race conditions, dirty buffers/processState,
- Developers test with a single client against a
single server no chance to expose race
conditions - Not designing for testability
- Testability is an end requirement
- Retrofitting is expensive
- No senior engineering committed to the testing
problem
42Outline
- Overview Automated Testing
- Definition, Value, High-Level Approach
- Applying Automated Testing
- Mechanics, Applications
- Process Shifts Stability Scale
- Implementation Key Risks
- Summary Questions
43Summary Mechanics Implications
- Scripted test clients and instrumented code rock!
- Collection, aggregation and display of test data
is vital in making decisions on a day to day
basis - Lessen the panic
- ScaleBreak is a very clarifying experience
- Stable codeservers in development greatly ease
the pain of building a MMP game - Hard data (not opinion) is both illuminating and
calming - Long-term operations testing is a recurring cost
44Summary Process
- Integrate automated testing at all levels
- Dont just throw testing over the wall to QA
monsters - Use automation to speed focus development
- Stability Sniff Test, Monkey Tests
- Scale Load Test
45Summary Key Points
- Ship a better game
- Lessen the panic
- Constant testing for stability prevents
backsliding during development and operations,
keeps the team moving forward, roadblock free,
keeps the player experience smooth - Early load testing exposes critical server costs
and failures in time to be addressed - Everybody knows what works, every day
- Testing its not just for QA anymore
- Continual content extensions while keeping
previous features stable, over years of
operations - Stable systems keep customers happy developers
working on new features, not fire-fighting - Recurring cost excellent fit for tool investment
46Tabula Rasa
PreCheckin SniffTest
Keep Mainline Working
Hourly Monkey Tests
Baseline for Developers
Dedicated Tools Group
Easy to Use Used
Executive Support
Radical Shifts in Process
Load Test Early Often
Break It Before Live
Distribute Test Development Ownership Across
Full Team
47Cautionary Tales
Flexible Game Development Requires Flexible Tests
Signal To Noise Ratio
Defects Variance In The Testing System
48Key Points
- Initial development phase game design in
constant flux - Tests usually start by not working
- Noise makes it hard to find results
- boy who cried wolf syndrome
- Business decisions get made off testing results
make sure theyre accurate (load testing inputs,
report generators, probing system, script errors,
) - Team trust another factor
- Complex system with high degree of flex requires
- Senior engineers full time
- Team management commitment
49Questions (15 Minutes)
- Overview Automated Testing
- Definition, Value, High-Level Approach
- Applying Automated Testing
- Mechanics, Applications
- Process Shifts Stability, Scale Metrics
- Implementation Key Risks
Slides online _at_ www.maggotranch.com/MMP