GDC Tutorial, 2005. Building Multi-Player Games - PowerPoint PPT Presentation

About This Presentation

Title:

GDC Tutorial, 2005. Building Multi-Player Games

Description:

... for many, many tests per day, each with multiple inputs (two to two thousand players per test) ... Get some people on board who've been burned before: a ... – PowerPoint PPT presentation

Number of Views:120

Avg rating:3.0/5.0

Slides: 39

Provided by: Lar9156

Category:

more less

Transcript and Presenter's Notes

Title: GDC Tutorial, 2005. Building Multi-Player Games

1
GDC Tutorial, 2005. Building Multi-Player Games

Case Study The Sims Online
Lessons Learned,
Larry Mellon

2
TSO Overview

Initial team little to no MMP experience
Engineering estimate switching from 4-8 player
peer to peer to MMP client/server would take no
additional development time!
No code / architecture / tool support for
Long-term, continually changing nature of game
Non-deterministic execution, dual platform (win32
/ Linux)
Overall process designed for single-player
complexity, small development team
Limited nightly builds, minimal daily testing
Limited design reviews, limited scalability
testing, no maintainable/extensible impl.
requirement

3
TSO Case Study Outline(Lessons Learned)

Poorly designed SP ? MP ?MMP transitions
Scaling
Team code size, data set size
Build distribution
Architecture logical code
Visibility development operations
Testability development, release, load
Multi-Player, Non-determinism
Persistent user data vs code/content updates
Patching / new content / custom content

4
Scalability(Team Size Code Size)

What were the problems
Side effect breaks ability to work in parallel
Limited encapsulation poor testability
non-determinism TROUBLE
Independent module design impact on overall
system (initially, no system architect)
include structure
win32 / Linux, compile times, pre-compiled
headers, ...
What worked
Move to new architecture via Refactoring
Scaffolding
HSB, incSync, nullView Simulator, nullView
client,
Rolling integrations never dark
Sandboxing pumpkins

5
Scalability (Build Distribution)

To developers, customers fielded servers
What didnt work (well enough)
Pulling builds from developers workstations
Shell scripts manual publication
What worked well
Heavy automation with web tracking
Repeatability, Speed, Visibility
Hierarchies of promotion test

6
Scalability (Architecture)

Logical versus physical versus code structure
Only physical was not a major, MAJOR issue
Logical Replicated computing vs client / server
Security stability implications
Code Client / server isolation code sharing
Multiple, concurrent logic threads were sharing
codedata, each impacting the others
Nullview client simulator
Regulators vs Protocols bug counts state
machines

7
Go to final architecture ASAP
Multiplayer
Client Sim
Evolve
Here be Sync Hell
Client Sim
Client Sim
Client Sim
8
Final Architecture ASAPMake Everything
SmallerSeparate
9
Final Architecture ASAPReduce Complexity of
Branches
Shared Code
Packet Arrival
If (client)
Client server teams would constantly break each
other via changes to shared statecode
Shared State
If (server)
ifdef (nullview)
Client Event
Server Event
10
Final Architecture ASAPRefactoring

Decomposed into Multiple dlls
Found the Simulator
Interfaces
Reference Counting
Client/Server subclassing

How it helped
Reduced coupling. Even reduced compile times!
Developers in different modules broke each other
less often.
We went everywhere and learned the code base.

11
Final Architecture ASAPIt Had to Always Run

Initially clients wouldnt behave predictably
We could not even play test
Game design was demoralized
We needed a bridge, now!

?
?
12
Final Architecture ASAPIncremental Sync

A quick temporary solution
Couldnt wait for final system to be finished
High overhead, couldnt ship it
We took partial state snapshots on the server and
restored to them on the client

How it helped
Could finally see the game as it would be.
Allowed parallel game design and coding
Bought time to lay in the right stuff.

13
Architecture Conclusions

Keep it simple, stupid!
Client/server
Keep it clean
DLL/module integration points
ifdefs must die!
Keep it alive
Plan for a constant system architect role review
all modules for impact on team, other modules
extensibility
Expose control all inter-process communication
See Regulators state machines that control
transactions

14
TSO Case Study Outline(Lessons Learned)

Poorly designed SP ? MP ?MMP transitions
Scaling
Team code size, data set size
Build distribution
Architecture logical code
Visibility development operations
Testability development, release, load
Multi-Player, Non-determinism
Persistent user data vs code/content updates
Patching / new content / custom content

15
Visibility

Problems
Debugging a client/server issue was very slow
painful
Knowing what to work on next was largely
guesswork
Reproducing system failures from live environment
Knowing how one build or server cluster differed
from another was again largely guesswork
What we did that worked
Log / crash aggregators filters
Live critical event monitor
Esper live player engine metrics
Repeatable load testing
Web-based Dashboard health, status, where is
everything
Fully automated build publish procedures

16
Visibility via Bread Crumbs Aggregated
Instrumentation Flags Trouble Spots
Server Crash
17
Quickly Find Trouble Spots
DB byte count oscillates out of control, server
crashes
18
Drill Down For Details
A single DB Request is clearly at fault
19
TSO Case Study Outline(Lessons Learned)

Poorly designed SP ? MP ?MMP transitions
Scaling
Team code size, data set size
Build distribution
Architecture logical code
Visibility development operations
Testability development, release, load
Multi-Player, Non-determinism
Persistent user data vs code/content updates
Patching / new content / custom content

20
Testability

Development, release, load all show stopper
problems
QA coordination / speed / cost
Repeatablity, non-determinism
Need for many, many tests per day, each with
multiple inputs (two to two thousand players per
test)

21
Testability What Worked

Automated testing for repeatablity scale
Scriptable test clients mirrored actual user
play sessions
Changed the games architecture to increase
testability
External test harnesses to control 50 test
clients per CPU, 4,000 per session
Push-button UI to configure, run analyze tests
(developer QA)
Constantly updated Baselines, with Monkey Test
stats
Pre-checkin regression
QA web-driven state machine to control testers
collect/publish results
What didnt work
Event Recorders, unit testing
Manual-only testing

22
MMP Automated Testing Approach

Push-button ability to run large-scale,
repeatable tests
Cost
Hardware / Software
Human resources
Process changes
Benefit
Accurate, repeatable measurable tests during
development and operations
Stable software, faster, measurable progress
Base key decisions on fact, not opinion

23
Why Spend The Time Money?

System complexity, non-determinism, scale
Tests provide hard data in a confusing sea of
possibilities
End users high Quality of Service bar
Dev team greater comfort confidence
Tools augment your teams ability to do their
jobs
Find problems faster
Measure / change / measure repeat as necessary
Production executives come to depend on this
data to a high degree

24
Scripted Test Clients

Scripts are emulated play sessions just like
somebody plays the game
Command steps what the player does to the game
Validation steps what the game should do in
response

25
Scripts TailoredTo Each Test Application

Unit testing 1 feature 1 script
Load testing Representative play session
The average Joe, times thousands
Shipping quality corner cases, feature
completeness
Integration test code changes for catastrophic
failures

26
Scripted Players Implementation
Commands
Presentation Layer
27
Process Shift
Earlier Tools Investment Equals More Gain
Not Good Enough
28
Process Shifts Automated Testing Changes The
Shape Of The Development Progress Curve
Stability (Code Base Servers)
Keep Developers moving forward, not bailing water
Scale Feature Completeness
Focus Developers on key, measurable roadblocks
29
Process Shift Measurable Targets, Projected
Trend Lines
Target Complete
Core Functionality Tests, Any Feature (e.g.
clients)
Time
Any Time (e.g. Alpha)
Actionable progress metrics, early enough to react
30
Process Shift Load Testing (Before Paying
Customers Show Up)

Expose issues that only occur at scale

Establish hardware requirements
Establish play is acceptable _at_ scale
31
Client-Server Comparison
32
TSO Case Study Outline(Lessons Learned)

Poorly designed SP ? MP ?MMP transitions
Scaling
Team code size, data set size
Build distribution
Architecture logical code
Visibility development operations
Testability development, release, load
Multi-Player, Non-determinism
Persistent user data vs code/content updates
Patching / new content / custom content

33
User Data

Oops!
Users stored much more data (with much more
variance) that we had planned for
Caused many DB failures, city failures
BIG problem their persistent data has to work,
always, across all builds DB instances
What helped
Regression testing, each build, against live set
of user data
What would have helped more
Sanity checks against the DB
Range checks against user data
Better code architecture support for validation
of user data

34
Patching / New Content / Custom Content

Oops!
Initial Patch budget of 1Meg blown in 1st week of
operations
New Content required stronger, more predictable
process
Custom Content required infrastructure able to
easily add new content, on the fly
Key Issue all effort had gone into going Live,
not creating a sustainable process once Live
Conclusion designing these in would have been
much easier than retrofitting

35
Lessons Learned

autoTest Scripted test clients and instrumented
code rock!
Collection, aggregation and display of test data
is vital in making decisions on a day to day
basis
Lessen the panic
ScaleBreak is a very clarifying experience
Stable codeservers greatly ease the pain of
building a MMP game
Hard data (not opinion) is both illuminating and
calming
autoBuild make it pushbutton with instant web
visibility
Use early, use often to get bugs out before going
live
Budget for a strong architect role a strong
design review process for the entire game
lifecycle
Scalability, testability, patching new content
long-term persistence are requirements MUCH
cheaper to design in than frantic retrofitting
KISS principle is mandatory, as is expecting
changes

36
Lessons Learned

Visibility tremendous volumes of data require
automated collectionsummarization
Provide drill-down access to details from summary
view web pages
Get some people on board whove been burned
before a lot of TSOs pain could have been
easily avoided, but little distributed system
experience MMP design issues existed in early
phases of project
Fred Brooks, the 31st programmer
Strong tools process pays off for large teams
long-term operations
Measure improve your workspace, constantly
Non-determinism is painful unavoidable
Minimize impact via explicit design support use
strong, constant calibration to understand it

37
Biggest Wins
Code Isolation
Scaffolding
Tools Build / Test / Measure,
Information Management
Pre-Checkin Regression / Load Testing
38
Biggest Losses
Architecture Massively peer to peer
Early lack of tools
ifdef across platform / function
Critical Path dependencies
More Details www.maggotranch.com/MMP (3 TSO
Lessons Learned talks)

Write a Comment

User Comments (0)