GDC Tutorial, 2005. Building Multi-Player Games - PowerPoint PPT Presentation

About This Presentation
Title:

GDC Tutorial, 2005. Building Multi-Player Games

Description:

... for many, many tests per day, each with multiple inputs (two to two thousand players per test) ... Get some people on board who've been burned before: a ... – PowerPoint PPT presentation

Number of Views:120
Avg rating:3.0/5.0
Slides: 39
Provided by: Lar9156
Category:

less

Transcript and Presenter's Notes

Title: GDC Tutorial, 2005. Building Multi-Player Games


1
GDC Tutorial, 2005. Building Multi-Player Games
  • Case Study The Sims Online
  • Lessons Learned,
  • Larry Mellon

2
TSO Overview
  • Initial team little to no MMP experience
  • Engineering estimate switching from 4-8 player
    peer to peer to MMP client/server would take no
    additional development time!
  • No code / architecture / tool support for
  • Long-term, continually changing nature of game
  • Non-deterministic execution, dual platform (win32
    / Linux)
  • Overall process designed for single-player
    complexity, small development team
  • Limited nightly builds, minimal daily testing
  • Limited design reviews, limited scalability
    testing, no maintainable/extensible impl.
    requirement

3
TSO Case Study Outline(Lessons Learned)
  • Poorly designed SP ? MP ?MMP transitions
  • Scaling
  • Team code size, data set size
  • Build distribution
  • Architecture logical code
  • Visibility development operations
  • Testability development, release, load
  • Multi-Player, Non-determinism
  • Persistent user data vs code/content updates
  • Patching / new content / custom content

4
Scalability(Team Size Code Size)
  • What were the problems
  • Side effect breaks ability to work in parallel
  • Limited encapsulation poor testability
    non-determinism TROUBLE
  • Independent module design impact on overall
    system (initially, no system architect)
  • include structure
  • win32 / Linux, compile times, pre-compiled
    headers, ...
  • What worked
  • Move to new architecture via Refactoring
    Scaffolding
  • HSB, incSync, nullView Simulator, nullView
    client,
  • Rolling integrations never dark
  • Sandboxing pumpkins

5
Scalability (Build Distribution)
  • To developers, customers fielded servers
  • What didnt work (well enough)
  • Pulling builds from developers workstations
  • Shell scripts manual publication
  • What worked well
  • Heavy automation with web tracking
  • Repeatability, Speed, Visibility
  • Hierarchies of promotion test

6
Scalability (Architecture)
  • Logical versus physical versus code structure
  • Only physical was not a major, MAJOR issue
  • Logical Replicated computing vs client / server
  • Security stability implications
  • Code Client / server isolation code sharing
  • Multiple, concurrent logic threads were sharing
    codedata, each impacting the others
  • Nullview client simulator
  • Regulators vs Protocols bug counts state
    machines

7
Go to final architecture ASAP
Multiplayer
Client Sim
Evolve
Here be Sync Hell
Client Sim
Client Sim
Client Sim
8
Final Architecture ASAPMake Everything
SmallerSeparate
9
Final Architecture ASAPReduce Complexity of
Branches
Shared Code
Packet Arrival
If (client)
Client server teams would constantly break each
other via changes to shared statecode
Shared State
If (server)
ifdef (nullview)
Client Event
Server Event
10
Final Architecture ASAPRefactoring
  • Decomposed into Multiple dlls
  • Found the Simulator
  • Interfaces
  • Reference Counting
  • Client/Server subclassing
  • How it helped
  • Reduced coupling. Even reduced compile times!
  • Developers in different modules broke each other
    less often.
  • We went everywhere and learned the code base.

11
Final Architecture ASAPIt Had to Always Run
  • Initially clients wouldnt behave predictably
  • We could not even play test
  • Game design was demoralized
  • We needed a bridge, now!

?
?
12
Final Architecture ASAPIncremental Sync
  • A quick temporary solution
  • Couldnt wait for final system to be finished
  • High overhead, couldnt ship it
  • We took partial state snapshots on the server and
    restored to them on the client
  • How it helped
  • Could finally see the game as it would be.
  • Allowed parallel game design and coding
  • Bought time to lay in the right stuff.

13
Architecture Conclusions
  • Keep it simple, stupid!
  • Client/server
  • Keep it clean
  • DLL/module integration points
  • ifdefs must die!
  • Keep it alive
  • Plan for a constant system architect role review
    all modules for impact on team, other modules
    extensibility
  • Expose control all inter-process communication
  • See Regulators state machines that control
    transactions

14
TSO Case Study Outline(Lessons Learned)
  • Poorly designed SP ? MP ?MMP transitions
  • Scaling
  • Team code size, data set size
  • Build distribution
  • Architecture logical code
  • Visibility development operations
  • Testability development, release, load
  • Multi-Player, Non-determinism
  • Persistent user data vs code/content updates
  • Patching / new content / custom content

15
Visibility
  • Problems
  • Debugging a client/server issue was very slow
    painful
  • Knowing what to work on next was largely
    guesswork
  • Reproducing system failures from live environment
  • Knowing how one build or server cluster differed
    from another was again largely guesswork
  • What we did that worked
  • Log / crash aggregators filters
  • Live critical event monitor
  • Esper live player engine metrics
  • Repeatable load testing
  • Web-based Dashboard health, status, where is
    everything
  • Fully automated build publish procedures

16
Visibility via Bread Crumbs Aggregated
Instrumentation Flags Trouble Spots
Server Crash
17
Quickly Find Trouble Spots
DB byte count oscillates out of control, server
crashes
18
Drill Down For Details
A single DB Request is clearly at fault
19
TSO Case Study Outline(Lessons Learned)
  • Poorly designed SP ? MP ?MMP transitions
  • Scaling
  • Team code size, data set size
  • Build distribution
  • Architecture logical code
  • Visibility development operations
  • Testability development, release, load
  • Multi-Player, Non-determinism
  • Persistent user data vs code/content updates
  • Patching / new content / custom content

20
Testability
  • Development, release, load all show stopper
    problems
  • QA coordination / speed / cost
  • Repeatablity, non-determinism
  • Need for many, many tests per day, each with
    multiple inputs (two to two thousand players per
    test)

21
Testability What Worked
  • Automated testing for repeatablity scale
  • Scriptable test clients mirrored actual user
    play sessions
  • Changed the games architecture to increase
    testability
  • External test harnesses to control 50 test
    clients per CPU, 4,000 per session
  • Push-button UI to configure, run analyze tests
    (developer QA)
  • Constantly updated Baselines, with Monkey Test
    stats
  • Pre-checkin regression
  • QA web-driven state machine to control testers
    collect/publish results
  • What didnt work
  • Event Recorders, unit testing
  • Manual-only testing

22
MMP Automated Testing Approach
  • Push-button ability to run large-scale,
    repeatable tests
  • Cost
  • Hardware / Software
  • Human resources
  • Process changes
  • Benefit
  • Accurate, repeatable measurable tests during
    development and operations
  • Stable software, faster, measurable progress
  • Base key decisions on fact, not opinion

23
Why Spend The Time Money?
  • System complexity, non-determinism, scale
  • Tests provide hard data in a confusing sea of
    possibilities
  • End users high Quality of Service bar
  • Dev team greater comfort confidence
  • Tools augment your teams ability to do their
    jobs
  • Find problems faster
  • Measure / change / measure repeat as necessary
  • Production executives come to depend on this
    data to a high degree

24
Scripted Test Clients
  • Scripts are emulated play sessions just like
    somebody plays the game
  • Command steps what the player does to the game
  • Validation steps what the game should do in
    response

25
Scripts TailoredTo Each Test Application
  • Unit testing 1 feature 1 script
  • Load testing Representative play session
  • The average Joe, times thousands
  • Shipping quality corner cases, feature
    completeness
  • Integration test code changes for catastrophic
    failures

26
Scripted Players Implementation
Commands
Presentation Layer
27
Process Shift
Earlier Tools Investment Equals More Gain
Not Good Enough
28
Process Shifts Automated Testing Changes The
Shape Of The Development Progress Curve
Stability (Code Base Servers)
Keep Developers moving forward, not bailing water
Scale Feature Completeness
Focus Developers on key, measurable roadblocks
29
Process Shift Measurable Targets, Projected
Trend Lines
Target Complete
Core Functionality Tests, Any Feature (e.g.
clients)
Time
Any Time (e.g. Alpha)
Actionable progress metrics, early enough to react
30
Process Shift Load Testing (Before Paying
Customers Show Up)
  • Expose issues that only occur at scale

Establish hardware requirements
Establish play is acceptable _at_ scale
31
Client-Server Comparison
32
TSO Case Study Outline(Lessons Learned)
  • Poorly designed SP ? MP ?MMP transitions
  • Scaling
  • Team code size, data set size
  • Build distribution
  • Architecture logical code
  • Visibility development operations
  • Testability development, release, load
  • Multi-Player, Non-determinism
  • Persistent user data vs code/content updates
  • Patching / new content / custom content

33
User Data
  • Oops!
  • Users stored much more data (with much more
    variance) that we had planned for
  • Caused many DB failures, city failures
  • BIG problem their persistent data has to work,
    always, across all builds DB instances
  • What helped
  • Regression testing, each build, against live set
    of user data
  • What would have helped more
  • Sanity checks against the DB
  • Range checks against user data
  • Better code architecture support for validation
    of user data

34
Patching / New Content / Custom Content
  • Oops!
  • Initial Patch budget of 1Meg blown in 1st week of
    operations
  • New Content required stronger, more predictable
    process
  • Custom Content required infrastructure able to
    easily add new content, on the fly
  • Key Issue all effort had gone into going Live,
    not creating a sustainable process once Live
  • Conclusion designing these in would have been
    much easier than retrofitting

35
Lessons Learned
  • autoTest Scripted test clients and instrumented
    code rock!
  • Collection, aggregation and display of test data
    is vital in making decisions on a day to day
    basis
  • Lessen the panic
  • ScaleBreak is a very clarifying experience
  • Stable codeservers greatly ease the pain of
    building a MMP game
  • Hard data (not opinion) is both illuminating and
    calming
  • autoBuild make it pushbutton with instant web
    visibility
  • Use early, use often to get bugs out before going
    live
  • Budget for a strong architect role a strong
    design review process for the entire game
    lifecycle
  • Scalability, testability, patching new content
    long-term persistence are requirements MUCH
    cheaper to design in than frantic retrofitting
  • KISS principle is mandatory, as is expecting
    changes

36
Lessons Learned
  • Visibility tremendous volumes of data require
    automated collectionsummarization
  • Provide drill-down access to details from summary
    view web pages
  • Get some people on board whove been burned
    before a lot of TSOs pain could have been
    easily avoided, but little distributed system
    experience MMP design issues existed in early
    phases of project
  • Fred Brooks, the 31st programmer
  • Strong tools process pays off for large teams
    long-term operations
  • Measure improve your workspace, constantly
  • Non-determinism is painful unavoidable
  • Minimize impact via explicit design support use
    strong, constant calibration to understand it

37
Biggest Wins
Code Isolation
Scaffolding
Tools Build / Test / Measure,
Information Management
Pre-Checkin Regression / Load Testing
38
Biggest Losses
Architecture Massively peer to peer
Early lack of tools
ifdef across platform / function
Critical Path dependencies
More Details www.maggotranch.com/MMP (3 TSO
Lessons Learned talks)
Write a Comment
User Comments (0)
About PowerShow.com