Title: ppt title
1(No Transcript)
2DirectPlay Performance and Scalability Brett
HumphreyTest LeadDirectPlay Microsoft
Corporation
3How well do we scale?
- Look at one sample application data
- Data Profile
- Look at the DirectPlay layers
- What affects scalability
- DirectPlay Service Provider Layer
- Your callback functions
- Understanding your links state
- Look at your tuning options
4Sample maze runs
- Max clients we attempted
- 800 on a single proc PIII 800
- 5,200 clients on our quad proc
- Systems setup for the 5,200 client run
- 104 client systems running Windows 2000
- Each system ran 50 console clients
- 750 clients joining near the same time
- Server Compaq ProLiant R6500 quad proc Xeon PIII
550 with 512k of cache running Windows XP
Advanced Server
5Data Profiling
- Network conditions beyond our control
- Size of data being sent
- Time between sends
- Guaranteed vs. non-guaranteed
- Unidirectional vs. bidirectional traffic
6DirectPlay 8 Diagram
Your Application
DirectPlay Core Layer
DirectPlay Protocol layer
DirectPlay Service Provider Layer
WinSock Layer
7Server Bottom up, top down
Your Application
DirectPlay Core Layer
DirectPlay Protocol layer
DirectPlay Service Provider Layer
WinSock Layer
8Service Provider
- Receives packets off the wire
- The start of you receive callback
- Controls the use of your callbacks
- Handles work with the WinSock layer, TAPI, and
future methods of transport - Adjustable to your applications needs
9Adjusting the Caps
- SetSPCaps
- Found off the IDirectPlay Server, Client and Peer
Interfaces - HRESULT SetSPCaps( const GUID const pguidSP,
const DPN_SP_CAPS const pdpnSPCaps,const DWORD
dwFlags ) - Set two parts of the DPN_SP_CAPS
- Number of threads with dwNumThreads
- System buffer size with dwSystemBufferSize
10Adjusting the threads
- Default thread pool size 2N2 where N is the
number of processors - This is the number of threads used to receive
your packets off the wire - If your application is I/O bound and not CPU
bound you can increase your thread pool to help
offset this problem - Unused threads are idle
- Increase in thread pool helps connection response
time
11Adjusting the System Buffer
- Decreasing the system buffer size
- lower latency
- Increase the data send rate response
- Our tests
- We use a buffer size of 0 to decrease the latency
and increase the protocols response to changing
network conditions
12Setting these two variables
- // set the proper size on the structure
- dnSPCaps.dwSize sizeof(DPN_SP_CAPS)
- // do a get caps so all other information is
correct - hr m_pDPlay-gtGetSPCaps(CLSID_DP8SP_TCPIP,
dnSPCaps, dwFlags) - // check return code here
- // set the threads and buffer size for our
application - dnSPCaps.dwNumThreads dwYourNumSPThreads
- dnSPCaps.dwSystemBufferSize
dwYourSPBufferSize - hr m_pDPlay-gtSetSPCaps(CLSID_DP8SP_TCPIP,
dnSPCaps, dwFlags) - // check return code here
13Your changesto the ServiceProvider Caps
Your Application Layer
DirectPlay Core Layer
DirectPlay Protocol layer
Your requested receive threads
Your requested system buffer size
14Your receive callback thread
- The thread is used through all layers
- Minimizes context switching
- Gives the application a transparent pipe from raw
data to assembled messages - Your processing time in the thread can effect
data throughput
15Your Application Layer
Receive Message
Core Layer
5) You do your processing
Protocol layer
4) Hands up the message
Service Provider Layer
3) Assembles message
2) Validate inbound packet
WinSock Layer
1) Receive packet
16A nice callback function
- All threads are reentrant
- Takes locks sparingly
- Keeps data processing time short
- Uses DPN_PENDING message receives
- Returns receive thread to service provider
- Calls ReturnBuffer when finished
- Tracks some statistical data
- Maximum, and Average over time on
- Threads held
- Time the thread is held
17Receive thread trackingEnter receive thread
- m_csThreadCountLock.Enter()
- // Get the start time of callback thread
- FLOAT fStartTime DXUtil_Timer( TIMER_GETAPPTIME
) - m_wActiveThreadCount // Increment thread count
- // Collect max thread count.
- if(m_wActiveThreadCount gt m_wMaxThreadCount)
- m_wMaxThreadCount m_wActiveThreadCount
- // Calculate and average.
- // Change the divisor to adjust the avg movement.
- FLOAT fdiff m_wActiveThreadCount -
m_fAvgThreadCount - m_fAvgThreadCount fdiff/32
- m_csThreadCountLock.Leave()
18Receive thread trackingExit receive thread
- m_csThreadCountLock.Enter()
- //decrement the thread count
- m_wActiveThreadCount--
- //Calculate our time in the thread.
- FLOAT fDiffTime (DXUtil_Timer( TIMER_GETAPPTIME
) - fStartTime) - m_fAvgThreadTime - m_fAvgThreadTime fDiffTime/32
- //Get the Max time in the thread.
- if ( fDiffTime gt m_fMaxThreadTime )
- m_fMaxThreadTime fDiffTime
- m_csThreadCountLock.Leave()
19CPU load and DirectPlay
- CPU bound data can be dropped
- Callback threads are slower to respond
- Slow response causes packets to drop
- Dropped packets can throttle clients
- Overall throughput from clients suffer
- Debug increase CPU load and locking
20One possible goal
Game Server 2
Game Server 1
Game Server 3
Client 1
Client
Client 2
Client 5000
Client 3
Client
Client 4
Client 3000
Client
21Connection statistics
- GetConnectionInfo
- Found off the IDirectPlay Server, Client and Peer
Interfaces - HRESULT GetConnectionInfo( const DPNID
dpnidEndPoint, DPN_CONNECTION_INFO const
pdnInfo, const DWORD dwFlags ) - Stored by the protocol layer
- Contains per connection information
22Retrieving Connection Info
- //create your connection info structure
- DPN_CONNECTION_INFO dpnConnectionInfo
- // set the correct size
- dpnConnectionInfo.dwSize sizeof(DPN_CONNECTION_I
NFO) - // call GetConnectionInfo off of your interface
- hr m_pDP8Server-gtGetConnectionInfo (dpnID,
dpnConnectionInfo, 0) - // check your return code
- // Note DPNERR_INVALIDEPLAYER may happen if the
- // player has left the session just before
you - // made the call to GetConnectionInfo
23Sending data
- Two send data information locations
- GetConnectionInfo
- Show information on sent data
- GetSendQueueInfo
- Message left to be sent
- Includes non delivered guaranteed messages
- Send queue can grow without bound
- If queue backs up moderate your sending
- Use timeouts on your sends
24GetSendQueueInfo
- GetSendQueueInfo
- Found off the iDirectPlay Server, Client and Peer
Interfaces - HRESULT GetSendQueueInfo( const DPNID
dpnid, DWORD const pdwNumMsgs, DWORD
const pdwNumBytes, const DWORD dwFlags ) - (Note slide is hidden)
25Timeouts
- You are submitting data at a high rate
- The high rate of sending will cause the protocol
to queue up your unsent data - Using the timeouts will expire the old messages
- Timed out messages will show up in
GetConnectionInfo
261) Sending data with 150 ms timeout
Your Application Layer
Core Layer
Protocol layer
2) Sends have not timed out
3) Old message will be expired
Service Provider Layer
WinSock Layer
27What data do I have?
- Data Profile
- Adjustable service provider caps
- Thread information
- Connection information
- GetConnectionInfo
- Queue statistics
28Scenario Game traffic 1
- Data profile
- Client to server 120 bytes every 150ms
- Server to client 300 bytes every 150ms
- Continuous connect for greater than 20min
- Bidirectional traffic
- All non-guaranteed
- 300 ms timeout on server and client
- DirectPlay Service Provider Layer
- Threads default
- System buffers default
29Observations game traffic 1
- Server CPU usage is less than 70
- Server CPU spikes to 95 during connections
- All server receive threads are consumed during
connect - Clients report dropped packets
- Latency seem arbitrarily high
30Adjustments game traffic 1
- Adjustment for server side receive thread
consumption - Increase thread pool
- Result
- More threads to handle receive data
- CPU usage may increase
- Clients may have fewer dropped packets
- More clients should connect
31Adjustments game traffic 1
- Adjustment for latency
- Decrease the system buffer size to zero
- Result
- Latency should lower
32Scenario Game traffic 2
- Data profile
- Sending to server 50 bytes every 50ms
- Sending to client 600 bytes every 100ms
- Continuous connect for greater than 20min
- Bidirectional traffic
- All non-guaranteed
- Server timeout of 300ms
- Client timeout of 150ms
- DirectPlay Service Provider Layer
- Threads 40
- System buffers 0
33Observations game traffic 2
- Server queues are full at 3 messages
- 300ms timeout/100ms sends 3 messages
- Server reports a high rate of timed out messages
to most endpoints - Server CPU usages is near 80
- Server time in receive threads is .1ms
- Clients report choppy behavior
- Clients report dropped packets
34Adjustments game traffic 2
- Adjustment 1 for queue being full
- Increase your time between sends on both the
server and client - Increase your timeout to match the change
- Result
- This will reduce the overall bandwidth
consumption and may be enough to offset this
problem
35Adjustments game traffic 2
- Adjustment 2 for queue being full
- Reduce the number of clients attached to the
server - Load balance
- Result
- System will be able to keep up with outbound
traffic
36Scenario Game traffic 3
- Data profile
- Sending to server 50 bytes every 150ms
- Sending to client 400 bytes every 150ms
- Continuous connect for greater than 20min
- Bidirectional traffic
- 10 guaranteed traffic
- 600ms timeout (max queue of 4 message)
- DirectPlay Service Provider Layer
- Threads 40
- System buffers 0
37Observations game traffic 3
- Server CPU usages is near 95
- Server send queues are empty
- Server receive threads are consumed
- Clients report jerky behavior
- Client send queue is backed up at 4 messages
- Clients report a high rate of timed out messages
- Clients report dropped packets
38Adjustments game traffic 3
- Adjustment for client queue being full
- Increase update rate to make time between sends
longer - Result
- This may lower the inbound data to the server
- Adjustment for server CPU being at 95
- Reduce clients on server
- Result
- Lower inbound data to the server
39Scenario logon server
- Data profile
- Sending to server 120 bytes at connect
- Sending to client 500 bytes on connect complete
after retrieving information from the game data
store - Connect only for a few minutes
- All guaranteed
- No timeout on client or server
- DirectPlay Service Provider Layer
- Threads 40
40Observations logon server
- All server receive threads are consumed
- Server CPU usages is around 95 during connect
- Server time spent in receive threads is around
2ms to 4ms - Hundreds of clients are attempting to connect
- Clients are failing to connect
41Adjustments logon server
- Adjustment for receive thread consumption
- Increase thread pool
- Result
- More threads to handle connections
- More clients may successfully connect
42Adjustments logon server
- Adjustment to lower time in receive thread
- Remove the data store work from the receive
thread and move to a worker thread - Result
- The time in the receive threads may drop allowing
more connections
43Questions?
44Thank you!