Stackless Python in EVE, pt. II - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Stackless Python in EVE, pt. II

Description:

High variance. Sometimes get very long ping times (seconds) Unpredictable scheduling ... Up 7 days a week. 1 hour regular downtime each day, for DB maintainance, ... – PowerPoint PPT presentation

Number of Views:150
Avg rating:3.0/5.0
Slides: 51
Provided by: arni
Category:
Tags: eve | python | stackless

less

Transcript and Presenter's Notes

Title: Stackless Python in EVE, pt. II


1
  • Stackless Python in EVE, pt. II
  • Kristján Valur Jónsson
  • CCP Games

2
Introduction
  • The history of EVE
  • Built using Stackless Python from the year 2000
  • Launched in 2003
  • Moved from 1.5.2 to 2.5.1
  • Talk on PyCon 2006
  • Custom patches
  • Unicode imports
  • Faster __getattr__

3
Introduction
  • Eve subscribers

4
Introduction
  • Currently some 260.000 subscribers
  • More than 55.000 concurrent players
  • Apocrypha expansion on shelves
  • Future growth expected

5
Introduction

6
Introduction
  • Core Concepts
  • Stackless python Numerous inexpensive tasklets.
  • Cooperative multithreading
  • stackless.run() never interrupts a tasklet
  • Tasklets switch only at known points (send/recv,
    Yield(), channel.recv(), etc)
  • No locking required (most of the time)
  • Simple programming model.

7
Introduction
  • Game Loop
  • while True
  • WaitForMultipleItems() IO
  • WakeupIOTasklets()
  • RunScheduler() sleepers, yielders
  • stackless.run()
  • TickGame() space simulation, etc

8
Introduction
  • RPC
  • r remoteService.GetShipID(playerID)
  • def RemoteServiceCall(target, call)
  • callID newID()
  • SendMessage(target, id, call)
  • r WaitForResponse(id)
  • return UnpackResponse(r)

9
Introduction
  • IO using stackless python
  • def StacklessRecv(socket, len)
  • r socket.PostRequest(len)
  • return r.channel.recv()
  • def Poller()
  • for r in requests
  • if r.done
  • r.channel.send(r.result)

10
Problems
  • Inter-node latency
  • Ping times of some 200ms
  • High variance
  • Sometimes get very long ping times (seconds)
  • Unpredictable scheduling
  • Loaded nodes would work, but some operations
    would not complete, while others happened in a
    jiffy.

11
Problems
  • Jita system. A busy market hub

12
Problems
  • Simple Ping RPC to Jita node from client
  • 0.5 seconds
  • Spikes of up to 4 minutes
  • Meanwhile concurrent pings succeed!

13
Problems
  • Polaris, lightly loaded node
  • Occasional spikes of 3 seconds

14
StacklessIO
  • Get rid of legacy IO systems
  • Server socket layer written in three days in 2002
  • External C source code adapted for stackless
  • A different system for DB queries
  • Yet another socket layer for clients
  • Windows 98 didnt support IO Completion ports

15
StacklessIO
  • Provide a unified framework for blocking
    operations in Stackless Python
  • Write a socket implementation using it
  • Same semantics as std socket
  • Use on both client and server
  • Rewrite network code to use socket
  • Adapt DB to use this system

16
StacklessIO
  • Written in C
  • A central IOEventQueue object
  • An IOEvent hierarchy
  • IORequest
  • IOOverlapped
  • IOWorker
  • Subclassable by other modules
  • Currently uses boost and win32 api

17
StacklessIO
  • Uses win32 thread pool for IOWorker and
    IOOverlapped
  • Further subclasses of IOOverlapped, e.g.
    IOWSAOverlapped to support overlapped winsock
    operations
  • Provides its own stacklessio._socket module.

18
StacklessIO
  • Extra enhancements
  • socket.sendpacket(), recvpacket()
  • socket.send(a, b, (c, d))
  • socket.blockingsendFalse

19
connect()
  • class ConnectResult public WSIOWorker
  • void ThreadFunc()
  • int err connect(mXtra-gtSocket(), (sockaddr
    )mAddr, mAddrLen)
  • if (err SOCKET_ERROR)
  • SetError("connect", WSAGetLastError())
  • else
  • mXtra-gtmStats.Connect()

20
connect()
  • int slsock_connect(PySocketSockObject s,
    sock_addr_t addr, int addrlen, int timeoutp)
  • try
  • timeoutp 0
  • boostintrusive_ptrltConnectResultgt result(
  • new ConnectResult(
  • static_castltSocketXtragt(s-gtsock_xtradata),
    addr, addrlen))
  • result-gtExecuteAndWait()
  • return 0
  • catch (const Win32Error e)
  • SetWSAError(e)
  • return e.GetCode()

21
StacklessIO
  • Monkeypatching is easy
  • Monkeypatch _socket with stacklessio._socket
  • import sys
  • from stacklessio import _socket
  • sys.modules"_socket" _socket
  • stacklessio._socket is a copy of socketmodule.c,
    with enhancements.
  • Standard modules, such as urllib, just work
    without blocking!

22
Deployment
  • Only one real testing ground The Tranquility
    cluster.
  • Needed empirical data to guide us
  • Change the tyres of a moving bus full of
    passengers!

23
Tranquility cluster
  • Up 7 days a week
  • 1 hour regular downtime each day, for DB
    maintainance, updates, etc.
  • Hundreds of thousands of players expect their
    game to work
  • Established QA protocols for server updates,
    client patches, etc. to minimize risk.
  • Completely inadequate for interactive improvement.

24
Enter Cowboy
  • Todo picture of Ray Krebbs

25
Cowboy mode
  • Normal QA procedures were short-circuited.
  • Daily Cowboy meetings of EVE software group,
    Operations, and Core technology group.
  • Review of problems, and cluster performance.
  • Proposed changes discussed.
  • Server-side deployement fast-tracked through
    unanimous consensus.

26
Iterative refinement
  • Low latency is very important!
  • (But so is fairness! Later)
  • How do we wake up the IO tasklets as soon as
    possible?
  • Explored many differnet ways

27
Iterative refinement
  • Waking up sleeping tasklets
  • Main tasklet, from the main loop, after
    WaitForMultipleObjectsEx(), call Dispatch()
  • Client tasklet, having submitted request, but
    before calling channel.recv(), call Dispatch()

28
Iterative refinement
  • Use ScheduleThreadCalll(TODO) to invoke a
    callback on main thread.
  • Callback executed during SleepEx(),
    WaitForMultipleObjectsEx() and other calls that
    put the thread in an alertable state.
  • Callback must acquire the GIL using
    PyGILState_Ensure()
  • Must call Dispacth() . Caller of SleepEx() et
    al. beware of tasklet action!

29
Iterative refinement
  • Have the worker thread that finishes the IO
    acquire the GIL itself using and call Dispatch()
  • Use stackless interthread channel action.
  • Will make target tasklet runnable, otherwise no
    tasklet synchronization

30
Iterative refinement
  • Use Pythons pending calls mechanimsm
    (Py_AddPendingCall()).
  • Used for signals on unix
  • Asks python to call a callback during the next
    housekeeping tick (sys.settickinterval())
  • Tick callback calls Dispatch
  • Again, no tasklet switches may happen, since the
    application doesnt expect tasklet switches at
    arbitrary points.

31
Iterative refinement
  • Above methods (but 1) added incrementally
  • Togglable live from ESP
  • Or from prefs.ini

32
Problems with these
  • Client tasklets calling Dispatch() could be
    problematic
  • Must be careful not to wake up more recent
    requests than self (fairness)

33
Problems with these
  • Thread callabacks during alertable states could
    not perform tasklet switches
  • Got strange results from OLEDB if tasklet
    switches occurred in an alertable state
  • Presumably, stack allocated variables were being
    used by some workers.
  • Stackless messes with stack, so other threads
    cant use stack vars.

34
Problems with these
  • Worker threads acquiring the GIL and performing
    inter-thread chanel action didnt scale.
  • Callbacks on system pool threads got hung up
    waiting for the GIL
  • System thread pool got starved of threads!
  • Only saw this on live cluster! (Hello Cowboy!)
  • Quickly disabled this cluster-wide using ESP

35
Fairness
  • Back story
  • Historically, channel.send() would always cause a
    tasklet switch
  • If no one was receive()-ing, tasklet would block
  • If a receiver was blocked, it would be woken up
    and the sender would sleep
  • Eve had always assumed this behaviour
  • Some times we wrote non blocking send using
    worker tasklets

36
Fairness
  • Priotiry chaos!
  • Tasklets are awoken in a chain reaction.
  • Overview is quickly lost
  • Target tasklet may awake some other tasklet,
    giving it priority

37
Fairness
  • channel.preference
  • Availible since stackless 2.3
  • Default is -1, receive preference, same as old
    days.
  • Value of 0, no preference, means that a tasklet
    switch is not done unless needed.
  • channel.send(), where a receiver is waiting, will
    make that receiver runnable but not switch to it.

38
Fairness
  • The necessity to not switch tasklets of some of
    the Dispatch methods led us to arrive at
  • The Round Robin Way!

HolyPython!
39
Round Robin
  • Always use channel.preference 0
  • Tasklets are made runnable in the order that
    they receive their send()
  • No tasklet gets to skip the queue.
  • Stackless maintains an ordered runnable queue,
    trust this.

40
Round Robin
  • Extend this to the rest of the code
  • blue.pyos.Synchro() now uses preference0
    (handles Sleep(), Yield(), etc)
  • Uthread.Semaphore, uthread.CriticalSection() and
    others use preference0
  • All the tasklets in the forest shall friends!
    (Torbjørn Egner)

41
Round Robin
  • Stacklessios Dispatch method also uses this
  • Dispatch never causes tasklet switch, it only
    makes IO requests runnable.
  • Can stick with methods 1, 2 and 5 of waking up
    tasklets, 5 being the Fastest gun in the West
    (Pending Calls) while 1 remains the Charles
    Ingalls of Dispatches.

42
Necessary Changes
  • We must make sure that stackless.run() regularly
    finishes to keep game loop running
  • It can keep waking IO tasklets and switching to
    them ad infinitum
  • Old stackless.run() would quit only when
  • There were no runnable tasklets
  • A tasklet had been running for N ticks without
    switching.

43
Necessary Changes
  • EVE tasklets switch regularly, so stackless.run()
    could go on forever
  • A new mode was needed!

44
Necessary Changes
  • stackless.run(totaltimeoutN, softTrue)
  • Will return once N ticks have gone by
  • Will not interrupt a tasklet, but use tasklet
    switches as interrupt points
  • Allows us to have tasklets being living,
    switching and dying to their own ends until a
    fixed timeout occurs.
  • Can scale this timeout to match a desired FPS,
    such as 4fps on a server.

45
Necessary Changes
  • PyAdd_PendingCall() wasnt thread safe!
  • This useful API was only used by signals on the
    main thread on unix platforms. (ok, SIGINT on
    Windows)
  • Implementation was from before python
    multithreading Not thread aware
  • Made it fully reentrant
  • Can be used by signal handlers
  • Can be called from any thread
  • Known to Python or not
  • With or without the GIL

46
Comparison client-Jita ping
47
Comparison client-Jita ping
48
Comparison client-Jita ping
49
Open source
  • StacklessIO made open source in next few weeks.
  • Needs work to make cross platform
  • Synchronization primitives
  • Threadpool
  • Async sockets on linux

50
Questions?
  • Kristján Valur Jónsson
  • CCP Games
  • kristjan_at_ccpgames.com
Write a Comment
User Comments (0)
About PowerShow.com