bw logo

Chapter 11. Profiling

Profiling the server can take on a number of forms depending on the stage of development you are in and what kind of issues you are attempting to isolate. The following list is a brief outline of the different areas you may wish to profile.

  • Entities

    Entities are the main game object which are used by all components of the BigWorld engine. Inefficient or bloated entities can cause excessive load and network communication on your server processes which will reduce the overall performance of your server cluster. This in turn can affect the total number of active users your server can support.

    For more information see Profiling Entities.

  • Python script

    Python script represents the majority of custom game code and as such is one of the most variable sources of load that is placed on the BigWorld server processes. Profiling Python script should be performed on a semi-regular basis and preferably while performing load tests in order to ensure that there are no major bottlenecks in your script.

    For more information see Python Game Script.

  • Server Processes

    As the BigWorld server processes are the core of a server cluster that run custom script and game code, it can be useful to identify which areas of the server processes are under load. This can help to pinpoint issues in either game script, navigation issues, network communication, entity management or any of the other numerous tasks that are handled by the server processes.

    For more information see Profiling Server Processes (C++ Code).

  • Client Communication

    Understanding the network traffic being sent to client connections can help lower bandwidth usage, thus reducing network costs as well as increasing the overall performance of your server cluster.

    For more information see Client Communication.

  • Server Process Communication

    In order to diagnose network related latency issues on individual server processes, it can be useful to identify which network messages are causing the most load on each process.

    For more information see Server Communication.

We recommend you run profiles regularly while you are developing or making changes to configuration, etc. as you can see what effect a change can make. You should keep the profiles that you generate to use as a historical reference for your games performance over the development life. For example, if you make a script change, and suddenly cell load is quite high, you can use the profiles to see when this load spike started and narrow down the cause of the issue to script usage as well as the particular methods causing the script load spike.

11.1. Profiling Entities

As Entities are the main game object being used by all components of the BigWorld engine it is important to make sure that your game entities are implemented as efficiently as possible. This includes:

  • Minimising persistent properties.

  • Ensuring properties have the smallest applicable data type (while considering long term scaling).

  • Ensuring properties have the most appropriate data propagation flags assigned.

  • Ensuring properties have AoI where appropriate.

The best mechanism currently in place for profiling entity sizes is the usage of the WebConsole filtered watchers view to collect information about the impact of each entity type's properties, coupled with a peer review system to ensure that multiple people have an understanding of the impact of property types.

11.1.1. Persistent Properties

Persistent properties cause load from both network transmission costs between CellApp, BaseApp and DBMgr, as well as placing load onto DBMgr when not using Secondary Databases. By minimising the number of persistent properties as well as decreasing the size of properties that are being persisted you will effect a long term performance gain in your cluster.

Determining the persistent properties can be performed by using the WebConsole filtered watcher page using a Processes of 'cellapp' and a Path of 'entityTypes/*/properties/*/persistent'. Once you have narrowed down the persistent properties you can perform a review of them to determine any optimisations that can be made.

11.1.2. Property Data Types

Choosing the smallest possible data type that can be used for a property will assist in lowering network load within your cluster. This includes all properties regardless of whether they are persisted.

11.1.3. Property Data Propagation

Often the easiest solution when developing a game is to set all properties to CELL_PUBLIC or ALL_CLIENTS as it provides the greatest visibility of the property. This quite often works perfectly when developing in the office on a single machine server that isn't heavily loaded, however this can lead to large problems when performing load tests and scaling a game up to production.

Ideally the best approach when deciding on the propagation flags to use with properties is to use the object oriented design philosophy of calling methods on an object, in this case an Entity, to request that it perform work for you, rather than accessing its private information yourself.

A common example is the health or HP of an entity. This value may be constantly changing due to damage or regenerating health over time and is often only directly relevant for the entity the property is associated with. While other entities may be interested in this property, they are only transiently interested, such as when they are all in combat together, or if the entities have formed a group of players. In this circumstance it may make sense to have the propagation flag as CELL_PRIVATE which can be requested as required by a method call. This reduces the amount of network traffic generated from broadcasting the health property every time it is updated but still allows access to the property when required.

11.2. Python Game Script

control_cluster.py pyprofile

Profiling Python script generally occurs as a result of identifying particular bottlenecks in script that are causing issues with server processes, such as long tick times resulting in process termination through a SIGQUIT signal.

Profiling server side script is performed with the control_cluster.py script using the pyprofile[24] command. This command only applies to server components that run Python script, i.e., CellApp, BaseApp and DBMgr.

This command will dump a table of the top cumulative and internal times for script calls over a period of time. The user can optionally display tables of callers and callees as well. See control_cluster.py --help for exact usage and options.

This tool is invaluable in determining which script calls are causing the most server-side CPU load for your game, for example:

11.2.1. Understanding the output

PyProfile will output its results for each process that has been queried in two orderings, internal time and cumulative time. The output format of these two result sets are identical. The profile aims to help in identifying the most time-expensive Python script methods implemented by your game.

      6681 function calls in 0.106 CPU seconds

Ordered by: internal time
List reduced from 30 to 10 due to restriction <10>

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   643    0.019    0.000    0.020    0.000 Guard.py:446(moveToPosition)
   102    0.016    0.000    0.016    0.000 Creature.py:476(moveTowardsPoint)
   643    0.011    0.000    0.050    0.000 Guard.py:349(think)
   354    0.010    0.000    0.036    0.000 Creature.py:332(think)
   239    0.007    0.000    0.008    0.000 Guard.py:584(nextPatrolNode)
   627    0.006    0.000    0.054    0.000 Guard.py:735(onMove)
   202    0.006    0.000    0.006    0.000 Flock.py:166(onTimer)
   404    0.005    0.000    0.019    0.000 Guard.py:653(doCamp)
   101    0.004    0.000    0.004    0.000 DustDevil.py:26(onTimer)
   239    0.004    0.000    0.019    0.000 Guard.py:639(doPatrol)

The hotshot Python module which produces these reports uses two different names when referring to the same information. The internal time report corresponds to the total time column outlined in the table below along with the other column descriptions.

Column Name Description
ncalls The total number of calls seen for this function.
tottime The amount of time (in seconds) that is spent executing the code only inside of the current method. Methods called from this method are not included.
cumtime The amount of time (in seconds) this method took to execute, including the execution time of any other methods called from this method.
percall The percall columns refer to the columns immediately to the left of them and indicate the average time per call for the profile time they refer to.
filename:lineno(function) The filename and line number containing function the profile refers to.

11.2.2. Increasing Memory Usage / Entity Count

Automatic garbage collection has been disabled in BigWorld to avoid the Python engine from utilising large amounts of CPU time at unexpected intervals. In order to allow Python objects to be deleted, the server processes will only delete objects when their reference count reaches zero. This approach has the side effect of not deleting objects that have circular references or are referenced by objects having circular references. See Debugging Circular References.

11.3. Profiling Server Processes (C++ Code)

When experiencing load spikes in your server that are not easily attributable to a specific cause it can be useful to profile the C++ server code to help narrow down the areas where the most time is being spent. This is supported on both the BaseApp and the CellApp using the control_cluster.py command cprofile.

This cprofile command collects and reports the internal C++ profiles of a server process over a period of time. The profiles are broken down into key time sensitive code blocks such as sending updates to clients, entity navigation, etc.

Below is the cprofile output from a CellApp running the FantasyDemo resources that is idling:

Profile                Count   Cumulative Times          Internal Times         
Idle                    2758 |  8.963s   3249us  88.0% |  8.963s   3249us  88.0%
scriptCall              8825 |  0.804s     91us   7.9% |  0.744s     84us   7.3%
callUpdates              102 |  0.383s   3758us   3.8% |  0.109s   1067us   1.1%
shuffleEntity          15648 |  0.069s      4us   0.7% |  0.069s      4us   0.7%
backup                   102 |  0.065s    637us   0.6% |  0.052s    510us   0.5%
boundaryCheck            448 |  0.035s     78us   0.3% |  0.035s     78us   0.3%
findPath                 466 |  0.030s     65us   0.3% |  0.030s     65us   0.3%
RunningTime                0 |  0.000s      0us   0.0% |  0.025s      0us   0.2%
callTimers               102 |  0.637s   6241us   6.3% |  0.020s    193us   0.2%
visionUpdate            9999 |  0.018s      1us   0.2% |  0.018s      1us   0.2%
tickStats                102 |  0.017s    163us   0.2% |  0.017s    163us   0.2%
canNavigateTo           2018 |  0.046s     22us   0.5% |  0.016s      8us   0.2%
calcBoundary             102 |  0.016s    159us   0.2% |  0.014s    140us   0.1%
onTimer                 3614 |  0.618s    170us   6.1% |  0.010s      2us   0.1%
watchersTCP              250 |  0.007s     26us   0.1% |  0.007s     26us   0.1%
transientLoad            509 |  0.004s      8us   0.0% |  0.004s      8us   0.0%
gameTick                 102 |  1.158s  11356us  11.4% |  0.004s     41us   0.0%
findEntity              1327 |  0.003s      2us   0.0% |  0.003s      2us   0.0%
onMove                   796 |  0.198s    248us   1.9% |  0.002s      2us   0.0%
chunksMainThread         509 |  0.006s     10us   0.1% |  0.001s      2us   0.0%

Total running time: 10.181s (0.000s spare)

The following sections outline some of the code blocks that are profiled along with a short description of their meaning. The sections have been broken down into code that is common between server processes, and server process specific code.

11.3.1. Common Code Block Profiles

Profile Type Description
Idle Time spent idling while polling for network / file descriptor activity.
RunningTime Time spent collating profile statistics.
tickStats Time spent updating moving average for statistics.
watchersTCP Time spent processing TCP watcher requests.
watchersUDP Time spent processing UDP watcher requests.

11.3.2. BaseApp Code Block Profiles

Profile Type Description
archive Time spent archiving entities to the database as part of the second level fault tolerance. See Server Operations Guide chapter Backups and Disaster Recovery.
backup Time spent performing entity backups to other BaseApps as part of the first level fault tolerance mechanism. See Server Operations Guide chapter Fault Tolerance.
encryptRecv Time spent decrypting incoming network traffic from the EncryptionFilter (bigworld/src/lib/network/encryption_filter.cpp).
encryptSend Time spent encrypting outgoing network traffic from the EncryptionFilter (bigworld/src/lib/network/encryption_filter.cpp).
tickGameTime Time spent processing game ticks.

11.3.3. CellApp Code Block Profiles

Profile Type Description
backup Time spent backing up cell entities to their Base.
boundaryCheck Time spent calculating all entities that need to be offloaded, ghosted or removed.
calcBoundary Time spent calculating the number of entities that can be offloaded and notifying CellAppMgr.
callTimers Time spent processing timers.
callUpdates Time spent calling updatable objects such as controllers and witnesses. Calls to Controller.update() are included in this time.
canNavigateTo Time spent calculating whether an Entity can move to a destination position from its current location.
chunksMainThread Time spent performing chunk loading / unloading calculations in the main thread.
findEntity Time spent locating a specific entity within the known population on the CellApp.
gameTick Time processing the game tick.
onMove Time spent in the onMove() script callback from a movement controller.
onTimer Time spent in the onTimer() script callback from the timer controller.
scriptCall Time spent in any Python script call that has been invoked from C++.
shuffleEntity Time spent updating the range lists on entity movement.
transientLoad Time spent performing entity management tasks such as creation, deletion and initialisation.
visionUpdate Time spent updating the vision of entities.

11.4. Client Communication

control_cluster.py eventprofile

Recall that the server sends down both non-volatile data and volatile data down to the client. Volatile data is in the form of position and pose updates, while non-volatile data consists of the property updates on an entity and client-side method calls for that entity, sent down to any player that can witness that entity. The profiling of non-volatile data can give valuable insight into which methods and property updates are particularly high throughput, enabling effective optimisation of the network impact of entity scripts.

There is a distinction made between own-client communication and other-client communication. Own-client communication consists of method calls and property updates that only an entity's own client will receive. These include updates for properties marked as OWN_CLIENT as well as method calls that are made using the ownClient mailbox, and are private to the player's own entity and not broadcast to other players. Note that this type of client communication only applies to player entities.

In contrast, there is also other-client communication for an entity, which consists of the public method calls and property updates that players having that entity in their AoI are able to witness. This includes updates for properties marked as OTHER_CLIENTS or ALL_CLIENTS, as well as method calls made on an entity''s otherClients or as well as the allClients mailbox. These events are potentially propagated to any player that has that entity in their AoI.

Other-client events are not propagated to witnessing clients immediately. For instance, the distance a player is from the entity an event pertains to will affect how soon that event is propagated after the event has actually occurred on the server side. In addition, the LoD level set on properties and methods will affect whether a particular event will be sent down to a witnessing player.

Client communication is broken down into three types for the purposes of profiling:

  • Property changes and method calls to an entity's own client (privateClientEvents). This tracks whenever any own-client communication occurs.

  • Property changes and method calls that can be made available to players that witness it (publicClientEvents). This tracks whenever any public property is updated or any client method is called that can be sent down to any witnessing players.

  • Property changes and method calls that are actually sent down to players (totalPublicClientEvents). This tracks the total number of public properties that were actually sent down to witnessing players, subject to LoD levels.

The eventprofile command samples a running server over a period of time, collecting counts and sizes of all non-volatile communication down to the client. A sample output of eventprofile is provided below, followed by a brief description of the sections. This section is brief as the event profile command is scheduled to be updated to use the new property statistics available under the entityTypes watcher tree.

Waiting 120.0 secs for sample data ...
**** cellapp01 ****

Event Type: privateClientEvents

Name                                    #       Size    AvgSize  Bandwidth

Avatar.cellBounds                     120       2880     24.000     24.000
Avatar.myArray                          2         41     20.500      0.342
Avatar.modeTarget                       2         16      8.000      0.133
Avatar.myDict                           1         11     11.000      0.092
Avatar.mode                             2         10      5.000      0.083


Event Type: publicClientEvents

Name                                    #       Size    AvgSize  Bandwidth

Creature.moving                      4741      23705      5.000    197.542
Creature.performAction                956       4780      5.000     39.833
Guard.moving                          647       3235      5.000     26.958
Creature.creatureState                150        750      5.000      6.250
Guard.didGesture                       69        345      5.000      2.875
Avatar.modeTarget                       2         16      8.000      0.133
Merchant.modeTarget                     2         16      8.000      0.133
Avatar.mode                             2         10      5.000      0.083
Beast.initiateRage                      2          8      4.000      0.067


Event Type: totalPublicClientEvents

Name                                    #       Size    AvgSize  Bandwidth

Creature.moving                      1559       7795      5.000     64.958
Creature.performAction                396       1980      5.000     16.500
Creature.creatureState                 52        260      5.000      2.167
Merchant.modeTarget                     2         16      8.000      0.133
Beast.initiateRage                      2          8      4.000      0.067

11.4.1. Private Client Events

These include:

  • Changes on properties marked as OWN_CLIENT or ALL_CLIENTS.

  • Client-side method calls on a player's own entity that are propagated to that player's own client via Entity.ownClient, Entity.client or Entity.allClients.

11.4.2. Public Client Events

These include:

  • Changes on properties marked as OTHER_CLIENTS or ALL_CLIENTS.

  • Client-side methods calls that are made on the mailboxes Entity.otherClients, Entity.allClients, or using the special Entity.clientMethod( entityID ) method.

11.4.3. Total Public Client Events

These include:

  • Changes on properties marked as OTHER_CLIENTS or ALL_CLIENTS that were actually sent down to witnessing players.

  • Client-side method calls that are made on the mailboxes Entity.otherClients, Entity.allClients, or using the special Entity.clientMethod( entityID ) method, that were actually sent down to witnessing players.

11.5. Server Communication

control_cluster.py mercuryprofile

Server communication primarily occurs via Mercury[25] interfaces. These can be seen on server processes under the Watcher tree path nub/interfacesByName.

The output of mercuryprofile provides a condensed table of results that can be optionally sorted as required by using options when running the profile. A sample of the output from a single instance of CellApp is provided below.

cellapp01 - Internal Nub

 id                   name     br    mr max br   aml abps amps
  0       DBInterfaceBirth      8     1      8   8.0  0.0  0.0
  1                addCell     92     4     23  23.0  0.0  0.0
  2                startup      8     1      8   8.0  0.0  0.0
  3            setGameTime      4     1      4   4.0  0.0  0.0
 14         cellAppMgrInfo 380184 95046      4   4.0 40.0 10.0
 18         updateGeometry 646272 38016     17  17.0 68.0  4.0
 19    spaceGeometryLoaded     78     4     22  19.5  0.0  0.0
 21           createEntity    270     4     70  67.5  0.0  0.0
 22 createEntityNearEntity  63181   466    295 135.6  0.0  0.0
 35       writeToDBRequest   1536   384      4   4.0  0.1  0.0
 50        runScriptMethod  49110  4821    190  10.2 -5.0 -0.5
255                  Reply  10892   165   9600  66.0 -0.0 -0.0

By default this output is sorted by ID. As mentioned the sorting can be altered by providing command line options with running the profile. Refer to the online help with control_cluster.py help mercuryprofile for more details. The table below provides a description of each column type.

Column Name Description
id The Mercury message ID for this message. This corresponds to the watcher value nub/interfaceByName/messageName/id.
name The name for this Mercury message. This corresponds to the watcher value nub/interfaceByID/ID/name.
br The total number of bytes received for messages of this message type.
mr A count of the number of messages received for this message type.
max br The maximum numbers of bytes received for a single instance of this message type from all messages of this type that have been seen.
aml The average message length (in bytes) for this message type.
abps The average bytes per second that this message type is handling.
amps The average number of messages per second of this message type.

Each server process type provides a different interface based on the functionality that process is providing. The two major causes of network traffic from Mercury messages are the BaseApp and CellApp. The following sections outline the most commonly encountered interface messages and their associated functionality. With this information it is possible to narrow down the scope of any potential issues.

11.5.1. BaseApp Interface Summary

Interface Name Description
addGlobalBase Messages received from the BaseAppMgr informing a BaseApp of a new global base.
backupBaseEntity Messages containing backup information for bases on other BaseApps. This is part of the first level fault tolerance mechanism. See Server Operations Guide chapter Fault Tolerance.
backupCellEntity Messages from cell entities containing backup information. This is part of the first level fault tolerance mechanism.
callBaseMethod Messages from other processes requesting a Python method is run on a base entity.
callCellMethod Messages from a CellViaBaseMailBox. These messages are forwarded on to the requested base's cell entity.
callClientMethod Messages from a ClientViaBaseMailBox. These messages are forwarded to the Proxy's connect client entity.
cellEntityLost Messages from a cell entity notifying its base that it has been destroyed.
createBaseFromDB Messages asking for a base entity to be created using a known DBID as a result of calling BigWorld.createRemoteBaseFromDB*().
createBaseWithCellData Messages requesting a base entity is created.
createCellPlayer Messages forwarded from a CellApp to a Client connection via a Proxy notifying the client that its Witness has been created.
currentCell Messages from CellApps to BaseApps notifying a base of an offload in its cell entity. This occurs as a normal part of offloading entities as part of load balancing.
detailedPosition Messages containing position updates from the cell to the client.
entityMessage Messages that are received on a nub to call a specific interface method. For internal nubs this corresponds to all methods defined in an entity's <baseMethod> section. For external nubs this corresponds only to methods that are an <Exposed> <baseMethod>.
forwardedBaseMessage Messages intended for a base that has been recently offloaded to another BaseApp.
logOnAttempt Messages from the DBMgr for logon attempts.
modWard Messages from the cell notifying the proxy and the client of a change of client control as a result of calling BigWorld.controlledBy().
sendToClient Messages from the cell notifying the proxy that it has sent all the required updates for the entity for the current tick and they should now be forwarded to the client.
setBackupBaseApps Messages from the BaseAppMgr notifying the current BaseApp of the other BaseApps that it is responsible for performing backups of.
setClient Message to notify the BaseApp that the next message received refers to a specific Entity. This will generally correspond with the number of callBaseMethod, callCellMethod and callClientMethod messages.
setCreateBaseInfo Messages from the BaseAppMgr informing the BaseApp of the best BaseApp to use for calls to BigWorld.createBaseAnywhere() and BigWorld.createBaseRemotely().
setSharedData Messages notifying the BaseApp of any changes to shared data which can include BigWorld.baseAppData and BigWorld.globalData.
teleportOther Messages requesting that an entity (A) is teleported to the same space as an entity (B) that is owned by this BaseApp.
writeToDB Messages from the cell notifying the base entity to write itself to the database.

The following table outlines some of the BaseApp interfaces that are used primarily to forward messages directly to the client without performing any processing. These message occur as a result of CellApp witnesses sending updates to their associated client.

Interface Name Description
enterAoI Proxy message forwarded to client.
enterAoIOnVehicle Proxy message forwarded to client.
forcedPosition Proxy message forwarded to client.
leaveAoI Proxy message forwarded to client.
updateEntity Proxy message forwarded to client.

11.5.2. CellApp Interface Summary

Interface Name Description
addCell Messages from a CellAppMgr to add a Cell into a Space.
avatarUpdateExplicit Position updates from a Client connection to their entity.
avatarUpdateImplicit Position updates from a Client connection to their entity.
callBaseMethod Method calls occurring on a BaseViaCell mailbox.
callClientMethod Method calls occurring on a ClientViaCell mailbox.
callWatcher A forwarding watcher request from the corresponding manager process (i.e., CellAppMgr / BaseAppMgr). For more information see Forwarding Watchers.
createEntity Requests to create a new entity within a Space.
createEntityNearEntity Requests to create a new entity near to another Entity.
createGhost Messages from other CellApps requesting a ghost entity is created for a new real entity in a nearby Cell.
forwardedBaseEntityPacket Messages that are forwarded from a ghost entity to the real so that it may be provided to the base entity. This is used to avoid race conditions when offloading entities between CellApps.
ghostAvatarUpdate Messages from nearby Cells updating ghost entities on the current CellApp of position changes.
ghostHistoryEvent Messages sent from real entities that are being ghosted on the current CellApp providing history event updates.
ghostedDataUpdate Messages from real entities updating ghosted properties.
notifyOfCellRemoval Messages sent from the CellAppMgr to notify of a neighbouring cells removal.
onload Messages sent from a nearby cell that is handing responsibility for a real entity over to the current CellApp.
runExposedMethod Messages from Clients to run an <Exposed> method on an entity.
runScriptMethod Messages from other server components to run a method on an entity.
spaceData Messages from other CellApps notifying us of Space Data changes. For more information see Space Data.
writeToDBRequest Messages to entities to stream their information back to their base to be written to the database.


[24] For information on pyprofile usage see the control_cluster.py help.

$ control_cluster.py help pyprofile

[25] For more information on Mercury see the Server Overview section Inter-Process Communication (Mercury).