Table of Contents
Profiling the server can take on a number of forms depending on the stage of development you are in and what kind of issues you are attempting to isolate. The following list is a brief outline of the different areas you may wish to profile.
-
Entities
Entities are the main game object which are used by all components of the BigWorld engine. Inefficient or bloated entities can cause excessive load and network communication on your server processes which will reduce the overall performance of your server cluster. This in turn can affect the total number of active users your server can support.
For more information see Profiling Entities.
-
Python script
Python script represents the majority of custom game code and as such is one of the most variable sources of load that is placed on the BigWorld server processes. Profiling Python script should be performed on a semi-regular basis and preferably while performing load tests in order to ensure that there are no major bottlenecks in your script.
For more information see Python Game Script.
-
Server Processes
As the BigWorld server processes are the core of a server cluster that run custom script and game code, it can be useful to identify which areas of the server processes are under load. This can help to pinpoint issues in either game script, navigation issues, network communication, entity management or any of the other numerous tasks that are handled by the server processes.
For more information see Profiling Server Processes (C++ Code).
-
Client Communication
Understanding the network traffic being sent to client connections can help lower bandwidth usage, thus reducing network costs as well as increasing the overall performance of your server cluster.
For more information see Client Communication.
-
Server Process Communication
In order to diagnose network related latency issues on individual server processes, it can be useful to identify which network messages are causing the most load on each process.
For more information see Server Communication.
We recommend you run profiles regularly while you are developing or making changes to configuration, etc. as you can see what effect a change can make. You should keep the profiles that you generate to use as a historical reference for your games performance over the development life. For example, if you make a script change, and suddenly cell load is quite high, you can use the profiles to see when this load spike started and narrow down the cause of the issue to script usage as well as the particular methods causing the script load spike.
As Entities are the main game object being used by all components of the BigWorld engine it is important to make sure that your game entities are implemented as efficiently as possible. This includes:
-
Minimising persistent properties.
-
Ensuring properties have the smallest applicable data type (while considering long term scaling).
-
Ensuring properties have the most appropriate data propagation flags assigned.
-
Ensuring properties have AoI where appropriate.
The best mechanism currently in place for profiling entity sizes is the usage of the WebConsole filtered watchers view to collect information about the impact of each entity type's properties, coupled with a peer review system to ensure that multiple people have an understanding of the impact of property types.
Persistent properties cause load from both network transmission costs between CellApp, BaseApp and DBMgr, as well as placing load onto DBMgr when not using Secondary Databases. By minimising the number of persistent properties as well as decreasing the size of properties that are being persisted you will effect a long term performance gain in your cluster.
Determining the persistent properties can be performed by using the WebConsole filtered watcher page using a Processes of 'cellapp' and a Path of 'entityTypes/*/properties/*/persistent'. Once you have narrowed down the persistent properties you can perform a review of them to determine any optimisations that can be made.
Choosing the smallest possible data type that can be used for a property will assist in lowering network load within your cluster. This includes all properties regardless of whether they are persisted.
Often the easiest solution when developing a game is to set all properties to CELL_PUBLIC or ALL_CLIENTS as it provides the greatest visibility of the property. This quite often works perfectly when developing in the office on a single machine server that isn't heavily loaded, however this can lead to large problems when performing load tests and scaling a game up to production.
Ideally the best approach when deciding on the propagation flags to use with properties is to use the object oriented design philosophy of calling methods on an object, in this case an Entity, to request that it perform work for you, rather than accessing its private information yourself.
A common example is the health or HP of an entity. This value may be constantly changing due to damage or regenerating health over time and is often only directly relevant for the entity the property is associated with. While other entities may be interested in this property, they are only transiently interested, such as when they are all in combat together, or if the entities have formed a group of players. In this circumstance it may make sense to have the propagation flag as CELL_PRIVATE which can be requested as required by a method call. This reduces the amount of network traffic generated from broadcasting the health property every time it is updated but still allows access to the property when required.
control_cluster.py pyprofile
Profiling Python script generally occurs as a result of identifying particular bottlenecks in script that are causing issues with server processes, such as long tick times resulting in process termination through a SIGQUIT signal.
Profiling server side script is performed with the
control_cluster.py
script using the pyprofile
[24] command. This command only applies to server components that
run Python script, i.e., CellApp, BaseApp and DBMgr.
This command will dump a table of the top cumulative and internal
times for script calls over a period of time. The user can optionally
display tables of callers and callees as well. See
control_cluster.py --help
for exact
usage and options.
This tool is invaluable in determining which script calls are causing the most server-side CPU load for your game, for example:
PyProfile will output its results for each process that has been queried in two orderings, internal time and cumulative time. The output format of these two result sets are identical. The profile aims to help in identifying the most time-expensive Python script methods implemented by your game.
6681 function calls in 0.106 CPU seconds Ordered by: internal time List reduced from 30 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 643 0.019 0.000 0.020 0.000 Guard.py:446(moveToPosition) 102 0.016 0.000 0.016 0.000 Creature.py:476(moveTowardsPoint) 643 0.011 0.000 0.050 0.000 Guard.py:349(think) 354 0.010 0.000 0.036 0.000 Creature.py:332(think) 239 0.007 0.000 0.008 0.000 Guard.py:584(nextPatrolNode) 627 0.006 0.000 0.054 0.000 Guard.py:735(onMove) 202 0.006 0.000 0.006 0.000 Flock.py:166(onTimer) 404 0.005 0.000 0.019 0.000 Guard.py:653(doCamp) 101 0.004 0.000 0.004 0.000 DustDevil.py:26(onTimer) 239 0.004 0.000 0.019 0.000 Guard.py:639(doPatrol)
The hotshot
Python module which produces
these reports uses two different names when referring to the same
information. The internal time report corresponds to the total time
column outlined in the table below along with the other column
descriptions.
Column Name | Description |
---|---|
ncalls | The total number of calls seen for this function. |
tottime | The amount of time (in seconds) that is spent executing the code only inside of the current method. Methods called from this method are not included. |
cumtime | The amount of time (in seconds) this method took to execute, including the execution time of any other methods called from this method. |
percall | The percall columns refer to the columns immediately to the left of them and indicate the average time per call for the profile time they refer to. |
filename:lineno(function) | The filename and line number containing function the profile refers to. |
Automatic garbage collection has been disabled in BigWorld to avoid the Python engine from utilising large amounts of CPU time at unexpected intervals. In order to allow Python objects to be deleted, the server processes will only delete objects when their reference count reaches zero. This approach has the side effect of not deleting objects that have circular references or are referenced by objects having circular references. See Debugging Circular References.
When experiencing load spikes in your server that are not easily
attributable to a specific cause it can be useful to profile the C++
server code to help narrow down the areas where the most time is being
spent. This is supported on both the BaseApp and the CellApp using the
control_cluster.py
command cprofile
.
This cprofile
command collects
and reports the internal C++ profiles of a server process over a period of
time. The profiles are broken down into key time sensitive code blocks
such as sending updates to clients, entity navigation, etc.
Below is the cprofile
output
from a CellApp running the FantasyDemo resources that is idling:
Profile Count Cumulative Times Internal Times Idle 2758 | 8.963s 3249us 88.0% | 8.963s 3249us 88.0% scriptCall 8825 | 0.804s 91us 7.9% | 0.744s 84us 7.3% callUpdates 102 | 0.383s 3758us 3.8% | 0.109s 1067us 1.1% shuffleEntity 15648 | 0.069s 4us 0.7% | 0.069s 4us 0.7% backup 102 | 0.065s 637us 0.6% | 0.052s 510us 0.5% boundaryCheck 448 | 0.035s 78us 0.3% | 0.035s 78us 0.3% findPath 466 | 0.030s 65us 0.3% | 0.030s 65us 0.3% RunningTime 0 | 0.000s 0us 0.0% | 0.025s 0us 0.2% callTimers 102 | 0.637s 6241us 6.3% | 0.020s 193us 0.2% visionUpdate 9999 | 0.018s 1us 0.2% | 0.018s 1us 0.2% tickStats 102 | 0.017s 163us 0.2% | 0.017s 163us 0.2% canNavigateTo 2018 | 0.046s 22us 0.5% | 0.016s 8us 0.2% calcBoundary 102 | 0.016s 159us 0.2% | 0.014s 140us 0.1% onTimer 3614 | 0.618s 170us 6.1% | 0.010s 2us 0.1% watchersTCP 250 | 0.007s 26us 0.1% | 0.007s 26us 0.1% transientLoad 509 | 0.004s 8us 0.0% | 0.004s 8us 0.0% gameTick 102 | 1.158s 11356us 11.4% | 0.004s 41us 0.0% findEntity 1327 | 0.003s 2us 0.0% | 0.003s 2us 0.0% onMove 796 | 0.198s 248us 1.9% | 0.002s 2us 0.0% chunksMainThread 509 | 0.006s 10us 0.1% | 0.001s 2us 0.0% Total running time: 10.181s (0.000s spare)
The following sections outline some of the code blocks that are profiled along with a short description of their meaning. The sections have been broken down into code that is common between server processes, and server process specific code.
Profile Type | Description |
---|---|
Idle | Time spent idling while polling for network / file descriptor activity. |
RunningTime | Time spent collating profile statistics. |
tickStats | Time spent updating moving average for statistics. |
watchersTCP | Time spent processing TCP watcher requests. |
watchersUDP | Time spent processing UDP watcher requests. |
Profile Type | Description |
---|---|
archive | Time spent archiving entities to the database as part of the second level fault tolerance. See Server Operations Guide chapter Backups and Disaster Recovery. |
backup | Time spent performing entity backups to other BaseApps as part of the first level fault tolerance mechanism. See Server Operations Guide chapter Fault Tolerance. |
encryptRecv | Time spent decrypting incoming network
traffic from the EncryptionFilter
(bigworld/src/lib/network/encryption_filter.cpp ).
|
encryptSend | Time spent encrypting outgoing network
traffic from the EncryptionFilter
(bigworld/src/lib/network/encryption_filter.cpp ).
|
tickGameTime | Time spent processing game ticks. |
Profile Type | Description |
---|---|
backup | Time spent backing up cell entities to their Base. |
boundaryCheck | Time spent calculating all entities that need to be offloaded, ghosted or removed. |
calcBoundary | Time spent calculating the number of entities that can be offloaded and notifying CellAppMgr. |
callTimers | Time spent processing timers. |
callUpdates | Time spent calling updatable objects such as
controllers and witnesses. Calls to
Controller .update()
are included in this time.
|
canNavigateTo | Time spent calculating whether an Entity can move to a destination position from its current location. |
chunksMainThread | Time spent performing chunk loading / unloading calculations in the main thread. |
findEntity | Time spent locating a specific entity within the known population on the CellApp. |
gameTick | Time processing the game tick. |
onMove | Time spent in the
onMove() script callback from a
movement controller.
|
onTimer | Time spent in the
onTimer() script callback from the
timer controller.
|
scriptCall | Time spent in any Python script call that has been invoked from C++. |
shuffleEntity | Time spent updating the range lists on entity movement. |
transientLoad | Time spent performing entity management tasks such as creation, deletion and initialisation. |
visionUpdate | Time spent updating the vision of entities. |
control_cluster.py eventprofile
Recall that the server sends down both non-volatile data and volatile data down to the client. Volatile data is in the form of position and pose updates, while non-volatile data consists of the property updates on an entity and client-side method calls for that entity, sent down to any player that can witness that entity. The profiling of non-volatile data can give valuable insight into which methods and property updates are particularly high throughput, enabling effective optimisation of the network impact of entity scripts.
There is a distinction made between own-client communication and other-client communication. Own-client communication consists of method calls and property updates that only an entity's own client will receive. These include updates for properties marked as OWN_CLIENT as well as method calls that are made using the ownClient mailbox, and are private to the player's own entity and not broadcast to other players. Note that this type of client communication only applies to player entities.
In contrast, there is also other-client communication for an entity, which consists of the public method calls and property updates that players having that entity in their AoI are able to witness. This includes updates for properties marked as OTHER_CLIENTS or ALL_CLIENTS, as well as method calls made on an entity''s otherClients or as well as the allClients mailbox. These events are potentially propagated to any player that has that entity in their AoI.
Other-client events are not propagated to witnessing clients immediately. For instance, the distance a player is from the entity an event pertains to will affect how soon that event is propagated after the event has actually occurred on the server side. In addition, the LoD level set on properties and methods will affect whether a particular event will be sent down to a witnessing player.
Client communication is broken down into three types for the purposes of profiling:
-
Property changes and method calls to an entity's own client (privateClientEvents). This tracks whenever any own-client communication occurs.
-
Property changes and method calls that can be made available to players that witness it (publicClientEvents). This tracks whenever any public property is updated or any client method is called that can be sent down to any witnessing players.
-
Property changes and method calls that are actually sent down to players (totalPublicClientEvents). This tracks the total number of public properties that were actually sent down to witnessing players, subject to LoD levels.
The eventprofile
command
samples a running server over a period of time, collecting counts and
sizes of all non-volatile communication down to the client. A sample
output of eventprofile
is provided
below, followed by a brief description of the sections. This section is
brief as the event profile command is scheduled to be updated to use the
new property statistics available under the entityTypes
watcher tree.
Waiting 120.0 secs for sample data ... **** cellapp01 **** Event Type: privateClientEvents Name # Size AvgSize Bandwidth Avatar.cellBounds 120 2880 24.000 24.000 Avatar.myArray 2 41 20.500 0.342 Avatar.modeTarget 2 16 8.000 0.133 Avatar.myDict 1 11 11.000 0.092 Avatar.mode 2 10 5.000 0.083 Event Type: publicClientEvents Name # Size AvgSize Bandwidth Creature.moving 4741 23705 5.000 197.542 Creature.performAction 956 4780 5.000 39.833 Guard.moving 647 3235 5.000 26.958 Creature.creatureState 150 750 5.000 6.250 Guard.didGesture 69 345 5.000 2.875 Avatar.modeTarget 2 16 8.000 0.133 Merchant.modeTarget 2 16 8.000 0.133 Avatar.mode 2 10 5.000 0.083 Beast.initiateRage 2 8 4.000 0.067 Event Type: totalPublicClientEvents Name # Size AvgSize Bandwidth Creature.moving 1559 7795 5.000 64.958 Creature.performAction 396 1980 5.000 16.500 Creature.creatureState 52 260 5.000 2.167 Merchant.modeTarget 2 16 8.000 0.133 Beast.initiateRage 2 8 4.000 0.067
These include:
-
Changes on properties marked as OWN_CLIENT or ALL_CLIENTS.
-
Client-side method calls on a player's own entity that are propagated to that player's own client via
Entity
.ownClient
,Entity
.client
orEntity
.allClients
.
These include:
-
Changes on properties marked as OTHER_CLIENTS or ALL_CLIENTS.
-
Client-side methods calls that are made on the mailboxes
Entity
.otherClients
,Entity
.allClients
, or using the specialEntity
.clientMethod
(entityID
) method.
These include:
-
Changes on properties marked as OTHER_CLIENTS or ALL_CLIENTS that were actually sent down to witnessing players.
-
Client-side method calls that are made on the mailboxes
Entity
.otherClients
,Entity
.allClients
, or using the specialEntity
.clientMethod
(entityID
) method, that were actually sent down to witnessing players.
control_cluster.py mercuryprofile
Server communication primarily occurs via Mercury[25] interfaces. These can be seen on server processes under the Watcher tree path nub/interfacesByName.
The output of mercuryprofile
provides a condensed table of results that can be optionally sorted as
required by using options when running the profile. A sample of the output
from a single instance of CellApp is provided below.
cellapp01 - Internal Nub id name br mr max br aml abps amps 0 DBInterfaceBirth 8 1 8 8.0 0.0 0.0 1 addCell 92 4 23 23.0 0.0 0.0 2 startup 8 1 8 8.0 0.0 0.0 3 setGameTime 4 1 4 4.0 0.0 0.0 14 cellAppMgrInfo 380184 95046 4 4.0 40.0 10.0 18 updateGeometry 646272 38016 17 17.0 68.0 4.0 19 spaceGeometryLoaded 78 4 22 19.5 0.0 0.0 21 createEntity 270 4 70 67.5 0.0 0.0 22 createEntityNearEntity 63181 466 295 135.6 0.0 0.0 35 writeToDBRequest 1536 384 4 4.0 0.1 0.0 50 runScriptMethod 49110 4821 190 10.2 -5.0 -0.5 255 Reply 10892 165 9600 66.0 -0.0 -0.0
By default this output is sorted by ID. As mentioned the sorting can
be altered by providing command line options with running the profile.
Refer to the online help with control_cluster.py help
mercuryprofile
for more details. The table below provides a
description of each column type.
Column Name | Description |
---|---|
id | The Mercury message ID for this message. This
corresponds to the watcher value
nub/interfaceByName/messageName /id.
|
name | The name for this Mercury message. This
corresponds to the watcher value
nub/interfaceByID/ID /name.
|
br | The total number of bytes received for messages of this message type. |
mr | A count of the number of messages received for this message type. |
max br | The maximum numbers of bytes received for a single instance of this message type from all messages of this type that have been seen. |
aml | The average message length (in bytes) for this message type. |
abps | The average bytes per second that this message type is handling. |
amps | The average number of messages per second of this message type. |
Each server process type provides a different interface based on the functionality that process is providing. The two major causes of network traffic from Mercury messages are the BaseApp and CellApp. The following sections outline the most commonly encountered interface messages and their associated functionality. With this information it is possible to narrow down the scope of any potential issues.
Interface Name | Description |
---|---|
addGlobalBase | Messages received from the BaseAppMgr informing a BaseApp of a new global base. |
backupBaseEntity | Messages containing backup information for bases on other BaseApps. This is part of the first level fault tolerance mechanism. See Server Operations Guide chapter Fault Tolerance. |
backupCellEntity | Messages from cell entities containing backup information. This is part of the first level fault tolerance mechanism. |
callBaseMethod | Messages from other processes requesting a Python method is run on a base entity. |
callCellMethod | Messages from a CellViaBaseMailBox. These messages are forwarded on to the requested base's cell entity. |
callClientMethod | Messages from a ClientViaBaseMailBox. These messages are forwarded to the Proxy's connect client entity. |
cellEntityLost | Messages from a cell entity notifying its base that it has been destroyed. |
createBaseFromDB | Messages asking for a base entity to be
created using a known DBID as a result of calling
BigWorld .createRemoteBaseFromDB .
|
createBaseWithCellData | Messages requesting a base entity is created. |
createCellPlayer | Messages forwarded from a CellApp to a Client connection via a Proxy notifying the client that its Witness has been created. |
currentCell | Messages from CellApps to BaseApps notifying a base of an offload in its cell entity. This occurs as a normal part of offloading entities as part of load balancing. |
detailedPosition | Messages containing position updates from the cell to the client. |
entityMessage | Messages that are received on a nub to call a specific interface method. For internal nubs this corresponds to all methods defined in an entity's <baseMethod> section. For external nubs this corresponds only to methods that are an <Exposed> <baseMethod>. |
forwardedBaseMessage | Messages intended for a base that has been recently offloaded to another BaseApp. |
logOnAttempt | Messages from the DBMgr for logon attempts. |
modWard | Messages from the cell notifying the proxy
and the client of a change of client control as a result of
calling
BigWorld .controlledBy() .
|
sendToClient | Messages from the cell notifying the proxy that it has sent all the required updates for the entity for the current tick and they should now be forwarded to the client. |
setBackupBaseApps | Messages from the BaseAppMgr notifying the current BaseApp of the other BaseApps that it is responsible for performing backups of. |
setClient | Message to notify the BaseApp that the next message received refers to a specific Entity. This will generally correspond with the number of callBaseMethod, callCellMethod and callClientMethod messages. |
setCreateBaseInfo | Messages from the BaseAppMgr informing the
BaseApp of the best BaseApp to use for calls to
BigWorld .createBaseAnywhere()
and
BigWorld .createBaseRemotely() .
|
setSharedData | Messages notifying the BaseApp of any
changes to shared data which can include
BigWorld .baseAppData
and
BigWorld .globalData .
|
teleportOther | Messages requesting that an entity (A) is teleported to the same space as an entity (B) that is owned by this BaseApp. |
writeToDB | Messages from the cell notifying the base entity to write itself to the database. |
The following table outlines some of the BaseApp interfaces that are used primarily to forward messages directly to the client without performing any processing. These message occur as a result of CellApp witnesses sending updates to their associated client.
Interface Name | Description |
---|---|
enterAoI | Proxy message forwarded to client. |
enterAoIOnVehicle | Proxy message forwarded to client. |
forcedPosition | Proxy message forwarded to client. |
leaveAoI | Proxy message forwarded to client. |
updateEntity | Proxy message forwarded to client. |
Interface Name | Description |
---|---|
addCell | Messages from a CellAppMgr to add a Cell into a Space. |
avatarUpdateExplicit | Position updates from a Client connection to their entity. |
avatarUpdateImplicit | Position updates from a Client connection to their entity. |
callBaseMethod | Method calls occurring on a BaseViaCell mailbox. |
callClientMethod | Method calls occurring on a ClientViaCell mailbox. |
callWatcher | A forwarding watcher request from the corresponding manager process (i.e., CellAppMgr / BaseAppMgr). For more information see Forwarding Watchers. |
createEntity | Requests to create a new entity within a Space. |
createEntityNearEntity | Requests to create a new entity near to another Entity. |
createGhost | Messages from other CellApps requesting a ghost entity is created for a new real entity in a nearby Cell. |
forwardedBaseEntityPacket | Messages that are forwarded from a ghost entity to the real so that it may be provided to the base entity. This is used to avoid race conditions when offloading entities between CellApps. |
ghostAvatarUpdate | Messages from nearby Cells updating ghost entities on the current CellApp of position changes. |
ghostHistoryEvent | Messages sent from real entities that are being ghosted on the current CellApp providing history event updates. |
ghostedDataUpdate | Messages from real entities updating ghosted properties. |
notifyOfCellRemoval | Messages sent from the CellAppMgr to notify of a neighbouring cells removal. |
onload | Messages sent from a nearby cell that is handing responsibility for a real entity over to the current CellApp. |
runExposedMethod | Messages from Clients to run an <Exposed> method on an entity. |
runScriptMethod | Messages from other server components to run a method on an entity. |
spaceData | Messages from other CellApps notifying us of Space Data changes. For more information see Space Data. |
writeToDBRequest | Messages to entities to stream their information back to their base to be written to the database. |
[24] For information on pyprofile
usage see the
control_cluster.py
help.
$ control_cluster.py help pyprofile
[25] For more information on Mercury see the Server Overview section Inter-Process Communication (Mercury).