bw logo

Chapter 23. Debugging

23.1. General Debugging

23.1.1. Information and Error Messages

When running the server using control_cluster.py, for example, there may not be a console associated with an application. In order to view process messages when not running from the console the server tools MessageLogger[37], and WebConsole[38] should be used for collecting process output and viewing log messages respectively.

23.1.2. Testing Scripts Using the Python Server

In order to test Python scripts and behaviour on a live server a telnet server can be connected to on both the BaseApp and CellApp processes to run script. The telnet server by default is run on port 40001 for a BaseApp, and 50001 for a CellApp, but both can be configured in the file <res>/server/bw.xml using the <pythonPort> configuration option.

If the desired port is already used on the application's machine, then a random port is chosen. You can find out the assigned port by looking at the log output of the particular BaseApp or CellApp for a line such as the following:

INFO: Python server is running on port 33225

This value can also be found in the watcher value pythonServerPort.

Note

This is intended primarily as a development time only utility and should be used sparingly in a production environment. Performing CPU intensive Python operations such as listing all entities may adversely affect game behaviour for your clients.

Telnet can be used to connect to the Python server of each BaseApp and CellApp, which provides a Python console that can be used for

There are currently three methods of connecting to the Python telnet server:

  1. Connecting via WebConsole

  2. Connecting via control_cluster.py

  3. Connecting via the commandline using telnet

Connecting to the Python server via WebConsole is the recommended method of interacting with the Python server as it allows access to the other server debugging tools as required. To connect through WebConsole simply select the Python Console module from the menu on the left hand side of the main WebConsole page.

To connect using control_cluster.py, simply use the pyconsole[39] option. For example, to connect to the second CellApp in a cluster:

$ ./control_cluster.py pyconsole cellapp02

To connect using telnet, after determining the port the process has an active python server on simply provide telnet with the hostname and port. For example, to connect to a machine called cluster01 running a BaseApp with a Python server running on port 40001 the following command would be used:

$ telnet cluster01 40001

Once connected to the Python console server it is possible to call script methods. For example, in FantasyDemo, the player entity is called Avatar, and it is possible to access its cell or base parts after a player logs in:

$ telnet cluster02 50001
Trying 10.40.3.4...
Connected to cluster02.
Escape character is '^]'.
Welcome to Cell App 1
Build: 13:34:56 Apr 12 2005
> BigWorld.entities.keys()
[4848, 4849]
> BigWorld.entities[4848]
Avatar at 0x08533FDC
> avatar = _
> avatar.playerName
'Trogdor the Burninator'
> avatar.beginTrade()

Accessing an avatar via the Python console on the cell

23.2. Performance Profiling

Python has helpful modules that can be used to profile your script. BigWorld exposes the method _hotshot through the watcher interface, to help with profiling BigWorld script.

To start profiling, set the watcher pythonProfile/running to true. The profiler outputs its log to a file with the filename specified by the watcher pythonProfile/filename, the file name being relative to the current working directory, most likely to be bigworld/bin/Hybrid. Setting the watcher pythonProfile/running to false ends the profiling session, and closes the profile log.

To inspect the log, use the module hotshot.stats.

For example,

$ python
> import hotshot.stats
> stats = hotshot.stats.load( "cell.prof" )
> stats.sort_stats( "time", "calls" )
> stats.print_stats( 20 )

Please note that there may be problems inspecting the profiling log, if the call stack depth gets smaller than when the session was started, or the session is stopped at a different call stack depth. Due to that, care must be taken when starting and stopping profiling from script.

It is also possible to use the module _hotshot in script. This may be helpful if you only want to profile specific parts of the script. A profiler object can be created, and then started and stopped over a specific method call.

For example, the following could be added to file fantasydemo/res/scripts/cell/Creature.py to profile the method Creature.onTime:

import _hotshot
profiler = None

def startProfiling():
  global profiler
  profiler = _hotshot.profiler( "creature.prof" )

def stopProfiling():
  global profiler
  profiler.close()
  profiler = None

class Creature( BigWorld.Entity ):
  def onTimer( self, timerId, userId ):
    if profiler:
        profiler.start()

    # Normal function body

    if profiler:
        profiler.stop()

This can be started and stopped with something like the following.

$ telnet bgserver 50001
Trying 10.40.3.4...
Connected to bgserver.
Escape character is '^]'.
Welcome to Cell App 1
Build: 13:34:56 Apr 12 2005
> import Creature
> Creature.startProfiling()
> Creature.stopProfiling()

For more details on the hotshot module, see the Python documentation at http://docs.python.org/lib/module-hotshot.html.

23.3. Common Mistakes

23.3.1. Definition Files Inconsistent Between the Server and Client

To ensure that the client can understand the data sent by the server, the definition files must be kept consistent between them.

A client will not be able to log in if it has inconsistent definition files. The LoginApp and the DBMgr produce the following error:

INFO: LoginApp::sendFailure: LogOn for 10.40.3.17:2254 failed 'Bad digest'

23.3.2. Implementation (.py) Does Not Match Definition (.def)

For each entity type, its Python script must implement the methods described in its .def file. The server will report an error if this does not occur.

For example:

ERROR: EntityDescription::checkMethods: class Avatar does not have method sendMessageToFriends
ERROR: EntityType::Type: Script for Avatar is missing a method.

23.3.3. Accessing Other Entities' Properties and Methods Not Declared in the Definition File

It is possible to access a property of another entity when it is on the same process as the calling entity. This is true for both base and cell entities.

This works during initial testing, when only one BaseApp and one CellApp are used, but is likely to not work when more BaseApps and CellApps are used.

Properties of remote entities cannot be written to directly (i.e. they are read-only), regardless of whether those properties are declared in the .def file. Also, only methods declared in the .def file can be called when the entity is on another application.

23.3.4. Trying to Update the Properties of a Ghost Entity

Sometimes the game design might have two entities moving through the world close to each other, as would be the case of an Avatar and a Bodyguard, or a Pet. Due to their proximity, developers might assume that they will always be located in the same cell as each other, and thus have one of the entities try to update a property on the other (e.g., self.bodyguard.armour=true, or self.pet.state=Alert).

Though this will not cause problems most of the time, it might happen that the two entities are separated by a cell boundary, and thus will only have access to the other one's ghost, which will cause the properties to be read-only.

To test for the existence of this kind of problem, CellApp has the configuration option treatAllOtherEntitiesAsGhosts. This option causes the CellApp to treat only its own entity as real, and all others as ghosts. For more details, see Data Distribution.

This debugging mode allows script writers to catch these errors immediately instead of leaving them lurking in the background to only appear on rare occasions.

23.3.5. Database backup and fault tolerance doesn't work for entities lacking a Base part

As noted in the Python API documentation, the writeToDB() method can only be called on entities that have a Base part. That means you cannot persist entities that do not have a Base part.

The server's first-level fault tolerance (which restores entities when a CellApp dies) also relies on those entities having a Base part. The state of the entity is backed up from the Cell to the Base part over time and if the CellApp that is hosting an entity's Cell part disappears, the Base entity will restore it to another CellApp. This does not work unless you create each entity type with a Base and Cell part.

This means that you may need to declare more-or-less empty Base entity definitions for entities that don't have any Base methods, just so that they can be written to the database and so that they will be restored in the event of a CellApp crash.

23.4. Fixed Cell Boundaries

To help the testing and debugging of the transitioning of entities between CellApps, it can be helpful to have fixed cell boundaries.

Typical things that could be tested this way include:

  • Controllers and entity extras implemented in extensions.

  • Script interaction of entities on different CellApps.

  • Streaming of entity properties.

To configure fixed cell boundaries in BigWorld, follow the steps below:

  • Start the server including at least two CellApps.

  • Make sure that cells are created on all CellApps by setting the configuration options cellAppMgr/cellAppLoadLowerBound and cellAppMgr/cellAppLoadUpperBound to 0.0 in file <res>/server/bw.xml. These options can also be changed with the watcher values cellAppLoad/ balanceLowerBound and cellAppLoad/balanceUpperBound of CellAppMgr.

  • Disable load balancing by setting the CellAppMgr watcher value debugging/shouldLoadBalance to false.

  • It can also be convenient to set the exact position of a partition. Currently, only the root partition of a space can have its position set. This can be set with the CellAppMgr watcher value spaces/<spaceID>/rootPartition, where <spaceID> is the space ID.

23.5. Message Reliability And Ordering

BigWorld is a networked, distributed system, and as such, script writers need to be aware of the reliability and ordering issues that can arise in such a system. All non-volatile messages are reliably delivered. BigWorld guarantees not only the reliable delivery but also the in-order delivery of some messages. These are:

  • Messages sent between Proxy and Client

  • Messages sent between the Base and Cell part of the same entity

  • Messages sent between two Base entities

  • Messages sent between any pair of server processes

  • Updates sent from a real entity to its ghosts

The offloading of Cell entities from one CellApp to another can cause some messages to be delivered slightly out-of-order. This means your game script may need to cope with method calls and property updates being slightly out-of-order if they are triggered by any of the following message types:

  • Messages sent between two different Cell entities

  • Messages sent between the Cell part of one entity and the Base part of another

  • Messages sent between the Cell part of one entity and the Client part of another

It is important to note that the probability of out-of-order delivery of these messages is directly proportional to the amount of packet loss on the network the server processes are running, i.e. this re-ordering cannot happen unless you are getting some degree of packet loss. We cannot emphasise enough the importance of using good quality hardware (both computers and network hardware) in your production deployment clusters, and having enough hardware in your clusters that you can run your game servers with an ample amount of CPU and network capacity to spare. BigWorld's experience with customer deployments has shown that inferior and/or insufficient hardware is likely to cause critical (i.e showstopping) problems at runtime.



[37] For details on MessageLogger, see the document Server Operations Guide's section Cluster Administration Tools Message Logger.

[38] For details on WebConsole, see the document Server Operations Guide's section Cluster Administration Tools WebConsole.

[39] For more information regarding this option see the control_cluster.py program help using the --help flag.