Table of Contents
This chapter discusses some general issues that programmers implementing game systems should keep in mind, and also provides some example design and implementation of common systems used in MMOGs.
In general, server processing load, internal network bandwidth and external network bandwidth scales linearly to the number of players if player and entity density remain constant. There is a small extra cost as density increases.
Capacity can be added by:
-
Adding more BaseApps for more external connection points and connection processing capacity
-
Adding more CellApps for more spatial processing capacity
-
Adding a combination of both more BaseApps and more CellApps to increase game script processing capacity
CellAppMgr, BaseAppMgr and DBMgr are single instances and are theoretically scaling bottlenecks. CellAppMgr and BaseAppMgr are only concerned with managing CellApps and BaseApps and have very low load. They can scale to handling thousands of BaseApps and CellApps. Although BigWorld's design does not make heavy use of the database, the main concern for scaling is DBMgr. This is addressed below in BigWorld Database Scalability.
The number of BaseApps and CellApps required to sufficiently service an entity population should generally scale linearly with the number of entities. Most communication between entities is with those that are nearby. This is handled by keeping those entities together on CellApps as much as possible. Other communication involves point-to-point communication using remote method calls. The main issue here is to try to minimise situations where entities need to be looked up globally.
The DBMgr functionality for writing out entity state is distributed across the secondary databases for each BaseApp and consolidated when the entity is retired. See BigWorld Database Scalability below.
The general strategy to combating bottlenecks in game script is to avoid global game systems where possible, such as having singleton entities that control some operation of the game, for example, trading. In general, these bottlenecks can be avoided by restructuring game script and using distributed object methods to implement such global sub-systems, rather than entrusting the request handling to a single entity. An example of this is presented below (see AoI-based trading below).
Updates to some of the entities in a player's AoI are propagated to the player's client every game tick (by default, gameUpdateHertz is 10Hz). The amount of update data sent to the player's client is constrained to a downstream bit rate (by default, bitsPerSecondToClient is 20kbps).
These updates consist of property changes and method calls. Every cell entity keeps a history of these changes, and for each entity in a player's AoI, the player is updated incrementally about that entity periodically. The position and direction data of an entity is specially treated so that only the most recent value of these properties (so called volatile properties) is sent to the client, instead of the full history of the property.
Internally, entities in an AoI are in a priority queue. The priority of an entity in a player's AoI determines how long it will be before the next update about that entity occurs to the player. Generally speaking, entities closer to the player are updated more frequently than those that are towards the edge of a player's AoI. Properties can have Level of Detail (LoD) rules applied so that these properties will only be updated if the entity is close enough to the player. See the chapter LOD (Level of Detail) on Properties.
In general, many game operations are localised to the specific area that a player inhabits. Load balance partitioning is done across each space depending on the load being generated per cell. As entity densities increase, the partitioning scheme changes in response to equalise the load amongst the cells servicing a space. The amount of data per entity that is sent to the client is also reduced as the density of entities increases. This reduces a lot of the extra cost due to density and also makes good use of the client's bandwidth.
However, very high entity densities can cause problems by causing each periodic update to a player client to be overrun with excessive amounts of entity event data. Recall that the amount of downstream bandwidth is a configurable constant. Due to the prioritising of change events of entities in a player's AoI, this can cause updates of entities further away to be starved if there are many more entities that are closer to the player.
Increasing the downstream bandwidth can improve on this situation, but eventually, it is usually the client that becomes the limiting factor. There is a per-entity cost of processing on game clients, for example:
-
processing notifications for each entity's position and direction
-
processing notifications for each entity's property changes
-
processing notifications for each entity's method calls
-
applying physics rules to each entity
-
rendering of each entity
There is also a limit to the amount of information that a player can comprehend. With large numbers of entities nearby, less information tends to be needed for distant entities.
Extreme entity densities that can negatively affect the end-user experience can be avoided with good game design.
A brief discussion of the operation of DBMgr and implications for scalability follow.
When entities are checked out of the database, they are assigned to the least-loaded BaseApp. Once entities are loaded onto a BaseApp they do not generally migrate away from that BaseApp unless that BaseApp process terminates, in which case they are restored on other BaseApps in the system. See the chapter Fault Tolerance.
For each entity that resides on it, the BaseApp is responsible for
collecting all the explicit script writes (from calls to
BigWorld.writeToDB()
) for that entity over its
checked-out lifetime, as well as the periodic backups for that entity.
These writes are performed on a secondary database stored on the BaseApp
machine. There can be arbitrarily many BaseApps in a cluster, and entities
are statically load-balanced across them when they are instantiated. That
is, they are assigned to the least-loaded BaseApp.
When the entity is destroyed, it is checked back into the primary database, this results in the sum of the database writes in the BaseApp secondary database for that entity being consolidated back into the primary database on the DBMgr.
This consolidation can be a bottleneck, and there are future features planned to reduce this so as to not overload the DBMgr. In general, writing back to the primary database is not a time-critical operation, so that checking entities back into the database is a fire-and-forget operation. No data loss is possible as the data is persisted on the secondary database, and not removed until the consolidation for that entity is done.
On server shutdown, all checked-out entities have their database writes consolidated back into the primary database. If an unexpected failure occurs (e.g. power failure), this consolidation can take place on the next server startup.
The following operations on the DBMgr can still be a bottleneck:
-
checking login credentials
-
handling lookup requests for entities by name or database ID
-
loading entities from the persistent storage
-
writing entity state when entities are checked back into persistent storage
In practice, looking up which BaseApp an entity is checked out to
(by name or database ID) is a read operation and comparatively inexpensive
due to the underlying MySQL query cache. However, schemes such as the
PlayerRegistry
entity (see Player Lookup below) can
be implemented which can offload the task of handling lookup requests from
the DBMgr to game script running on arbitrarily many BaseApps.
However, the global DBMgr process will still place an implicit limit on how quickly entities can be loaded from, and saved to, persistent storage. Future improvements being considered include sharding the database to spread this load over many external databases.
Each player must be able to query the status of another player by name:
-
whether or not they are logged in
-
if they are logged in, get their player mailbox
Using BigWorld.lookUpBaseByName()
causes
a query to DBMgr (which causes a read on the primary database), while
sufficient for many scenarios (and empirically works in many released
BigWorld-based games), introduces a potential bottleneck. The discussion
below outlines a design for a distributed mapping of player names to
player mailboxes which effectively offloads this load to BaseApp game
script, which can be scaled up by adding more BaseApps.
The idea is to have multiple PlayerRegistry
Base entities that contain a distributed mapping of player names to
player mailboxes. These PlayerRegistry
entities
have no geospatial representation, they exist only as a system service,
and so generate no load with respect to AoI updates.
Each BaseApp has a corresponding
PlayerRegistry
entity - this spreads the
PlayerRegistry
entities out and protects against
BaseApp failures. Having more than one
PlayerRegistry
entity per BaseApp does not add
any additional redundancy benefit.
PlayerRegistry
entity instances register
themselves globally. Globally registered bases have their mailboxes
registered under a string key in a global bases mapping that is
synchronised across every BaseApp. Player names are hashed against the
known number of player registries, and a particular
PlayerRegistry
instance is located via the Global
Bases mechanism (see Global Bases).
When player entities are created, they add themselves to the
distributed registry by hashing their own name to the appropriate
PlayerRegistry
entity, and registering their base
mailbox with that PlayerRegistry
entity. On
logout, they contact that same PlayerRegistry
to
notify it of the logout, and this results in the removal of the mapping
between that player name and that player mailbox. A scheme for
rebalancing the player registry entries can be implemented which
re-balances the entries across the PlayerRegistry
entities when a new PlayerRegistry
is added or
removed, such that the hash scheme remains consistent.
Queries for a particular player name are done by first hashing the
player name to be looked up to the appropriate
PlayerRegistry
, and then querying one of the
multiple PlayerRegistry
entities via a remote
method, and a callback remote method with a mailbox. Requests for player
lookup are asynchronous, and caller entities implement a callback method
that is called back when the lookup is complete.
Each PlayerRegistry
needs a persistent
mailbox list for fault tolerance purposes, so that the registry is
restored to another BaseApp along with the
PlayerRegistry
entity if the BaseApp it formerly
resides on fails. In this case, it is likely that it will be restored to
another BaseApp which already has its own
PlayerRegistry
, so re-balancing should be done
and then the restored PlayerRegistry
should be
destroyed.
This system can be scaled up by increasing the number of BaseApps to handle queries. Tiered request schemes could also be used to avoid large numbers of globally registered base entities becoming a bottleneck.
Each player maintains a list of other players that they can use for the following purposes:
-
to contact a friend
-
send private messages to friends
-
presence updates
Assume that friendship relation is symmetric, so that if A is on
the friend list of B, then B is on the friend list of A. A friends list
can be implemented as an ARRAY of FIXED_DICT
consisting of a STRING name property, and the
MAILBOX of the player (or None
if offline),
and a UINT8
Boolean flag hasResponded
indicating that this player has responded to our request to add that
player as a friend.
Let the player adding the friend be called Player A, and the friend being added to Player A's list be called Player B.
-
Player A checks that Player B is not already in A's friends list. Player A uses Player B's name to look up B's status and mailbox (if online) via the Player Look-up mechanism.
-
If B is not online, then we fail the operation. A scheme could be implemented that accommodated this situation, but for the sake of simplicity, it will not be discussed here.
If Player B is online, then Player A adds Player B to its friend list, setting the
hasResponded
flag toFalse
, and writes itself to the database usingBase.writeToDB()
, registering a callback when the database write completes. -
If the write fails, then we rollback the friends list by removing Player B's FIXED_DICT element, and abort this process, and inform Player A's client of system error.
Otherwise, the write is completed successfully, and so Player A informs Player B via remote method call to add Player A to Player B's list, passing along Player A's name and mailbox.
Periodically, Player A resends any such outstanding requests (indicated by
hasResponded
beingFalse
in the friends list), every, say, 3 seconds. The mailbox for each of these resends should be looked up each time, in case Player B has been restored to another BaseApp or if Player B has logged off and/or back on again. If Player B is not online during a retry, then the operation fails and Player A's client is informed that Player B is not online. -
Typically, Player B won't already have Player A as a friend, and so Player B adds to its local friends list by creating a FIXED_DICT element for Player A containing Player A's name and mailbox, and sets the
hasResponded
flag toTrue
. A write to the database is requested with a callback.Player B may already have an entry for Player A in its friends list. This can happen if Player A and Player B both simultaneously attempt to add each other as friends (in which case
hasResponded
will beFalse
). It can also happen if Player B is restored to another BaseApp or is destroyed and re-created during the wait for the database write, or the database write takes so long that Player A has resent the request, and in these cases,hasResponded
will beTrue
.If the
hasResponded
flag isTrue
, then it signals to Player A that the operation succeeded straight away. If thehasResponded
flag isFalse
, then it should be set toTrue
, and the database written to and called back from before signalling success to Player A. -
Typically, the write succeeds, and so Player B calls back on Player A to indicate that the request was successful.
In the exceptional case, the write can fail. Player B removes Player A's FIXED_DICT entry in its friends list, and calls back on Player A to indicate that the operation failed.
In this scenario, Player A should try to remove Player B's FIXED_DICT element, and this should be made persistent by writing Player A to the database. However, there's a chance that Player A fails this second database write while its earlier database write succeeded, making Player A's friends list inconsistent in the database. There are some ways of handling this situation:
-
Do not remove Player B's FIXED_DICT entry in Player A's list, and instead have Player A retry the request to Player B periodically until Player B responds with success.
-
Do remove Player B's FIXED_DICT entry in Player A's write to database periodically.
Both of these approaches assume that the database write failures are a temporary phenomenon. It could be caused by, the BaseApp secondary databases not having enough disk space, which is cleared up when the system administrator makes more space. A retry count could be kept that would remove the FIXED_DICT entry from the friends list after the retry count exceeded some threshold, and Player A's client should be informed of failure.
-
-
On a successful callback from Player B, Player A sets the
hasResponded
flag toTrue
. A database write is not necessary at this point, as the periodic backup and archival systems can be relied on to save this out eventually. In the event that the system is restarted or Player A is restored to another BaseApp, the periodic retry of FIXED_DICT entries withhasResponded
set toFalse
will get a second successful callback, and will eventually be written out.
Adding to friends lists is not expected to be a frequent operation on average over the entire player population, and players are typically spread out across the available BaseApps.
Removing a player from a friends list can be done in a similar fashion.
See the section on Chat below. Once you have a player mailbox, a chat message can be sent to them using a simple remote method call.
Presence notifications can be implemented simply by calling a
method on each player in that player's friends list indicating that
they have logged in or logged out (which signals that the mailbox is
invalidated, and should be set to None
in the
corresponding FIXED_DICT in the
ARRAY).
Player status notifications (e.g. away from keyboard) can be done in a similar way. Player base entities inform their clients of any change in the status of any friends, so they can update a user interface to the friends list.
The friends list can be used as a cache of player mailboxes
while those friends are logged in, and do not need to use the general
Player Look-up mechanism in order to communicate with their friend
player entities. Friend mailboxes are set to None
when
the friends log out.
Caches are not required to be persistent, and so do not add any additional processing cost to the database.
When a player is restored, some of the friends may have come online or offline (or come offline, and then online) in the time since the player and the friends list was last backed up. At restore or initialisation time, player entities should perform look-ups on all the players in its friends list. It should also notify all online friends of its new mailbox when restoring.
-
P2P chat
-
AoI-based chat
-
Channel-based chat (includes guid chat, world chat)
Players need to be able to send messages to other players. Players are identified by name.
See the section on Player Look-up above. Chatting from one player to another player involves the following:
-
the mailbox of the destination player needs to be acquired. This can be done in one of the following ways:
-
supplied by the player cell entity as the destination entity is in the player's AoI
-
a local look-up in your friends list mailbox cache
-
a look-up of their mailbox using the Player Look-up mechanism described above
-
-
calling chat remote method on that mailbox with the chat message contents.
To save the remote method cost of look-ups, player mailboxes can be cached on the player entities as a non-persistent entity property. For example, private messages to non-friend players tend to result in conversations, and so having a local cache of player name mapped to player mailboxes will save on a look-up each time a further chat message is sent.
Players need to be able to broadcast messages to players in their immediate spatial vicinity.
AoI-based chat can be implemented as a broadcast remote method
call to all player entities that have the speaking player in their
AoI. This does not require looping through all entities in script, and
is implemented efficiently on the CellApp. The chat method call is
broadcast to client entities using the same mechanism as any other
broadcast method call, or when an ALL_CLIENTS
or
OTHER_CLIENTS
property changes.
Volatile distance constraints can be specified for that chat method call so that only players within a certain radius of the originating player receive the method call message.
Non-AoI-based chat channels are chat channels of entities that are not necessarily in the same spatial location. This could be used for guild-scope chat and world-scope chat.
A non-AoI-based channel can be implemented as a
ChatChannel
entity that contains a list of
player mailboxes of the players that are connected to that chat
channel.
When a player wants to connect to a channel, a channel look-up
is performed for the particular ChatChannel
entity. This could be done via a similar scheme to the Player Look-up
scheme described above. Once a mailbox to the channel is found, the
player registers its base mailbox with the
ChatChannel
entity, which adds it to the list
of connected player mailboxes.
A connected player broadcasts to that channel via a remote
method call with the contents of that channel. The
ChatChannel
entity is responsible for
broadcasting that message to each of its connected player base
mailboxes.
Each player must have the ability to send mail to other players. This mail includes some text and optionally in-game items.
The scalability of SMTP/IMAP mail servers can be leveraged here. Note that these game mail servers are completely internal to the game - no public access would be allowed (though this would be up to the game design).
Each player has an associated email address. BaseApps can query IMAP servers asynchronously using a TCP socket registered with BigWorld, without blocking game script. Python has good support for communication with IMAP over a socket (see the chapter Non-Blocking Socket I/O Using Mercury)
Items can be gifted using special attachments or special email headers, depending on the item system used. Item data would never be directly sent via email, instead, gifted items over email would be held in escrow, as with AoI-based player item trading. See Inventory and Item trading below.
Assume a game inventory system with the following features:
-
Items are instances of a finite set of item archetypes
-
Each item instance has associated with it customisations that differentiate it between other instances of the same item archetype. These customisations may be visual customisations, different attributes (e.g. durability, bonus to strength, etc.)
Store a fixed amount of inventory item slots per-player on the player entity to limit the amount of inventory data that is associated with player inventory.
Some popular MMOs have the concept of banks where players must be in a specific area to access items stored at the bank. This could be a separate entity that is loaded on request when a player is accessing their bank, and then destroyed once they leave the bank. The capacity of the on-player inventory and the bank inventory could be tuned to optimise database load.
With the on-player inventory, this can be stored as a BigWorld ARRAY of item descriptors. Item descriptors themselves would be persisted as a BigWorld FIXED_DICT, but could be class-customised when loaded from the database so that items are represented in script as an arbitrary object type.
Player inventory changes are expected to be frequent. Per-element changes in a BigWorld ARRAY are propagated to the client with a description of the change path to that element and the new element value (i.e. the entire array is not sent from the server to the client each time an element is changed).
If each time the inventory is changed, the entity wrote its state out to the secondary database, then a bottleneck can occur, as this operation is expected to be frequent.
In this case, we rely on the fault tolerance mechanism for ensuring against data loss. This works by periodically saving out the state of the entity to another process. That other process is responsible for restoring the entity in the event of a process failure. For example, cell entity data is backed up to the corresponding base entity's BaseApp, and base entities are backed up to other BaseApps. This is the first level of fault tolerance, and the frequency of backups can be configured.
There is also a second level of fault tolerance, which is the periodic archiving of the base and cell entity state to the secondary databases. The frequency of this can similarly tuned to achieve optimum BaseApp load.
However, for important changes to the item inventory, for example,
a quest item, game script can request a write to the secondary database
and have that confirmed via an onWriteToDB()
callback. For trading transactions between two players, see
below.
Player entities in the same spatial vicinity must be able to negotiate trade of items that they own.
Each player makes an offer to each other, placing their offered items in escrow. Once both players accept the opposing player's offer, the trade succeeds and the items are traded. If one player cancels the trade, all offered items are returned to their respective players.
Item trading transactions must not result in duplicate items or item loss.
BigWorld can readily supply the base mailboxes of any player entity in a player's AoI. Otherwise, if trading with a specific person not in the player AoI, a player look-up is required.
Escrow
entities are created for the
lifetime of a transaction, and hold mailboxes to the two entities
bartering. Escrow
entities persist to the
database. Trading consists of two stages, the negotiation stage and the
transfer stage. Escrow
entities are created on
the least loaded BaseApp.
The negotiation stage is a series of offer operations made from a
player entity to an Escrow
entity, each of which
is then forwarded to the opposing player entity.
If the server stops in the middle of a transaction, the
Escrow
entity has enough persistent information
to cancel itself on restore and return items back to their owning player
entities.
Player entities on the server offer items to the other player (in
response to GUI interactions from their player client) in the form of
remote method requests to the Escrow
entity. In
doing so, they transfer these items from their inventory to a special
holding area on the player entity on the server. This holding area is
not accessible for any other purpose by the player's client, other than
to remove the item from the current offer, which moves that item back
into their inventory.
Each transfer to/from their player inventory to the trade holding area results in:
-
notification of a change in items being offered to the
Escrow
entity via remote method call -
a database write on the
Escrow
entity -
an acknowledgement remote method call from the
Escrow
entity back to the originating player entity -
removal of the items from the holding area, and a database write on the player entity
If, for some reason (temporary or otherwise), the database write
fails, the entire trade is cancelled, and the items are returned to the
players via remote method calls, which are acknowledged via a remote
method call by the players back to the Escrow
entity. When the Escrow
entity receives
acknowledgements from the two players that the trade has been cancelled,
it deletes itself from the database.
Each player can signal to the Escrow
entity
that it is willing to accept the trade as it stands. Once the
Escrow
entity receives positive notification for
both parties, it transfers ownership of the items to the corresponding
opposing players by signalling to the player the item data that they
have traded.
-
The
Escrow
entity transfers ownership to each player their corresponding traded items -
On receipt of the items, each player initiates a write to the database. When this is confirmed to be OK, the player entity acknowledges that they have the items by calling back on the
Escrow
entity. -
The
Escrow
entity waits for both acknowledgements to return, and then destroys itself and deletes itself from the database.
Total database writes: 2 for each offer made, and at least 2 offers are made. 3 writes for the transfer stage.
This illustrates that trading can potentially be an expensive
operation in terms of writes to disk. However, all the writes are
distributed amongst the entities involved, and most would be written to
the secondary database. Only one of the database writes, when the
Escrow
entity is destroyed, results in the
primary database being utilised, in order to remove that
Escrow
entity from persistent storage.
Note that each participating entity in a trading transaction is not required to be on the same process. This scales well because there can be an arbitrary number of BaseApps, and players and Escrow entities would be uniformly distributed amongst the BaseApps. Recall that while CellApps have player distributions that map to where they are spatially, base entities on BaseApps do not have this spatial relation.
There is a cost to the primary database associated with the
creation and destruction of each Escrow
entity.
This design can be improved by consolidating the escrow operations to
target a pre-existing EscrowManager
entity rather
than creating and destroying Escrow
entities. A
similar scheme could be implemented to the
PlayerRegistry
entities by having an
EscrowManager
entity per BaseApp. Trading
entities would nominate and agree on a random
EscrowManager
to use for their trading
transaction.