bw logo

Chapter 19. Transactions and Handling Fault Tolerance and Disaster Recovery

We illustrate the previous chapters' guidelines with respect to handling fault tolerance and disaster recovery mechanisms in BigWorld by providing an example involving the use of transactions and how to make them work with fault tolerance and disaster recovery mechanisms.

19.1. Transaction logic

We give here an example of a trading transaction for transferring an item between two player entities.

Figure 19.1. Transaction sequence diagram

Transaction sequence diagram

The transaction logic between the two player entities Alice and Bob is as follows:

  1. Alice and Bob are within each other's Area of Interest (AoI). Alice's client informs her base entity that she would like to give Bob a Sword item.

    Alice's base entity is passed the player name of Bob as part of the request from the client, along with the representation of the Sword item within Alice's inventory.

    Alice's base entity adds an entry into her transaction list. This entry contains a unique transaction ID that identifies this transaction, Bob's player name, the state of the transaction (set to a symbolic constant called BEGIN), and the item in Alice's inventory.

    Alice's base entity removes the Sword from Alice's inventory.

    Alice requests a write to the database.

  2. When the write to the database calls back, it indicates whether the write was successful or not.

    If it was not successful, then there is a problem with the database, and the transaction is aborted (a message is sent back to the client informing Alice of this situation), and the Sword is added back to Alice's inventory.

    Otherwise, if the write to the database was successful, the transaction action starts with Alice's base entity requesting an entity base mailbox lookup based on the player name, via the BigWorld.lookUpBaseByName() method, and registers a callback to a functor containing the transaction ID.

  3. Alice's base entity gets notification with the base mailbox of Bob.

    If Bob's base entity can't be found, the transaction is aborted by removing the transaction entry from the transaction list, adding the Sword back to Alice's inventory, informing Alice's client and calling another writeToDB().

    Otherwise, Alice's base entity calls a method on Bob's base mailbox and requesting that it add the Sword item to his inventory. We pass the item, transaction ID, Alice's player name and a mailbox back to Alice's base entity.

  4. Bob's base entity adds the Sword item to its inventory (but marks it as unusable by Bob's client for the moment).

    Bob's base entity adds an entry into its transaction list, with Alice's player name, the state of the transaction set to the symbolic constant REMOVE, and the same transaction ID that was passed in from Alice.

    Bob's base entity then starts a write to the database, registering a callback to a functor object that holds Alice's base entity mailbox and the transaction ID.

  5. Bob's base entity is called back with the result of the database write.

    If it was unsuccessful, Bob should remove the sword from his inventory as well as the transaction entry in the transaction list (the transaction ID is stored in the functor callback).

    If the write was successful, the item should be marked as usable for Bob's client.

    Whether or not the database write was successful, Bob's base entity informs Alice of the success of the database write through her base mailbox that is supplied through the functor callback. Bob passes in the success flag, a mailbox to Bob's base entity and the transaction ID.

  6. Alice's base entity receives the result of the transaction from Bob's side.

    Alice removes the transaction entry from her transaction list.

    If Bob indicated that the transaction was unsuccessful, Alice re-adds the item back to her inventory informs the client of the trading failure.

    Alice writes to the database, and registers a callback to a functor that holds Bob's base mailbox and the transaction ID.

    Alice notifies Bob that the transaction on her side is complete, by passing in the transaction ID.

  7. Bob receives this notification, and removes the transaction entry with the given transaction ID.

19.2. Fault Tolerance Behaviour

19.2.1. CellApp Fault Tolerance

If the CellApp that Alice's and/or Bob's cell entity resides on exits, all cell entities that have base entities will be restored to another CellApp. With this example scenario and transaction as described, there is not much of concern with regards to behaviour of restored cell entities as the transaction only involves BaseApps.

However, suppose the inventory system implementation was such that the player cell entities required knowledge of items, for example, what item a player was holding in its hands, which would need to be a OTHER_CLIENTS or ALL_CLIENTS cell entity property so that other players could view the item that a player was holding. If the cell entity was restored from an older version of its cell entity data when it was last backed up to the base entity, there could be inconsistencies in the cell entity state with respect to the base entity state.

For example, if Alice was restored to another CellApp, her cell entity could check with her base entity whether she still owned the item that she was holding, and if not, her cell entity should remove that item.

Cell entities that are restored do not have their __init__() method called, instead, after they are restored with the cell backup data from their base entity, they have their onRestore() method called, and checks such as these can be done in this method to make sure the state is consistent with the base entity state.

19.2.2. BaseApp Fault Tolerance

If the BaseApp that Alice's and/or Bob's base entity resides on exits, those base entities will be restored to other BaseApps if they exist (if there is only one BaseApp, they cannot be restored).

As with cell entities, restored base entities do not have __init__() called on them, instead, they have onRestore() called on them when they are each restored from their most recent base entity backup data. This is a good place to do checks on uncompleted transactions.

For example, if the BaseApp that contained Alice exited, and Alice was restored onto another BaseApp (and perhaps Bob was too, and it could be a different BaseApp to where he was), then we need to replay any transactions that may have been underway.

For each transaction entry in Alice's transaction list, the entity needs to replay each transaction depending on the state that it's in.

For example, if it is in the BEGIN state, we resume the transaction from step 3 by looking up Bob's base entity, and continuing on.

If we are Bob, we may have transactions in the REMOVE state, and so we resume the transaction from step 6, and we tell Alice (or whoever the transactions' player name refers to) that they should complete the transaction on their end.

19.3. Disaster Recovery Behaviour

When we are starting the server and restoring from the database, the base entities will be restored, and each of these will have __init__() called on them. The variable BigWorld.hasStarted will be False for restored base entities, so we can do similar checks to what we have in the BaseApp fault tolerance section.

It is also the responsibility of the base entities to recreate the cell entities, usually via createCellEntity(). The space ID is archived with the entity when it is written to the database, and this is present in the base entity's cellData dictionary.