bw logo

Chapter 4. Cluster Hardware

A typical BigWorld network cluster is built from multiple multi-core machines and a high performance network connecting them. Correct planning of these components will allow for an easy and cost effective implementation. BigWorld recommends installing a test environment using similar but smaller scale hardware as the planned production environment and basing the production environment configuration on the test results acquired from this environment.

4.1. Choosing the Number of BigWorld Server Instances

When releasing your game, you may want to consider running more than one BigWorld server instance.

If you are releasing your game in multiple geographic locations, you may want to host your game in multiple locations so that your servers are closer to the players. Each data centre will need to run at least one BigWorld server.

Even inside a single data centre, there may be reasons to run multiple servers. If your game has multiple shards, you can choose to develop and run your game so that all shards run within a single server instance or you may choose to run a single server instance per shard.

Advantages of running a single server instance include:

  1. Machine resources are balanced between all shards.

  2. Easier handling of player movement between shards.

  3. Ability to send script messages between entities in different shards.

  4. Easier to manage as there are fewer databases, fewer accounts to monitor and deploy to etc.

Advantages of running a single server instance per shard include:

  1. Not needing to have a single network to run the entire game.

  2. Greater isolation of faults. If a single shard gets corrupted, it will not affect other shards.

A solution between these two extremes can also be used having multiple server instances each running multiple shards.

The Server Overview contains more details.

4.2. The Cluster Hardware

BigWorld recommends using brand name cluster machines with proven stability and performance. Reducing the variations between machines in the cluster can also reduce complex and increase flexibility. The BigWorld engine has not been extensively tested using virtual machines to host the server instance.

Using multiple cores with big L1 and L2 cache sizes is recommended as long as the cores are not overloading the NIC bandwidth. One server component (such as a CellApp or BaseApp) should be run on each core.

4.2.1. General Cluster Machines

Multi-core brand name machines are becoming the market standard and are recommended. We recommend taking into account the specific game requirements and the cost effective market solutions when choosing the cluster machine specifications. In general, our customers are currently using 2-4 core machines with 2-4 GB of memory. One CellApp or BaseApp application should be run on each core. Depending on the usage requirement, each machine should be equipped with one or two 1Gb NICs and with enough disk space to host the server resources, the server binaries and system log files. Appropriate free disk space should be available on the disk where the server is installed. Redundant PSU's should be considered to ensure high reliability.

BaseApp processes using the secondary database feature require disk space to write to their SQLite databases. Attention should be paid to ensure that there is enough disk space for these processes.

The machines should have enough memory to avoid swapping as this can lead to poor performance and even processes being shut down. Swap space is still recommended as there is still a chance that these situations can be handled.

For sizing the cluster, please review the section Sizing the Cluster.

Please note that we strongly recommend that the processor affinity for the server processes be left to the operating system to manage.

4.2.2. Database Machines

The hardware required by your database will be heavily dependant on how much load your database is put under. As with most databases the general rule is to use multiple cores, large memory and high performance reliable hard disks. RAID solutions should be considered for the MySQL hard disk. These should be implemented by a MySQL expert.

Setting up the database machine should be done by an experienced MySQL DBA taking into account the performance measurements taken in the test environment.

4.2.3. Server Tools Machines

The server tools components have greater disk requirements than normal cluster machines. You may wish to have a higher performance hard-disk in this machine to handle the load of logging for all server processes. It can also be a good idea to isolate logs to separate partitions. This ensures that if a single logging process runs away and consumes too much disk space, other processes still have some free disk space.

The general strategy used for long term storage of logs is to use logrotate to perform periodic log switches. Older log files should be archived in order to reduce the space requirements.

4.2.4. Secondary Storage Considerations

It is crucial that the master copy of the server binaries and resources and the MySQL database are stored on fault tolerant hard disks which are also backed up regularly.

BigWorld recommends against using NAS related solutions like NFS for the database as these solutions can cause increased network load and therefore degrade performance.

The BigWorld cluster has yet to be performance tested using SAN solutions but we expect these solutions to better fit the BigWorld cluster deployment.

When deploying the BigWorld cluster on separate machines without the usage of a SAN or a NAS solution, a solution should be used to distribute game resources between the cluster components. Any commonly used solution can be used for this issue. Some examples include:

4.2.5. Sizing the Cluster

BigWorld Technology supports multiple types of games. Some games include many NPC entities and complex server AI calculations while others include more player characters and less NPCs. The number of spaces can also vary between different implementations. This variety means that the cluster requirements will depend on the specific customer game implementation.

The best solution for sizing the cluster is to measure the BigWorld server instance requirements on a test environment and then to use these results to predict the required cluster hardware. BigWorld Technology allows adding machines to a cluster as well as changing the role of each machine. This allows easy adjustment of the cluster during both the testing and the deployment stages. Please note that extra resources should be allocated to any server to handle load spikes and unexpected events. The chapter Cluster Deployment Examples below contains examples based on real cluster deployments.

4.2.5.1. Cluster Machines

In order to establish the hardware specification required for a game, tests should be run in a representative test environment. We recommend creating a representative cluster which will include the same ratio of CellApps, BaseApps and other processes as planned for the production environment. A WoW style game should probably have a test environment for at least one complete shard while any style game should aim to have a test environment sized at least 10% of the planned production environment.

Scaling from the test environment to the production environment should be done in small steps to detect any bottlenecks or unexpected issues. This includes identifying changes that may need to be made in the game's script and other resources. In some cases, it may even require changes to the game design so early detect of these issues is important.

The testing stage should try to estimate the expected production environment while also monitoring less scalable components (like the CellAppMgr, BaseAppMgr and database components) to detect potential bottlenecks.

4.2.5.2. Cluster Network

The external-facing cluster machines would typically be connected directly to the Internet without the usage of any hardware based firewalls. The estimated required bandwidth per machine should be calculated based on the results of the test environment and extra bandwidth should be allocated to prevent problems due to network spikes. A rough estimate can be achieved by multiplying the requested downstream bandwidth per client by the expected number of clients. Extra bandwidth should be allocated for resending dropped packets as well as handling temporary spikes in the amount of data being sent. Outgoing bandwidth is generally higher than incoming.

In order to establish the network requirements, it is recommended to run the same network cluster as above and monitor network bandwidth and loss. Additionally, the network switch load should be monitored using switch monitoring utilities or other network interface monitoring utilities to measure the internal cluster bandwidth requirements and internet bandwidth requirements.

BigWorld Technology has multiple mechanisms to overcome packet loss issues but it is recommended that the underlying network error rate is within acceptable parameters (for example, less than .01% in unsaturated network conditions).

4.2.5.3. The Bots Process

The bots process should be used early and regularly to test with large numbers of simulated players. Time should be spent to create bot scripts that will simulate player behaviour. Server load and bandwidth usage should grow near linearly with the number of bots added. Attention should be paid, if this growth does not appear to be linear as this can lead to scaling issues in larger production environments.

4.2.5.4. Staged Release

It can be difficult to estimate the expected player base and expected concurrent player count. It may make sense to release your game in a stage manner to help predict the peak usage and cluster requirements as well as having an opportunity to identify any problems earlier with a smaller player base.

4.3. Cluster Configuration

4.3.1. Linux Distributions

For long term deployment environments, BigWorld supports and recommends the usage of one of the following Linux distributions. Be aware that 32 bit Linux distributions are not supported.

4.3.2. Smoothing load during startup

While some effort should be put into tweaking server configuration for operational stability, it is also important to consider the load of your system during server startup. Spiky load during server startup may reduce the speed that your cluster will startup and, in the worst case, potentially prevent server startup completely due to overloaded processes.

In order to assist spreading load between server processes and increasing space load time, the following options should be configured in your game's bw.xml file.

  • <desiredBaseApps>

  • <desiredCellApps>

  • <cellAppMgr/maxLoadingCells>

  • <cellAppMgr/minLoadingArea>

While these options will assist in smoothing out your server startup, it is important to take into consideration that the CellAppMgr configuration options apply across the entire lifetime of the server, not just during startup. For this reason it may be useful for your game script to implement a loading phase and an active game phase. During the transition between these two phases the configuration options such as <cellAppMgr/maxLoadingCells> can be modified via a Watcher set request. This also applies to any other load balancing options or other configuration options you may choose to tweak for game startup. See the Server Operations Guide for more details on these options.

4.3.3. MySQL Configuration

MySQL is quite well configured by default, however there are a few options that should be considered for tweaking to ensure optimal performance within your own production environment.

innodb_buffer_pool_size

MySQL documentation suggests that this system variable can be set to 80% of the physical memory size.

4.3.4. Network Configuration

The cluster network should be carefully configured to provide both security and performance.

4.3.4.1. Installing Multiple LoginApps

BigWorld recommends using DNS round robin in order to achieve load balancing when deploying multiple LoginApps. There are many good references available on the Internet explaining how to install and use this DNS feature.

4.3.4.2. Security Considerations

4.3.4.2.1. External Network Security

A typical BigWorld cluster includes multiple external facing machines as displayed in the image below. These machines include both BaseApp and LoginApp machines which are used for the client login and for communication with the other server components. All external facing machines should be carefully configured to prevent hackers from getting access to these machines. Additionally, BaseApp and CellApp Python code should be carefully reviewed to prevent any potential exploits. For further information please review the Server Operations Guide chapter Security.

BigWorld Server Instance components

4.3.4.2.2. Communication Encryption

BigWorld Technology ships with a default LoginApp RSA key pair to enable encrypted login communication, however all BigWorld customers receive the same pre-generated key pair. In order to secure the client connections for a BigWorld game, a new key pair should be generated and the public key distributed with the game clients. The key size directly impacts the performance of LoginApp. For instructions please see Encrypting Client-Server Traffic.

4.3.4.3. Default broadcast network routes and External Interfaces

4.3.4.3.1. Configuring internal interfaces

In order to ensure the correct operation of your cluster, it is important to correctly configure network routing on every host. From BigWorld's perspective, this means making sure that machines with multiple network interfaces (such as BaseApp and LoginApp machines) have the correct default network route established for broadcast network messages. BigWorld servers use broadcast messages as a mechanism to establish internal network interfaces as well as finding other servers in the cluster. Please review the Server Installation Guide for more details.

4.3.4.3.2. Configuring external interfaces

While default broadcast routes are used to identify internal network interfaces it is necessary to configure BigWorld servers to know which interface to use for external network communication. In production environments, the Internet facing network will generally have a separate network assignment to the internal network.

To configure your cluster to use the correct network interface use the bw.xml option <externalInterface>[1]. The recommended method for specifying external interfaces is with an IP address / netmask combination to avoid having individual bw.xml files per host. For example using the previous example host configuration we could add an entry as follows:

<root>
    <personality> FantasyDemo </personality>
    
    <parentFile> server/production_defaults.xml </parentFile>

    <externalInterface> 192.168.1.0/24 </externalInterface>
</root>

Although not strictly necessary, it is recommended to standardise the network interface assignment to assist with system administration. For example, all internal network interfaces could be configured as eth0 and all external interfaces should be configured as eth1.

4.3.5. BWMachined Version Consistency

While BigWorld strives to make BWMachined versions interact with each other as seamlessly as possible, it is recommended to make sure that all the installed versions of BWMachined within your BigWorld network cluster have the same version number. The easiest way to check this is from within WebConsole. Navigate to the Cluster Control module's All Machines page. WebConsole will display differently versioned BWMachined installation in red and will show a warning at the bottom of the screen if multiple BWMachined versions are installed within the same cluster.

Another way to see the same information using command line tools is to use the control_cluster.py command cinfo. Differing or older versions of BWMachined have their version number displayed at the end of each line. For example:

$ control_cluster.py cinfo
dev01      10.40.1.01       0 processes   0% of 1000MHz (7% mem)
dev02      10.40.1.02       6 processes   7% of 2200MHz (77% mem)
dev03      10.40.1.03       0 processes   0% of 2405MHz (2% mem) (v41)
fedora6    10.40.1.06       1 process     0% of 1300MHz (5% mem) (v41)
bwtools    10.40.1.100      1 process     2%, 0% of 2133MHz (30% mem)
5 machines total

In this example we can see that the machines dev03 and fedora6 have an older BWMachined version of 41 than the rest of the cluster.

If you want to quickly check the version of BWMachined on your current host and don't have access to the BigWorld server tools, you can query the installed BWMachined binary directly using either the init.d script as root:

# /etc/init.d/bwmachined2 version
BWMachined (BigWorld 2.0.0 Hybrid64. 11:18:12 Jul  8 2010)
Protocol version 42

or query the installed BWMachined binary using the --version command line argument:

# /opt/bigworld/current/bwmachined/sbin/bwmachined2 --version
BWMachined (BigWorld 2.0.0 Hybrid64. 11:18:12 Jul  8 2010)
Protocol version 42

Newer BWMachined versions are developed to support running older instances of the server, so updating to a newer version of BWMachined will almost always still allow running older servers.

4.4. Client Hardware Recommendations

The requirements for running the client are heavily dependant on the visual complexity of your game and the number of effects (such as particles) that are used for different quality levels. The following recommendations are based upon the FantasyDemo game environment and client as shipped with the default BigWorld Technology packages. We recommend running client performance and stability tests on different platforms to ensure compatibility and performance of the client on these platforms.

The other major factor that can determine client hardware requirements is the kind of terrain used within your game. For more information on the different terrain types available please refer to Client Programming Guide's chapter Terrain.

4.4.1. Advanced Terrain Recommendations

Recommended Minimum
GeForce 7600 or ATI Radeon x1600 GeForce 6600 128 MB or ATI Radeon 9600 128 MB
2 GHz CPU 2 GHz CPU
1 GB RAM (2GB if running Vista) 512 MB RAM
Windows XP Home 32bit or Vista Home 32bit Windows XP Home 32bit

4.4.2. Simple Terrain Recommendations

Recommended Minimum
GeForce4 Ti or ATI Radeon 9500 GeForce4 MX or ATI Radeon 7 Series
2 GHz CPU 2 GHz CPU
1 GB RAM 512 MB RAM
Windows XP Home/Pro 32bit or Vista Home 32bit Windows XP Home 32bit