During the early stages of testing Nuclear Dawn, it became clear that we needed a good way of analysing the game that we have implemented, in comparison to the game that we have designed. An example of this is weapon balancing; we need to be sure that the Avenger assault rifle for the Empire team is balanced against the F2100 for the Consortium team. While lots and lots of game-play testing is the main method of balancing, it definitely helps to have some solid statistics to help make decisions regarding balancing.
As well as balance testing, we also do performance testing, to make sure we aren’t slowly developing Nuclear Dawn into a slow unplayable memory hog. Like with balance testing, we use a variety of tools to keep the performance of Nuclear Dawn in check but the rest of these tools will be left for another blog post.
With the need for this data, we developed our own statistics system, which is much like Google’s Analyticsweb statistics tool. We track a variety of data points during our game-play test sessions, some of which are:
- >Weapon used to kill
- Shots fired per weapon, per player
- Weapon hits per weapon
- Position of killer/victim
- Class of killer/victim
- Structure Kills/Destructions
- Structure build locations
Along with the data collected during test sessions, we also run each map though a benchmark run after each and every revision to the game. This allows us to quickly and easily see what was changed in the game to cause a huge frame drop for example. Some of the data collected during these runs includes:
- Minimum, maximum & average render time (FPS)
- Min, max & average time in each code system (networking, physics, renderer, input system, UI, etc)
- Game/Map load time
- Maximum memory usage per map
- Memory usage for each type of data (model vertices, model textures, light-maps, render targets, etc.)
All of this data is recorded by each client, and then bundled and uploaded to a central server where the data is inserted into a Mongo database. As of writing this post, we currently have just over 53,500 game sessions tracked and stored using this system.
The great thing about having this amount of data, is you can run some interesting queries on it and generate some interesting graphs. As all of this data is time stamped before it’s saved into the database, we can create historical reports, and graphs for any time range we need to analyze.
Below, I have included some screen captures from our analytics system. Note that not all of the maps we will be shipping appear below, so don’t be alarmed that some maps aren’t shown .
The first graph represents the total memory usage for each map in megabytes over the total period this system has been active. The good news here is that the general trend of memory usage is going down. One point to note is the sharp drop half way though December, this when we finished merging the ND code base onto the L4D2 engine code branch. There are a number of reasons for this drop, but for the sake of this blog post, lets just say that there were a lot optimizations.
The next graph represents the total time it takes to load each map. The load times for the first part of the graph are so spread out because lots of factors affect load time, most of which is windows accessing the hard drive, causing the load times to be longer. Another reason for this is at different times the system will have different files cached in memory for different maps, which will cause unpredictable level load times. We have since improved the way the test machines perform the benchmark runs by “priming” the memory. This is where the test machines will load the same map twice before performing the test. This ensures that all of the content is cached in memory before the test starts. Obviously the tests now don’t account for HDD seek and read time, but we use this data for relative comparisons between other game revisions all using the same “priming” method.
The following graphs are all in relation to the “nd_toko” map.This graph shows how the texture memory usage has been optimized over time. Each colour is a different type of texture stored in memory. The units are megabytes.
The last graph is like the texture memory one, but shows how long each game component takes (on average) to process during each frame. As with the memory usage, this has been optimized over time. But as you can see, we have had a problem in the last few days which has caused the main renderer to take longer to process than it usually does, which results in lower FPS. This is definitely not good, but it will be resolved, and at-least you can’t say I have faked the results for this post!
The y-axis unit is “frame time” in milliseconds. The formula to calculate FPS from this data is: `1000 / frame time`. So before the recent problem, “nd_tokyo” was averaging about 133 frames-per-second on this test machine.
The last two images should be familiar with most people who have seen Valve’s HL2 Episode 2 stats page. This image effectively shows the worst place to stand if you want to stay alive. The “hotter” the colours (yellow, orange, red, clear) are the more deaths have occurred around that area of the map. The “cooler” the colour (green, blue) the fewer deaths have occurred in that area. The first image is a full historical death-map for “nd_tokyo”, and the second image shows a heat-map from a single testing session.
We do have quite a few other statistic graphs created for the data shown, but that’s for another time.