A team running a pizza themed game wants to see if giving out new unique toppings as part of a real-money starter pack would improve player conversion, engagement, and satisfaction. Some members of the team feel that this is the best direction forward, and also suggest increasing the price of the pack to ensure that the player perceives the value of the pack to be high. They believe that this change will have no downside. 

Other members of the team believe differently. They contend that monetizing pizza topics in this way risks turning players off, and mixing both that and a price increase is too much change all at once. This group favors separating the changes out to ensure that each change brings positive quantifiable results.

Dilemmas like this regularly confront game and product teams. Improving KPIs is always a big focus, but so is keeping the game fun and making it enjoyable. In the pizza example above, the dilemma is about the pace of change, between whether it’s fine to push through a big change and “get on with it” versus being slow and careful. So how should you resolve it? With A/B testing!

A/B testing has long been the norm when it comes to making changes in a gaming product. While this norm is most historically associated with mobile, today it applies to many kinds of game across almost all platforms (the main exceptions being platforms whose publishing policies make it inherently difficult). Its purpose is very simple: To make your KPIs trend positive. By testing different versions of games against two or more audience cohorts, you can measure how changes impact your KPIs, and therefore objectively know what is the right path to take rather than trusting your gut. 

However, while the purpose of A/B testing may be simple, the application of it as a product management methodology for games can be very complicated. Studios new to A/B testing often make many mistakes, from overcomplicating their test conditions to poorly applying statistical analysis to evaluate results. Proper A/B testing requires a significant degree of rigor and patience, and a willingness to interpret the results of a test honestly rather than rationalizing negative results. It also requires a high degree of clarity.

The best way that we have found to approach A/B testing is to be as clear as possible about the questions you’re asking, and for the team to agree on the goals of those questions. Once those two things are understood, A/B testing becomes a process through which studios diligently and systematically vary specific factors within their game and assess results. They generally vary only one factor while keeping all other internal and external factors constant and thereby understand the impact of the singular change. 

There’s a lot more to it than that, of course. Knowledge of the KPI roadmap, the available remote configuration variables that allow certain aspects of a feature to be tested and understanding the gravity of impact of a particular change are important when we start to A/B test. There are many considerations to be kept in mind through the process of A/B testing, such as these:

  • Always create an A/B test plan. An A/B test plan can be as simple as a series of statements or questions that you want to answer. Be clear about your questions, and make sure they have clear answers. An example of a good question would be whether changing the price of an item in game cause players to visit the Market. An example of a question you can’t really answer would be whether players like the change.
  • Loop engineering in early to give them lead time in case there are features required for your A/B testing.
  • Create a central tracking document to chart out your plan. This document should contain a list of your A/B tests, as well as their priority or order. It should include start date, end date, platforms, regions, the hypothesis you’re testing, KPIs, variants, results, and any other relevant links. Getting this together early will help keep you organized, and it can be reviewed and updated as you go. Use a consistent naming convention for your A/B tests including date and the topic eg “economy1b_2022620”.
  • If you’re just beginning to use A/B testing, run one at a time and see how the process works before trying multiple tests.
  • When you do run multiple tests, always ensure that the tests do not touch the same parts of your game. Do not, for example, run two separate tests on grain pricing simultaneously. If you do, the results will corrupt each other and leave you with no useful information. Good separations include things like testing changes on entirely separate levels. Poor separations include things like testing product price and pop up offers at the same time. 
  • Ensure that the user groups variants do not overlap in any way. For example, if some players in a multiplayer game experience a change in matchmaking, but others do not, don’t matchmake them together in battles.
  • If you need results quickly try to limit the number of variants you test at once. More variants require more users, and tend to produce less definitive results.
  • If you need new users for your test loop in marketing/UA early. If you’re testing FTUE completion rates or changes meant to impact early retention you’ll need new users and a user acquisition plan that will get you qualified users when you run the test in sufficient quantities.
  • Don’t start new tests while others are still running. Always call and stop all current tests before running new ones to keep results clear and avoid muddying the results of both sets of A/B tests.
  • Try to split tests and test results by platform (Android vs. IOS). There can be significant differences in how users from each platform interact with a game (possibly due to culture, demographic, economic status etc) such that mixing the two often leads to corrupted results. 
  • Do not stop a test before the results are in. This sounds obvious, but if, for example, you’re testing the impact of a change on day-14 retention, you need to run that test for 14 days after you have attained enough users. If you stop the test midway, users who experienced the change likely get dumped back into the general population with an experience that alters their behavior compared to existing users. This makes the day-14 test data invalid.
  • Tell your community when changes are being – switching to the winning variant can derail player strategy and can cause player dissatisfaction. Even if you end up choosing your control consider messaging players in the variant and taking necessary steps to ensure that they don’t have issues shifting back.
  • Finally, remember that not all A/B tests are the same, and hence all of the above factors need to be weighed for each test rather than just assuming they will work. 

Happy testing!

(Photo by Letizia Bordoni on Unsplash)