Making a card game is hard! Players want decks that are strong, but they also want decks that are unique and represent their individuality. They want the game to be fair, but they also want the game to be diverse and exciting. These games have hundreds of cards that interact in all sorts of complex and surprising ways, which simultaneously makes them very varied and very hard to balance.

The primary defense against broken cards is playtesting – you get a team of people together and have them play your game for months on end. In this process, they get a feel for each card’s strength relative to every other card and they can suggest balance changes. It’s a simple system, but it has some flaws.

These prototype Magic: The Gathering cards are used for rapid playtesting before the final card art is created.

For a start, people are expensive! Unless you’re a major studio, hiring even 1 or 2 dedicated playtesters can be a serious drain on your resources. But the worst part is that playtesters are actually pretty bad at balancing games. That’s not an insult, but rather a recognition of the fact that most games are designed to be hard to solve. If the average player could quickly figure out which cards are the strongest, the game would become stale almost instantly. Instead, card games want to encourage ambiguity and debate over which cards are better in various situations.

As a result, this means that playtesters often miss broken cards. Maybe the card just flew under their radar, or maybe it’s only broken when combined with several other cards that haven’t been fully considered. Or maybe the playtester likes some cards more than others, and therefore doesn’t give them all a fair shake.

The fallibility of individual playtesters pushes many card game companies to hire large numbers of testers. This way, the blind spots of one tester have a good chance of being cancelled out by all the other testers. But this makes the financial cost problem even bigger! And even in the case of Magic: the Gathering (which may be the most intensely playtested card game ever made) some broken cards will still slip through the cracks.

For the time being, playtesting is still the best way to balance games. But if we look just a few years forwards, maybe there’s something we can do to make the lives of playtesters easier.

The Rise of AI Gamers

We likely all know that Google’s AlphaGo defeated a world champion Go player in 2016 (a feat many experts thought was decades away!)

You’d be making a similar face if you had to play against AlphaGo

But did you know that AlphaGo is totally obsolete, these days? In 2017 Google created AlphaGo Zero, a new system that was stronger and also easier to train. And whereas the original AlphaGo relied on a database of professional Go games, AlphaGo Zero learned entirely from playing itself.

Just a few months later, AlphaGo Zero was also obsolete. Their latest system, simply named AlphaZero, is even stronger and even easier to train. But the truly stunning thing about AlphaZero is that it also plays Chess and Shogi. The system managed to reach superhuman ability in all 3 games in under 24 hours, purely by playing itself! AlphaZero is a (relatively) general-purpose gaming genius.

Google aren’t the only ones working on this kind of research. Last year, OpenAI created a bot that defeated pro players in 1v1 Dota 2 matches. Today, they’re working on a full 5v5 version.

It’s debatable whether Dota is really more complex than Go. But either way, this race is starting to heat up!

Hearthstone is a complicated game with lots of small decisions that influence your odds of winning. But it’s nowhere near as complicated as Go – a game renowned for it’s staggering complexity. If AlphaZero can play Go, Chess and Shogi, I see no reason why a similar system couldn’t also learn to play HearthStone to a superhuman level. I can only see 3 significant barriers:

1) Hardware. AlphaZero was trained on 5,000 TensorFlow TPU’s. That’s a lot of hardware! But computer hardware tends to fall in price very rapidly, meaning this training power could be very affordable in just 5 or 10 years. A large developer (like Blizzard) could easily afford it right now!

2) Software. As of right now, only Google knows the full details of how AlphaZero was built. But they’ve also released some research papers that explain the general architecture of the system, giving open source (and commercial) projects a chance to catch up. In 5 or 10 years, I’d be very surprised if the technology behind AlphaZero isn’t widely available.

3) Developer Will. Building an AI system to play your game at a superhuman level is a big undertaking (for now, at least). There are probably dozens of things developers would rather do with their time. But in the next section of this article, I’m going to explain how such a system could be used and why it would be so hugely beneficial.

AI Assisted Testing

Here’s the design process I’m envisioning:

  1. The dev team makes a first draft of some cards.
  2. Overnight, an AI system learns how to play the game to a superhuman level.
  3. In the morning, the AI system tells the developers which cards (and decks) were the strongest and by how much.
  4. The dev team can use this information to rebalance their cards. Repeat from step 2.

This process is very similar to what card game developers already do. The big difference is they get accurate data overnight, instead of approximate data after many days or weeks of playtesting. This could radically increase iteration time for card balancing, while also catching the most egregious balance mistakes well in advance of release.

Let’s look at a concrete example. The recent Kobolds and Catacombs expansion for Hearthstone contained a particularly troublesome card – Corridor Creeper. Here it is:

A 7-cost 5/5 may not seem all that powerful, but this card can be heavily discounted if you kill lots of your creatures. It also just so happens that one of the most powerful deck archetypes for the past 2 years has been pirate aggro – where the player runs a large number of small, aggressive pirate creatures that can swing in for a couple of damage before dying. This synergises perfectly with Corridor Creeper, making it an incredibly powerful card!

It was so bad that pro players Reynad and Firebat encouraged working Corridor Creeper and pirates into virtually every aggro deck (they are all neutral cards, meaning they can be played in every class).

Iksar, a Hearthstone developer who focuses on balancing cards, later said this about Corridor Creeper:

“I think the oversight was that we didn’t try it [in conjunction] with many non-token cards. There are 135 cards [in a new set], and we have a lot of time, but not enough to test one card in every single possible deck. So we use our best judgment, and in this instance our judgment call was wrong on Corridor Creeper. We still want to push things, because there are a lot of dangers in being too safe. Everything ends up underpowered and nothing changes.”

It’s fair enough – you can’t catch these things every single time just by playtesting. And they did eventually nerf the card (lowering its 5 attack to 2, something many would consider overkill). But a sufficiently powerful AI could have quickly converged on Creeper Pirates, letting them fix the issue long before the expansion was released.

Looking Forwards

This kind of AI system wouldn’t replace playtesters or designers. We would still need playtesters to evaluate how much fun a deck is to play and how much skill it requires. We would still need designers to create the cards and to decide on appropriate changes.

But by augmenting the design process with an AI we could take a lot of the guesswork, drudgery and fear out of balancing. We could spend more time making wildly creative cards, safe in the knowledge that any truly broken combos will be caught.

The technology for all of this is still a little ways out, but with some dedicated attention (especially by large developers) I think we could make it happen pretty soon! Even if things progress slowly, I predict that within 10 years most complex PvP games will be balanced in part by an AI.

How quickly do you expect this might happen? Are there any other ways in which these AI’s could take some of the heavy lifting out of game development? Am I just barking up the wrong tree? I’d love to hear your thoughts in the Reddit comments.


If you liked this article, why not read some of my other ones? (like this one on placing limits on fast travel systems).

I work full time as the sole developer on Patch Quest, a rouguelike-inspired exploration game for PC. You can learn about the latest update by watching this video, and you can help support me by subscribing or playing the game (it’s free!). Your support is what lets me make time to write these articles. Thanks again!