🤖My First AI
I was always curious how an AI in a game works, to be honest it is quite a complex topic, and I thought it could be interesting to learn how to do an AI for a small board game.
Instead of choosing the typical Tic Tac Toe, I decided to use Mancala. Mancala is a board game originally from Jordan whose rules are not very complicated and I though it could be a good start to learn how to do my first AI.
The goal of this post is not to explain how the game Mancala works, but rather focus on how I implemented the AI algorithm to play the game. If you aren't familiar with the game, I recommend you to read its rules and play one or two games to get a feeling.
Q-Learning
There are several algorithms to create an AI, but the algorithm I will use to build the AI is Q-Learning. As a summary, Q-learning is a reinforcement learning algorithm that finds an optimal action. It helps an agent learn to maximize the total reward over time through repeated interactions with the environment, even when the model of that environment is not known.
In the Mancala's case, it will store the number of beads on every bowl and its possible actions and consequences.
How the algorithm works
Imagine the following situation (state) S1 (the player mancalas are on the sides, in the example the player 2' mancala contains 5 seeds, and the player 1' mancala contains 8 seeds. The 12 pits are placed in two rows with 6 pits each row):
_____Player2_____________________________________________________ / _____ ____ ____ ____ ____ ____ ____ \ / | | [____] [____] [__1_] [____] [____] [____] ____ \ / | 5 | | | \ / |_____| ____ ____ ____ ____ ____ ____ | 8 | \ / [____] [____] [____] [____] [____] [__1_] |____| \ / \ /____________________________________________________*Player1_____\
The Player1 has only one option (action) A5. If it plays that action, then the game is finished and Player1 wins.
Then, the algorithm will save that, all previous states and actions that leads to this final state and action,makes it wins.
(S1, A5) = 1
Before this state, we had a previous one, e.g.:
_____Player2_____________________________________________________ / _____ ____ ____ ____ ____ ____ ____ \ / | | [____] [____] [____] [__1_] [____] [____] ____ \ / | 5 | | | \ / |_____| ____ ____ ____ ____ ____ ____ | 8 | \ / [____] [____] [____] [____] [_1__] [____] |____| \ / \ /____________________________________________________*Player1_____\
In this state S2, the Player1 has one action, A4, then, if it plays that action, the game is still on, so he would save something like this:
(S2, A4) = 0
So basically it will give a score to the different states and actions.
So by playing many games (training), the knowledge (states and actions) of the algorithm increases and it's able to make intelligent decisions.
So by playing many games (training), the knowledge (states and actions) of the algorithm increases and it's able to make intelligent decisions.
Playing a game
If you want to try it out, don't hesitate to check out my repo: https://github.com/manuelarte/mancala-go
No comments:
Post a Comment