Reinforcement Learning for Tic-Tac-Toe
Introduction
In this project reinforcement learning is used for implementing Tic-Tac-Toe game.
There are two scripts for learning tt.cgi and for playing play.cgi.
The script for playing is using values obtained during the learning and saved in the text file. You can visit the working demo at Tic-Tac-Toe Game.
Here is the latest version for Tic Tac Toe game program.
Learning
Learning can be done by running perl script tt.cgi from command prompt.
After results of learning are saved and transfered to web server someone can play with the program over the web.
Learning is using a simple formula for update:
V(s)=V(s)+alpha(V(s')-V(s))
where s is state after move, s' is state before move
The learned values are saved in the the file level1.txt in the format
s:V(s)
The board position s is represented by string of 9 digits
where 0 for 0 , 1 for RL program or x and 2 for emty cell
During the learning the program counts the number of wins for each 500 games and prints all results in the end.
That number always bigger in the end then in the beginning because the program starts without any knowledge how to play. (See table from Observations section)
Playing
After learning is completed the file with values should be copied to directory with the script play.cgi.
The player always is doing move first by clicking on the cell and the script will mark that position and also make own move.
Observations
By increasing the number of games in learning we can improve learning results however there is limit.
After 40000 number of wins for each 500 games during of learning is almost the same. But perfomance of such agent in the game with real opponent is bad.
The demo is using agent trained in 400000 games. The perfomance is better.
Learning can also be improved by changing alpha. Here is example of results of learning for alpha=0.02 , 0.4 and alpha=0.07-0.06*n/N
| alpha\N | 500 | 1000 | 1500 | 2000 | 2500 | 3000 | 3500 | 4000 | 4500 | 5000 | 5500 | 6000 | 6500 | 7000 | 7500 | 8000 | 8500 | 9000 | 9500 | 10000 |
| 0.02 | 272 | 317 | 346 | 375 | 398 | 437 | 452 | 457 | 448 | 447 | 455 | 445 | 444 | 455 | 455 | 446 | 454 | 452 | 438 | 455 |
| 0.4 | 290 | 339 | 374 | 380 | 376 | 383 | 380 | 406 | 394 | 410 | 405 | 412 | 414 | 411 | 426 | 405 | 410 | 395 | 412 | 404 |
| 0.07-0.06*n/N | 271 | 318 | 391 | 438 | 438 | 434 | 439 | 462 | 440 | 447 | 451 | 447 | 444 | 448 | 456 | 450 | 440 | 447 | 456 | 457 |
Thus we saw the example how reinforcement learning can be used for training agent to play in game programming.
Source Code (Perl)
Data file (learned values from training for using to play)
perl Source Code for Tic-Tac_Toe Game
Working Demo: Tic-Tac-Toe Game
Comments, suggestions
Links
Applied Math and Computer Science Lab
ScriptSearch.com
|