Applied Math & Computer Science Lab
Data Analysis, Optimization & Mathematical Modeling, Artificial Intelligence, Neural Net For Everyday Life Applications
Artificial Intelligence/Data Mining Links Webmaster Resources AMCSL Forum: Web Mining Submit Link Archive

Reinforcement Learning for Tic-Tac-Toe



Introduction

In this project reinforcement learning is used for implementing Tic-Tac-Toe game. There are two scripts for learning tt.cgi and for playing play.cgi. The script for playing is using values obtained during the learning and saved in the text file. You can visit the working demo at Tic-Tac-Toe Game. Here is the latest version for Tic Tac Toe game program.

Learning

Learning can be done by running perl script tt.cgi from command prompt.
After results of learning are saved and transfered to web server someone can play with the program over the web. Learning is using a simple formula for update:
V(s)=V(s)+alpha(V(s')-V(s))
where s is state after move, s' is state before move
The learned values are saved in the the file level1.txt in the format s:V(s)
The board position s is represented by string of 9 digits where 0 for 0 , 1 for RL program or x and 2 for emty cell
During the learning the program counts the number of wins for each 500 games and prints all results in the end. That number always bigger in the end then in the beginning because the program starts without any knowledge how to play. (See table from Observations section)

Playing

After learning is completed the file with values should be copied to directory with the script play.cgi. The player always is doing move first by clicking on the cell and the script will mark that position and also make own move.

Observations

By increasing the number of games in learning we can improve learning results however there is limit. After 40000 number of wins for each 500 games during of learning is almost the same. But perfomance of such agent in the game with real opponent is bad. The demo is using agent trained in 400000 games. The perfomance is better. Learning can also be improved by changing alpha. Here is example of results of learning for alpha=0.02 , 0.4 and alpha=0.07-0.06*n/N

alpha\N50010001500200025003000350040004500500055006000650070007500800085009000950010000
0.02272 317 346 375 398 437 452 457 448 447 455 445 444 455 455 446 454 452 438 455
0.4290 339 374 380 376 383 380 406 394 410 405 412 414 411 426 405 410 395 412 404
0.07-0.06*n/N271 318 391 438 438 434 439 462 440 447 451 447 444 448 456 450 440 447 456 457
Thus we saw the example how reinforcement learning can be used for training agent to play in game programming.

Source Code (Perl)

Data file (learned values from training for using to play)
perl Source Code for Tic-Tac_Toe Game
Working Demo: Tic-Tac-Toe Game
Comments, suggestions

Links

Applied Math and Computer Science Lab




Add Me!      
ScriptSearch.com