Projects:2017s1-121 Learning Procedural Knowledge using Random Forests
Contents
- 1 Project Team
- 2 Abstract
- 3 Introduction & Motivation
- 4 Objectives
- 5 Background
- 6 Project Method
- 7 Project Results
- 8 Project Conclustion
- 9 Future Project Recommendations
Project Team
Students: Woong-Ji Choi and Yiming Shitao
Supervisor: Dr Braden Phillips and Xin Yuan
Abstract
This project aims to produce artificially intelligent(AI) programs, or agents, that are automatically generated using a learning process. The agents are written in a production rule language and are created by inferring decision trees. A decision tree describes a series of conditions that link observed variables to a conclusion. Decision trees can be generated automatically (or inferred) from training data and random forests of inferred trees have often proven to be very effective classifiers. Although it is less common, decision trees can also be used for AI control of agents. In effect, the agent's behaviour is learned by the system rather than programmed in to it. As outcome of the project, it was possible to produce AI agents by using random forest for simple games such as Tic-Tac-toe and Connect 4. However, the behaviour of the AI agents depend strongly on the quality of training data and gathering these data is a time-consuming process. Also, the AI agent was only able to make decisions based on current environment state and can not foresee the future outcome. Hence, it was concluded that using random forest alone is not an optimal way of developing more complex AI agents. As a solution, it was suggested that a further research needs to be carried out to overcome the limitations of random forest by combining it with reinforcement learning.
Introduction & Motivation
In current technology, a computer requires a set of specific rules or an algorithm that pre- defines the process to execute any given task [1]. Therefore, successfully completing a task relies on the presence of a precise and extensive set of instructions. Existing computer systems lack the ability to: execute tasks through examples and previously executed tasks; observe and imitate experts to acquire new abilities; or improve on errors previously made [1] [2]. As computing tasks continue to become more complex, manually defining and implementing these rules is challenging and time consuming, hence demanding extra time and expertise from programmers. Machine learning methods have the potential to overcome these limitations [1] [3]. Machine learning involves the learning process of algorithms which can automatically be generated from studying data, learning from examples (models) and a process of developing computer systems or programs which automatically improve with experience [4]. In recent years there has been great progress in the use of machine learning for classification tasks [5] [3]. Using machine learning to generate agents is also possible and is beginning to emerge as an important research topic. Significant examples from recent years include developing new reinforcement learning algorithm for robot soccer [6] and applying machine learning strategy for controlling mobile robots [7] . Similar to computer programs and algorithms, developing an artificial intelligent (AI) agent is also usually rule-based, meaning it requires programmers to define a set of rules in order to perform a task. Additionally, like defining algorithms, developing AI agents are a complex and time-consuming process as its reasoning and decision-making abilities must mimic or resemble those of a human [8]. Utilising machine learning to develop AI agents can enhance fast development, reduce the workload of the developer but most importantly, results in more versatile AI agents through their ability to learn [1].
Objectives
The purpose of this project is to produce AI agents using random forests, a well-known machine learning method. The first objective is to produce a training dataset which is used as an input for random forests. The next objective is to Convert the production rules into a functional agent in Java language and develop a program to demonstrate the games using the AI agent. Here, another goal is to investigate the effects of different training data size has on the production rules, hence, the performance of AI agent. Finally, the last objective is to achieve expert behaviour of the agents.
Background
Machine Learning
Machine learning is referred to as the autonomous learning ability of a computer through series of models [8]. There are two types of machine learning: supervised and unsupervised. Unsupervised learning uses a set of data without any class information and reveals patterns or features hidden within the data [9]. On the other hand, supervised learning uses classified training data, which contains an input and a desired output to learn a function and predict a class for any valid data [9]. Supervised learning usually involves classification tasks. Decision trees and random forests are well-known examples of supervised learning methods [9].
Decision Trees
A decision tree is supervised logic based machine learning method ultimately used to classify data, through categorizing information depending on values or features [3]. Decision tree contains two types of decision nodes: root nodes (also known as decision nodes); and leaf nodes [10]. Figure 1 represents an example of a decision tree. The node at the upmost part of the tree is referred to as the Root Node where the classification initiates. From the decision node are multiple branches, which then lead to leaf nodes. Leaf nodes signify a result or classification. The set of rules that results a in classification is called the production rules. The production rules can easily be defined by translating each path from a leaf node to the root node of a decision tree into the form of “if-then” statement (Figure 1).
Random Forest
Random forest is a machine learning method used for classification that consists of multiple decision trees. A random selection of data (or variables) is used to generate the decision trees. This results in an ensemble of classifiers [11]. When presented with data to classify, the results from the decision trees may differ. The final classification for the input data is determined through casting a majority vote. The most popular classification among the various decision trees is output as the final result. The process is illustrated in Figure 2 [11] [12]. To achieve correct classification, it is crucial that the set of data chosen to generate the random forests [CHECK THIS] contains important features [13]. However, this is not always the case as the dataset is randomly selected and may contain impurities. Random forest provides a process called “feature selection” which involves developing a model that only includes the most important feature from a random dataset [11]. Therefore, random forest has advantages on minimising the generalization error and does not overfit (too many features effecting the overall accuracy in prediction) [13]. Other advantages of random forests include the effectiveness in computing and short amount of time required to train which makes random forest a good classifier [13].
C5.0
C5.0 is a software platform used as a data mining tool which involves a process of evaluating patterns from a set of data [14]. The evaluated patterns are categorised and constructed into classifiers [14]. C5.0 uses the classification techniques discussed in 2.2 and 2.3. A random set of input data produces classifiers in the form of decision trees and by majority voting the final output classification is generated. As a result, the set of rules (production rules) can be found automatically which describe the classifier [14].
Reinforcement Learning
RL is a type of machine learning which its agents learn the desired behaviour through interacting with the environment [6]. The objective of RL is to obtain the best action based on the current state [7]. The RL agents learns the behaviour through the process of rewarding/penalising the actions it has taken [7] and the RL algorithm aims to select actions that maximizes the expected cumulative rewards [6].
Project Method
1. Developing Training Dataset
Tic-Tac-Toe Training Dataset
The steps taken to generate the training dataset are:
1) Play the game of Tic-Tac-Toe and record current state of the board in a line followed by a number that represents the position of the board where next move will be made. Starting from the left, each letter corresponds to state of the board position 1 to 9 (e.g. N,N,N,N,N,N,N,N,N will represent the initial state of the board since no move is made by players).
2) Repeat step 1) until the game is over.
3) Include recorded results in the data file.
4) Repeat step 1) to 3) for each game played.
5) To investigate the effect data size has on the agent’s behaviour, Record 5,10,15 and 20 games of data separately to form training datasets. Each training dataset will be used to generate production rules for the AI agent in later stage.
Connect 4 Training Dataset
following steps are taken to generate the training dataset.
1) Play the game of Connect 4 and record current state of the board in a line followed by a number that represents the column of the board where next move will be made. Starting from the left, each letter corresponds to state of the board column number 1 to 7 (e.g. N,N,N,N,N,N,N will represent the initial state of the bottom row since no move is made by players). However, there are six rows in the board therefore, total of forty-two letters are required to display the board state.
2) Repeat step 1) until the game is over.
3) Include recorded results in the data file.
4) Repeat step 1) to 3) for each game played.
5) To investigate the effect data size has on the agent’s behaviour, Record 10,20,30,40 and 50 games of data separately to form training datasets. Each training dataset will be used to generate production rules for the AI agent in later stage.
2. Converting Decision Trees to Java Agent
Tic-Tac-Toe Agent
The steps involved in converting production rules to Tic-Tac-Toe agent in Java are:
1) Open C5.0 software and locate the training dataset from Section 4.1.1.
2) Click on “Construct Classifier” button in the toolbar. This shows the classifier construction options. Construct classifier with default setting and click OK. The resulting decision trees are displayed in a new window with details.
3) Copy and save the decision trees.
4) Repeat step 1) to 3) with different sizes of each training datasets.
5) Translate each set of production rules from resulting decision trees into Java language. This is essentially the agent for the game for each different sizes of training dataset.
Connect 4 Agent
The steps involved in converting production rules to Connect 4 agent in Java are:
1) Open C5.0 software and locate the training dataset from Section 4.1.2.
2) Click on “Construct Classifier” button in the toolbar. This shows the classifier construction options. Construct classifier with default setting and click OK. The resulting decision trees are displayed in a new window with details.
3) Copy and save the decision trees.
4) Repeat step 1) to 3) with different sizes of each training datasets.
5) Translate each set of production rules from resulting decision trees into Java language. This is essentially the agent for the game for each different sizes of training dataset.
3. Simulating & Testing the Performance of Agent
Java Programs for Game Environment
To simulate and test the game of Tic-Tac-Toe and Connect 4, gaming environment is required for both games in Java. The program developing processes are:
1) Write a program for the agent which will be used in a main program.
2) Write a program to construct the board for the game.
3) Develop a function to display and update the board state.
4) Develop a function which takes an input from user.
5) Write a main program that uses above functions and programs. This main program also implements the rules of the games.
Testing the Performance of Agents
The performance of the agents will be recorded by simulating the game. The desired performance of the agent is to either win or at least, draw the game. The testing procedures are as follows:
1) Use the programs and agent developed in previous section to play both games and record the results for twenty games.
2) Repeat step 1) with each agent with different training dataset size.
3) Calculate the winning rate for each agent.
Improving Performance of Agent
Tests are performed to discover undesired behaviour of the agent. These errors are corrected by including new training data for particular error cases into existing training dataset. The updated training dataset will undergo the agent development cycle depicted in Figure 4 and as a result, the performance of the agent will be improved.
The steps for improving performance of agent are:
1) Simulate both games and test the performance of each agent.
2) When undesired behaviour of the agents occurs, record the conditions (e.g. the state of the board and next move made).
3) Include a new training case into existing training dataset. Repeat step 1) to 3) to improve the performance.
Project Results
Training Dataset for Tic-Tac-Toe
Training Dataset for Connect 4
Figure 9 and 10 represents the examples of the training dataset produced for the game of Connect 4 by following the steps in Section 4.1.2. The largest training dataset contains fifty games of training data. Both data and names files are successfully defined and each training data cases are separated by space between them.