Projects:2021s1-13332 Artificial General Intelligence in fully autonomous systems
Abstract here
Contents
Project team
Project students
- Chaoyong Huang
- Jingke Li
- Ruslan Mugalimov
- Sze Yee Lim
Supervisors
- Prof. Peng Shi
- Prof. Cheng-Chew Lim
Advisors
- Dr. Xin Yuan
- Yang Fei
- Zhi Lian
Introduction
Artificial Intelligence (AI) has made many innovations across industries in recent years. According to Elon Musk’s interview with the New York Times, we will have machines vastly smarter than humans in narrowed functions and applications within five years, such as recognitions and predictions. However, this is only the first stage of “the AI revolution”. Smarter machines will need to achieve human-level intelligence and recursive self-improvements. This category of AI is called Artificial General Intelligence (AGI) which improves machine intelligence in border tasks. AGI could be implemented into autonomous systems and make machines think, react and perform as human.
Objectives
This project aims to apply a rudimentary form of AGI in a fully autonomous system. In this project, AGI will be demonstrated by reproducing basic human behaviours that are understandable and explainable to humans. This will be achieved by designing a heterogenous, multi-agent maze solving system with the cooperation of the Unmanned Aerial Vehicle (UAV) and the Unmanned Ground Vehicle (UGV). A non-AGI system will also be developed to evaluate its relative performance against the AGI system. Both the AGI and non-AGI systems will be developed on virtual and physical platforms respectively to facilitate testing and demonstration of concepts developed by the team.
Literature Review
AGI Relevant Literature
ANI Relevant Literature
Background
Looking back to the days when technological developments were not that advanced, barely has anyone thought that one day in the future, machines would be capable of achieving the same level of intelligence as humans or even supersede humans. However, in the 21st century, every dream on technology has the slightest chance of turning into reality.
We are currently in the later stage of AI with many researchers and technology companies starting to venture into the upcoming field of AI, which is AGI, also known as strong AI. According to Kaplan and Haenlein in [1], AGI is the ability to reason, plan and solve problems autonomously for tasks they were never designed for. As of today, AGI has not been realisable, however, AI experts have predicted its debut by the year 2060 according to a survey in [2].
System Design
The High-Level Design of the project incorporates a system with AGI and a system without AGI. Each of these systems consists of three main modules which are the Operations Control Centre (OCC), UAV, and the UGV
The OCC acts as the core support for the UGV and UAV, facilitating the communication of data between both agents. The UAV plays a role in scanning the environment from a higher perspective than the UGV, to provide the UGV with the essential information to solve the maze in both systems. The UGV will then be deployed in the maze once it has obtained the required information from the UAV.
The UAV acts as the eyes in the sky for the UGV on the ground, it has a broader vision and provides accessorial information for UGV to make decisions. The UAV will recognise the checkpoints on the ground and provide those coordinates to UGV. It communicates with OCC bidirectionally and has four subsystems: Movement System, Information Processing System, Communication System and Self Health Checking System.
The UGV is the main part of the system and its aim is to navigate itself through a maze created on a flat surface autonomously. The UAV will be providing the checkpoint coordinates as a guide for the UGV to navigate itself. These UGVs are used to provide a dependable and reliable autonomous navigation service. The UGV will encounter various decision-making situation and is required to make a decision based on the information it has.
System without AGI
In this part of the system, UAV and UGV are designed to work together, but work separately. The difference between the two systems is that the system without AGI is more reliable on the performance of UAV. UAV plays a role of UGV’s eyes which can provide a better view of sight and more information. UGV needs to follow the specific navigation information to arrive at its destination. UAV is designed to have abilities of image processing and information collection systems. The collection system uses a monocular camera to take pictures while flying. Then, the collected images need to be processed and transferred to the position information in coordinates corresponding to the UGV’s location and guide the UGV moving direction. After the moving information is provided, the UGV needs to comply with the information to arrive at the desired position and use its own function of collision avoidance to navigate. This process is close to human being lost in the mall and they use Google map to find the way out instead of by their own decisions. This system will be purely autonomous and significantly less intelligent than the system with AGI.
System with AGI
In comparison with the aforementioned ANI system, the AGI system comprises a custom maze-traversal algorithm. The UAV and UGV still work together to solve the maze, however the primary goal of this system is to attempt to mimic human maze-solving behaviour. Evidently, humans are not optimal creatures, and as such, it can be expected that this system may lack aspects that benefit from raw logical input and deduction. Humans however, are capable of adapting easily to a plethora of environments and conditions. This is where the system with AGI should excel: adapting to different mazes dynamically, being able to solve the maze through exploration without failure. In this system, rather than having the UAV assert full control over the UGV, the UAV would only serve to provide the UGV with guiding information. The UAV would roughly tell the UGV where there are landmarks in the maze that would serve to guide the UGV towards the solution path. This is akin to how a human being might use tall buildings or road signs to navigate the streets of an unfamiliar city, for example.
Methods
This section covers the methodologies that have been implemented to build the ANI and AGI system. The project was initiated on a virtual platform on CoppeliaSim, and has gradually transitioned to a physical platform for more practical and thorough testing. Simulation codes were mainly written in the Python programming language. The UAV that was used in the physical platform is the DJI Tello Edu Drone and the UGV used was the Robomaster EP core.
Virtual Platform
UAV Motion Control
The maze structure has been divided into three rows where three dummy points - start, mid and end have been placed at the three rows respectively. The starting position of the UAV will be where the dummy point ‘Start” is located. The UAV moves according to the arrows horizontally in the negative x-axis -1.5 units every loop until it reaches the last column of the first row. It then moves in the negative y-axis to the ‘Mid” dummy point and horizontally in the positive x-direction to the last column. The same procedure will be executed for the last row until the UAV reaches the exit of the maze. Overall, to capture the entire maze image, the UAV will be moving in an ‘S’ shape pattern throughout the maze.
Maze Reconstruction
Due to the limited field view of the UAV, several images needed to be taken at different positions to form the entire maze structure. The UAV motion control algorithm was integrated with the vision sensor to capture images at every new set position from start to end. The images in each of the three rows will then be concatenated horizontally to form three images and lastly vertically to form the complete maze image.
Path planning algorithm
After the completion of maze capturing and processing, the maze needs to be solved to provide a path to guide UGV moving out of the maze. Before directly using a processed maze image, the maze needs to be transferred to a binary grid map, which means a map composed of 0 and 1. The obstacle of the maze is using 1 to represent. Maze-solving algorithms are the Breadth-First Search algorithm and A* algorithm. The working process of the BFS algorithm is to scan the maze from the start point first and calculate and record the current position to start point distance. Once the endpoint is found, the algorithm will move back to the start point and compute the shortest path. A* algorithm is a greedy first algorithm, it prefers to move through the shortest straight line distance from the current position to the endpoint, the distance will be counted as a cost in the calculation. The characteristic of the A* algorithm is that will determine the moving cost first and then decide the moving direction. Therefore, programming the A* algorithm can avoid passing the desired path and generate more accurate moving information. Furthermore, the configuration space of UGV needs to be considered, which represents the available movement map based on the robot’s size and degree of freedom. To apply the configuration space into the path planning algorithm, the obstacles in the maze can be expended to achieve the configuration space in the BFS algorithm, the configuration space can be expressed as additional movement cost in the A* algorithm.
Landmark Detection
Template matching was used to detect landmarks in the maze where the landmarks were represented by resizable concrete blocks. A HSV colour range was defined to enable the algorithm to segment the green colour on the landmarks from the maze.
Following that, to avoid the issues of having multiple detections on one landmark, the Non Maximum Suppression (NMS) technique was used. It works by selecting the best match out of all the overlapping bounding boxes by computing the Intersections over Union (IOU). The IOU is a method to compute the overlap percentage between ground truth detection box and prediction box. Expressing the IOU calculation mathematically, it will be, IOU (Box1,Box2)=Intersection Size(Box1,Box2)/Union Size(Box1,Box2)
The IOU will then be used in the NMS technique to filter out detections keeping only one bounding box per detection. This method works by selecting the prediction with the highest confidence score and suppresses all other predictions.
This method has also been applied in the physical platform with some slight modifications to the ratio.
Coordinate Conversion
Coordinate conversion is needed to convert the pixel coordinates the UAV is using to real-world coordinates for the UGV to traverse through the maze. This is one of the most essential sections in ensuring the success of both the system with AGI and without AGI, because without an accurate coordinate conversion, the UGV will have a risk of moving towards the wrong location and in the worst case scenario, it may cause the UAV to crash into walls.
The final reconstructed maze has been plotted on a graph spanning from -2.5m to +2.5m in both the x and y axes in the beginning. This specific extent was chosen as it resembles the actual maze size of 5m x 5m in the virtual environment. A ratio comparison was then made by choosing several reference points from the plotted maze and the real environment. These reference points were chosen based on the bounding boxes in the maze as all bounding boxes were set to span 90 pixels in length and height, which corresponds to approximately two small squares in the maze and also 1m in the real-world.
Based on this logic, the point of origin (0,0) was found to be approximately (205,175). The subsequent coordinates were then obtained by first subtracting 205 from the pixel coordinates to obtain the gap between the origin to point of interest, then dividing the 90 pixels gap into 100 divisions of 0.01m per division. The pixel increment was then multiplied by the smallest division of 0.01 to obtain the increment in the real world. This increment was then added to the point of origin to obtain the real world coordinates. The same procedure has been applied for both the x and y axes respectively. This method of coordinate conversion has been applied with a different set of ratios for the physical platform as well.
Physical Platform
Results
Conclusion & Future Work
Through the performance comparison of both systems, the non AGI system is more robust and efficient than the AGI system However, the AGI system has higher adaptability in solving problems in varying environments There is vast potential for improvement and boundless possibilities, from the rudimentary form of AGI designed to an AGI system equipped with human like capabilities
References
[1] A. Kaplan and M. Haenlein, "Siri, Siri, in my hand: Who’s the fairest in the land? On the interpretations, illustrations, and implications of artificial intelligence", Business Horizons, vol. 62, no. 1, pp. 15-25, 2019.
[2] S. D. Baum, B. Goertzel and T. G. Goertzel, "How Long Until Human-Level AI? Results from an Expert Assessment", Technological Forecasting and Social Change, vol. 78, no. 1, pp. 185-195, 2011. Available: https://sethbaum.com/ac/2011_AI-Experts.pdf.