Projects:2021s1-13332 Artificial General Intelligence in fully autonomous systems
Contents
Project team
Project students
- Chaoyong Huang
- Jingke Li
- Ruslan Mugalimov
- Sze Yee Lim
Supervisors
- Prof. Peng Shi
- Prof. Cheng-Chew Lim
Advisors
- Dr. Xin Yuan
- Yang Fei
- Zhi Lian
Introduction
Artificial Intelligence (AI) has made many innovations across industries in recent years. According to Elon Musk’s interview with the New York Times, we will have machines vastly smarter than humans in narrowed functions and applications within five years, such as recognitions and predictions. However, this is only the first stage of “the AI revolution”. Smarter machines will need to achieve human-level intelligence and recursive self-improvements. This category of AI is called Artificial General Intelligence (AGI) which improves machine intelligence in border tasks. AGI could be implemented into autonomous systems and make machines think, react and perform as human.
Motivations
The field of AGI has seen many recent developments however, there exists a gap between ANI and human intelligence due to its limited performances and functions This project explores the usage of AGI in an autonomous system and investigate s the collaboration of two agents under AGI and normal autonomous algorithms.
Objectives
This project aims to apply a rudimentary form of AGI in a fully autonomous system. In this project, AGI will be demonstrated by reproducing basic human behaviours that are understandable and explainable to humans. This will be achieved by designing a heterogenous, multi-agent maze solving system with the cooperation of the Unmanned Aerial Vehicle (UAV) and the Unmanned Ground Vehicle (UGV). A non-AGI system will also be developed to evaluate its relative performance against the AGI system. Both the AGI and non-AGI systems will be developed on virtual and physical platforms respectively to facilitate testing and demonstration of concepts developed by the team.
Literature Review
AGI Relevant Literature
ANI Relevant Literature
Background
Looking back to the days when technological developments were not that advanced, barely has anyone thought that one day in the future, machines would be capable of achieving the same level of intelligence as humans or even supersede humans. However, in the 21st century, every dream on technology has the slightest chance of turning into reality.
We are currently in the later stage of AI with many researchers and technology companies starting to venture into the upcoming field of AI, which is AGI, also known as strong AI. According to Kaplan and Haenlein in [1], AGI is the ability to reason, plan and solve problems autonomously for tasks they were never designed for. As of today, AGI has not been realisable, however, AI experts have predicted its debut by the year 2060 according to a survey in [2].
System Design
High Level Diagram
The High-Level Design of the project incorporates a system with AGI and a system without AGI. The key difference between System with AGI and System without AGI is where the Information Processing System is. In System without AGI, the decision making is on UAV and has control over UGV telling UGV where to go precisely. However, in System with AGI, the decision making is shifted to UGV and UAV just provides additional information. Each of these systems consists of three main modules which are the Operations Control Centre (OCC), UAV, and the UGV
The OCC acts as the core support for the UGV and UAV, facilitating the communication of data between both agents. The UAV plays a role in scanning the environment from a higher perspective than the UGV, to provide the UGV with the essential information to solve the maze in both systems. The UGV will then be deployed in the maze once it has obtained the required information from the UAV.
The UAV acts as the eyes in the sky for the UGV on the ground, it has a broader vision and provides accessorial information for UGV to make decisions. The UAV will recognise the checkpoints on the ground and provide those coordinates to UGV. It communicates with OCC bidirectionally and has four subsystems: Movement System, Information Processing System, Communication System and Self Health Checking System.
The UGV is the main part of the system and its aim is to navigate itself through a maze created on a flat surface autonomously. The UAV will be providing the checkpoint coordinates as a guide for the UGV to navigate itself. These UGVs are used to provide a dependable and reliable autonomous navigation service. The UGV will encounter various decision-making situations and is required to make a decision based on the information it has.
System without AGI
In this part of the system, UAV and UGV are designed to work together, but work separately. The difference between the two systems is that the system without AGI is more reliable on the performance of UAV. UAV plays a role of UGV’s eyes which can provide a better view of sight and more information. UGV needs to follow the specific navigation information to arrive at its destination. UAV is designed to have abilities of image processing and information collection systems. The collection system uses a monocular camera to take pictures while flying. Then, the collected images need to be processed and transferred to the position information in coordinates corresponding to the UGV’s location and guide the UGV moving direction. After the moving information is provided, the UGV needs to comply with the information to arrive at the desired position and use its own function of collision avoidance to navigate. This process is close to human being lost in the mall and they use Google map to find the way out instead of by their own decisions. This system will be purely autonomous and significantly less intelligent than the system with AGI.
System with AGI
In comparison with the aforementioned ANI system, the AGI system comprises a custom maze-traversal algorithm. The UAV and UGV still work together to solve the maze, however the primary goal of this system is to attempt to mimic human maze-solving behaviour. Evidently, humans are not optimal creatures, and as such, it can be expected that this system may lack aspects that benefit from raw logical input and deduction. Humans however, are capable of adapting easily to a plethora of environments and conditions. This is where the system with AGI should excel: adapting to different mazes dynamically, being able to solve the maze through exploration without failure. In this system, rather than having the UAV assert full control over the UGV, the UAV would only serve to provide the UGV with guiding information. The UAV would roughly tell the UGV where there are landmarks in the maze that would serve to guide the UGV towards the solution path. This is akin to how a human being might use tall buildings or road signs to navigate the streets of an unfamiliar city, for example.
Methods
This section covers the methodologies that have been implemented to build the ANI and AGI system. The project was initiated on a virtual platform on CoppeliaSim, and has gradually transitioned to a physical platform for more practical and thorough testing. Simulation codes were mainly written in the Python programming language. The UAV that was used in the physical platform is the DJI Tello Edu Drone and the UGV used was the Robomaster EP core.
Virtual Platform
UAV Motion Control
The maze structure has been divided into three rows where three dummy points - start, mid and end have been placed at the three rows respectively. The starting position of the UAV will be where the dummy point ‘Start” is located. The UAV moves according to the arrows horizontally in the negative x-axis -1.5 units every loop until it reaches the last column of the first row. It then moves in the negative y-axis to the ‘Mid” dummy point and horizontally in the positive x-direction to the last column. The same procedure will be executed for the last row until the UAV reaches the exit of the maze. Overall, to capture the entire maze image, the UAV will be moving in an ‘S’ shape pattern throughout the maze.
Maze Reconstruction
Due to the limited field view of the UAV, several images needed to be taken at different positions to form the entire maze structure. The UAV motion control algorithm was integrated with the vision sensor to capture images at every new set position from start to end. The images in each of the three rows will then be concatenated horizontally to form three images and lastly vertically to form the complete maze image.
Path planning algorithm
After the completion of maze capturing and processing, the maze needs to be solved to provide a path to guide UGV moving out of the maze. Before directly using a processed maze image, the maze needs to be transferred to a binary grid map, which means a map composed of 0 and 1. The obstacle of the maze is using 1 to represent. Maze-solving algorithms are the Breadth-First Search algorithm and A* algorithm. The working process of the BFS algorithm is to scan the maze from the start point first and calculate and record the current position to start point distance. Once the endpoint is found, the algorithm will move back to the start point and compute the shortest path. A* algorithm is a greedy first algorithm, it prefers to move through the shortest straight line distance from the current position to the endpoint, the distance will be counted as a cost in the calculation. The characteristic of the A* algorithm is that will determine the moving cost first and then decide the moving direction. Therefore, programming the A* algorithm can avoid passing the desired path and generate more accurate moving information. Furthermore, the configuration space of UGV needs to be considered, which represents the available movement map based on the robot’s size and degree of freedom. To apply the configuration space into the path planning algorithm, the obstacles in the maze can be expended to achieve the configuration space in the BFS algorithm, the configuration space can be expressed as additional movement cost in the A* algorithm.
Landmark Detection
Template matching was used to detect landmarks in the maze where the landmarks were represented by resizable concrete blocks. A HSV colour range was defined to enable the algorithm to segment the green colour on the landmarks from the maze.
Following that, to avoid the issues of having multiple detections on one landmark, the Non Maximum Suppression (NMS) technique was used. It works by selecting the best match out of all the overlapping bounding boxes by computing the Intersections over Union (IOU). The IOU is a method to compute the overlap percentage between ground truth detection box and prediction box. Expressing the IOU calculation mathematically, it will be, IOU (Box1,Box2)=Intersection Size(Box1,Box2)/Union Size(Box1,Box2)
The IOU will then be used in the NMS technique to filter out detections keeping only one bounding box per detection. This method works by selecting the prediction with the highest confidence score and suppresses all other predictions.
This method has also been applied in the physical platform with some slight modifications to the ratio.
Coordinate Conversion
Coordinate conversion is needed to convert the pixel coordinates the UAV is using to real-world coordinates for the UGV to traverse through the maze. This is one of the most essential sections in ensuring the success of both the system with AGI and without AGI, because without an accurate coordinate conversion, the UGV will have a risk of moving towards the wrong location and in the worst case scenario, it may cause the UAV to crash into walls.
The final reconstructed maze has been plotted on a graph spanning from -2.5m to +2.5m in both the x and y axes in the beginning. This specific extent was chosen as it resembles the actual maze size of 5m x 5m in the virtual environment. A ratio comparison was then made by choosing several reference points from the plotted maze and the real environment. These reference points were chosen based on the bounding boxes in the maze as all bounding boxes were set to span 90 pixels in length and height, which corresponds to approximately two small squares in the maze and also 1m in the real-world.
Based on this logic, the point of origin (0,0) was found to be approximately (205,175). The subsequent coordinates were then obtained by first subtracting 205 from the pixel coordinates to obtain the gap between the origin to point of interest, then dividing the 90 pixels gap into 100 divisions of 0.01m per division. The pixel increment was then multiplied by the smallest division of 0.01 to obtain the increment in the real world. This increment was then added to the point of origin to obtain the real world coordinates. The same procedure has been applied for both the x and y axes respectively. This method of coordinate conversion has been applied with a different set of ratios for the physical platform as well.
Physical Platform
UAV Motion Control
Two different approaches were taken when controlling the UAV motion for image capturing. The method applied in the virtual environment without modification is no longer viable due to the front-facing camera on the Tello drone. The single image approach was to fly sufficiently far and high to capture one image that covers the entire maze. The second approach was to attach an acrylic mirror sheet to the front of the camera to reflect the image on the ground which resembles a downward-facing camera.
Single Image Approach
This approach involves taking off to a specified height to capture an image of the entire maze. Due to space constraints, the UAV can only fly up to a maximum height of 2.4m. Therefore the maze has been shrunk to 1.5m x 1.5m just to demonstrate the feasibility of the idea. However, for practical applications in the real world, the UAV will be required to scan an environment many times larger than the current maze size. Therefore, the second approach may be more practical.
Multiple Image Approach
This approach involves a more tedious procedure of defining fixed intervals between captured images having decided on a fixed maze size. The traversal path is the same as the method applied in the virtual platform which is a ‘S’ pattern. However, due to the underperforming gyroscopes and acrylic mirror sheets, the quality of the images taken from the UAV during flight has been largely compromised. The images were relatively pixelated which makes it a challenging task when combining them.
Due to time and budget constraints, an alternative of hanging drones down from safety nets was used to prove the feasibility of the initial concept while eliminating the potential issues. The UAV has been hung with the front camera facing vertically downwards to eliminate the reflection issue from the mirror sheet. A total of 9 images were taken- 3 per row, 3 per column to be combined with the image stitching algorithm detailed in the next section.
Image stitching
In the virtual simulation environment, the UAV movement is accurate and precise. Therefore, the entire maze image can be simply connected through the separated maze images. The situation is different from the virtual environment. The main problem is that UAV movement is not stable. The unstable movement can cause the error and deviation to the desired position, which leads to the failure of a simple image connection while using the same algorithm in a virtual environment. Therefore, instead of strict UAV motion control, an image stitching method can be used in this case. The image stitching method is called Scale-Invariant Feature Transform (SIFT) which is a feature matching algorithm. Through feature-matched points in two relevant images, the spatial transformation can be applied to stitch the images. The basic image stitching algorithm is referred from [1]. Because the maze environment is tedious and boring which leads to a lack of features, image stitching algorithms cannot work appropriately. Meanwhile, TELLO’s camera quality cannot capture maze images with enough features. Hence, additional features were added manually that have significantly different colors to the maze image and simple shapes such as star, oval.
Image processing
After stitching the entire maze image, the image needs to be processed to consist of the maze information, such as the wall, path, and landmark. To remove the unwanted content, the first is to extract the information that we want. Dependent on the color differences between the maze wall and most unwanted objects, a script from [x] can be operated to determine the maze wall color in a range of HSV color spaces. By removing most unwanted contents, there may still be some shapes that are from the extra features, because the light condition and similar color range caused the thin edges in the image. Canny detector and Hough Line Transformation can be applied to remove those thin edges. Because the line detection only remains the line information, the wall area between two edges needs to be filled out. Using a morphological filter with a closing operation can fill the small empty area and complete an image that can be transferred to a binary grid map. [x] "HSV-Color-Picker/HSV Color Picker.py at master · alieldinayman/HSV-Color-Picker", GitHub, 2020. [Online]. Available: https://github.com/alieldinayman/HSV-Color-Picker/blob/master/HSV%20Color%20Picker.py. [Accessed: 24- Oct- 2021].
Pioneer_pd3x
Pioneer_pd3x is an in-build mobile rover provided by copperliasim. It has 16 ultrasonic sensors surrounding it to achieve the 360-degree object avoidance function. A vision sensor is added for the image recognition function. A front view of this rover can be seen below. As it's a two-wheeled rover, the motion is simple and easy to understand. However, it has the limitation of movement and self-balance.
Sensors overshooting
In the simulation, the ultrasonic sensor can only detect a certain range of distance and output value from 0 to 1. When the rover enters an open area, the sensor will behave weirdly and approach an extremely small value like 2e-47. This has impacted the function for finding the smallest sensor and caused the rover to turn to the opposite direction as we want. To prevent this issue, we set a threshold value for all the sensors. When the sensor value is lower than 0.0001, we correct it to 1. By fixing this, the rover can successfully avoid obstacles and navigate through the maze without collision.
Motion control for Two-Wheel
The two-wheel rover motion control was done through simple PID inputs to the left and right motors. The steering direction was derived from the lowest collision avoidance sensor value. This value was stored in the steer variable:
steer = -1/sensor_loc[min_ind]
The left and right inputs to the respective motors follow from above:
vl=v+kp *steer, vr=v-kp*steer
Omnidirectional
The four-wheel rover motion control covered four motor inputs, rather than two, in a similar fashion to how motion control was handled for the two-wheel rover:
vlf=v+kp *steer,
vrf=v-kp*steer,
vlr=v+kp *steer,
vrr=v-kp*steer,
It is important to note that for both the two-wheel and four-wheel UGV’s, the P value was kept at a moderate level, so as to not cause overshoot when corrections needed to occur.
UGV Rudimentary AGI Algorithm
During the conception phase of the project, we proposed the following high-level description of the rudimentary AGI algorithm, with a description of its behaviours: High-level description of the ‘rudimentary AGI’ algorithm behaviours:
Results
Conclusion & Future Work
Through the performance comparison of both systems, the non AGI system is more robust and efficient than the AGI system However, the AGI system has higher adaptability in solving problems in varying environments There is vast potential for improvement and boundless possibilities, from the rudimentary form of AGI designed to an AGI system equipped with human like capabilities
References
[1] A. Kaplan and M. Haenlein, "Siri, Siri, in my hand: Who’s the fairest in the land? On the interpretations, illustrations, and implications of artificial intelligence", Business Horizons, vol. 62, no. 1, pp. 15-25, 2019.
[2] S. D. Baum, B. Goertzel and T. G. Goertzel, "How Long Until Human-Level AI? Results from an Expert Assessment", Technological Forecasting and Social Change, vol. 78, no. 1, pp. 185-195, 2011. Available: https://sethbaum.com/ac/2011_AI-Experts.pdf.