Projects:2019s2-23301 Robust Formation Control for Multi-Vehicle Systems
Formation control has been widely used in the control of robots, sicne it can improve the overall efficency of the system. In this project, we aim to design a robust formation control for multi-vehicle system, in which the system can deal with at least one network problem or physical failure.
Contents
Introduction
Formation control of multi-agent systems (MASs) has been widely used for cooperative tasks in such applications as terrain exploration, mobile networks and traffic control. However, the communication-induced problems and the high failure risk of increasing equipments has created a number of challenges for the security of MASs. The objective of this project is to design a robust formation control strategy for a multi-vehicle system against communication and physical failures (e.g., network attacks, link failures, packet dropouts, sensor/actuator faults).
The vehicles are designed to detect the local environments by visual navigation and achieve a self-organisation formation. The robust fault-tolerant control strategy is investigated to deal with at leas one network problem or physical failure. The effectiveness of the formation control strategy and its robustness should be verified by both simulations and experiments. Potential applications are in large flexibility MASs and high-security Cyber-Physical Systems. Currently, our lab is equipped with a multi-vehicle platform, consisting of quadrotors, ground robots and camera location systems. Algorithms are developed by either Matlab Code or C language. MATLAB, Simulink, OpenGL, Motive and Visual Studio are possible softwares to be chosen for this project.
Project team
Student members
- Abdul Rahim Mohammad
- Jie Yang
- Kamalpreet Singh
- Zirui Xie
Supervisors
- Prof. Peng Shi
- Prof. Cheng-Chew Lim
Advisors
- Xin Yuan
- Yuan Sun
- Yang Fei
- Zhi Lian
Objectives
Design a robust formation control approach for multi-vehicle system to achieve:
- Self-decision making
- Environment detection
- Communication
- Obstacle avoidance
- Tolerance to physical or network problem
Background
Autonomous Control System
Autonomous control system has the power and ability for self-governance in the performance of control functions. They’re composed of a collection of hardware and software, which can perform the necessary control functions without intervention, or over extended time periods. There’re several degrees of autonomy. Conventional fixed controllers can be considered to have a restricted class of plant parameter variations and disturbances, while in a high degree of autonomy, controller must be able to perform a number of functions beyond conventional functions, such as regulation or tracking.
Agent
For the most part, we are happy to accept computers as obedient, literal, unimaginative servants. For many applications, it is entirely acceptable. However, for an increasingly large number of applications, we require systems that can decide for themselves what they need to do in order to achieve the objectives that we delegate to them. Such computer systems are known as agents. Agents that must operate robustly in rapidly changing, unpredictable, or open environments, where there is a significant possibility that actions can fail, are known as intelligent agents, or sometimes autonomous agents.
Multi Agent System
A group of loosely connected autonomous agents act in an environment to achieve a common goal. This is done by cooperating, and sharing knowledge with each other Multi-agent systems have been widely adopted in many application domains because of the beneficial advantages offered. Some of the benefits available by using MAS technology in large systems are
- An increase in the speed and efficiency of the operation due to parallel computation and asynchronous operation
- A graceful degradation of the system when one or more of the agents fail. It thereby increases the reliability and robustness of the system
- Scalability and flexibility- Agents can be added as and when necessary
- Reduced cost- This is because individual agents cost much less than a centralized architecture
- Reusability-Agents have a modular structure and they can be easily replaced in other systems or be upgraded more easily than a monolithic system
Formation Control
Formation of a multi-agent system is a composed of a group of specific agents, communication among agents, and geometrical information of agents. This project focuses on the formation control of the multi-vehicle system. The aim of formation control is to design a controller that has the ability to bring the agents to a desired geometric shape by assigning local control laws to individual agents.
Method
Detection:
Environmental sensing in a multimodal sensing system requires three key features- distance information, rotation/orientation information, visual information. SONAR and LiDAR were chosen for distance ranging, IMU for orientation and monovision sensor for visual information. Sonar can be accomplished through ultrasonic sensor examples of which can be found in *REF* and *REF* however ultrasonic sensor is highly susceptible to noise and requires a lot of sensor tuning and calibration as can be seen through examples such as . LiDAR works on the similar principle of SONAR however it is much more accurate due to the nature of light reflection and speed of detection; LiDAR can thus be used comfortably for ranging purposes *REF*. Vision sensors can be used to identify the difference between the detected objects and agents through the use of Convolutional Neural Networks (CNN)*REF*. YOLOv3 is the current trend in vision-based detection due to its flexibility in modelling, speed and robustness compared to other vision-based classifiers such as Harr cascades and Regions with convolutional neural networks (R-CNN), Fast regions with convolutional neural networks (Fast R CNN), Single Shot Detector (SSD) *REF*. Current techniques utilize various features to detect vehicles such as Histogram of oriented Gradients (HoG) features, Haar-like features, edge features and optical flow *REF*. Various datasheets covering generic ultrasonic sensors, LiDAR Time-of-Flight (ToF) sensor and 9 Degrees of Freedom (DOF) IMUs were studied to best model the sensors in the simulation environment. Multimodal sensing has its drawbacks in that each sensor should individually be as accurate as possible. Appropriate filters are introduced to counteract noise in the system namely Kalman filter, Alpha trimmed mean filter and Complementary filter. Kalman filter is used to track moving objects based on the center of gravity of a moving object region in the minimum bounding box *REF*. The Alpha trimmed mean filter is a moving average filter that works well with gaussian noise and impulsive noise from ranging sensors. The Complementary filter is used to filter out the noise and measure the best estimate using gyro data, accelerometer data and magnetometer data from the IMU.
Convoluted Neural Network Object Classifier
Object classifiers are based on visual information received from a vision sensor. Some of the classic vision classification algorithms are the Viola-Jones detection framework, Scale-Invariant Feature Transform (SIFT), and HoG however these algorithms require a lot of pretraining are only suitable for visual data under controlled circumstances. A newer approach to object classifiers comes from Deep Learning derived from modern Machine Learning. A Convoluted Neural Network is a class of Deep Neural Networks (DNN), machine learning algorithms that are commonly used in visual recognition systems and image classification systems. Various image classification algorithms were considered for object classification such as Region-based Convoluted Neural Network and Fast Region-based Convoluted Neural Network. These classification DNNs are robust and have been tested over the last decade however they are designed to be used in industrial systems with large GPU based computational support for better performance. Both R-CNN and Fast R-CNN segments the input image into regions and extract features. These features are then passed onto a Support Vector Machine(SVM), which tries to find a “hyperplane in N-dimensional space(N — the number of features) that distinctly classifies the data points ”. Fast R-CNN is an optimised R-CNN which uses the entire image instead of segments of the image to generate a mapping to classify the image. Several more image classifying algorithms such as Faster R-CNN were developed later as an attempt to run Fast R-CNN in real-time, but their performance is locked behind high end GPUs and server-grade processing power.
The YOLOv3 CNN is chosen over traditional R-CNN and Fast RCNN because of its speed, flexibility of training and low-performance cost on lower-end computers As the system was designed to be implemented over hardware, it is necessary that we choose the most performance friendly approach to our design. To completely cover CNNs and how YOLOv3 fits within the ecosystem would be out of scope of this wiki, instead the section will cover the basic functioning of a YOLOv3 and how it is used with in our system. YOLOv3 is written on Darknet53 a variant of ResidualNet(ResNet) technology which uses Residual layers as inputs for convolution layers. It is an open source neural network framework written in C and CUDA which decreases training times especially with Graphical Processing Units (GPU) compared to previously used frameworks such as ResNet, Inception ResNet etc. A few important terms to understand before understanding the YOLOv3 architecture are convolution, residual, pooling, bounding boxes and stride.*REF* Convolution is the process of iterating over an image with a kernel, a kernel is a nxn mask that ‘slides’ over the image computing whatever operation specified. Residual is the output of an image operation after convolution and pooling is performed. Pooling is the process of changing image elements by using a max, min or averaging function with a kernel similar to convolution. Bounding boxes are the boxes drawn around a detected object during object detection. Stride is the down sampling factor of the network. For example, if the stride of the network is 32,16, 8 then an input image of size 416 x 416 will make detections on scales 13 x 13, 26 x 26 and 52 x 52. Generally, stride of any layer is equal to the factor by which the output of the layer is smaller than the input image.
YOLOv3 has 106 convolution layers that concatenate the residual output from a block of residual to the alternate block. These blocks are shown in the figure below in colour pink. YOLOv3 detects objects 3 stage with different strides. The green coloured blocks are upsampling layers which up sample the input image and output the new feed for the network. Each grid in the network makes 3 predictions using 3 anchor boxes. Anchor boxes are new to YOLOv3 and are the predefined data set that specified the size of the object to be detected. A cell is selected if the object to be detected falls under the receptive field of the cell. The network downsamples the input image until the first detection layer, where a detection is made using feature maps of a layer with stride 32. Next, layers are upsampled by a factor of 2 and concatenated with feature maps of a previous layers having identical feature map sizes. Another detection is now made at layer with stride 16. The same upsampling procedure is repeated, and a final detection is made at the layer of stride 8. The output of the network for an input image are bounding boxes specifying the location of objects detected. For an image of 416 x 416, YOLO predicts ((52 x 52) + (26 x 26) + 13 x 13)) x 3 = 10647 bounding boxes. To reduce the number of bounding boxes to the number of objects, we use the Non-maximum Suppression(NMS) algorithm which takes the probabilities of detection from the set of bounding boxes, sorts them in the highest order and outputs a single bounding box with the highest confidence level. A detailed explanation along with examples and the underlying theory can be found in the links attached in references and the appendix(see appendix 2,3,4,5). In our system, we set the resolution of the image sensor as 512x256 and 10 frames per second are passed as the input to the classifier, instead of real time detection. This is because there is not enough time to process frames, hypothesis generation and detection for hypothesis verification. Processing like object coordinate points and global coordinates, are highly time dependent as such processing frames every 100ms will allows us to complete our hypothesis verification as well as communicating this information to other rovers, and further display output in case of simulation pc.
Results
Conclusion
Reference
[1] Wooldridge, M (2002). An Introduction to MultiAgent Systems. John Wiley & Sons. ISBN 978-0-471-49691-5
[2] Balaji, P., & Srinivasan, D. (2010). An introduction to multi-agent systems. Studies in Computational Intelligence, 310, 1-27.
[3] Hong-Jun M., & Guang-Hong Y. (2016). Adaptive Fault Tolerant Control of Cooperative Heterogeneous Systems With Actuator Faults and Unreliable Interconnections. IEEE Transactions on Automatic Control, 61(11), 3240-3255.
[4] Oh k, Park M, & Ahn H. (2015). A survey of multi-agent formation control. Automatica, 53, 424-440.
[5] Khatib, O. (1986). Real-Time Obstacle Avoidance for Manipulators and Mobile Robots. The International Journal of Robotics Research, 5(1), 90–98. https://doi.org/10.1177/027836498600500106
[6] Autonomous Ground Vehicles Self-Guided Formation Control https://github.com/vitsensei/Trionychid-Formation-Control