Projects:2019s2-23102 Secure Machine Learning Against DoS Induced by Poisoning Attacks

From Projects
Jump to: navigation, search

Project Team

Students

  • Fengyi Yang
  • Elaine Kuan

    Supervisors

  • Prof. Cheng-Chew Lim
  • Prof. Ali Babar

    Introduction

    Poisoning attacks are one type of adversarial machine learning technique which aims to fool the target learning-based system by injecting "false" data into the system's training set, in order to maximise the system's misclassification rate. This project analyses how poisoning attacks can compromise the functionality of a network intrusion detection system (NIDS) and proposes countermeasures.

    In the target system, Denial of Service (DoS) in Appication Layer can be caused if the detectors misclassify legitimate users into malicious ones. The problem to solve in this project is to reduce DoS caused by this kind of misclassification of the network intrusion detector by imposing poisoning attacks simulated by statistical-based and gradient-based methods. The machine learning algorithms to look into are those of highest accuracies in current research, e.g., Random Forest, SVM and some classifier ensemble techniques.

    Canadian Institute for Cybersecurity Intrusion Detection System Dataset (CICIDS 2017) is chosen to be the network traffic dataset to work on.

    Objectives

    Three main objectives of this project are:
    1. To develop learning-based detectors for more than one type of network intrusion.
    2. To simulate an intelligent and adaptive adversary to attack the learning-based system, which means the attack mechanism can be transferable to other "peer" datasets, not only the target one.
    3. To implement a robust proactive defense mechanism to the imposed poisoning attacks.

    Motivations

    1. Network intrusion detection is a significant piece of the data security framework
    2. Wide application of machine learning techniques in network intrusion detection
    3. Online services heavily rely upon machine learning, thus exposes learning algorithms to the threat of data poisoning
    4. Most work used outdated datasets

    Research Framework

    1. Problem formulation
    Problem to Solve
    To reduce the impact on test accuracy of the network intrusion detector based on Random Forest and SVM against poisoning attacks simulated by statistical and gradient-based methods
    What is Given
    i. Network Traffic Datasets - KDD 99', UNSW-NB15, CIC IDS 2017 etc
    ii. Mechanisms of Poisoning Attacks - Random Label Flips, Feature Manipulation, Jacobian Saliency Map Attack (JSMA), Fast Gradient Sign Method (FGSM) etc
    iii. Libraries of Machine Learning Algorithms - scikit-learn, MATLAB Statistics & Machine Learning toolbox, cleaverhans library
    Constraints
    i. Time
    ii. Complexity
    iii. Memory

    2. Develop Conceptual Model
    Network Intrusion Detection System

    3. Identify Relevant Approaches

    4. Collecting & Synthesis of Existing Data & Metadata
    Dataset Analysis

    5. Generate Specific Hypotheses

    6. Test Hypotheses

    7. Research & Findings

    8. Synthesis of Results

    Related Work

    Phase I: Model Generation & Simple Attack Simulation
    In phase I we will perform literature review to solidify understanding of Adversarial Machine Learning & to investigate what has been done in this field. We will then move on to evaluate and analyze commonly used datasets for network intrusion detection, including benchmark sets such as KDD 99' and NSL KDD, and other dataset sets such as UNSW NB-15 and CIC IDS 2017. After the choice of dataset is justified, we move on to perform data preprocessing to the dataset and evaluate the dataset using python's sklearn library and WEKA, both which are commonly used machine learning tools.

    Phase II: State-of-the-Art Attacks & Defence Methods Implementation
    Attack:
    1. Random Label Flipping
    Select random samples and flip their labels.
    2. Statistical Based Poisoning
    Manipulate features values.
    3. Optimisation Based Poisoning
    Initialise attack point and move it along the direction of the steepest gradient of the outer objective function.

    Defence:
    1. KNN Relabelling
    Relabel the neighbours of a query point (in the training set) based on the label of the query point itself.
    2. Label Propagation
    Semi-supervised method to propagate labels to an unlabelled set using a small set of verified data.
    3. Hybrid Method
    Combining the methods in 1. and 2.

    Phase III: Model Security Evaluation

    References

    [1] “Wild patterns: Ten years after the rise of adversarial machine learning,” Pattern Recognition, vol. 84, pp. 317–331, 2018. ​
    [2] M. Jagielski, A. Oprea, B. Biggio, C. Liu, C. Nita-Rotaru, and B. Li, “Manipulating machine learning: Poisoning attacks and countermeasures for regression learning,” in 2018 IEEE Symposium on Security and Privacy (SP), IEEE, 2018, pp. 19–35.​
    [3] B. Biggio, B. Nelson, and P. Laskov, “Poisoning attacks against support vector machines,” in ICML’12 Proceedings of the 29th International Conference on International Conference on Machine Learning, USA: Omnipress, 2012, pp. 1467–1474.​
    [4] M. Ghifary, W. Kleijn, and M. Zhang, “Deep hybrid network with good out-of-sample object recognition,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, May 2014.​
    [5] A. Paudice, L. Munoz-Gonzalez, and E. Lupu, “Label sanitization against label flipping poisoning attacks," Springer Verlag, 2019, pp. 5-15.​
    [6] R. Taheri, R. Javidan, M. Shojafar, Z. Pooranian, A. Miri, and M. Conti, “On defending against label flipping attacks on malware detection systems," Mar. 2020. arXiv: 1908. 04473v2 [cs.LG].