Projects:2017s1-101 Classifying Network Traffic Flows with Deep-Learning
Contents
Project Team
Clinton Page
Daniel Smit
Kyle Millar
Supervisors
Dr Cheng-Chew Lim
Dr Hong-Gunn Chew
Dr Adriel Cheng (DST Group)
Introduction
The internet has become a key facilitator of large-scale global communications and is vital in providing an immeasurable number of services every day. With the ever-expanding growth of internet use, it is critical to effectively manage the underlying networks that hold it together. Network traffic classification plays a crucial role in this management, providing quality of service, forecasting future trends, and detecting potential security threats. For these reasons, accurate network traffic classification is of great interest to internet service providers (ISPs), large-scale enterprise companies, and government agencies alike.
Current methods of network traffic classification have become less effective in recent years due to the increasing trend of obscuring network activity, whether it be for security, priority, or malicious intent [1, 2, 3]. Therefore, in today’s network there arises a need for a more effective classification algorithm to handle these conditions.
Objectives
- Gain knowledge about the application of deep-learning for classifying network traffic flows
- Conduct experiments on synthetic traffic flows and/or make use of communications flow data from real-life enterprise networks
- Develop network traffic classifying software using deep-learning techniques to an acceptable accuracy when comparing against the results of previous years
Relevant Work
Extensive research has been performed on network traffic classification. Common techniques include port-based classification and deep pack inspection (DPI). Port-based classification performs poorly due to the usage of non-standard port numbers, and DPI requires updates to recognise unseen classes of network traffic. Moore and Papagiannaki [4] found that using port-based classification as the sole classifier resulted in classification accuracies of less than 70%. nDPI, an open-source tool for investigating traffic using DPI, shows that a high classification accuracy (around 99%) can be achieved for standard applications but will deteriorate depending on how common the application is or whether it has been encrypted [5].
The use of machine learning techniques for network traffic classification has therefore been researched extensively to tackle the problems with these traditional methods. Previous iterations of this project have investigated using various machine learning techniques to preform network classification. In the 2014, tree based and support vector machines (SVM) algorithms were investigated. Using the Universitat Polit`ecnica de Catalunya (UPC) data set [6] they were able to classify botnet traffic from legitimate traffic with up to 94% classification accuracy using decision trees and 89% classification accuracy using SVM techniques [7]. In 2016, 10 graph-based methods which utilised spatial traffic statistics were explored and achieved classification accuracies up to 95% using the same UPC data set [8].
Auld et al. [3] investigated the use of statistical data (e.g., number of packets per flow) as inputs to a neural network. The study used Bayesian Neural Networks and created a model using 246 selected flow level features. A list of the most valuable features was then created based on the weightings in the neural network, and included in the report. With this method, an accuracy of 95.8% was achieved with over 200 features.
Trivedi et al. [9] found similar results utilising a neural network to classify network traffic based upon the variance of packet lengths. Comparing the neural network against a clustering approach, it was found that a neural network could both achieve a better classification as well as take less time to train.
As the utilisation of deep learning has been a relatively recent addition to the field of machine learning, only few papers have considered their use in network traffic classification. Wang [1] showed that utilising the first one thousand bytes of a network flow’s payload could prove an effective input for a deep (multilayered) neural network. Although the paper was lacking in overall detail about their implementation and information about the data set used, the results showed promising results for the utilisation of deep learning in network traffic classification.
Background
Network Flows
The term ‘flow’ can be thought of as conversation between two end points on a network. These two end points will exchange packets with each other until this conversation ends. Packets consists of two sections, the header section which holds information about the packet (e.g., destination and source address), and the payload section which holds the message that will be delivered to the recipient. An individual flow has been defined as the unidirectional or bidirectional exchange of packets containing the same five key properties [10]:
- Destination IP address
- Source IP address
- Destination port address
- Source port address
- Transport layer protocol (TCP or UDP).
Machine Learning and Deep Learning
ML is a subset of artificial intelligence (AI) that uses pattern recognition to classify or make predictions from a given set of data. This project will focus on one area of ML in particular, supervised learning. Supervised learning is a type of learning algorithm which uses the desired output (in the case of this report, the correct classification of network traffic) to assess the performance of a model and make corrections based upon the difference between its prediction and the correct classification [11].
Artificial neural networks (ANN, or alternatively simplified to NN) are a subset of machine learning that are formed on the basis of designing mathematical models to mimic how biological neural networks compute information. There are many different types of NNs but the foundation of which, is a network of simple processors, referred to as nodes or neurons, which communicate through numerous connections to other nodes within the network [12] (refer to Figure 2). NN are structured in layers; it is these layers that hold the distinction between deep neural networks and other NNs. A typical NN will be divided into three layers, the input, hidden, and output layer. When a NN is made up of multiple hidden layers it is said to be a Deep Neural Network (DNN). Deep learning is the process of using these deep neural networks for machine learning.
Convolutional Neural Networks (CNNs) are a type of NN that differ from traditional feed-forward reverse propagation by how data is presented to the network. CNNs are passed in data in the form of an image which is then filtered and processed by a neural network to extract an output prediction. Convolutional neural networks have been shown in industry to achieve high performance but require their input to be in the form of an image.
Precision and Recall
In terms of comparing ML algorithms it is not always prudent to compare the overall accuracy of the models. Take for example a ML model which aims to identify all malicious sample in a set of 100 results. If there is only 1 malicious sample in this set, the model can still achieve a 99% accuracy by classifying every sample as non-malicious. Two metrics that help to measure this imbalance are precision and recall [13].
A confusion matrix is used to show the prediction versus the true classification for every class in the system. For example, a confusion matrix for the previous example can be seen below. It should be noted that here the positive class has been identified as the malicious sample. It can be seen that as the example model does not correctly identify any malignant samples it will have a precision and recall of zero.
Predicted Class | |||
---|---|---|---|
Malicious | Non-Malicious | ||
True Classes | Malicious | True Positive (TP)
0 |
False Negative (FN)
1 |
Non-Malicious | False Positive (FP)
0 |
True Negative (TN)
99 |
Method
One of the major questions faced in this project was how network traffic should be represented to a deep neural network. To address this challenge, the research was divided into three main phases, each proposing a new input strategy.
- Phase 1: Drawing from research conducted by Wang [1], Phase 1 explored using patterns found in the payload sections to preform flow classification. In its essence, this is the same method used by deep packet inspection, a widely used method of traffic classification. However, by utilising deep learning to find these patterns it is no longer limited to just those that can be detected by humans.
- Phase 2: Assessed the viability of using statistical features derived from a flow as inputs to a deep neural network. As these features have a higher tolerance to encryption methods and are harder to obscure, this phase aimed to construct a more robust classifier. There have been
many studies utilising this method in machine learning, the aim of this phase was extending this research by utilising a deep learning classifier.
- Phase 3: By exploiting classification methods typically found in image recognition, Phase 3 focused on representing network traffic as images based on the network traffic data and then using a convolutional neural network for classification.
This research aims to provide an extensive analysis of deep learning’s performance on traffic classification using a multitude of neural network configurations. In machine learning the parameters selected before a classifier trains are called the hyperparameters. This is to avoid confusion with the self-defined parameters the classifier will set through training. Six model hyperparameters were investigated to determine their effect on network traffic classification and are described below. TensorFlow [14] was then utilised to build and evaluate the different model architectures.
Hyperparameters | Description and Significance |
---|---|
Number of nodes in the hidden layer(s) | Increases the model’s degrees of freedom, as the number of processing units and connections is increased. |
Number of hidden layers | Each layer generates a higher order representation of the input data. |
Padding style | Padding a flow with either zeros or random values such that each flow meets the required input length. |
Input length | The number of bytes used from each bi-directional flow. As each input byte maps to a respective node in the input layer, the number of bytes used also determines how many nodes are present in the input layer. |
Training/Optimization algorithm | Responsible for adjusting the connection weights to minimize the cost function. |
Activation functions | Responsible for detecting non-linear patterns in the input data. |
Category encoding | How the inputs and outputs are encoded. |
Data set
Deep learning benefits from a large and extensive data set and as supervised learning was to be used, this data set must also be labelled. Due to these reasons, the UNSW-NB15 data set [3, 21] was chosen as the basis for this research. Released in 2015, UNSW-NB15 contains a capture of around 1 million bi-directional flows, made up of both application and malicious traffic. This data set was generated over two separate days, for approximately 15 hours on each of the days. The data set was provided in two distinct sets of packet captures from the 22 nd of January 2015 and 17 th of February 2015. To ensure that the developed model generalises well with new data, the proposed use of this data set for this phase was to train using one set, and test with the other. By using the different sets recorded at different times, our classification results may correspond more closely to the results of classification when applied in a real scenario.
From this data set, a selection of application and malicious classes was chosen. For the application classes, the ten largest represented applications were selected. The remaining flows were group into an 11th “Unknown” class. Two different versions of data sets were then made: one for general application network traffic classification, and another for malicious traffic classification. These were developed to provide an approximately equal number of classes for training.
For the application data set, the data set from 22 Jan 2015 was used for training, while the 17 Feb 2015 data set was split into testing and validation sets, with proportions of 60% and 40%. This resulted in overall proportions of 50% training, 30% testing and 20% validation. The malicious data sets were created by merging both the 22 Jan 2015 and 17 Feb 2015 data sets together, and creating a data set of evenly distributed malicious classes, as well as an equal sized general application class. While the training set is made from the 22 Jan 2015 data set, the testing and validation sets were made by randomly sorting the 17 Feb 2015 data set and splitting the results into the 60%/40% distribution. Additionally, due to the small number of instances of the four smallest malicious classes: ANALYSIS, BACKDOOR, SHELLCODE and WORMS, these were grouped together into a single class called OTHER_MALICIOUS.
Application Protocols | Malicious Classes |
---|---|
DNS | Exploits |
FTP | Fuzzers |
FTP-DATA | Generic |
Mail (POP3, SMTP, IMAP) | DOS |
SSH | Reconnaissance |
P2P (eDonkey, Bittorrent) | Other Malicious |
NFS | |
HTTP | |
BGP | |
OSCAR | |
Unknown |
Phase 1
Phase 1 explored network classification based on the first one thousand bytes in the payload of a bi-directional flow. As a flow is made of multiple packets, each containing their own payload, these payload sections were concatenated together until the required number of bytes was met. The data was then used to train a feedforward-backpropagation neural network. The motivation behind this phase was to reproduce and extend the work documented by Wang [1] by considering the addition of UDP flows and malicious output classes.
Phase 2
References
[1] Z. Wang, "The Applications of Deep Learning on Traffic Identification," Black Hat USA, 2015.
[2] S. Zeba and D.G. Harkut, "An overview of network traffic classification methods," International Journal on Recent and Innovation Trends in Computing and Communication (IJRITCC), no. ISSN: 2321-8169, pp. 482 - 488, February 2015.
[3] T. Auld, A. W. Moore, and S. F. Gull, "Bayesian neural networks for internet traffic classification," IEEE Transactions on Neural Networks, vol. 18, no. 1, pp. 223-39, Jan 2007.
[4] A. W. Moore and K. Papagiannaki, "Toward the Accurate Identification of Network Applications," in PAM, 2005, vol. 5, pp. 41-54: Springer. An intensive examination of network traffic and methods to classify them.
[5] L. Deri, M. Martinelli, T. Bujlow, and A. Cardigliano, "nDPI: Open-source high-speed deep packet inspection," in 2014 International Wireless Communications and Mobile Computing Conference (IWCMC), 2014, pp. 617-622. A deep packet inspection tool utilise to add additional labels to the UNSW-NB15 data set.
[6] V. Carela-Español, P. Barlet-Ros, A. Cabellos-Aparicio, and J. Solé-Pareta, "Analysis of the impact of sampling on NetFlow traffic classification," Computer Networks, vol. 55, no. 5, pp. 1083-1099, 2011. The “UPC” data set, used by both the 2014 and 2016 iterations of this project.
[7] B. McAleer et al., "Honours Project 10: Development of Machine Learning Techniques for Analysing Network Communications," The University of Adelaide, Adelaide 2014. 2014’s iteration of the project. Investigated network traffic with tree based and SVM classifiers.
[8] K. Hörnlund, J. Trann, H. G. Chew, C. C. Lim, and A. Cheng, "Classifying Internet Applications and Detecting Malicious Traffic from Network Communications," ECMS, The University of Adelaide, 2016. Last years’ iteration of the project. Explored network traffic classification utilising graph based techniques.
[9] C. Trivedi, M.-Y. Chow, A. A. Nilsson, and H. J. Trussell, "Classification of Internet traffic using artificial neural networks," 2002. Used the packet size for classification. Showed a neural network was favourable compared to a standarised clusting technique.
[10] J. Quittek, T. Zseby, B. Claise, and S. Zander, "Requirements for IP Flow Information Export (IPFIX)," Internet Engineering Task Force. (IETF), October 2004. Used to understand the requirements for IPFIX designation.
[11] A. Ng. Machine Learning. Available: https://www.coursera.org/browse/datascience/machine-learning A free course on machine learning. Was used initially to get fimilar with the concepts.
[12] Nerual Network FAQ. Available: ftp://ftp.sas.com/pub/neural/FAQ.html An extensive resource for information reagarding neural networks.
[13] J. Rajagopal, I. Descutner, M. Scibior, and N. Pickorita. (2016). Deep Learning SIMPLIFIED. Available: https://www.youtube.com/watch?v=iIjtgrjgAug A YouTube series, useful to get fimilar with the different deep learning networks.
[14] A. Martín et al., "TensorFlow: A System for Large-Scale Machine Learning," in OSDI, 2016, vol. 16, pp. 265-283. A python library utilised to create and evaluate all models used within this project.