Projects:2017s1-101 Classifying Network Traffic Flows with Deep-Learning

From Projects
Revision as of 00:13, 30 October 2017 by A1646872 (talk | contribs)
Jump to: navigation, search

Project Team

Clinton Page

Daniel Smit

Kyle Millar

Supervisors

Dr Cheng-Chew Lim

Dr Hong-Gunn Chew

Dr Adriel Cheng (DST Group)

Introduction

The internet has become a key facilitator of large-scale global communications and is vital in providing an immeasurable number of services every day. With the ever-expanding growth of internet use, it is critical to effectively manage the underlying networks that hold it together. Network traffic classification plays a crucial role in this management, providing quality of service, forecasting future trends, and detecting potential security threats. For these reasons, accurate network traffic classification is of great interest to internet service providers (ISPs), large-scale enterprise companies, and government agencies alike.

Current methods of network traffic classification have become less effective in recent years due to the increasing trend of obscuring network activity, whether it be for security, priority, or malicious intent [1, 2, 3]. Therefore, in today’s network there arises a need for a more effective classification algorithm to handle these conditions.

Objectives

  • Gain knowledge about the application of deep-learning for classifying network traffic flows
  • Conduct experiments on synthetic traffic flows and/or make use of communications flow data from real-life enterprise networks
  • Develop network traffic classifying software using deep-learning techniques to an acceptable accuracy when comparing against the results of previous years

Relevant Work

Extensive research has been performed on network traffic classification. Common techniques include port-based classification and deep pack inspection (DPI). Port-based classification performs poorly due to the usage of non-standard port numbers, and DPI requires updates to recognise unseen classes of network traffic. Moore and Papagiannaki [4] found that using port-based classification as the sole classifier resulted in classification accuracies of less than 70%. nDPI, an open-source tool for investigating traffic using DPI, shows that a high classification accuracy (around 99%) can be achieved for standard applications but will deteriorate depending on how common the application is or whether it has been encrypted [5].

The use of machine learning techniques for network traffic classification has therefore been researched extensively to tackle the problems with these traditional methods. Previous iterations of this project have investigated using various machine learning techniques to preform network classification. In the 2014, tree based and support vector machines (SVM) algorithms were investigated. Using the Universitat Polit`ecnica de Catalunya (UPC) data set [6] they were able to classify botnet traffic from legitimate traffic with up to 94% classification accuracy using decision trees and 89% classification accuracy using SVM techniques [7]. In 2016, 10 graph-based methods which utilised spatial traffic statistics were explored and achieved classification accuracies up to 95% using the same UPC data set [8].

Auld et al. [3] investigated the use of statistical data (e.g., number of packets per flow) as inputs to a neural network. The study used Bayesian Neural Networks and created a model using 246 selected flow level features. A list of the most valuable features was then created based on the weightings in the neural network, and included in the report. With this method, an accuracy of 95.8% was achieved with over 200 features.

Trivedi et al. [9] found similar results utilising a neural network to classify network traffic based upon the variance of packet lengths. Comparing the neural network against a clustering approach, it was found that a neural network could both achieve a better classification as well as take less time to train.

As the utilisation of deep learning has been a relatively recent addition to the field of machine learning, only few papers have considered their use in network traffic classification. Wang [1] showed that utilising the first one thousand bytes of a network flow’s payload could prove an effective input for a deep (multilayered) neural network. Although the paper was lacking in overall detail about their implementation and information about the data set used, the results showed promising results for the utilisation of deep learning in network traffic classification.

Background

Network Flows

The term ‘flow’ can be thought of as conversation between two end points on a network. These two end points will exchange packets with each other until this conversation ends. Packets consists of two sections, the header section which holds information about the packet (e.g., destination and source address), and the payload section which holds the message that will be delivered to the recipient. An individual flow has been defined as the unidirectional or bidirectional exchange of packets containing the same five key properties [10]:

  • Destination IP address
  • Source IP address
  • Destination port address
  • Source port address
  • Transport layer protocol (TCP or UDP).

Machine Learning and Deep Learning

ML is a subset of artificial intelligence (AI) that uses pattern recognition to classify or make predictions from a given set of data. This project will focus on one area of ML in particular, supervised learning. Supervised learning is a type of learning algorithm which uses the desired output (in the case of this report, the correct classification of network traffic) to assess the performance of a model and make corrections based upon the difference between its prediction and the correct classification [11].

Artificial neural networks (ANN, or alternatively simplified to NN) are a subset of machine learning that are formed on the basis of designing mathematical models to mimic how biological neural networks compute information. There are many different types of NNs but the foundation of which, is a network of simple processors, referred to as nodes or neurons, which communicate through numerous connections to other nodes within the network [12] (refer to Figure 2). NN are structured in layers; it is these layers that hold the distinction between deep neural networks and other NNs. A typical NN will be divided into three layers, the input, hidden, and output layer. When a NN is made up of multiple hidden layers it is said to be a Deep Neural Network (DNN). Deep learning is the process of using these deep neural networks for machine learning.

Convolutional Neural Networks (CNNs) are a type of NN that differ from traditional feed-forward reverse propagation by how data is presented to the network. CNNs are passed in data in the form of an image which is then filtered and processed by a neural network to extract an output prediction. Convolutional neural networks have been shown in industry to achieve high performance but require their input to be in the form of an image.

Precision and Recall

In terms of comparing ML algorithms it is not always prudent to compare the overall accuracy of the models. Take for example a ML model which aims to identify all malicious sample in a set of 100 results. If there is only 1 malicious sample in this set, the model can still achieve a 99% accuracy by classifying every sample as non-malicious. Two metrics that help to measure this imbalance are precision and recall [13].

A confusion matrix is used to show the prediction versus the true classification for every class in the system. For example, a confusion matrix for the previous example can be seen below. It should be noted that here the positive class has been identified as the malicious sample. It can be seen that as the example model does not correctly identify any malignant samples it will have a precision and recall of zero.

Predicted Class
Malicious Non-Malicious
True Classes Malicious True Positive (TP)

0

False Negative (FN)

1

Non-Malicious False Positive (FP)

0

True Negative (TN)

99

Method

An overview of each approach to classification


One of the major questions faced in this project was how network traffic should be represented to a deep neural network. To address this challenge, the research was divided into three main phases, each proposing a new input strategy.

  • Phase 1: Drawing from research conducted by Wang [1], Phase 1 explored using patterns found in the payload sections to preform flow classification. In its essence, this is the same method used by deep packet inspection, a widely used method of traffic classification. However, by utilising deep learning to find these patterns it is no longer limited to just those that can be detected by humans.
  • Phase 2: Assessed the viability of using statistical features derived from a flow as inputs to a deep neural network. As these features have a higher tolerance to encryption methods and are harder to obscure, this phase aimed to construct a more robust classifier. There have been

many studies utilising this method in machine learning, the aim of this phase was extending this research by utilising a deep learning classifier.

  • Phase 3: By exploiting classification methods typically found in image recognition, Phase 3 focused on representing network traffic as images based on the network traffic data and then using a convolutional neural network for classification.

This research aims to provide an extensive analysis of deep learning’s performance on traffic classification using a multitude of neural network configurations. In machine learning the parameters selected before a classifier trains are called the hyperparameters. This is to avoid confusion with the self-defined parameters the classifier will set through training. Six model hyperparameters were investigated to determine their effect on network traffic classification and are described below. TensorFlow [14] was then utilised to build and evaluate the different model architectures.

Hyperparameters Description and Significance
Number of nodes in the hidden layer(s) Increases the model’s degrees of freedom, as the number of processing units and connections is increased.
Number of hidden layers Each layer generates a higher order representation of the input data.
Padding style Padding a flow with either zeros or random values such that each flow meets the required input length.
Input length The number of bytes used from each bi-directional flow. As each input byte maps to a respective node in the input layer, the number of bytes used also determines how many nodes are present in the input layer.
Training/Optimization algorithm Responsible for adjusting the connection weights to minimize the cost function.
Activation functions Responsible for detecting non-linear patterns in the input data.
Category encoding How the inputs and outputs are encoded.

Data set

Deep learning benefits from a large and extensive data set and as supervised learning was to be used, this data set must also be labelled. Due to these reasons, the UNSW-NB15 data set [3, 21] was chosen as the basis for this research. Released in 2015, UNSW-NB15 contains a capture of around 1 million bi-directional flows, made up of both application and malicious traffic. This data set was generated over two separate days, for approximately 15 hours on each of the days. The data set was provided in two distinct sets of packet captures from the 22 nd of January 2015 and 17 th of February 2015. To ensure that the developed model generalises well with new data, the proposed use of this data set for this phase was to train using one set, and test with the other. By using the different sets recorded at different times, our classification results may correspond more closely to the results of classification when applied in a real scenario.

From this data set, a selection of application and malicious classes was chosen. For the application classes, the ten largest represented applications were selected. The remaining flows were group into an 11th “Unknown” class. Two different versions of data sets were then made: one for general application network traffic classification, and another for malicious traffic classification. These were developed to provide an approximately equal number of classes for training.

For the application data set, the data set from 22 Jan 2015 was used for training, while the 17 Feb 2015 data set was split into testing and validation sets, with proportions of 60% and 40%. This resulted in overall proportions of 50% training, 30% testing and 20% validation. The malicious data sets were created by merging both the 22 Jan 2015 and 17 Feb 2015 data sets together, and creating a data set of evenly distributed malicious classes, as well as an equal sized general application class. While the training set is made from the 22 Jan 2015 data set, the testing and validation sets were made by randomly sorting the 17 Feb 2015 data set and splitting the results into the 60%/40% distribution. Additionally, due to the small number of instances of the four smallest malicious classes: ANALYSIS, BACKDOOR, SHELLCODE and WORMS, these were grouped together into a single class called OTHER_MALICIOUS.

Application Protocols Malicious Classes
DNS Exploits
FTP Fuzzers
FTP-DATA Generic
Mail (POP3, SMTP, IMAP) DOS
SSH Reconnaissance
P2P (eDonkey, Bittorrent) Other Malicious
NFS
HTTP
BGP
OSCAR
Unknown

Phase 1

An overview of how data was represented in Phase 1

thumb|upright=1.35|Phase 1 data input sample: the byte values of two types of network traffic protocols - FTP-DATA and SSH


Phase 1 explored network classification based on the first one thousand bytes in the payload of a bi-directional flow. As a flow is made of multiple packets, each containing their own payload, these payload sections were concatenated together until the required number of bytes was met. The data was then used to train a feedforward-backpropagation neural network. The motivation behind this phase was to reproduce and extend the work documented by Wang [1] by considering the addition of UDP flows and malicious output classes.

The first one thousand bytes of a bi-direction flow were normalised and mapped to corresponding input nodes of the classifier. Normalisation of the input values is not always required in neural networks but has been shown to help speed up the learning process.

Extensive testing was performed to find the optimal hyperparameters for application protocol classification and malicious traffic classification.

Number of nodes in the hidden layers

Testing Accuracy for different values of degrees of freedom.

The initial number of nodes in the hidden layers was selected based on the following relationship:

Degrees of freedom = ∑ 𝐿𝑖 × 𝐿𝑖 + 1 [15]

where 𝐿𝑖 is the number of nodes in node layer i, and N is the total number of layers in the neural network (N = input layer + hidden layers + output layer).

Number of nodes in the hidden layers

Similar experiments were run for testing the optimal number of hidden layers. To investigate the effect in which the depth of the model had on its ability to classify network traffic, 10 models were constructed each with the same total number of nodes in their hidden layers but spread across varying depths. These depths ranged from one to ten hidden layers. From this, it could be seen that the addition of more layers did not have a conclusive effect on the testing accuracy achieved and in many cases, reduces the testing accuracy. The results also indicated that the addition of more nodes to the network has a greater effect on the model’s accuracy than the addition of hidden layers.

Padding style

For flows that did not meet the 1000-byte input length, zero and random padding were investigated. The two methods fill the remaining positions with either a 0, or a pseudorandom value between the ranges of 0 to 255, respectively.

Twenty models of various degrees of freedom were then tested with both zero and random padding. The findings indicated that for all models zero padding achieves a greater classification accuracy. As it is clear where zero padding starts, a part of this result is speculated to come from the added feature of flow length which the classifier could be identifying.

Input length

thumb|300px|Phase 1 data input sample: the byte values of two types of network traffic protocols - FTP-DATA and SSH

The number of inputs to the neural network is determined by how many bytes from each bidirectional flow is passed into the classifier. If the input length is too small then the classifier will be unable to detect patterns in the input, resulting in poor classification accuracies. As the input length increases, the degrees of freedom also increase, as shown in previous experiments this may lead to better accuracies.

To investigate this idea, three classifiers of varying degrees of freedom were trained on a series of input lengths ranging from 50 to 800 bytes. The classifier suffered no deterioration in the small input lengths, and therefore an input length of 50 bytes was be utilised for the subsequent experiments. This allowed for a reduced training time.


Phase 2

References

[1] Z. Wang, "The Applications of Deep Learning on Traffic Identification," Black Hat USA, 2015.

[2] S. Zeba and D.G. Harkut, "An overview of network traffic classification methods," International Journal on Recent and Innovation Trends in Computing and Communication (IJRITCC), no. ISSN: 2321-8169, pp. 482 - 488, February 2015.

[3] T. Auld, A. W. Moore, and S. F. Gull, "Bayesian neural networks for internet traffic classification," IEEE Transactions on Neural Networks, vol. 18, no. 1, pp. 223-39, Jan 2007.

[4] A. W. Moore and K. Papagiannaki, "Toward the Accurate Identification of Network Applications," in PAM, 2005, vol. 5, pp. 41-54: Springer. An intensive examination of network traffic and methods to classify them.

[5] L. Deri, M. Martinelli, T. Bujlow, and A. Cardigliano, "nDPI: Open-source high-speed deep packet inspection," in 2014 International Wireless Communications and Mobile Computing Conference (IWCMC), 2014, pp. 617-622. A deep packet inspection tool utilise to add additional labels to the UNSW-NB15 data set.

[6] V. Carela-Español, P. Barlet-Ros, A. Cabellos-Aparicio, and J. Solé-Pareta, "Analysis of the impact of sampling on NetFlow traffic classification," Computer Networks, vol. 55, no. 5, pp. 1083-1099, 2011. The “UPC” data set, used by both the 2014 and 2016 iterations of this project.

[7] B. McAleer et al., "Honours Project 10: Development of Machine Learning Techniques for Analysing Network Communications," The University of Adelaide, Adelaide 2014. 2014’s iteration of the project. Investigated network traffic with tree based and SVM classifiers.

[8] K. Hörnlund, J. Trann, H. G. Chew, C. C. Lim, and A. Cheng, "Classifying Internet Applications and Detecting Malicious Traffic from Network Communications," ECMS, The University of Adelaide, 2016. Last years’ iteration of the project. Explored network traffic classification utilising graph based techniques.

[9] C. Trivedi, M.-Y. Chow, A. A. Nilsson, and H. J. Trussell, "Classification of Internet traffic using artificial neural networks," 2002. Used the packet size for classification. Showed a neural network was favourable compared to a standarised clusting technique.

[10] J. Quittek, T. Zseby, B. Claise, and S. Zander, "Requirements for IP Flow Information Export (IPFIX)," Internet Engineering Task Force. (IETF), October 2004. Used to understand the requirements for IPFIX designation.

[11] A. Ng. Machine Learning. Available: https://www.coursera.org/browse/datascience/machine-learning A free course on machine learning. Was used initially to get fimilar with the concepts.

[12] Nerual Network FAQ. Available: ftp://ftp.sas.com/pub/neural/FAQ.html An extensive resource for information reagarding neural networks.

[13] J. Rajagopal, I. Descutner, M. Scibior, and N. Pickorita. (2016). Deep Learning SIMPLIFIED. Available: https://www.youtube.com/watch?v=iIjtgrjgAug A YouTube series, useful to get fimilar with the different deep learning networks.

[14] A. Martín et al., "TensorFlow: A System for Large-Scale Machine Learning," in OSDI, 2016, vol. 16, pp. 265-283. A python library utilised to create and evaluate all models used within this project.

[15] M. Nour and J. Slay, "UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)." Military Communications and Information Systems Conference (MilCIS), 2015. IEEE The data set used for all experiments shown in the report.