Projects:2017s1-101 Classifying Network Traffic Flows with Deep-Learning

From Projects
Revision as of 15:29, 29 October 2017 by A1646872 (talk | contribs)
Jump to: navigation, search

Project Team

Clinton Page

Daniel Smit

Kyle Millar

Supervisors

Dr Cheng-Chew Lim

Dr Hong-Gunn Chew

Dr Adriel Cheng (DST Group)

Introduction

The internet has become a key facilitator of large-scale global communications and is vital in providing an immeasurable number of services every day. With the ever-expanding growth of internet use, it is critical to effectively manage the underlying networks that hold it together. Network traffic classification plays a crucial role in this management, providing quality of service, forecasting future trends, and detecting potential security threats. For these reasons, accurate network traffic classification is of great interest to internet service providers (ISPs), large-scale enterprise companies, and government agencies alike.

Current methods of network traffic classification have become less effective in recent years due to the increasing trend of obscuring network activity, whether it be for security, priority, or malicious intent [1-3]. Therefore, in today’s network there arises a need for a more effective classification algorithm to handle these conditions.

Objectives

  • Gain knowledge about the application of deep-learning for classifying network traffic flows
  • Conduct experiments on synthetic traffic flows and/or make use of communications flow data from real-life enterprise networks
  • Develop network traffic classifying software using deep-learning techniques to an acceptable accuracy when comparing against the results of previous years

Relevant Work

Extensive research has been performed on network traffic classification. Common techniques include port-based classification and deep pack inspection (DPI). Port-based classification performs poorly due to the usage of non-standard port numbers, and DPI requires updates to recognise unseen classes of network traffic. Moore and Papagiannaki [4] found that using port-based classification as the sole classifier resulted in classification accuracies of less than 70%. nDPI, an open-source tool for investigating traffic using DPI, shows that a high classification accuracy (around 99%) can be achieved for standard applications but will deteriorate depending on how common the application is or whether it has been encrypted [5].

The use of machine learning techniques for network traffic classification has therefore been researched extensively to tackle the problems with these traditional methods. Previous iterations of this project have investigated using various machine learning techniques to preform network classification. In the 2014, tree based and support vector machines (SVM) algorithms were investigated. Using the Universitat Polit`ecnica de Catalunya (UPC) data set [6] they were able to classify botnet traffic from legitimate traffic with up to 94% classification accuracy using decision trees and 89% classification accuracy using SVM techniques [7]. In 2016, 10 graph-based methods which utilised spatial traffic statistics were explored and achieved classification accuracies up to 95% using the same UPC data set [8].



References

[1] Z. Wang, "The Applications of Deep Learning on Traffic Identification," Black Hat USA, 2015.

[2] S. Zeba and D.G. Harkut, "An overview of network traffic classification methods," International Journal on Recent and Innovation Trends in Computing and Communication (IJRITCC), no. ISSN: 2321-8169, pp. 482 - 488, February 2015.

[3] T. Auld, A. W. Moore, and S. F. Gull, "Bayesian neural networks for internet traffic classification," IEEE Transactions on Neural Networks, vol. 18, no. 1, pp. 223-39, Jan 2007.

[4] A. W. Moore and K. Papagiannaki, "Toward the Accurate Identification of Network Applications," in PAM, 2005, vol. 5, pp. 41-54: Springer. An intensive examination of network traffic and methods to classify them.

[5] L. Deri, M. Martinelli, T. Bujlow, and A. Cardigliano, "nDPI: Open-source high-speed deep packet inspection," in 2014 International Wireless Communications and Mobile Computing Conference (IWCMC), 2014, pp. 617-622. A deep packet inspection tool utilise to add additional labels to the UNSW-NB15 data set.

[6] V. Carela-Español, P. Barlet-Ros, A. Cabellos-Aparicio, and J. Solé-Pareta, "Analysis of the impact of sampling on NetFlow traffic classification," Computer Networks, vol. 55, no. 5, pp. 1083-1099, 2011. The “UPC” data set, used by both the 2014 and 2016 iterations of this project.

[7] B. McAleer et al., "Honours Project 10: Development of Machine Learning Techniques for Analysing Network Communications," The University of Adelaide, Adelaide 2014. 2014’s iteration of the project. Investigated network traffic with tree based and SVM classifiers.

[8] K. Hörnlund, J. Trann, H. G. Chew, C. C. Lim, and A. Cheng, "Classifying Internet Applications and Detecting Malicious Traffic from Network Communications," ECMS, The University of Adelaide, 2016. Last years’ iteration of the project. Explored network traffic classification utilising graph based techniques.