Difference between revisions of "Projects:2016s1-102 Classifying Internet Applications and Detecting Malicious Traffic from Network Communications"
(→Introduction) |
(→Stage 1: Bootstrap) |
||
Line 14: | Line 14: | ||
= Introduction = | = Introduction = | ||
+ | |||
+ | The project aims to use machine learning to predict the application class of computer network traffic. In particular, we will explore the usefulness of graph based techniques to extract additional features and provide a simplified model for classification; and, evaluate the classification performance with respect to identifying malicious network traffic. | ||
+ | |||
+ | '''Objectives''' | ||
+ | |||
+ | - Implement a supervised machine learning system which utilises NetFlow data and spatial traffic statistics to classify network traffic, as described by Jin et al. [12] [18] [19]. | ||
+ | |||
+ | - Achieve an appropriate level of accuracy when benchmarked against previous years’ iterations of the project and verify the results of Jin et al. [18]. | ||
+ | |||
+ | - Evaluate the effectiveness of using spatial traffic statistics, in particular with respect to identifying malicious traffic. | ||
+ | |||
+ | - Explore improvements and extensions on the current method prescribed by Jin et al. [12] [18] [19]. | ||
+ | |||
+ | == Introduction == | ||
The project aims to use machine learning to predict the application class of computer network traffic. In particular, we will explore the usefulness of graph based techniques to extract additional features and provide a simplified model for classification; and, evaluate the classification performance with respect to identifying malicious network traffic. | The project aims to use machine learning to predict the application class of computer network traffic. In particular, we will explore the usefulness of graph based techniques to extract additional features and provide a simplified model for classification; and, evaluate the classification performance with respect to identifying malicious network traffic. |
Revision as of 19:16, 26 October 2016
Project Team
Karl Hornlund
Jason Trann
Supervisors
Assoc Prof Cheng Chew Lim
Dr Hong Gunn Chew
Dr Adriel Cheng (DSTG)
Introduction
The project aims to use machine learning to predict the application class of computer network traffic. In particular, we will explore the usefulness of graph based techniques to extract additional features and provide a simplified model for classification; and, evaluate the classification performance with respect to identifying malicious network traffic.
Objectives
- Implement a supervised machine learning system which utilises NetFlow data and spatial traffic statistics to classify network traffic, as described by Jin et al. [12] [18] [19].
- Achieve an appropriate level of accuracy when benchmarked against previous years’ iterations of the project and verify the results of Jin et al. [18].
- Evaluate the effectiveness of using spatial traffic statistics, in particular with respect to identifying malicious traffic.
- Explore improvements and extensions on the current method prescribed by Jin et al. [12] [18] [19].
Introduction
The project aims to use machine learning to predict the application class of computer network traffic. In particular, we will explore the usefulness of graph based techniques to extract additional features and provide a simplified model for classification; and, evaluate the classification performance with respect to identifying malicious network traffic.
Objectives
- Implement a supervised machine learning system which utilises NetFlow data and spatial traffic statistics to classify network traffic, as described by Jin et al. [12] [18] [19].
- Achieve an appropriate level of accuracy when benchmarked against previous years’ iterations of the project and verify the results of Jin et al. [18].
- Evaluate the effectiveness of using spatial traffic statistics, in particular with respect to identifying malicious traffic.
- Explore improvements and extensions on the current method prescribed by Jin et al. [12] [18] [19].
Stage 1: Bootstrap
The bootstrap stage begins by constructing the edge level features to be used as inputs to the supervised machine learning system. Edge level features are built from the flow level features of the NetFlow data, as part of the process of building the Traffic Activity Graph (TAG).
Constructing the Traffic Activity Graph
The TAG is constructed as follows:
(1) Map each unique host in the network to a node in the TAG.
(2) For each flow in the network, create a directed edge between the respective nodes in the TAG, corresponding to the source and destination hosts of the flow. Assign the label of that flow as an edge attribute; do the same for duration, packets, and bytes.
(3) Calculate two additional edge attributes:
mean packet size = bytes/packets
mean packet rate = packets/duration
(4)For each set of edges with both nodes in common, perform the following simplification:
Assume x edges e_1,e_2,…,e_x =(u,v).
Create a new undirected edge e_(x+1),and assign it the following attributes:
label(u,v) ∶= the most common application class label among e_1,e_2,…,e_x.
minduration ≔ minimum duration among e_1,e_2,…,e_x.
minpacket size ∶= minimum mean packet size among e_1,e_2,…,e_x.
minpacket rate ∶= minimum mean packet rate among e_1,e_2,…,e_x.
maxduration ≔ maximum duration among e_1,e_2,…,e_x.
maxpacket size ∶= maximum mean packet size among e_1,e_2,…,e_x.
maxpacket rate ∶= maximum mean packet rate among e_1,e_2,…,e_x.
bytes_uv ≔ sum of bytes flowing from u to v.
bytes_vu ≔ sum of bytes flowing from v to u.
symmetry ∶= min((bytes_uv)/(bytes_vu ),(bytes_vu)/(bytes_uv ))
Remove edges e_1,e_2,…,e_x from the TAG.
(5) Remove any loops (edges which leave from, and enter the same node) from the TAG.