A Preliminary Performance Comparison of Five Machine Learning Algorithms for Practical IP Traffic Flow Classification
The identification of network applications through observation of associated packet traffic flows is vital to the areas of network management and surveillance. Currently popular methods such as port number and payload-based identification exhibit a number of shortfalls.
An alternative is to use machine learning (ML) techniques and identify network applications based on per-flow statistics, derived from payload-independent features such as packet length and inter-arrival time distributions. The performance impact of feature set reduction, using Consistency-based and Correlation-based feature selection, is demonstrated on Naïve Bayes, C4.5, Bayesian Network and Naïve Bayes Tree algorithms. We then show that it is useful to differentiate algorithms based on computational performance rather than classification accuracy alone, as although classification accuracy between the algorithms is similar, computational performance can differ significantly.
This work is a a nice empirical study of the use of main-stream machine learning algorithms for the classification of network traffic. As the title suggests, it is a preliminary study, and it does a good job of filling that role.
An important role of this work is to show the need for thorough comparisons between the plethora of proposed solutions for traffic classification. The machine learning techniques and their use is carefully explained that it can also serve as quick primer on supervised learning. Certainly there are other learning algorithms, other features, other performance measures, different aproaches to traffic classification, and (in general) more research that could be done. This paper is a good first attempt to create discussion and inspire future research in this direction.
| Attachment | Size |
|---|---|
| p7-williams.pdf | 269.75 KB |
