Intrusion detection has become an open challenge in any modern ICT system due to the ever-growing urge towards assuring security of present day networks. Various machine learning methods have been proposed for finding an effective solution to detect and prevent network intrusions. Many approaches, tuned and tested by means of public datasets, capitalize on well-known classifiers, which often reach detection accuracy close to 1. However, these results strongly depend on the training data, which may not be representative of real production environments and ever-evolving attacks. This paper is an initial exploration around this problem. After having learned a detector on the top of a public intrusion detection dataset, we test it against held-out data not used for learning and additional data gathered by attack emulation in a controlled network. The experiments presented are focused on Denial of Service attacks and based on the CICIDS2017 dataset. Overall, the figures gathered confirm that results obtained in the context of synthetic datasets may not generalize in practice.
A Critique on the Use of Machine Learning on Public Datasets for Intrusion Detection
Catillo M.;Pecchia A.;Villano U.
2021-01-01
Abstract
Intrusion detection has become an open challenge in any modern ICT system due to the ever-growing urge towards assuring security of present day networks. Various machine learning methods have been proposed for finding an effective solution to detect and prevent network intrusions. Many approaches, tuned and tested by means of public datasets, capitalize on well-known classifiers, which often reach detection accuracy close to 1. However, these results strongly depend on the training data, which may not be representative of real production environments and ever-evolving attacks. This paper is an initial exploration around this problem. After having learned a detector on the top of a public intrusion detection dataset, we test it against held-out data not used for learning and additional data gathered by attack emulation in a controlled network. The experiments presented are focused on Denial of Service attacks and based on the CICIDS2017 dataset. Overall, the figures gathered confirm that results obtained in the context of synthetic datasets may not generalize in practice.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.