The proliferation of Internet of Things (IoT) devices has introduced significant challenges in cybersecurity, particularly in the realm of intrusion detection. While effective, traditional centralized machine learning approaches often compromise data privacy and scalability due to the need for data aggregation. In this study, we propose a federated learning framework for near-real-time intrusion detection in IoT environments. Federated learning enables decentralized model training across multiple devices without exchanging raw data, thereby preserving privacy and reducing communication overhead. Our approach builds upon a previously proposed hybrid model, which combines a machine learning model deployed on IoT devices with a second-level cloud-based analysis. This previous work required all data to be passed to the cloud in aggregate form, limiting security. We extend this model to incorporate federated learning, allowing for distributed training while maintaining high accuracy and privacy. We evaluate the performance of our federated-learning-based model against a traditional centralized model, focusing on accuracy retention, training efficiency, and privacy preservation. Our experiments utilize actual attack data partitioned across multiple nodes. The results demonstrate that this hybrid federated learning not only offers significant advantages in terms of data privacy and scalability but also retains the previous competitive accuracy. This paper also explores the integration of federated learning with cloud-based infrastructure, leveraging platforms such as Databricks and Google Cloud Storage. We discuss the challenges and benefits of implementing federated learning in a distributed environment, including the use of Apache Spark and MLlib for scalable model training. The results show that all the algorithms used maintain an excellent identification accuracy (98% for logistic R=regression, 97% for SVM, and 100% for Random Forest). We also report a very short training time (less than 11 s on a single machine). The previous very low application time is also confirmed (0.16 s for over 1,697,851 packets). Our findings highlight the potential of federated learning as a viable solution for enhancing cybersecurity in IoT ecosystems, paving the way for further research in privacy-preserving machine learning techniques.
A Hybrid Federated Learning Framework for Privacy-Preserving Near-Real-Time Intrusion Detection in IoT Environments
Salvatore Rampone
2025-01-01
Abstract
The proliferation of Internet of Things (IoT) devices has introduced significant challenges in cybersecurity, particularly in the realm of intrusion detection. While effective, traditional centralized machine learning approaches often compromise data privacy and scalability due to the need for data aggregation. In this study, we propose a federated learning framework for near-real-time intrusion detection in IoT environments. Federated learning enables decentralized model training across multiple devices without exchanging raw data, thereby preserving privacy and reducing communication overhead. Our approach builds upon a previously proposed hybrid model, which combines a machine learning model deployed on IoT devices with a second-level cloud-based analysis. This previous work required all data to be passed to the cloud in aggregate form, limiting security. We extend this model to incorporate federated learning, allowing for distributed training while maintaining high accuracy and privacy. We evaluate the performance of our federated-learning-based model against a traditional centralized model, focusing on accuracy retention, training efficiency, and privacy preservation. Our experiments utilize actual attack data partitioned across multiple nodes. The results demonstrate that this hybrid federated learning not only offers significant advantages in terms of data privacy and scalability but also retains the previous competitive accuracy. This paper also explores the integration of federated learning with cloud-based infrastructure, leveraging platforms such as Databricks and Google Cloud Storage. We discuss the challenges and benefits of implementing federated learning in a distributed environment, including the use of Apache Spark and MLlib for scalable model training. The results show that all the algorithms used maintain an excellent identification accuracy (98% for logistic R=regression, 97% for SVM, and 100% for Random Forest). We also report a very short training time (less than 11 s on a single machine). The previous very low application time is also confirmed (0.16 s for over 1,697,851 packets). Our findings highlight the potential of federated learning as a viable solution for enhancing cybersecurity in IoT ecosystems, paving the way for further research in privacy-preserving machine learning techniques.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.