The use of system logs for detecting and troubleshooting anomalies of production systems has been known since the early days of computers. In spite of the advances in the area, the analysis of log files emitted by real-life systems poses many peculiar challenges. Up-to-date tools, such as log management and Security Information and Event Management (SIEM) products, capitalize on standard data formats, logging protocols and dictionaries of threat signatures, which hardly fit to logs of industrial and proprietary systems. This paper addresses the analysis of logs emitted by computer systems with a focus on anomaly detection. The proposed approach, named AutoLog, consists in sampling the logs at regular intervals and to compute numeric scores. Scores collected under normative operations are used to train a semi-supervised deep autoencoder, which serves as a baseline to classify future scores. The approach is not constrained by the structure of underlying logs and does not need for anomalies at training time. The results obtained in detecting anomalies of two industrial systems and the public BG/L and Hadoop datasets widely used as benchmarks, indicate that the recall of AutoLog ranges between 0.96 and 0.99, while the precision is within 0.93 and 0.98. A comparative study with isolation forest, one-class SVM, decision tree, vanilla autoencoder and variational autoencoder is conducted to demonstrate the validity of the proposal.

AutoLog: Anomaly detection by deep autoencoding of system logs

Catillo M.
;
Pecchia A.;Villano U.
2022-01-01

Abstract

The use of system logs for detecting and troubleshooting anomalies of production systems has been known since the early days of computers. In spite of the advances in the area, the analysis of log files emitted by real-life systems poses many peculiar challenges. Up-to-date tools, such as log management and Security Information and Event Management (SIEM) products, capitalize on standard data formats, logging protocols and dictionaries of threat signatures, which hardly fit to logs of industrial and proprietary systems. This paper addresses the analysis of logs emitted by computer systems with a focus on anomaly detection. The proposed approach, named AutoLog, consists in sampling the logs at regular intervals and to compute numeric scores. Scores collected under normative operations are used to train a semi-supervised deep autoencoder, which serves as a baseline to classify future scores. The approach is not constrained by the structure of underlying logs and does not need for anomalies at training time. The results obtained in detecting anomalies of two industrial systems and the public BG/L and Hadoop datasets widely used as benchmarks, indicate that the recall of AutoLog ranges between 0.96 and 0.99, while the precision is within 0.93 and 0.98. A comparative study with isolation forest, one-class SVM, decision tree, vanilla autoencoder and variational autoencoder is conducted to demonstrate the validity of the proposal.
2022
Anomaly detection
Autoencoder
Cybersecurity
Deep learning
System logs
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12070/52429
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 27
  • ???jsp.display-item.citation.isi??? 14
social impact